have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)
So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...
With glide I think we've reached something of a plateau in terms of architecture on the "text to image generator S curve". DALL-E-2 is a very similar architecture to glide and has some notable downsides (poorer language understanding)
glid-3 is a relatively small model trained by a single guy on his workstation (aka me) so it's not going to be as good. It's also not fully baked yet so ymmv, although it really depends on the prompt. The new latent diffusion model is really amazing though and is much closer to DALLE-2 for 256px images.
I think the open source community will rapidly catch up with Openai in the coming months. The data, code and compute are all there to train a model of similar size and quality.
glid-3 is trained specifically on photographic-style images, and is a bit better at generalization compared to the latent diffusion model.
eg. prompt: half human half Eiffel tower. A human Eiffel tower hybrid (I get mostly normal Eiffel towers from LDM but some sensical results from glid-3)
glid-3 will be worse for things that require detailed recall, like a specific person.
With smaller models you kind of have to generate a lot of samples and pick out the best ones.
They're also not censored on the dataset front and thus produce much more interesting outputs.
OpenAI has a low resolution checkpoint for similar functionality as this - called GLIDE - and the output is super boring compared to community driven efforts, in large part because of similar dataset restrictions as this likely has been subjected to.
GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...
and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...
have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)
So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...