Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some freely available models

GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...

and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...

have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)

So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...



With glide I think we've reached something of a plateau in terms of architecture on the "text to image generator S curve". DALL-E-2 is a very similar architecture to glide and has some notable downsides (poorer language understanding)

glid-3 is a relatively small model trained by a single guy on his workstation (aka me) so it's not going to be as good. It's also not fully baked yet so ymmv, although it really depends on the prompt. The new latent diffusion model is really amazing though and is much closer to DALLE-2 for 256px images.

I think the open source community will rapidly catch up with Openai in the coming months. The data, code and compute are all there to train a model of similar size and quality.


Wow. Thanks for GLID-3. It was genuinely exciting for a few days but then I must admit latent diffusion stole my attention somewhat ;-)

What kind of prompts is GLID-3 especially good for? I remember getting lucky when I was playing around a few times but I didn't do it systematically.


glid-3 is trained specifically on photographic-style images, and is a bit better at generalization compared to the latent diffusion model.

eg. prompt: half human half Eiffel tower. A human Eiffel tower hybrid (I get mostly normal Eiffel towers from LDM but some sensical results from glid-3)

glid-3 will be worse for things that require detailed recall, like a specific person.

With smaller models you kind of have to generate a lot of samples and pick out the best ones.


Thanks!

Do you happen to know how much GPU RAM I need to run glid-3 and/or the latent diffusion model, if I don't want to run on colab?


Just tried glid-3 with a batch size of one and I'm getting 4781MiB. The latent diffusion model peaks at 8403MiB

These are fp16 numbers though, you might need a recent nvidia card to run it.


I'll try them out. I have an RTX 2070, which apparently supports fp16. But it only has 8GB RAM.

I used the instructions here to check: https://github.com/wang-xinyu/tensorrtx/blob/master/tutorial...


They're also not censored on the dataset front and thus produce much more interesting outputs.

OpenAI has a low resolution checkpoint for similar functionality as this - called GLIDE - and the output is super boring compared to community driven efforts, in large part because of similar dataset restrictions as this likely has been subjected to.


How do you run such a Google Colab thing?

I don't see a run button?

On.. maybe "Runtime -> Run All" from the menu ...

Shows me a spinning circle around "Download model" ...

26% ...

Fascinating, that Google offers you a computer in the cloud for free ..

Now it is running the model. Wow, I'm curious ..

Ha, it worked!

Nothing compared to the images in the Dall-E 2 article but still impressive.


Google is a company with a lot of spare VMs and GPUs.

However, the free GPU is now a K80 which is obsolete and barely sufficient for running these types of models.


You sometimes still get T4s. I got one last week and it was great.


I think this is really neat, but definitely not on the same tier as DALL-E 2, at least from the cherry-picked images I saw.


I'm not sure what you've seen but I've been very impressed indeed by some results I've obtained. Some less so.

It's hard to compare because we don't know how much cherry picking is going on with published Dall-E results (either v1 or v2)

My gut feeling is that it's in the same ballpark as Dall-E 1


a cow and a farmer in their field looking at the sky




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: