Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fill 3D takes a different step from diffusion, in that it tries to build an actual 3D scene (kinda like a clone) of what's in the image you upload. In some sense, that's actually the most fundamental representation of what's in your image (or said another way, your image is just a representation of that original scene).

So it works by trying to estimate a 3D 'room' that matches your image. Everything from the geometry, to the light fixtures, to the windows. It's heavily inspired by how humans (weird to contrast 'human' vs. AI work) do image/video compositing.

TL;DR: Image in, 3D scene out.



Could you elaborate on how that's done technically? I'm curious how you estimate the 3D room. Are you using ML based estimation like LayoutNet? How about the lighting?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: