Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

in the super public consumer space, search engines / answer engines (like chatgpt) are the big ones.

on the other hand it's also led to improvements in many places hidden behind the scenes. for example, vision transformers are much more powerful and scalable than many of the other computer vision models which has probably led to new capabilities.

in general, transformers aren't just "generate text" but it's a new foundational model architecture which enables a leap step in many things which require modeling!





Transformers also make for a damn good base to graft just about any other architecture onto.

Like, vision transformers? They seem to work best when they still have a CNN backbone, but the "transformer" component is very good at focusing on relevant information, and doing different things depending on what you want to be done with those images.

And if you bolt that hybrid vision transformer to an even larger language-oriented transformer? That also imbues it with basic problem-solving, world knowledge and commonsense reasoning capabilities - which, in things like advanced OCR systems, are very welcome.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: