Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

+1, I am also big user of PGMs, and also a big user of transformers, and I don't know what the parent comment talking about, beyond that for e.g. LLMs, sampling the next token can be thought of as sampling from a conditional distribution (of the next token, given previous tokens). However, this connection of using transformers to sample from conditional distributions is about autoregressive generation and training using next-token prediction loss, not about the transformer architecture itself, which mostly seems to be good because it is expressive and scalable (i.e. can be hardware-optimized).

Source: I am a PhD student, this is kinda my wheelhouse





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: