Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?


They provide the answer in less words (while still conveying what needed to be said).

Which is a good thing in my book as the models now are way too verbose (and I suspect one of the reasons is the billing by tokens).


The post implies that the new model are better at thinking, therefore less time/cost spent overall.

The first chart implies the gains are minimal for nonthinking models.


Models are less verbose, so produces fewer output tokens, so answers cost less.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: