Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would guess the “secret sauce” here is distillation: pretraining on an extremely high quality synthetic dataset from the prompted output of their state of the art models like o3 rather than generic internet text. A number of research results have shown that highly curated technical problem solving data is unreasonably effective at boosting smaller models’ performance.

This would be much more efficient than relying purely on RL post-training on a small model; with low baseline capabilities the insights would be very sparse and the training very inefficient.



> research results have shown that highly curated technical problem solving data is unreasonably effective at boosting smaller models’ performance.

same seems to be true for humans


Yes, if I understand correctly, what it means is "a very smart teacher can do wonders for their pupils' education".


Wish they gave us access to learn from those grandmother models instead of distilled slop.


It behooves them to keep the best stuff internal, or at least greatly limit any API usage to avoid giving the goods away to other labs they are racing with.


Which, presumably, is the reason they removed 4.5 from the API... mostly the only people willing to pay that much for that model were their competitors. (I mean, I would pay even more than they were charging, but I imagine even if I scale out my use cases--which, for just me, are mostly satisfied by being trapped in their UI--it would be a pittance vs. the simpler stuff people keep using.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: