Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it? Training is only done once, inference requires GPUs to scale, especially for a 685B model. And now, there’s an open source o1 equivalent model that companies can run locally, which means that there’s a much bigger market for underutilized on-prem GPUs.


I'd be really curious about the split in hardware for training vs inference - I got the read that it was a very high ratio to the point the training is not a significant portion of the requisite hardware but instead the inference at scale sucks up most of the available datacenter gpu share.

Could be entirely wrong here - would love a fact-check by industry insider or journalist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: