Is it? Training is only done once, inference requires GPUs to scale, especially for a 685B model. And now, there’s an open source o1 equivalent model that companies can run locally, which means that there’s a much bigger market for underutilized on-prem GPUs.
I'd be really curious about the split in hardware for training vs inference - I got the read that it was a very high ratio to the point the training is not a significant portion of the requisite hardware but instead the inference at scale sucks up most of the available datacenter gpu share.
Could be entirely wrong here - would love a fact-check by industry insider or journalist.