Also, next year, there will be GPT 5. I find it fascinating how much attention small models get, when at the same time the big models just get bigger and prohibitively expensive to train. No leading lab would do that if they thought it a decent chance that small models were able to compete.
So who will be interested in a shitty assistant next year when you can have an amazing one, is what I wonder? Is this just the biggest cup of wishful thinking that we have ever seen?
If I’ve raised $1B to buy GPUs and train a “bigger model”, a major part of my competitive advantage is having $1B to spend on sufficient GPUs to train a bigger model.
If, after having raised that money it becomes apparent that consumer hardware can run smaller models that are optimized and perform as well without all that money going into training them, how am I going to pivot my business to something that works, given these smaller models are released this way on purpose to undermine my efforts?
It seems there are two major possibilities: one,
people raising billions find a new and expensive intelligence step function that at least time-locally separates them from the pack, or two (and significantly more likely in my view) they don’t, and the improvements come from layering on different systems such as do not require acres of GPUs, while the “more data more GPUs” crowd is found to have hit a nonlinearity that in practical terms means they are generations of technology away from the next tier.
Is it still even worth the electricity to do this on a GPU? It wouldn’t surprise me if some startups were renting them out, but is anyone still mining any volume of crypto on GPUs?
edit: I guess to your point if it is not knowingly then the electricity costs are not a factor either.
What you suggest is not impossible but simply flies in the face of all currently available evidence and what all leading labs say and do. We know they are actively looking for ways to do things more efficiently. OpenAI alone did a couple of releases to that effect. Because of how easy it is to switch providers, if only one lab found a way to run a small model that competed with the big ones, it would simply win the entire space, so everyone has to be looking for that (and clearly they are, given that all of them do have smaller versions of their models)
Scepticism is fine, if it's plausible. If not it's conspiratorial.
There are at least two different optimizations happening:
1) optimizing the model training
2) optimizing the model operation
The $1B-spend holy grail is that it costs a lot of money to train, and almost nothing to operate, a proprietary model that benchmarks and chats better than anyone else’s.
OpenAI’s optimizations fall into the latter category. The risk to the business model is in the former — if someone can train a world-beating model without lots of money, it’s a tough day for the big players.
I disagree. Not axiomatically because you’re kind of right, but enough to comment. OpenAI doesn’t believe in optimizing the traisning costs of AI but believes in optimizing (read: maxing) the training period. Their billions go to collecting, collating, and transforming as much training data as they can get their hands on.
To see what optimizing model operation looks like, groq is a good example. OpenAI isn’t (yet) obviously in that kind of optimization, though I’m sure they’re working on it internally.
My argument wasn’t that the well-funded entities were optimizing to reduce training costs, but the opposite: they need creative ways to spend $1B that provide some tangible advantage. But they need operating costs to be low or they lose money and try to somehow make it up on volume.
I would roll data acquisition/cleaning processes into training costs for purposes of this because what else is the data for if not training?
If 4o wasn’t an optimization for model operation costs what was it?
Why would anyone buy a Raspberry Pi when they can get a fully decked out Mac Pro?
There are different use cases and computers are already pretty powerful. Maybe your local model won't be able to produce tests that check all the corner cases of the class you just wrote for work in your massive code base.
But the small model is perfectly capable of summarizing the weather from an API call and maybe tack on a joke that can be read out to you on your speakers in the morning.
My memory is fuzzy, but I recall that some models had very limited hardware acceleration support in the driver stack for things like video codecs, OpenCL, and Vulcan, unless you used the official kernel with the Broadcom blob. I never liked running that due to bloat and the age of the kernel/Debian they ship. All that combined with the performance of the SOC compared to its peers from Rockchip/Mediatek/Samsung and lack of eMMC support pretty much drove me away from Raspberry Pi devices in favor of Radxa and ODROID boards.
One of the reasons I run local is that the models are completely uncensored and unfiltered. If you're doing anything slightly 'risky' the only thing APIs are good for is a slew of very politely written apology letters, and the definition of 'risky' will change randomly without notice or fail to accommodate novel situations.
It is also evident in the moderation that your usage is subject to human review and I don't think that should even be possible.
So who will be interested in a shitty assistant next year when you can have an amazing one, is what I wonder? Is this just the biggest cup of wishful thinking that we have ever seen?