It doesn't need to, during inference there's little data exchange between one chip and another (just a single embedding vector per token).
It's completely different during training because of the backward pass and weight update, which put a lot of strain on the inter-chip communication, but during inference even x4 PCIe4.0 is enough to connect GPUs together and not lose speed.
With the exception of diffusion language models that don't work this way, but are very niche, language models are autoregressive, which means you indeed need to process token in order.
And that's why model speed is such a big deal, you can't just throw more hardware at the problem because the problem is latency, not compute.
Kvark was leading the engineering effort for wgpu while he was at Mozilla.
But he was doing that on his work time and did so collaborating with other Mozilla engineers, whereas AFAIK blade has been more of a personal side project.
And made almost zero impact, it was just a bigger version of Deepseek V2 and when mostly unnoticed because its performances weren't particularly notable especially for its size.
It was R1 with its RL-training that made the news and crashed the srock market.
And the author completely misses the point thinking it's somehow mandatory in plaster walls, when it's just a convenience thing that avoids making holes in the plaster…
I do appreciate why people want to avoid that, plaster does crumble pretty easily. Combined with 100+ year old lath that is as hard as iron, it can be a mild pain in the ass to hang a picture without doing more damage to the plaster than you want.
Thanks to another comment here I went looking for the strategy guides that are injected. To save everyone else the trouble, here [0]. Look at (e.g.) default/STRATEGY.md.jinja. Also adding a permalink [1] for future readers' sake.
> and you’ll realize that not only are “AI takeover” fears justified
Its quite the opposite actually, the “AI takeover risk” is manufactured bullshit to make people disregard the actual risks of the technology. That's why Dario Amodei keeps talking about it all the time, it's a red herring to distract people from the real social damage his product is doing right now.
As long as he gets the media (and regulators) obsessed by hypothetical future risks, they don't spend too much time criticizing and regulating his actual business.
Fortunately deaths are only a fraction of the accidents though, and it's not even necessarily the kind of accident that bothers insurance companies the most as long as the driver only kills himself.
> Credit scores are universally hated but they make it possible to offer lower interest rates to more people.
That's probably true in theory, but not in practice, given how high US credit interest rates are compared to European countries for instance.
> Without credit scores, fewer people would have access to credit.
Too many people having access to credit is exactly how we got the worst financial crisis of the century, so it's not really something to brag about… People talk about US public debt a lot, but private debt is even more worrisome.
> This to me sounds a lot like the SpaceX conversation
The problem is that it is absolutely indiscernible from the Theranos conversation as well…
If Anthropic stopped making lies about the current capability of their models (like “it compiles the Linux kernel” here, but it's far from the first time they do that), maybe neutral people would give them the benefit of the doubt.
For one grifter that happen to succeed at delivering his grandiose promises (Elon), how many grifters will fail?
It's completely different during training because of the backward pass and weight update, which put a lot of strain on the inter-chip communication, but during inference even x4 PCIe4.0 is enough to connect GPUs together and not lose speed.
reply