1. AMD isn’t different enough. They’d be subject to the same export restrictions and political instability as Nvidia, so why would global companies switch to them?
2. CUDA has been a huge moat, but the incentives are incredibly strong for everybody except Nvidia to change that. The fact that it was an insurmountable moat five years ago in a $5B market does not mean it’s equally powerful in a $300B market.
3. AMD’s culture and core competencies are really not aligned to playing disruptor here. Nvidia is generally more agile and more experimental. It would have taken a serious pivot years ago for AMD to be the right company to compete.
AMD is HIGHLY successful in the GPU compute market. They have the Instinct line which actually outperforms most nVidia chips for less money.
It's the CUDA software ecosystem they have not been able to overcome. AMD has had multiple ecosystem stalls but it does appear that ROCm is finally taking off which is open source and multi-vendor.
AMD is unifying their GPU architectures (like nVidia) for the next gen to be able to subsidize development by gaming, etc., card sales (like nVidia).
Why doesn't AMD just write a CUDA translation layer? Yeah, it's a bit difficult to say "just", but they're a pretty big company. It's not like one guy doing it in a basement.
Does Nvidia have patents on CUDA? They're probably invalid in China which explains why China can do this and AMD can't.
The CUDA moat is extremely exaggerated for deep learning, especially for inference. It’s simply not hard to do matrix multiplication and a few activation functions here and there.
It regularly shocks me that AMD doesn't release their cards with at least enough CUDA reimplementation to run DL models. As you point out, AI applications use a tiny subset of the overall API, the courts have ruled that APIs can't be protected by copyright, and CUDA is NVIDIA's largest advantage. It seems like an easy win, so I assume there's some good reason.
A very cynical take: AMD and Nvidia CEO’s are cousins and there’s more money to be made with one dominant monopoly than two competitive companies. And this income could be an existential difference-maker for Taiwan.
AMD can't even figure out how to release decent drivers for Linux in a timely fashion. It might not be the largest market, but would have at least given them a competitive advantage in reaching some developers. There is either something very incompetent in their software team, or there are business reasons intentionally restraining them.
From what I've been reading the inference workload tends to ebb and flow throughout the day with much lower loads overnight than at for example 10AM PT/1PM ET. I understand companies fill that gap with training (because an idle GPU costs the most).
So for data centers, training is just as important as inference.
> So for data centers, training is just as important as inference.
Sure, and I’m not saying buying Nvidia is a bad bet. It’s the most flexible and mature hardware out there, and the huge installed base also means you know future innovations will align with this hardware. But it’s not primarily a CUDA thing or even a software thing. The Nvidia moat is much broader than just CUDA.
And it would be a big bet for AMD. They don't create and manufacture chips 'just in time' -- it takes man hours and MONEY to spin up a fab, not to mention marketing dollars.
2. CUDA has been a huge moat, but the incentives are incredibly strong for everybody except Nvidia to change that. The fact that it was an insurmountable moat five years ago in a $5B market does not mean it’s equally powerful in a $300B market.
3. AMD’s culture and core competencies are really not aligned to playing disruptor here. Nvidia is generally more agile and more experimental. It would have taken a serious pivot years ago for AMD to be the right company to compete.