I tried this (gpt-oss-120b with Cerebras) with Roo Code. It repeatedly failed to use the tools correctly, and then I got 429 too many requests. So much for the "as fast as I can think" idea!
I'll have to try again later but it was a bit underwhelming.
The latency also seemed pretty high, not sure why. I think with the latency the throughout ends up not making much difference.
Btw Groq has the 20b model at 4000 TPS but I haven't tried that one.
I'll have to try again later but it was a bit underwhelming.
The latency also seemed pretty high, not sure why. I think with the latency the throughout ends up not making much difference.
Btw Groq has the 20b model at 4000 TPS but I haven't tried that one.