> By keeping computation and memory on a single wafer-scale processor, we eliminate the data-movement penalties that dominate GPU systems. The result is up to 15× faster inference, without sacrificing model size or accuracy.
I hope all AI will reach 300ms response times, including 200 line diffs. Querying a million rows or informing user that a codebase is wrong used to take minutes but now happen instantly.
reply