Respectfully, I don't think that's true. "The Cloud" is just computers in a ware...

exitheone · on Nov 15, 2022

I don't actually think they're true.

One second of Google cloud TPU has roughly the same number of floating point operations then 4 hours of raspberry pi 4B time.

So 3 minutes of cloud TPU time already covers your whole month of raspberry pi usage. Pretty sure it costs them less than 5$ as well, since they have the hardware anyways.

jeffbee · on Nov 15, 2022

"The cloud" is also massively parallel software. If I run a Google search, many thousands of CPUs will be brought to bear on my query, and a gazillion DIMMs, and all the throughput of a hell of a lot of SSDs, and so on. If you just happened to have a copy of the web, and an index of it, on "a computer" no matter how big, it would be impossible to get prompt answers.

If Google (or whomever) needs to run voice models, they take your query and all the other queries that arrive in the same millisecond, smoosh them all together and shove the batch into a TPU and run it. You don't have any TPUs and you also don't have any traffic you can use to amortize the cost of your infrequent queries.

The idea that you could run these kinds of ML inference tasks is economically fanciful. You would need a huge investment in hardware and the opex would be ridiculous.

vineyardmike · on Nov 17, 2022

> The idea that you could run these kinds of ML inference tasks is economically fanciful. You would need a huge investment in hardware and the opex would be ridiculous.

Google, Apple, Amazon and even Sonos are all releasing voice assistants that work locally on their relatively low powered speakers.

Apple seems to be ahead with what is local, while Google seems to be the smartest. (Sonos doesn’t have a cloud, but it’s not ‘general purpose’ afaik).

Sure you can’t amortize them across a bunch of TPUs BUT instead they can ship custom hardware. A tpu needs to be big and support parallel streams. A home server may only need to ever serve one stream. There are arduino style devices that can perform basic tensor flow audio models in real time now. And obviously most phones can perform this locally now, so depending on opinion that may be considered affordable.

arcturus17 · on Nov 15, 2022

I don’t think a $5 instance is enough for ML/AI workloads. You need something with a GPU.