Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've heard that the M-series chips with metal do great on the whole small model with low latency front; but I have no practical experience doing this yet. I'm hoping to add some local LLM/STT function to my office without heating my house.

I'm uncertain as to whether any M series mac will be performant enough and the M1/M2 mac mini's specifically, or whether there are features in the M3/M4/M5 architecture that make it worth my while to buy new.

Are these incremental updates actually massive in the model performance and latency space, or are they just as small or smaller?



As someone who purchased their first M-series Mac this year (M4 pro), I've been thrilled to discover how well it does with local genAI tasks to produce text, code and images. For example openai/gpt-oss-20b runs locally quite well with 24GB memory. If I knew beforehand how performant the Mac would be for these kinds of tasks, I probably would have purchased more RAM in order to load larger models. Performance for genAI is a function of GPU, # of GPU cores, and memory bandwidth. I think your biggest gains are going from a base chip to a pro/max/ultra version with the greater gpu cores and greater bandwidth.


The M5 is a huge upgrade over the M4 for local inference. They advertise 400% and there is reason to believe this isn’t a totally BS number. They redo the GPU cores to avoid having to emulate certain operations at the core inner loop of LLM inference.

I have an M4 and it is plenty fast enough. But honestly the local models are just not anywhere near the hosted models in quality, due to the lower parameter count, so I haven’t had much success yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: