Anyone tried running on a Mac M1 with 16GB RAM yet? I've never run higher than a...

thimabi · 2025-08-05T17:40:52 1754415652

It works fine, although with a bit more latency than non-local models. However, swap usage goes way beyond what I’m comfortable with, so I’ll continue to use smaller models for the foreseeable future.

Hopefully other quantizations of these OpenAI models will be available soon.

roboyoshi · 2025-08-05T20:30:47 1754425847

M2 with 16GB: It's slow for me. ~13GB RAM usage, not locking up my mac, but took a very long time thinking and slowly outputting tokens.. I'd not consider this usable for everyday usage.

pamelafox · 2025-08-05T17:50:40 1754416240

Update: I tried it out. It took about 8 seconds per token, and didn't seem to be using much of my GPU (MPU), but was using a lot of RAM. Not a model that I could use practically on my machine.

steinvakt2 · 2025-08-05T19:11:29 1754421089

Did you run it the best way possible? im no expert, but I understand it can affect inference time greatly (which format/engine is used)

pamelafox · 2025-08-05T19:18:26 1754421506

I ran it via Ollama, which I assume uses the best way. Screenshot in my post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol3...

I'm still wondering why my MPU usage was so low.. maybe Ollama isn't optimized for running it yet?

wahnfrieden · 2025-08-06T00:17:11 1754439431

Might need to wait on MLX

turnsout · 2025-08-05T19:17:32 1754421452

To clarify, this was the 20B model?

pamelafox · 2025-08-05T19:18:59 1754421539

Yep, 20B model, via Ollama: ollama run gpt-oss:20b

Screenshot here with Ollama running and asitop in other terminal:

https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol3...