Anyone tried running on a Mac M1 with 16GB RAM yet? I've never run higher than an 8GB model, but apparently this one is specifically designed to work well with 16 GB of RAM.
It works fine, although with a bit more latency than non-local models. However, swap usage goes way beyond what I’m comfortable with, so I’ll continue to use smaller models for the foreseeable future.
Hopefully other quantizations of these OpenAI models will be available soon.
M2 with 16GB: It's slow for me. ~13GB RAM usage, not locking up my mac, but took a very long time thinking and slowly outputting tokens.. I'd not consider this usable for everyday usage.
Update: I tried it out. It took about 8 seconds per token, and didn't seem to be using much of my GPU (MPU), but was using a lot of RAM. Not a model that I could use practically on my machine.