Yes! 3bit maybe 4bit can also fit! llama.cpp has MoE offloading so your GPU hold...

		danielhanchen 8 months ago \| parent \| context \| favorite \| on: Qwen3-Coder: Agentic coding in the world Yes! 3bit maybe 4bit can also fit! llama.cpp has MoE offloading so your GPU holds the active experts and non MoE layers, thus you only need 16GB to 24GB of VRAM! I wrote about how to do in this section: https://docs.unsloth.ai/basics/qwen3-coder#improving-generat...

awesome documentation, I'll try this. thank you!