Oh you can run the Q8_0 / Q8_K_XL which is nearly equivalent to FP8 (maybe off b... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		danielhanchen 8 months ago \| parent \| context \| favorite \| on: Qwen3-Coder: Agentic coding in the world Oh you can run the Q8_0 / Q8_K_XL which is nearly equivalent to FP8 (maybe off by 0.01% or less) -> you will need 500GB of VRAM + RAM + Disk space. Via MoE layer offloading, it should function ok

summarity 8 months ago | [–]

This should work well for MLX Distributed. The low activation MoE is great for multi node inference.

ilaksh 8 months ago | [–]

1. What hardware for that. 2. Can you do a benchmark?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact