Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLaMA-13B 4bit is around 7GB and beats GPT-3.5 175B in benchmarks.

It runs at acceptable speeds on a Macbook Air M1 or a $100 consumer video card.



There is a drop in performance after quantization, so initial benchmarks aren't relevant. Here are some benchmarks for quantized https://github.com/qwopqwop200/GPTQ-for-LLaMa




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: