You can run llama-30b right now on high-end consumer hardware (RTX 3090+) using int4 quantization. With two GPUs, llama-65b is within reach. And even 30b is surprisingly good, although it's clearly not as well trained as ChatGPT specifically for dialog-like task setting.