I was able to get gpt-oss:20b wired up to claude code locally via a thin proxy a...

tarruda · 2025-08-05T22:44:11 1754433851

> Also I'm not sure if ollama supports a kv-cache between invocations of /v1/completions, which could help)

Not sure about ollama, but llama-server does have a transparent kv cache.

You can run it with

    llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none

Web UI at http://localhost:8080 (also OpenAI compatible API)