I just open sourced a REST API and framework that you can point any of your projects to such as OpenClaw so you can do TTS and SST using your own hardware.
It's specially focused on Mac M-class chips to utilize MLX:
Internally it uses Parakeet, Kokoro TTS, Qwen3 TTS (with voice cloning support!)
Also supports creating your own API key to lock your API to your own apps.
I just open sourced a REST API and framework that you can point any of your projects to such as OpenClaw so you can do TTS and SST using your own hardware.
It's specially focused on Mac M-class chips to utilize MLX:
https://github.com/Sogni-AI/sogni-voice
All free and open source.
Internally it uses Parakeet, Kokoro TTS, Qwen3 TTS (with voice cloning support!) Also supports creating your own API key to lock your API to your own apps.