Setup VoiceMode
Install and configure VoiceMode MCP for voice interactions in the target client.
Steps
- Install VoiceMode:
uvx voice-mode-install --yes
- Add the MCP server to the target client:
claude mcp add --scope user voicemode -- uvx --refresh voice-mode
- Configure local endpoints (Kokoro TTS + Whisper STT):
voicemode config set VOICEMODE_TTS_BASE_URLS http://127.0.0.1:8880/v1
voicemode config set VOICEMODE_STT_BASE_URLS http://127.0.0.1:2022/v1
voicemode config set VOICEMODE_PREFER_LOCAL true
voicemode config set VOICEMODE_ALWAYS_TRY_LOCAL true
This is critical. Without explicit _BASE_URLS, the default includes https://api.openai.com/v1 as fallback, which crashes with OPENAI_API_KEY errors even when local services are running.
- Verify installation:
claude mcp list
- Test voice mode:
- Restart the target client
- If the target runtime exposes a VoiceMode tool, use it to verify; otherwise restart the target client and confirm voice input/output there
First Run Note
Kokoro TTS may take 5+ minutes to load on first run while it downloads and initializes the model (~111MB). Check status with:
voicemode service kokoro status
Two MCP restarts required:
- After initial setup (step 5)
- After Kokoro model finishes downloading
Without the second restart, you may get "OpenAI API key" errors even with local config.
Configuration Options
Edit config with:
voicemode config edit
List all options:
voicemode config list
Key Settings
| Setting | Description |
|---|---|
VOICEMODE_PREFER_LOCAL |
Prefer local providers over cloud (true/false) |
VOICEMODE_ALWAYS_TRY_LOCAL |
Always attempt local providers first (true/false) |
VOICEMODE_SAVE_AUDIO |
Save audio files (true/false, default: false) |
VOICEMODE_WHISPER_MODEL |
Whisper model (tiny, base, small, medium, large-v2) |
VOICEMODE_KOKORO_DEFAULT_VOICE |
Default voice (e.g., af_sky) |
OPENAI_API_KEY |
Required only for cloud processing |
Provider Options
- Local-only (default, recommended): Set
VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1andVOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1(no API key needed) - Cloud-only: Set
OPENAI_API_KEYand set URLs tohttps://api.openai.com/v1 - Hybrid (local-first, cloud fallback): Set
OPENAI_API_KEYand set URLs tohttp://127.0.0.1:8880/v1,https://api.openai.com/v1(TTS) andhttp://127.0.0.1:2022/v1,https://api.openai.com/v1(STT)
Troubleshooting
- OpenAI API key error: Ensure
VOICEMODE_TTS_BASE_URLSandVOICEMODE_STT_BASE_URLSpoint to local endpoints only (step 3). ThePREFER_LOCALflag alone is NOT sufficient — it does not remove OpenAI from the fallback chain - Kokoro stuck "starting up": Wait 5+ mins on first run, or check logs:
voicemode service kokoro logs - macOS M3 crash: Known issue with ggml_metal - use CPU mode
- WSL audio issues: Install PulseAudio packages
- Slow transcription: Use GPU acceleration or smaller Whisper model
Improved Accuracy (Optional)
The default tiny model is fast but less accurate. For better transcription:
| Model | Size | Accuracy | Speed |
|---|---|---|---|
| tiny | 75MB | ~70% | Fastest |
| small | 466MB | ~82% | Fast |
| medium | 1.4GB | ~88% | Moderate |
voicemode config set VOICEMODE_WHISPER_MODEL small
# or for best accuracy:
voicemode config set VOICEMODE_WHISPER_MODEL medium
Restart Whisper service after changing:
voicemode service whisper restart
macOS Metal GPU Acceleration (Optional)
For significantly faster transcription on Apple Silicon, convert Whisper to Core ML:
Prerequisites
# Install whisper.cpp via Homebrew
brew install whisper-cpp
# Set Whisper directory
WHISPER_DIR=~/.voicemode/services/whisper
Steps
1. Download model
cd $WHISPER_DIR/models
./download-ggml-model.sh medium
2. Install Python dependencies
pip3 install torch coremltools openai-whisper ane_transformers
3. Convert to Core ML
cd $WHISPER_DIR
./models/generate-coreml-model.sh medium
4. Update config
voicemode config set VOICEMODE_WHISPER_MODEL medium
5. Restart Whisper
voicemode service whisper restart
Verification
# Check Core ML model exists
ls -la $WHISPER_DIR/models/ggml-medium-encoder.mlmodelc
When running, logs should show: GPU: Metal, Core ML: Enabled
Links
- GitHub: https://github.com/mbailey/voicemode
- Docs: https://voice-mode.readthedocs.io
- LiveKit Cloud: https://cloud.livekit.io