Skip to main content
AI/MLWaterplanAI

ac-tools-setup-voice-mode

Installs and configures VoiceMode MCP for voice interactions in the target client. Triggers on keywords: setup voice, voice mode, install voicemode, configure voice

Stars
31
Source
WaterplanAI/agentic-config
Updated
2026-05-25
Slug
WaterplanAI--agentic-config--ac-tools-setup-voice-mode
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/WaterplanAI/agentic-config/HEAD/packages/pi-ac-tools/skills/ac-tools-setup-voice-mode/SKILL.md -o .claude/skills/ac-tools-setup-voice-mode.md

Drops the SKILL.md into .claude/skills/ac-tools-setup-voice-mode.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Setup VoiceMode

Install and configure VoiceMode MCP for voice interactions in the target client.

Steps

  1. Install VoiceMode:
uvx voice-mode-install --yes
  1. Add the MCP server to the target client:
claude mcp add --scope user voicemode -- uvx --refresh voice-mode
  1. Configure local endpoints (Kokoro TTS + Whisper STT):
voicemode config set VOICEMODE_TTS_BASE_URLS http://127.0.0.1:8880/v1
voicemode config set VOICEMODE_STT_BASE_URLS http://127.0.0.1:2022/v1
voicemode config set VOICEMODE_PREFER_LOCAL true
voicemode config set VOICEMODE_ALWAYS_TRY_LOCAL true

This is critical. Without explicit _BASE_URLS, the default includes https://api.openai.com/v1 as fallback, which crashes with OPENAI_API_KEY errors even when local services are running.

  1. Verify installation:
claude mcp list
  1. Test voice mode:
  • Restart the target client
  • If the target runtime exposes a VoiceMode tool, use it to verify; otherwise restart the target client and confirm voice input/output there

First Run Note

Kokoro TTS may take 5+ minutes to load on first run while it downloads and initializes the model (~111MB). Check status with:

voicemode service kokoro status

Two MCP restarts required:

  1. After initial setup (step 5)
  2. After Kokoro model finishes downloading

Without the second restart, you may get "OpenAI API key" errors even with local config.

Configuration Options

Edit config with:

voicemode config edit

List all options:

voicemode config list

Key Settings

Setting Description
VOICEMODE_PREFER_LOCAL Prefer local providers over cloud (true/false)
VOICEMODE_ALWAYS_TRY_LOCAL Always attempt local providers first (true/false)
VOICEMODE_SAVE_AUDIO Save audio files (true/false, default: false)
VOICEMODE_WHISPER_MODEL Whisper model (tiny, base, small, medium, large-v2)
VOICEMODE_KOKORO_DEFAULT_VOICE Default voice (e.g., af_sky)
OPENAI_API_KEY Required only for cloud processing

Provider Options

  • Local-only (default, recommended): Set VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8880/v1 and VOICEMODE_STT_BASE_URLS=http://127.0.0.1:2022/v1 (no API key needed)
  • Cloud-only: Set OPENAI_API_KEY and set URLs to https://api.openai.com/v1
  • Hybrid (local-first, cloud fallback): Set OPENAI_API_KEY and set URLs to http://127.0.0.1:8880/v1,https://api.openai.com/v1 (TTS) and http://127.0.0.1:2022/v1,https://api.openai.com/v1 (STT)

Troubleshooting

  • OpenAI API key error: Ensure VOICEMODE_TTS_BASE_URLS and VOICEMODE_STT_BASE_URLS point to local endpoints only (step 3). The PREFER_LOCAL flag alone is NOT sufficient — it does not remove OpenAI from the fallback chain
  • Kokoro stuck "starting up": Wait 5+ mins on first run, or check logs: voicemode service kokoro logs
  • macOS M3 crash: Known issue with ggml_metal - use CPU mode
  • WSL audio issues: Install PulseAudio packages
  • Slow transcription: Use GPU acceleration or smaller Whisper model

Improved Accuracy (Optional)

The default tiny model is fast but less accurate. For better transcription:

Model Size Accuracy Speed
tiny 75MB ~70% Fastest
small 466MB ~82% Fast
medium 1.4GB ~88% Moderate
voicemode config set VOICEMODE_WHISPER_MODEL small
# or for best accuracy:
voicemode config set VOICEMODE_WHISPER_MODEL medium

Restart Whisper service after changing:

voicemode service whisper restart

macOS Metal GPU Acceleration (Optional)

For significantly faster transcription on Apple Silicon, convert Whisper to Core ML:

Prerequisites

# Install whisper.cpp via Homebrew
brew install whisper-cpp

# Set Whisper directory
WHISPER_DIR=~/.voicemode/services/whisper

Steps

1. Download model

cd $WHISPER_DIR/models
./download-ggml-model.sh medium

2. Install Python dependencies

pip3 install torch coremltools openai-whisper ane_transformers

3. Convert to Core ML

cd $WHISPER_DIR
./models/generate-coreml-model.sh medium

4. Update config

voicemode config set VOICEMODE_WHISPER_MODEL medium

5. Restart Whisper

voicemode service whisper restart

Verification

# Check Core ML model exists
ls -la $WHISPER_DIR/models/ggml-medium-encoder.mlmodelc

When running, logs should show: GPU: Metal, Core ML: Enabled

Links