Skip to content

Configuration

All configuration is done through the Settings modal in the app.

Settings Modal

The Settings modal opens automatically on first launch. Reopen it anytime by clicking the gear icon in the main screen.

Gateway Settings

SettingDescriptionDefault
Gateway URLWebSocket URL for the OpenClaw gatewayws://127.0.0.1:18789
GATEWAY_TOKENAuthentication token for the gateway

The gateway connects to an OpenClaw instance that routes requests to your configured LLM.

STT Settings

SettingDescriptionDefault
STT ProviderSpeech-to-text engineElevenLabs Scribe
ELEVENLABS_API_KEYAPI key (ElevenLabs)
OPENAI_API_KEYAPI key (OpenAI Whisper)
whisper.cpp Binary PathPath to whisper-cli binary

See Providers for details on each provider.

TTS Settings

SettingDescriptionDefault
TTS ProviderText-to-speech engineElevenLabs
TTS Voice IDElevenLabs voicepFZP5JQG7iQjIQuC4Bku (Lily)
TTS ModelElevenLabs modeleleven_multilingual_v2
VOICEVOX URLVOICEVOX server URLhttp://localhost:50021
VOICEVOX Speaker IDVOICEVOX speaker1
Kokoro URLKokoro server URLhttp://localhost:8880
Kokoro VoiceKokoro voice IDjf_alpha
Piper Binary PathPath to piper binary
Piper Model PathPath to .onnx model

See Providers for details on each provider.

VAD Settings

SettingDescriptionDefault
Sensitivityauto or manual value (0.001–0.05)auto

See Voice Activity Detection below.

Connectivity Checks

Each section has a Test button that validates the connection:

  • Gateway: HTTP GET to the gateway URL (requires GATEWAY_TOKEN)
  • STT: Provider-specific check (API key validation or binary existence)
  • TTS: Provider-specific check (API key validation, server connectivity, or binary existence)

All three checks must pass before the Start button is enabled.


API Key Management

Storage

API keys are encrypted with AES-256-CBC and stored locally:

PlatformLocation
macOS~/Library/Application Support/TalkLob/keys.json
Windows%APPDATA%/TalkLob/keys.json
Linux~/.config/TalkLob/keys.json

Keys never leave your machine unencrypted.

Key Sources

Each key can be loaded from three sources (shown as buttons in the Settings modal):

SourceDescription
ManualType the key directly into the input field
OpenClawAuto-load from ~/.openclaw/openclaw.json
EnvRead from environment variables

OpenClaw Config Format

json
{
  "gateway": {
    "auth": { "token": "your-gateway-token" }
  },
  "env": {
    "ELEVENLABS_API_KEY": "your-elevenlabs-key",
    "OPENAI_API_KEY": "your-openai-key"
  }
}

Environment Variables

VariableUsed For
GATEWAY_TOKENOpenClaw gateway authentication
ELEVENLABS_API_KEYElevenLabs STT and TTS
OPENAI_API_KEYOpenAI Whisper STT

Voice Activity Detection

TalkLob uses Silero VAD (neural network-based) to detect when you start and stop speaking. No push-to-talk button needed.

Auto Calibration (Default)

When you enable the microphone, the app runs a 1.5-second calibration:

  1. Captures ambient noise via the microphone (with WebRTC echo cancellation enabled)
  2. Measures the median RMS (root mean square) noise level
  3. Maps the noise floor to a VAD threshold:
    • Quiet room (low RMS) → lower threshold → more sensitive
    • Noisy environment (high RMS) → higher threshold → less sensitive

During calibration, the status shows "Calibrating...".

Manual Sensitivity

If auto calibration doesn't work well for your environment, switch to manual mode in VAD Settings and adjust the slider. Lower values = more sensitive (picks up quieter speech), higher values = less sensitive (ignores more background noise).


Speaker Monitor

The speaker monitor captures system audio output to detect when media (YouTube, music, etc.) is playing. When system audio is detected, VAD is temporarily suppressed to prevent the AI from responding to non-speech sounds.

  • Uses Electron's desktopCapturer API
  • 800ms debounce to avoid rapid toggling
  • Fails gracefully if system audio capture is not available

Aizuchi (Backchanneling)

During the Thinking state (while waiting for the LLM response), TalkLob plays subtle audio cues to fill the silence and signal that the AI is processing. This mimics the Japanese conversational habit of "aizuchi" (相槌).

  • Initial delay: 1.5–2.5 seconds
  • Interval: 3–5 seconds between cues
  • Automatically stops when the AI starts speaking

Settings File

Settings are stored as JSON at ~/.config/lobster/settings.json. While you can edit this file directly, using the Settings modal is recommended.

json
{
  "gatewayUrl": "ws://127.0.0.1:18789",
  "sttProvider": "elevenlabs",
  "ttsProvider": "elevenlabs",
  "ttsVoiceId": "pFZP5JQG7iQjIQuC4Bku",
  "ttsModelId": "eleven_multilingual_v2",
  "voicevoxUrl": "http://localhost:50021",
  "voicevoxSpeakerId": 1,
  "kokoroUrl": "http://localhost:8880",
  "kokoroVoice": "jf_alpha",
  "vadSensitivity": "auto"
}

Talkative Lobster