Skip to content

Providers

Talkative Lobster supports multiple speech-to-text (STT) and text-to-speech (TTS) providers. Choose the combination that best fits your needs.

Speech-to-Text (STT)

ProviderTypeLanguagesAPI KeyLatency
ElevenLabs ScribeCloudMultilingualRequiredLow
OpenAI WhisperCloudMultilingualRequiredLow
whisper.cppLocalMultilingualNot neededMedium

ElevenLabs Scribe

High-accuracy cloud STT using ElevenLabs' Scribe v2 model.

  • API key: Get one at elevenlabs.io
  • Model: scribe_v2 (fixed)
  • Timeout: 5 seconds
  • Best for: Production-quality multilingual transcription

OpenAI Whisper

OpenAI's cloud-hosted Whisper model (whisper-1).

  • API key: Get one at platform.openai.com
  • Timeout: 5 seconds
  • Best for: General-purpose transcription

whisper.cpp

Runs Whisper locally on your machine. No data leaves your device.

  • Binary: You need the whisper-cli binary — build from whisper.cpp or install via Homebrew: brew install whisper-cpp
  • Model: ggml-medium.bin — automatically downloaded to ~/.config/lobster/models/ on first use
  • Language: Hardcoded to Japanese (--language ja)
  • Timeout: 60 seconds
  • Best for: Privacy-conscious use, offline operation

STT Fallback

If multiple providers are configured with valid keys, the app tries providers in order (ElevenLabs → OpenAI → whisper.cpp) and uses the first successful result.


Text-to-Speech (TTS)

ProviderTypeLanguagesAPI KeyStreaming
ElevenLabsCloudMultilingualRequiredYes
VOICEVOXLocalJapaneseNot neededNo
KokoroLocalJapanese, EnglishNot neededNo
PiperLocalManyNot neededNo

ElevenLabs

High-quality cloud TTS with natural-sounding voices and real-time streaming.

  • API key: Same as ElevenLabs Scribe (ELEVENLABS_API_KEY)
  • Output format: PCM 24kHz, 16-bit mono (streamed)

Voices

VoiceID
MoriokiKnMBELSmBGHPqfZxMRw6
Lily (default)pFZP5JQG7iQjIQuC4Bku
AliceXb7hH8MSUJpSbSDYk0k2
MatildaXrExE9yKIg1WjnnlVkGX
SarahEXAVITQu4vr4xnSDxMaL
DanielonwK4e9ZLuTAKqWW03F9
BriannPczCjzI2devNBz1zQrb
GeorgeJBFqnCBsd6RMkjVDRZzb
LiamTX3LPaxmHKxFdv7VOQHJ

Models

ModelDescription
eleven_multilingual_v2Highest quality, multilingual (default)
eleven_turbo_v2_5Balanced quality and speed
eleven_flash_v2_5Fastest response time

VOICEVOX

Free, open-source Japanese TTS engine. Runs as a local HTTP server.

  • Download: voicevox.hiroshiba.jp
  • Server URL: Default http://localhost:50021
  • Speaker ID: Integer (default: 1). See VOICEVOX docs for available speakers.
  • Process: 2-step — audio_query then synthesis
  • Output format: WAV

WARNING

VOICEVOX must be running before you start TalkLob. The app connects to its HTTP API.

Kokoro

Lightweight local TTS supporting Japanese and English.

  • Server URL: Default http://localhost:8880
  • API: OpenAI-compatible (POST /v1/audio/speech)
  • Output format: MP3

Voices

VoiceLanguage
jf_alpha (default)Japanese
jf_gongitsuneJapanese
jf_nezumiJapanese
jf_tebukuroJapanese
jm_kumoJapanese
af_heartEnglish
af_jadziaEnglish
af_jessicaEnglish

Piper

Fast local TTS with broad language support. Runs entirely on your machine as a subprocess.

  • Binary: Download from Piper releases
  • Model: ONNX voice model file (.onnx)
  • Timeout: 30 seconds per synthesis
  • Output format: WAV

You need to set both the binary path and model path in Settings.

Talkative Lobster