Skip to content

Architecture

Talkative Lobster is an Electron app with a React renderer. The main process handles voice processing, and the renderer displays the UI.

Process Overview

┌─────────────────────────────────────────────────┐
│ Main Process                                    │
│                                                 │
│  orchestrator.ts ── coordinates all engines      │
│       │                                         │
│       ├── voice-machine.ts (xstate)             │
│       │     idle → listening → processing       │
│       │     → thinking → speaking               │
│       │                                         │
│       ├── stt-engine.ts                         │
│       │     ElevenLabs / Whisper / whisper.cpp   │
│       │                                         │
│       ├── tts/ (provider implementations)       │
│       │     ElevenLabs / VOICEVOX               │
│       │     / Kokoro / Piper                    │
│       │                                         │
│       └── openclaw-client.ts                    │
│             WebSocket → OpenClaw gateway         │
│                                                 │
└────────────── IPC (contextBridge) ──────────────┘

┌───────────────────────┴─────────────────────────┐
│ Renderer Process (React 19)                     │
│                                                 │
│  App.tsx                                        │
│    ├── VoiceView ── main conversation UI        │
│    │     └── Waveform ── audio visualization    │
│    └── SetupModal ── first-run configuration    │
│                                                 │
│  hooks/                                         │
│    ├── useVoiceState ── voice machine state      │
│    ├── useTtsPlayback ── audio playback          │
│    └── useKeys ── encrypted key management      │
│                                                 │
└─────────────────────────────────────────────────┘

Data Flow

  1. Renderer captures microphone audio via @ricky0123/vad-web (Silero VAD)
  2. VAD detects speech start/end and sends audio chunks to main process via IPC
  3. STT engine converts audio to text using the configured provider
  4. Orchestrator sends transcribed text to OpenClaw gateway via WebSocket
  5. OpenClaw streams LLM response tokens back
  6. Speech filter processes the response text for TTS
  7. TTS engine synthesizes audio from the filtered text
  8. Renderer plays synthesized audio via useTtsPlayback

Voice State Machine

The voice state machine (voice-machine.ts) is built with xstate v5 and manages the conversation lifecycle:

StateDescription
idleWaiting for user to speak
listeningVAD detected speech, recording audio
processingSTT converting speech to text
thinkingWaiting for LLM response from OpenClaw
speakingTTS playing the AI response

Transitions happen automatically. The user can interrupt during speaking by starting to talk, which transitions back to listening.

Key Modules

ModulePathResponsibility
Orchestratorsrc/main/orchestrator.tsCentral IPC + engine coordination
Voice Machinesrc/main/voice-machine.tsxstate state machine for conversation flow
OpenClaw Clientsrc/main/openclaw-client.tsWebSocket client for LLM gateway
STT Enginesrc/main/stt-engine.tsMulti-provider speech-to-text
Speech Filtersrc/main/speech-filter.tsText processing before TTS
Keyssrc/main/keys.tsAPI key encryption (AES-256-CBC)
Settings Storesrc/main/settings-store.tsSettings persistence (JSON)
Piper TTSsrc/main/tts/piper-tts.tsLocal TTS via Piper
VOICEVOX TTSsrc/main/tts/voicevox-tts.tsJapanese TTS via VOICEVOX

Directory Structure

src/
  main/              # Electron main process
    orchestrator.ts   #   Central IPC + engine coordination
    voice-machine.ts  #   xstate state machine
    openclaw-client.ts#   WebSocket client for OpenClaw gateway
    stt-engine.ts     #   Multi-provider speech-to-text
    speech-filter.ts  #   Text processing before TTS
    keys.ts           #   API key encryption (AES-256-CBC)
    settings-store.ts #   Settings persistence (JSON)
    tts/              #   TTS provider implementations
    __tests__/        #   Unit tests
  preload/            # contextBridge (window.lobster API)
  renderer/           # React 19 UI
    hooks/            #   useVoiceState, useTtsPlayback, etc.
    components/       #   VoiceView, SetupModal, Waveform
  shared/             # Types and IPC channel definitions

Talkative Lobster