Skip to main content

LiveKit Server Overview

AI Agent

The livekit-server component contains Python AI agent workers that run inside LiveKit rooms. Each worker is an autonomous process that handles voice input from an ESP32 device, runs it through an LLM, and streams TTS audio back — all in real time. Workers are dispatched by the MQTT gateway when a device connects or switches mode.

Workers

FileAgent NamePortCharacter / Mode
workers/cheeko_worker.pycheeko-agent8081Main conversational companion (Cheeko)
workers/math_tutor_worker.pymath-tutor-agent8082Math Tutor game — arithmetic Q&A with Indian-themed stories
workers/riddle_solver_worker.pyriddle-solver-agent8085Riddle Solver game — riddles with hints
workers/word_ladder_worker.pyword-ladder-agent8086Word Ladder game — chain words by last/first letter

Services (src/services/)

FilePurpose
prompt_service.pyFetches agent prompts from Manager API (/config/agent-prompt) or falls back to config.yaml; also fetches model config (/config/agent-models) and extracts TTS configuration
analytics_service.pySends session start/end, game attempts, streaks, and media playback events to Manager API analytics endpoints
elevenlabs_tts_service.pyGenerates TTS audio via ElevenLabs API; used by cheeko_worker for rhyme/animal card playback
animal_audio_service.pyResolves local MP3 animal sound files by name (e.g. "Cow" → cow.mp3)
rhyme_cache_service.pyCaches ElevenLabs-generated rhyme audio to S3 and notifies firmware via data channel
mem0_service.pyInterfaces with Mem0 for long-term memory search and injection during conversation
unified_audio_player.pyPlays audio streams through the LiveKit session (used by game workers)

External AI Providers

CategoryProvider / PackageNotes
LLM + Voice (realtime)Google Gemini (livekit-plugins-google)gemini-2.5-flash-native-audio-preview-12-2025, voice Zephyr by default; all workers use google.realtime.RealtimeModel
Web SearchGoogle Search (google.tools.GoogleSearch)Attached to cheeko-agent session only
TTS (pre-synthesized)ElevenLabs (livekit-plugins-elevenlabs)Used in cheeko_worker for session.say() playback of rhymes and animal descriptions
TTS (configurable)Edge-TTS, OpenAI TTS, Groq TTSSelected per device via Manager API model config; handled in prompt_service.extract_tts_config()
STTDeepgram (livekit-plugins-deepgram)Available as plugin; actual STT in production is handled natively by Gemini Realtime
Vector SearchQdrant (qdrant-client) + sentence-transformersFor semantic content matching
MemoryMem0 (mem0ai==1.0.0)Long-term per-child memory storage and retrieval
VADSilero VAD (silero-vad==6.2.0)Voice activity detection
LoggingGrafana Loki (python-logging-loki)Centralized log shipping

Run Commands

cd main/livekit-server
pip install -r requirements.txt

# Main conversation agent
python workers/cheeko_worker.py dev

# Game workers
python workers/math_tutor_worker.py dev
python workers/riddle_solver_worker.py dev
python workers/word_ladder_worker.py dev

# Media API (music/story bots, separate FastAPI process)
python media_api.py

Each worker registers with the LiveKit SFU under its agent_name. The MQTT gateway dispatches jobs to the correct worker using the CHARACTER_AGENT_MAP in mqtt-gateway.js.

Environment Variables

VariableRequiredDescription
GOOGLE_API_KEYYesGemini Realtime API key (can also be set in config.yaml)
MANAGER_API_URLYesBase URL of Manager API (e.g. http://localhost:8002/toy)
MANAGER_API_SECRETYesBearer token for Manager API authentication
LIVEKIT_URLYesLiveKit server WebSocket URL
LIVEKIT_API_KEYYesLiveKit API key
LIVEKIT_API_SECRETYesLiveKit API secret
ELEVENLABS_API_KEYYes*ElevenLabs API key (*required if using ElevenLabs TTS)
ELEVENLABS_VOICE_IDNoDefaults to ecp3DWciuUyW7BYM7II1
MEM0_API_KEYNoMem0 memory service API key
QDRANT_URLNoQdrant cluster URL
QDRANT_API_KEYNoQdrant API key
CHEEKO_PORTNoOverride port for cheeko-agent (default 8081)
MATH_TUTOR_PORTNoOverride port for math-tutor-agent (default 8082)
RIDDLE_SOLVER_PORTNoOverride port for riddle-solver-agent (default 8085)
WORD_LADDER_PORTNoOverride port for word-ladder-agent (default 8086)