Skip to main content

Architecture Overview

Architecture

System Diagram

┌─────────────────────────────────────────────────────────────────────┐
│ ESP32 Device (Firmware) │
│ - State machine (idle/connecting/listening/speaking/...) │
│ - MQTT client │
│ - UDP socket (AES-128-CTR encrypted Opus audio) │
│ - SD card (RFID skill cache) │
│ - RFID reader │
└──────────────┬──────────────────────────────┬───────────────────────┘
│ MQTT (publish/subscribe) │ UDP (audio packets)
▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ MQTT/UDP Gateway (Node.js) │
│ main/mqtt-gateway/ │
│ - EMQX MQTT broker bridge │
│ - UDP server (AES-128-CTR encrypted Opus audio) │
│ - VirtualMQTTConnection per device │
│ - LiveKitBridge (per device session) │
│ - Calls Manager API for device config/RFID lookups │
└──────────────┬──────────────────────────────┬───────────────────────┘
│ REST HTTP │ LiveKit SDK (WebRTC)
▼ ▼
┌─────────────────────────┐ ┌────────────────────────────────────┐
│ Manager API (Node.js) │ │ LiveKit Cloud + livekit-server │
│ main/manager-api-node │ │ workers/cheeko_worker.py │
│ - Device registry │ │ workers/math_tutor_worker.py │
│ - OTA check/activate │ │ workers/riddle_solver_worker.py │
│ - RFID card lookup │ │ workers/word_ladder_worker.py │
│ - Agent config/prompts │ └────────────────────────────────────┘
│ - Content manifest │
│ - Child profiles │
│ - Analytics │
└─────────────────────────┘

All device-to-server communication starts with the Manager API (OTA), then shifts to the MQTT Gateway for real-time protocol.

Service Port Map

ServiceLanguagePortBase PathNotes
manager-api-nodeNode.js / Express8002/toyActive implementation
mqtt-gatewayNode.jsMQTT + UDP bridge
livekit-serverPythonLiveKit agent workers
manager-webVue.jsAdmin dashboard
MQTT broker (EMQX)1883Device MQTT endpoint
Swagger / API Docs8002/toy/doc.htmlOpenAPI UI

Boot-to-Conversation Flow: 8 Phases

PhaseNameDescription
1OTA CheckDevice POSTs to /toy/ota/ — gets MQTT credentials, firmware info, activation status, server time
2Activation LoopIf not activated, device polls POST /toy/ota/activate until it receives 200 success
3MQTT Connect + HelloDevice connects to EMQX broker and publishes {"type":"hello"} — gateway responds with server hello containing UDP credentials
4UDP Channel SetupDevice opens UDP socket to the address in server hello; configures AES-128-CTR cipher
5Mode Update (Deferred)Gateway queries Manager API for device mode, character, and child profile; sends {"type":"mode_update"} to device
6Conversation LoopBidirectional voice: device streams mic audio uplink (UDP), gateway streams TTS audio downlink (UDP); MQTT carries control messages
7Abort / InterruptDevice sends {"type":"abort"} to interrupt the assistant mid-speech; gateway stops TTS and waits for next listen
8Session EndEither side sends {"type":"goodbye"}; device closes UDP channel and returns to idle without disconnecting MQTT
Device Boot


[Phase 1] POST /toy/ota/
│ ← Returns: mqtt creds, firmware info, activation status, server_time

├─ If firmware update available → download firmware → reboot

├─ If not activated → [Phase 2] POST /toy/ota/activate (loop)


[Phase 3] MQTT CONNECT (using OTA credentials)


Firmware publishes: {"type":"hello", ...} → Gateway


[Phase 4] Gateway returns: {"type":"hello", "udp":{server,port,key,nonce,...}, ...}


Firmware opens UDP socket to server:port


[Phase 5] Gateway sends (deferred): {"type":"mode_update", ...}
│ (after querying Manager API for device mode/character/profile)


[Phase 6] Conversation loop begins

├─ Firmware: {"type":"listen","state":"start","mode":"auto|manual|realtime"}
├─ Firmware→Gateway: UDP encrypted Opus audio packets (uplink)
├─ Firmware: {"type":"speech_end"}
├─ Gateway→Firmware: {"type":"llm","state":"think"}
├─ Gateway→Firmware: {"type":"tts","state":"start"}
├─ Gateway→Firmware: UDP encrypted Opus audio packets (downlink, 24kHz)
├─ Gateway→Firmware: {"type":"stt","text":"..."} (user transcript)
├─ Gateway→Firmware: {"type":"llm","text":"...","emotion":"..."}
└─ Gateway→Firmware: {"type":"tts","state":"stop"}

Component Responsibilities

ESP32 Firmware

  • Implements the device state machine (startingactivatingidleconnectinglisteningspeaking)
  • Manages MQTT connection lifecycle and publishes/subscribes to control topics
  • Sends mic audio as AES-128-CTR encrypted Opus frames over UDP (16kHz uplink)
  • Plays TTS audio received over UDP (24kHz downlink)
  • Handles RFID card tap events; maintains local SD card cache of content skills

mqtt-gateway

The gateway is the real-time protocol hub. It is organized into layers under main/mqtt-gateway/:

LayerDirectoryPurpose
Protocol handlersgateway/MQTT/UDP handlers: mqtt-gateway.js, udp-server.js, emqx-broker.js
LiveKit integrationlivekit/livekit-bridge.js, audio-processor.js, mcp-handler.js
Shared utilitiescore/opus-initializer.js, worker-pool-manager.js
Config / loggingutils/Logging, config management

For each device session the gateway maintains a VirtualMQTTConnection and a LiveKitBridge. It resamples uplink audio from 16kHz to 24kHz before forwarding to LiveKit.

manager-api-node

REST API serving both the gateway (config lookups) and the firmware (OTA). Base path /toy. Modules:

ModulePathRole
agentsrc/routes/agent.routes.jsAgent config and prompts per MAC
devicesrc/routes/device.routes.jsDevice registry, mode, character
contentsrc/routes/content.routes.jsMusic, stories, textbooks
rfidsrc/routes/rfid.routes.jsRFID card lookup and content manifest
security / authsrc/routes/auth.routes.jsUser authentication (Supabase Auth)
analyticssrc/routes/analytics.routes.jsGame sessions, media playback, usage stats
profilesrc/routes/profile.routes.jsChild profiles (mobile API)

livekit-server

Python-based LiveKit agent workers. Each worker handles a specific mode:

WorkerCharacterTriggered by
cheeko_worker.pyCheekoDefault / conversation mode
math_tutor_worker.pyMath TutorCharacter set to Math Tutor
riddle_solver_worker.pyRiddle SolverCharacter set to Riddle Solver
word_ladder_worker.pyWord LadderCharacter set to Word Ladder

The gateway resolves the active character (from Manager API) and dispatches the corresponding agent worker into the LiveKit room.

External Services

ServicePurpose
LiveKit CloudReal-time voice/video WebRTC infrastructure
Groq / GoogleLLM providers for conversation
ElevenLabs / Edge-TTSText-to-speech synthesis
Deepgram / WhisperSpeech-to-text transcription
QdrantVector search for semantic content matching
Mem0Memory and personalization across sessions
Grafana LokiCentralized log aggregation
SupabasePostgreSQL database and auth for manager-api-node
EMQXMQTT broker for device-to-gateway messaging
CI/CD

CircleCI (.circleci/config.yml) handles branch-specific deployments, Docker builds for each component, and EMQX broker deployment.