Architecture
Architecture
Layers
VoxCore
Shared runtime types and utilities:
- runtime metadata
- transcription and synthesis metrics
- performance samples
- filesystem paths
- trace utilities
VoxEngine
Model-facing speech layer:
- model installation and preload
- ASR provider routing and audio preparation
- annotation provider routing and speaker-attribution contracts
- TTS provider routing and voice discovery
- Parakeet inference
- AVSpeech, OpenAI, and external synthesis backends
- stage-level timing
VoxService
Daemon-side orchestration:
- JSON-RPC bridge
- annotation route dispatch
- live session coordination
- synthesis session coordination
- microphone recording
- warm-up scheduling
- performance sample recording
TypeScript SDK
@voxd/sdk: health, models, voices, warm-up, file transcription, synthesis, live sessions, metrics parsing.
Browser SDK
@voxd/client: probe, transcribe, align, live sessions over the HTTP bridge.
Companion bridge
VoxBridge / voxbridge: browser-facing HTTP bridge that proxies into the companion daemon while keeping the browser surface narrower than the underlying WebSocket RPC runtime.
CLI
@voxd/cli: operator tool. Doctor, daemon lifecycle, model management, voices, transcription, synthesis, benchmarks, dashboards.
Ownership
| Surface | Owns |
|---|---|
| Swift runtime | Daemon lifecycle, audio prep, model lifecycle, provider routing, transcription, annotation, synthesis, perf recording |
| TypeScript SDK | Connection lifecycle, typed request/response shapes, live-session ergonomics, transcription and synthesis metric parsing |
| Browser SDK | Companion discovery, audio upload, job polling, live sessions over HTTP bridge |
| CLI | Operator commands, terminal output (human and machine), warm-up controls, transcription, synthesis, dashboards |
| Site and docs | Architecture docs, onboarding, OG images, landing page |
Data flow
- Client creates a connection with a stable
clientId - CLI or SDK issues JSON-RPC to
voxd, while browser clients reach the same runtime throughVoxBridge VoxServicecoordinates model state and route dispatchVoxEngineprepares ASR input, annotation input, or TTS requests and dispatches them to the selected providerVoxCoretypes and trace utilities shape the result- Runtime appends tagged performance samples for local inspection