Runtime
Runtime
Core Flow
- A companion-connected client connects to
voxdover local WebSocket JSON-RPC, or reaches it indirectly through the local HTTP bridge. - The runtime resolves health, model state, and optional warm-up state.
- The client triggers file transcription, file annotation, a live ASR session, one-shot synthesis, or a synthesis session.
VoxEngineresolves the right ASR, annotation, or TTS provider and returns transcript text, speaker attribution, word timings, or WAV bytes plus stage metrics.- The runtime records a tagged performance sample to
~/.vox/performance.jsonl. - The daemon appends operational logs to
~/.vox/logs/voxd.log.
Warm-Up
Warm-up is a public API, not a hidden side effect. It applies to both ASR and TTS models.
warmup.status— check if the model is hotwarmup.start— warm immediatelywarmup.schedule— warm after a delay
Typical pattern: create a VoxClient with a stable clientId, warm when the user opens a voice affordance, then transcribe or synthesize once the model is ready.
File Transcription
transcribe.file is best for benchmarks because it takes mic capture out of the measurement. Returns transcript text, word-level timestamps, modelId, elapsed time, and stage metrics.
File Annotation
annotate.file is the file-first speaker annotation route. It accepts an audio path plus optional transcript text and word timings, and returns speaker segments, speaker-attributed words, and annotation metrics.
The route is scaffolded now so evaluation harnesses and future diarization backends can share one contract even before the default annotation backend lands.
Synthesis
synthesize.voices lists available voices for a TTS model.
synthesize.generate returns:
modelIdvoiceIdformatcontentType- base64-encoded audio bytes
- elapsed time
- synthesis metrics
Longer-running output flows use:
synthesize.startSessionsynthesize.sessionStatussynthesize.cancel
RPC Routes
The runtime exposes these RPC routes:
transcribe.fileannotate.filetranscribe.startSessiontranscribe.sessionStatustranscribe.stopSessiontranscribe.cancelSessionsynthesize.voicessynthesize.generatesynthesize.startSessionsynthesize.sessionStatussynthesize.cancelwarmup.statuswarmup.startwarmup.schedule
Performance routes
These are the route values currently emitted into performance.jsonl:
transcribe.fileannotate.filetranscribe.livesynthesize.generatesynthesize.startSession
Cancelled synthesis sessions are currently recorded as synthesize.startSession with outcome: "cancelled".
Live Sessions
Speech sessions are coordinated in VoxService. Session ownership ties to both connectionID and clientId. Stop and cancel are distinct operations. Final transcript events include metrics and word-level timestamps. Synthesis session status includes the selected model and voice. Active session state is inspectable for operator recovery.
Configuration
Ports and bind address are configurable via environment variables.
Vox uses two named companion ports:
| Role | Default | Transport | Env var | Stored |
|---|---|---|---|---|
companion-ws | 42137 | WebSocket | VOX_PORT | runtime.json |
companion-http | 43115 | HTTP | VOX_BRIDGE_PORT | process/env only |
companion-ws is the voxd daemon port that @voxd/sdk discovers through ~/.vox/runtime.json. companion-http is the browser-facing bridge port used by @voxd/client.
The bind host is shared across both surfaces:
| Variable | Default | Description |
|---|---|---|
VOX_PORT | 42137 | companion-ws daemon WebSocket port |
VOX_BRIDGE_PORT | 43115 | companion-http bridge port |
VOX_HOST | 127.0.0.1 | Bind address for both services |
VOX_HOME | ~/.vox | Runtime data directory |
CLI flag --port takes precedence over env vars for both voxd and voxbridge.
Important Swift entry points
swift/Sources/voxd/main.swiftswift/Sources/VoxService/VoxRuntimeService.swiftswift/Sources/VoxService/LiveSessionCoordinator.swiftswift/Sources/VoxService/SynthesisSessionCoordinator.swiftswift/Sources/VoxService/WarmupCoordinator.swiftswift/Sources/VoxEngine/ProviderRegistry.swiftswift/Sources/VoxEngine/TTSProviderRegistry.swift