Observability
Observability
Telemetry is built into the runtime for both transcription and synthesis. Each performance sample includes:
clientIdroutemodelIdvoiceIdfor synthesis routes when a voice is selectedoutcome- nested
metrics
Metrics
Transcription metrics:
fileCheckMsmodelCheckMsmodelLoadMsaudioLoadMsaudioPrepareMsinferenceMstotalMsaudioDurationMs
Synthesis metrics:
modelCheckMsmodelLoadMsvoiceResolveMssynthesisMstotalMsaudioDurationMsoutputBytescharacterCount
Derived values: realtimeFactor, warm vs cold from modelLoadMs, audio-to-text speed from audioDurationMs / inferenceMs, and text-to-audio speed from audioDurationMs / synthesisMs.
Storage
The runtime appends JSON lines to:
~/.vox/performance.jsonl
The CLI dashboard reads from this file. You can also export it to another metrics backend.
Current emitted performance routes:
transcribe.filetranscribe.livesynthesize.generatesynthesize.startSession
Cancelled synthesis sessions are recorded under synthesize.startSession with outcome: "cancelled".
Operator Commands
vox transcribe file --metrics /tmp/sample.wav
vox transcribe bench /tmp/sample.wav 5
vox speak --metrics "hello world"
vox speak bench "hello world" 5
vox perf dashboard
vox perf dashboard --client vox-cli
Reading the numbers
inferenceMs, synthesisMs, and totalMs measure different things.
- For ASR,
inferenceMsis how fast the hot model ran. - For TTS,
synthesisMsis how long audio generation took once the request was inside the model. totalMsis what the user experienced end-to-end.
Example sample
{
"timestamp": "2026-04-25T17:54:26Z",
"clientId": "menu-bar",
"route": "synthesize.generate",
"modelId": "avspeech:system",
"voiceId": "com.apple.voice.compact.en-US.Samantha",
"outcome": "ok",
"textLength": 14,
"metrics": {
"audioDurationMs": 1240,
"synthesisMs": 182,
"inferenceMs": 182,
"totalMs": 196
}
}
Dashboard tips
- Only compare clients when the prompt or audio shape is similar.
- Use
inferenceMsfor loaded-model ASR speed. - Use
synthesisMsfor loaded-model TTS speed. - Use
totalMsfor end-user latency. - Large
modelLoadMsspikes are warm-up events, not inference regressions.