Overview
Vox Overview
Vox is a local-first voice stack for macOS and iOS. It supports both speech-to-text (STT / ASR) and text-to-speech (TTS), and you can use it in two main ways:
- Embed mode: Apple apps link Vox’s Swift packages directly and keep speech in and speech out in process.
- Companion mode:
voxdexposes the runtime over local WebSocket JSON-RPC, andVoxBridge/voxbridgeexposes an HTTP bridge for browser clients when web or shared-process access is useful.
Main surfaces:
- Swift packages:
VoxCore,VoxEngine,VoxService,VoxBridgefor embedded Apple app integrations. voxd: Vox Companion, the Swift daemon. Warm-up, telemetry, bridge transport, shared-process coordination.@voxd/sdk: TypeScript SDK for Bun/Node and other companion-connected integrations. WebSocket JSON-RPC tovoxd.@voxd/client: Browser SDK. HTTP bridge to the Vox Companion for web apps.@voxd/cli: Node CLI. Health checks, model management, transcription, synthesis, voices, warm-up, and benchmarks.
Which surface to reach for
- Use the Swift packages when you are modifying a macOS or iOS app and want speech to stay in process.
- Use
@voxd/sdkwhen you want a local Bun or Node client to talk to the companion daemon over WebSocket JSON-RPC. - Use
@voxd/clientwhen you want a browser app to talk to the local HTTP bridge on the same Mac. - Use
@voxd/cliwhen you want operator commands, warm-up controls, voice listing, or reproducible benchmarks.
Why it exists
Many voice stacks hide lifecycle, warm-up, and latency. Vox tries to keep those parts visible:
- Model stays local
- STT and TTS stay explicit runtime capabilities instead of hidden backend switches
- Warm-up is an explicit API
- Latency dimensions (
clientId,route,modelId) are preserved - Runtime stays easy to inspect from the start
Repository layout
swift/: VoxCore, VoxEngine, VoxService, VoxBridge, voxd companionpackages/client/:@voxd/sdk(TypeScript SDK)packages/web-client/:@voxd/client(browser SDK)packages/cli/:@voxd/cli(Node CLI)docs/: Dewey source contentsite/: website and docs UI
Design principles
- Root cause over workaround.
- Warm-up is part of the product, not an implementation detail.
- Instrumentation is part of the API surface.
- Multi-client support stays visible in the protocol and telemetry.
How it fits together
Apple app teams embed the Swift packages directly and keep the same provider, warm-up, and telemetry semantics in process. Bun and Node tools use @voxd/sdk against voxd, browser apps use @voxd/client against the local HTTP bridge, and operators use vox to check health, list voices, transcribe, synthesize, warm models, and benchmark both speech paths. Companion mode lets voxd stay warm across browser integrations, local tools, and other shared-process clients while the bridge stays narrow and browser-facing.
Reference implementations
examples/macos-minimal/is a good standalone reference for the direct Apple embed path.packages/client/andpackages/web-client/are good companion-mode SDK references in the repo.
Workflows
These examples assume vox is on your PATH. In a repo checkout, replace vox with node packages/cli/dist/index.js after bun run build.
# Build and verify
bun install && bun run build
vox daemon start
vox doctor
# Speech to text
vox warmup start parakeet:v3
vox transcribe file --model parakeet:v3 --metrics --timestamps /tmp/sample.wav
# Text to speech
vox voices --model avspeech:system
vox speak --model avspeech:system --metrics "Hello from Vox"
# Compare warm-path performance
vox transcribe bench --model parakeet:v3 /tmp/sample.wav 5
vox speak bench --model avspeech:system "Hello from Vox" 5
vox perf dashboard --client vox-cli