Sdk

SDK (Companion Client)

Use @voxd/sdk when you want a Bun or Node tool to talk to voxd over local WebSocket JSON-RPC. For Apple apps, embed the Swift packages directly. For web apps or browser extensions, use @voxd/client instead.

packages/client/ connects to voxd when you want out-of-process access to models, voices, warm-up, transcription, synthesis, and stage metrics.

Example

import { VoxClient } from "@voxd/sdk";

const client = new VoxClient({ clientId: "menu-bar" });

await client.connect();
await client.scheduleWarmup("parakeet:v3", 500);

const transcript = await client.transcribeFile("/tmp/sample.wav", "parakeet:v3");
const voices = await client.listVoices("avspeech:system");
const speech = await client.synthesize("Hello from Vox", {
  modelId: "avspeech:system",
  voiceId: voices[0]?.id,
  format: "wav",
});

console.log(transcript.text);
console.log(transcript.metrics?.inferenceMs);
console.log(transcript.words);
console.log(speech.audioBytes);
console.log(speech.metrics?.synthesisMs);

client.disconnect();

Client Identity

clientId is used to attribute latency by consumer, compare route-level behavior across integrations, and support multi-client workflows.

Main methods

interface VoxClientSurface {
  connect(): Promise<void>;
  disconnect(): void;
  doctor(): Promise<unknown>;
  listModels(): Promise<unknown>;
  listVoices(modelId?: string): Promise<unknown>;
  installModel(modelId?: string): Promise<unknown>;
  preloadModel(modelId?: string): Promise<unknown>;
  getWarmupStatus(modelId?: string): Promise<unknown>;
  startWarmup(modelId?: string): Promise<unknown>;
  scheduleWarmup(modelId?: string, delayMs?: number): Promise<unknown>;
  transcribeFile(path: string): Promise<FileTranscriptionResult>;
  synthesize(text: string, options?: SynthesisOptions): Promise<SynthesisResult>;
  getLiveSessionStatus(): Promise<LiveSessionStatus | null>;
  cancelLiveSession(sessionId?: string): Promise<{ cancelled: boolean; sessionId: string }>;
  createLiveSession(): Promise<unknown>;
}

File result shape

interface FileTranscriptionResult {
  modelId: string;
  text: string;
  elapsedMs: number;
  metrics?: TranscriptionMetrics;
  words: WordTiming[];
}

Synthesis result shape

interface SynthesisResult {
  modelId: string;
  voiceId: string;
  format: string;
  contentType: string;
  audio: Uint8Array;
  audioBytes: number;
  elapsedMs: number;
  metrics?: SynthesisMetrics;
}

Error handling

All client methods throw when voxd is unreachable, the model isn’t installed, or a transcription or synthesis request fails. Errors are plain Error instances, so check message for a human-readable description.

try {
  const result = await client.transcribeFile("/tmp/audio.wav");
} catch (err) {
  // Common causes:
  // - Companion not running: start with `vox daemon start`
  // - Model not installed: run `vox models install` first
  // - Voice mismatch: inspect `client.listVoices(modelId)`
  // - Request failed: daemon logs have details (`vox logs daemon`)
  console.error(err.message);
}

For live sessions, call session.cancel() in a finally block to ensure the microphone is always released:

const session = await client.createLiveSession();
try {
  // ...use session
} finally {
  await session.cancel();
}

Configuration

const client = new VoxClient({
  clientId: "menu-bar",    // stable identity for telemetry
  port: 42137,             // override the `companion-ws` daemon port
  host: "127.0.0.1",       // override daemon host
});

On the daemon side, set VOX_PORT or VOX_HOST environment variables to override defaults. VOX_PORT controls the companion-ws daemon port discovered from ~/.vox/runtime.json.

Integration advice

embed Swift directly for macOS and iOS apps; use @voxd/sdk when you want Vox Companion access from JS or tooling
use a stable clientId per product surface such as menu-bar, browser-extension, or vox-cli
warm on intent, not on every keystroke
call listVoices(modelId) before pinning a TTS voice in product code
benchmark with representative audio clips and read inferenceMs separately from totalMs
preserve raw transcription and synthesis metrics in your own telemetry if the app already exports traces

Sdk