Skip to content

API Reference

The DVAI config and the types that go with it.

DVAIConfig

The object you pass when you build a DVAI instance.

PropertyTypeDefaultDescription
backend"webllm" | "transformers" | "native" | "auto""webllm"The inference engine to use. Use "auto" for intelligent environment selection.
modelIdstring"gemma-2-2b-it-q4f16_1-MLC"WebLLM specific model identifier.
transformersModelIdstring"onnx-community/gemma-3n-E2B-it-ONNX"HuggingFace model ID for Transformers.js.
pipelineTaskstring"text-generation"Pipeline task for Transformers.js (e.g., "text-generation", "feature-extraction", "image-text-to-text").
device"webgpu" | "cpu" | "auto""auto"Device for Transformers.js inference. "auto" detects WebGPU availability.
dtypestringQuantization for Transformers.js models (e.g., "q4", "q4f16", "q8", "fp16").
transformersModelClassstringName of a transformers.js export to load via ClassName.from_pretrained(modelId). Enables the declarative multimodal loader. Works in the worker AND on main-thread fallback. Leave unset to use the stock pipeline() factory. Example: "Gemma4ForConditionalGeneration".
transformersProcessorClassstring"AutoProcessor"Processor class name for the declarative loader. Only used when transformersModelClass is set.
transformersDisableEncodersstring[][]Model submodule fields to null out after load (e.g. ["vision_encoder"]). Purely declarative — the library nulls each named field if present; unknown/absent names are silently ignored. Host app controls this based on which modalities it actually uses.
createPipelineCreatePipelineFnCustom pipeline factory for models whose processor call signature the declarative loader can't express. Main-thread only — function closures don't cross the Worker boundary. See Custom Pipeline Factory.
mockUrlstring"https://api.openai.local/v1/chat/completions"The URL that MSW intercepts for OpenAI-compatible requests.
serviceWorkerUrlstring"/mockServiceWorker.js"Path to the MSW service worker script. Set to "" to disable MSW.
transformersWorkerUrlstring"/dvai-transformers.worker.js"Path to the Transformers.js inference worker. Set to "" to run on main thread.
webllmWorkerUrlstring"/dvai-webllm.worker.js"Path to the WebLLM inference worker.
nativeModelPathstringPath to the GGUF model file for the Native backend.
nativeGpuLayersnumber99Number of layers to offload to GPU in Native backend.
nativeThreadsnumber4Number of CPU threads for Native inference.
nativeContextSizenumber2048Context window size for Native backend.
nativeEmbeddingModebooleanfalseInitialize the native llama.cpp context in embedding mode. Required for /v1/embeddings on the native backend.
maxRetriesnumber2Number of automatic recovery attempts on fatal WebLLM errors.
generationTimeoutnumber60000Maximum time (ms) allowed for generation before timing out.
maxBlankChunksnumber20Abort streaming after this many consecutive empty chunks.
licenseKeyPathstringPath or URL to a DVAI-Bridge license JWT. Auto-discovered from dvai-license.jwt at platform-conventional locations when unset. See License setup.
licenseTokenstringInline DVAI-Bridge license JWT (full token string). Highest-priority discovery path; wins over licenseKeyPath and env vars. Useful for serverless / CI deployments. See License setup.
autoInitbooleantrueWhether to initialize the backend immediately on mount (React only).
transport"auto" | "msw" | "http" | "none""auto"Transport selection. "auto" picks MSW in browser, HTTP in Node.
httpBasePortnumber38883HTTP transport base port (retries +1 up to 16 times).
httpMaxPortAttemptsnumber16Max HTTP port fallback attempts before throwing.
corsOriginstring | string[]"*"HTTP Access-Control-Allow-Origin value or allowlist.
httpBindHoststring | undefined"127.0.0.1"v3.1+. Network interface to bind. Default loopback only — safe for single-device deployments. Set to "0.0.0.0" for LAN-target deployments (the v3.1 Hub, native SDKs running in target mode). Phone-as-source / single-device deployments should leave the default; a 0.0.0.0 bind without pairing protection exposes the OpenAI surface.
offloadOffloadConfig | undefinedundefinedPhase 3 (v3.0+) — distributed-inference / device-offload config. See OffloadConfig below + the Distributed Inference guide. When unset, the library behaves exactly as v2.x.
chatCompletionInterceptor(body, ctx, headers?) => Promise<Response | null> | undefinedundefinedv3.1+. First-chance hook for /v1/chat/completions. Return a Response to short-circuit; return null to fall through to the default local-backend handler. The Hub uses this to enforce its substitution policy + route through external engines. See Chat completion interceptor.

OffloadConfig (v3.0+)

Turns on peer discovery and inference offload. Leave it unset and the library behaves exactly like v2.x.

PropertyTypeDefaultDescription
enabledbooleanfalseMaster switch. Opt-in at v3.0; nothing changes when off.
discoverLANbooleantrueRun mDNS / DNS-SD to discover peers on the local network. Browsers skip (can't speak mDNS); native SDKs use the platform-native API.
minLocalCapabilitynumber10Estimated decode tok/s the local device must hit to run locally. Below this, the library looks for a peer.
rendezvousUrlstring | undefinedundefinedURL of a self-hosted rendezvous server. If unset, the internet path is disabled — only LAN works.
knownPeersPeer[] | undefinedundefinedPre-known peers (skip discovery). Useful for corporate device registries or persisted pairings.
onPairingRequest(peer: Peer) => Promise<boolean | { approved: true; pairingKey: string } | { approved: false }>deniesHook to surface a "Allow this device to pair?" UI to the user. Default: deny. v3.1+: the return type was widened to a tagged union — return { approved: true, pairingKey } when your host app maintains its own pairing state (e.g. the Hub's MultiTenantPairing) and wants the library to use that key instead of generating a fresh one. v3.0 boolean returns continue to work.
onOffload(peer: Peer) => voidno-opDiagnostic callback when a request is offloaded. Useful for analytics + UI feedback.
customDiscovery() => Promise<Peer[]>undefinedOptional plug-in for app-specific discovery (e.g. corporate device registry). Combined with mDNS + knownPeers.

Per-request override (X-DVAI-Offload header)

Header valueMeaning
prefer (default)Offload if local can't serve fast enough AND a faster peer exists.
neverAlways run locally, even if slow. Privacy-sensitive prompts; on-device-only requirements.
requireRefuse rather than fall back. Returns the structured no_capable_device error if no qualified peer is reachable.

Peer type

PropertyTypeDescription
deviceIdstringStable per-install peer device ID.
deviceNamestringHuman-readable hint (iOS device name, hostname).
dvaiVersionstringLibrary SemVer the peer is running.
baseUrlstringOpenAI-compatible base URL the peer's local server exposes.
appIdstring | undefinedv3.1+. Identifies which application on the peer device is making the request — used by multi-tenant targets (Hub) to isolate per-app state. Optional for v3.0 SDK back-compat (Hub falls back to deviceId).
loadedModelsstring[]Models the peer claims to have loaded.
capabilityRecord<string, number>Peer-reported {modelId → tok/s} map (advisory; verified before first use).
via"mdns" | "static" | "rendezvous" | "custom"Discovery source.
securebooleanWhether the peer's URL uses TLS.
lastSeenAtnumberUnix ms — discovery sources update this.

no_capable_device error response

No peer fast enough? The library returns an OpenAI-shaped error — HTTP 503 with Retry-After: 30.

json
{
  "error": {
    "type": "no_capable_device",
    "code": 503,
    "message": "No device with capability ≥ 10 tok/s for model … was reachable.",
    "checked": [
      { "deviceId": "self", "capabilityScore": 4.2, "reason": "below threshold" }
    ],
    "localCapability": 4.2,
    "requiredAtLeast": 10,
    "rendezvousConfigured": true,
    "pairedRemotePeers": 0,
    "requestId": "..."
  }
}

New DVAI instance methods (v3.0+)

MethodReturnsDescription
dvai.probeCapability()Promise<CapabilityScore | undefined>Run a 50-token cold-run against the active backend; persist the score per (modelId, libraryVersion). No-op if offload.enabled is false.
dvai.getCapability(modelId?)Promise<CapabilityScore | undefined>Return the cached probe score or a heuristic fallback. No-op if offload.enabled is false.
dvai.getPeers()Peer[]Snapshot of currently-discovered peers.

See the Distributed Inference guide for the full design and the flows that drive it.


CreatePipelineFn

A factory for loading models the built-in pipeline() can't reach. DVAI hands you the dynamically-imported @huggingface/transformers module and a context object. You return a PipelineCallable.

typescript
type CreatePipelineFn = (
	transformers: any,
	ctx: {
		modelId: string;
		device: "webgpu" | "wasm";
		dtype?: string;
		onProgress?: (info: any) => void;
	},
) => Promise<PipelineCallable>;

PipelineCallable

The function createPipeline returns. Takes chat messages and generation options. Returns the same shape Transformers.js pipelines return.

typescript
type PipelineCallable = (messages: any, options?: any) => Promise<any>;
// Expected return shape: [{ generated_text: string }]

ChatOptions

The options you pass to chatCompletion or createStreamingResponse.

PropertyTypeDescription
messagesChatMessage[]Array of { role: "user" | "assistant" | "system", content: string }.
streambooleanWhether to stream the response.
max_tokensnumberMaximum number of tokens to generate.
temperaturenumberSampling temperature (usually 0 to 1).
top_pnumberNucleus sampling threshold.

DVAIInstance (Core Class)

The methods you call on a DVAI instance.

initialize(onProgress?)

Boots the backend. Starts workers. Registers MSW handlers. Downloads and loads the model. Pass a progress callback if you want to show a loader.

chatCompletion(options)

Returns an OpenAI-shaped response object. Works the same way for stock pipeline models and for custom createPipeline models.

createStreamingResponse(options)

Returns a ReadableStream of OpenAI SSE chunks. On the Transformers.js backend, those chunks are real per-token output via TextStreamer — not word-by-word fakery.

embedding(inputs)

Returns embedding vectors (number[][]) for a string or an array of strings.

  • backend: "transformers" needs pipelineTask: "feature-extraction".
  • backend: "native" needs nativeEmbeddingMode: true.
  • WebLLM doesn't do embeddings — this throws.

runPipeline(inputs, options?)

Calls the underlying Transformers.js pipeline directly. Reach for it when you need non-chat tasks — image generation, ASR, anything else.

unload()

Tears the engine down. Frees memory and workers.

getActiveBackend()

Hands you the resolved backend instance.

Instance fields

  • dvai.baseUrl?: string — The URL to hand to any OpenAI SDK. undefined when transport="none".
  • dvai.port?: number — The bound HTTP port. HTTP transport only.

Methods

  • dvai.getBaseUrl(): string | undefined — Same value as dvai.baseUrl, in method form.
  • dvai.getPort(): number | undefined — Same value as dvai.port, in method form.
  • dvai.getActiveTransport(): "msw" | "http" | "none" — The transport DVAI picked during initialize().

OpenAI-Compatible Endpoints

DVAI-Bridge mounts MSW handlers for these endpoints. The base URL comes from mockUrl (default: https://api.openai.local/v1/chat/completions). If mockUrl ends in /chat/completions, its parent is the base — the siblings live next to it.

MethodEndpointNotes
POST/v1/chat/completionsFull chat API. Streaming supported on all backends.
POST/v1/completionsLegacy OpenAI completion endpoint. The prompt field is wrapped into a single user message and forwarded to /v1/chat/completions; the response is rewritten to the legacy text_completion shape. Streaming supported.
POST/v1/embeddingsReturns embeddings. Gated on backend: transformers + pipelineTask: "feature-extraction", or native + nativeEmbeddingMode: true. Returns 400 on WebLLM.
GET/v1/modelsReturns a single-entry list with the currently loaded model ID.

Distributed-inference plane (/v1/dvai/*, v3.0+; v3.1 wire fixes)

Only mounted when offload.enabled = true. v3.0 defined these handlers but never dispatched them. v3.1 wires them into the HTTP transport — they finally return JSON instead of 404.

MethodEndpointNotes
GET/v1/dvai/healthLiveness, version, uptime, currentModelId.
GET/v1/dvai/peersDiscovered LAN peers.
GET/v1/dvai/capabilityLocal capability cache (per-model tok/s).
POST/v1/dvai/probeRun a fresh capability probe against the active backend.
POST/v1/dvai/handshakeLAN-pairing handshake. v3.1 request body adds optional appId; response now echoes pairingKey + peerDeviceId so the requester can HMAC-sign subsequent calls.
POST/v1/dvai/pair-qrRendezvous QR-pair (v3.0 — partial; per-SDK glue is a v3.1 finalization item).
POST/v1/dvai/pair-scanRendezvous QR-scan (same status).

Handshake request shape (v3.1)

jsonc
POST /v1/dvai/handshake
{
  "peerDeviceId": "phone-pixel-9",
  "peerDeviceName": "Pixel 9",
  "appId": "com.acme.chat",        // v3.1+ optional; falls back to peerDeviceId
  "via": "lan-handshake"
}

Handshake response shape (v3.1)

jsonc
{
  "paired": true,
  "pairedAt": 1778184673778,
  "via": "lan-handshake",
  "pairingKey": "yxQwo0Xv9dws…",   // v3.1+ — base64url-encoded 256-bit HMAC secret
  "peerDeviceId": "phone-pixel-9"  // v3.1+ — echoed for confirmation
}

The pairing key travels back over the same Wi-Fi the handshake came in on — that's the LAN trust model. Rendezvous-QR pairings use ECDH key agreement and never touch this handler.

Identity-signed /v1/chat/completions (v3.1)

Once paired, a peer can sign its requests with four headers — and the audit log records who it was.

HeaderDescription
X-DVAI-Peer-Device-IdThe peer's deviceId (matches the handshake).
X-DVAI-App-IdThe peer's appId (matches the handshake).
X-DVAI-NoncePer-request nonce (any unique string).
X-DVAI-SignatureHex HMAC-SHA256(pairingKey, composeSignedMessage(nonce, "POST", "/v1/chat/completions", bodyJson)).

composeSignedMessage, signHmac, and verifyHmac are exported from the @dvai-bridge/core package root. Targets — like the Hub — check the signature against the stored pairing key:

  • All four headers present and the signature verifies — the audit row records the real appId and peerDeviceId.
  • All four absent — the legacy anonymous path runs. Audit logs appId: "anonymous". v3.0 SDKs land here.
  • Some headers but not all — 401 with the reason "all four or none".

The Hub interceptor rejects requests whose model name parses to family: "unknown" — that's the parser's sentinel for "I can't read this." Substituting one sentinel for another tells you nothing, so the Hub refuses.