API Reference
The DVAI config and the types that go with it.
DVAIConfig
The object you pass when you build a DVAI instance.
| Property | Type | Default | Description |
|---|---|---|---|
backend | "webllm" | "transformers" | "native" | "auto" | "webllm" | The inference engine to use. Use "auto" for intelligent environment selection. |
modelId | string | "gemma-2-2b-it-q4f16_1-MLC" | WebLLM specific model identifier. |
transformersModelId | string | "onnx-community/gemma-3n-E2B-it-ONNX" | HuggingFace model ID for Transformers.js. |
pipelineTask | string | "text-generation" | Pipeline task for Transformers.js (e.g., "text-generation", "feature-extraction", "image-text-to-text"). |
device | "webgpu" | "cpu" | "auto" | "auto" | Device for Transformers.js inference. "auto" detects WebGPU availability. |
dtype | string | — | Quantization for Transformers.js models (e.g., "q4", "q4f16", "q8", "fp16"). |
transformersModelClass | string | — | Name of a transformers.js export to load via ClassName.from_pretrained(modelId). Enables the declarative multimodal loader. Works in the worker AND on main-thread fallback. Leave unset to use the stock pipeline() factory. Example: "Gemma4ForConditionalGeneration". |
transformersProcessorClass | string | "AutoProcessor" | Processor class name for the declarative loader. Only used when transformersModelClass is set. |
transformersDisableEncoders | string[] | [] | Model submodule fields to null out after load (e.g. ["vision_encoder"]). Purely declarative — the library nulls each named field if present; unknown/absent names are silently ignored. Host app controls this based on which modalities it actually uses. |
createPipeline | CreatePipelineFn | — | Custom pipeline factory for models whose processor call signature the declarative loader can't express. Main-thread only — function closures don't cross the Worker boundary. See Custom Pipeline Factory. |
mockUrl | string | "https://api.openai.local/v1/chat/completions" | The URL that MSW intercepts for OpenAI-compatible requests. |
serviceWorkerUrl | string | "/mockServiceWorker.js" | Path to the MSW service worker script. Set to "" to disable MSW. |
transformersWorkerUrl | string | "/dvai-transformers.worker.js" | Path to the Transformers.js inference worker. Set to "" to run on main thread. |
webllmWorkerUrl | string | "/dvai-webllm.worker.js" | Path to the WebLLM inference worker. |
nativeModelPath | string | — | Path to the GGUF model file for the Native backend. |
nativeGpuLayers | number | 99 | Number of layers to offload to GPU in Native backend. |
nativeThreads | number | 4 | Number of CPU threads for Native inference. |
nativeContextSize | number | 2048 | Context window size for Native backend. |
nativeEmbeddingMode | boolean | false | Initialize the native llama.cpp context in embedding mode. Required for /v1/embeddings on the native backend. |
maxRetries | number | 2 | Number of automatic recovery attempts on fatal WebLLM errors. |
generationTimeout | number | 60000 | Maximum time (ms) allowed for generation before timing out. |
maxBlankChunks | number | 20 | Abort streaming after this many consecutive empty chunks. |
licenseKeyPath | string | — | Path or URL to a DVAI-Bridge license JWT. Auto-discovered from dvai-license.jwt at platform-conventional locations when unset. See License setup. |
licenseToken | string | — | Inline DVAI-Bridge license JWT (full token string). Highest-priority discovery path; wins over licenseKeyPath and env vars. Useful for serverless / CI deployments. See License setup. |
autoInit | boolean | true | Whether to initialize the backend immediately on mount (React only). |
transport | "auto" | "msw" | "http" | "none" | "auto" | Transport selection. "auto" picks MSW in browser, HTTP in Node. |
httpBasePort | number | 38883 | HTTP transport base port (retries +1 up to 16 times). |
httpMaxPortAttempts | number | 16 | Max HTTP port fallback attempts before throwing. |
corsOrigin | string | string[] | "*" | HTTP Access-Control-Allow-Origin value or allowlist. |
httpBindHost | string | undefined | "127.0.0.1" | v3.1+. Network interface to bind. Default loopback only — safe for single-device deployments. Set to "0.0.0.0" for LAN-target deployments (the v3.1 Hub, native SDKs running in target mode). Phone-as-source / single-device deployments should leave the default; a 0.0.0.0 bind without pairing protection exposes the OpenAI surface. |
offload | OffloadConfig | undefined | undefined | Phase 3 (v3.0+) — distributed-inference / device-offload config. See OffloadConfig below + the Distributed Inference guide. When unset, the library behaves exactly as v2.x. |
chatCompletionInterceptor | (body, ctx, headers?) => Promise<Response | null> | undefined | undefined | v3.1+. First-chance hook for /v1/chat/completions. Return a Response to short-circuit; return null to fall through to the default local-backend handler. The Hub uses this to enforce its substitution policy + route through external engines. See Chat completion interceptor. |
OffloadConfig (v3.0+)
Turns on peer discovery and inference offload. Leave it unset and the library behaves exactly like v2.x.
| Property | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Master switch. Opt-in at v3.0; nothing changes when off. |
discoverLAN | boolean | true | Run mDNS / DNS-SD to discover peers on the local network. Browsers skip (can't speak mDNS); native SDKs use the platform-native API. |
minLocalCapability | number | 10 | Estimated decode tok/s the local device must hit to run locally. Below this, the library looks for a peer. |
rendezvousUrl | string | undefined | undefined | URL of a self-hosted rendezvous server. If unset, the internet path is disabled — only LAN works. |
knownPeers | Peer[] | undefined | undefined | Pre-known peers (skip discovery). Useful for corporate device registries or persisted pairings. |
onPairingRequest | (peer: Peer) => Promise<boolean | { approved: true; pairingKey: string } | { approved: false }> | denies | Hook to surface a "Allow this device to pair?" UI to the user. Default: deny. v3.1+: the return type was widened to a tagged union — return { approved: true, pairingKey } when your host app maintains its own pairing state (e.g. the Hub's MultiTenantPairing) and wants the library to use that key instead of generating a fresh one. v3.0 boolean returns continue to work. |
onOffload | (peer: Peer) => void | no-op | Diagnostic callback when a request is offloaded. Useful for analytics + UI feedback. |
customDiscovery | () => Promise<Peer[]> | undefined | Optional plug-in for app-specific discovery (e.g. corporate device registry). Combined with mDNS + knownPeers. |
Per-request override (X-DVAI-Offload header)
| Header value | Meaning |
|---|---|
prefer (default) | Offload if local can't serve fast enough AND a faster peer exists. |
never | Always run locally, even if slow. Privacy-sensitive prompts; on-device-only requirements. |
require | Refuse rather than fall back. Returns the structured no_capable_device error if no qualified peer is reachable. |
Peer type
| Property | Type | Description |
|---|---|---|
deviceId | string | Stable per-install peer device ID. |
deviceName | string | Human-readable hint (iOS device name, hostname). |
dvaiVersion | string | Library SemVer the peer is running. |
baseUrl | string | OpenAI-compatible base URL the peer's local server exposes. |
appId | string | undefined | v3.1+. Identifies which application on the peer device is making the request — used by multi-tenant targets (Hub) to isolate per-app state. Optional for v3.0 SDK back-compat (Hub falls back to deviceId). |
loadedModels | string[] | Models the peer claims to have loaded. |
capability | Record<string, number> | Peer-reported {modelId → tok/s} map (advisory; verified before first use). |
via | "mdns" | "static" | "rendezvous" | "custom" | Discovery source. |
secure | boolean | Whether the peer's URL uses TLS. |
lastSeenAt | number | Unix ms — discovery sources update this. |
no_capable_device error response
No peer fast enough? The library returns an OpenAI-shaped error — HTTP 503 with Retry-After: 30.
{
"error": {
"type": "no_capable_device",
"code": 503,
"message": "No device with capability ≥ 10 tok/s for model … was reachable.",
"checked": [
{ "deviceId": "self", "capabilityScore": 4.2, "reason": "below threshold" }
],
"localCapability": 4.2,
"requiredAtLeast": 10,
"rendezvousConfigured": true,
"pairedRemotePeers": 0,
"requestId": "..."
}
}New DVAI instance methods (v3.0+)
| Method | Returns | Description |
|---|---|---|
dvai.probeCapability() | Promise<CapabilityScore | undefined> | Run a 50-token cold-run against the active backend; persist the score per (modelId, libraryVersion). No-op if offload.enabled is false. |
dvai.getCapability(modelId?) | Promise<CapabilityScore | undefined> | Return the cached probe score or a heuristic fallback. No-op if offload.enabled is false. |
dvai.getPeers() | Peer[] | Snapshot of currently-discovered peers. |
See the Distributed Inference guide for the full design and the flows that drive it.
CreatePipelineFn
A factory for loading models the built-in pipeline() can't reach. DVAI hands you the dynamically-imported @huggingface/transformers module and a context object. You return a PipelineCallable.
type CreatePipelineFn = (
transformers: any,
ctx: {
modelId: string;
device: "webgpu" | "wasm";
dtype?: string;
onProgress?: (info: any) => void;
},
) => Promise<PipelineCallable>;PipelineCallable
The function createPipeline returns. Takes chat messages and generation options. Returns the same shape Transformers.js pipelines return.
type PipelineCallable = (messages: any, options?: any) => Promise<any>;
// Expected return shape: [{ generated_text: string }]ChatOptions
The options you pass to chatCompletion or createStreamingResponse.
| Property | Type | Description |
|---|---|---|
messages | ChatMessage[] | Array of { role: "user" | "assistant" | "system", content: string }. |
stream | boolean | Whether to stream the response. |
max_tokens | number | Maximum number of tokens to generate. |
temperature | number | Sampling temperature (usually 0 to 1). |
top_p | number | Nucleus sampling threshold. |
DVAIInstance (Core Class)
The methods you call on a DVAI instance.
initialize(onProgress?)
Boots the backend. Starts workers. Registers MSW handlers. Downloads and loads the model. Pass a progress callback if you want to show a loader.
chatCompletion(options)
Returns an OpenAI-shaped response object. Works the same way for stock pipeline models and for custom createPipeline models.
createStreamingResponse(options)
Returns a ReadableStream of OpenAI SSE chunks. On the Transformers.js backend, those chunks are real per-token output via TextStreamer — not word-by-word fakery.
embedding(inputs)
Returns embedding vectors (number[][]) for a string or an array of strings.
backend: "transformers"needspipelineTask: "feature-extraction".backend: "native"needsnativeEmbeddingMode: true.- WebLLM doesn't do embeddings — this throws.
runPipeline(inputs, options?)
Calls the underlying Transformers.js pipeline directly. Reach for it when you need non-chat tasks — image generation, ASR, anything else.
unload()
Tears the engine down. Frees memory and workers.
getActiveBackend()
Hands you the resolved backend instance.
Instance fields
dvai.baseUrl?: string— The URL to hand to any OpenAI SDK.undefinedwhentransport="none".dvai.port?: number— The bound HTTP port. HTTP transport only.
Methods
dvai.getBaseUrl(): string | undefined— Same value asdvai.baseUrl, in method form.dvai.getPort(): number | undefined— Same value asdvai.port, in method form.dvai.getActiveTransport(): "msw" | "http" | "none"— The transport DVAI picked duringinitialize().
OpenAI-Compatible Endpoints
DVAI-Bridge mounts MSW handlers for these endpoints. The base URL comes from mockUrl (default: https://api.openai.local/v1/chat/completions). If mockUrl ends in /chat/completions, its parent is the base — the siblings live next to it.
| Method | Endpoint | Notes |
|---|---|---|
POST | /v1/chat/completions | Full chat API. Streaming supported on all backends. |
POST | /v1/completions | Legacy OpenAI completion endpoint. The prompt field is wrapped into a single user message and forwarded to /v1/chat/completions; the response is rewritten to the legacy text_completion shape. Streaming supported. |
POST | /v1/embeddings | Returns embeddings. Gated on backend: transformers + pipelineTask: "feature-extraction", or native + nativeEmbeddingMode: true. Returns 400 on WebLLM. |
GET | /v1/models | Returns a single-entry list with the currently loaded model ID. |
Distributed-inference plane (/v1/dvai/*, v3.0+; v3.1 wire fixes)
Only mounted when offload.enabled = true. v3.0 defined these handlers but never dispatched them. v3.1 wires them into the HTTP transport — they finally return JSON instead of 404.
| Method | Endpoint | Notes |
|---|---|---|
GET | /v1/dvai/health | Liveness, version, uptime, currentModelId. |
GET | /v1/dvai/peers | Discovered LAN peers. |
GET | /v1/dvai/capability | Local capability cache (per-model tok/s). |
POST | /v1/dvai/probe | Run a fresh capability probe against the active backend. |
POST | /v1/dvai/handshake | LAN-pairing handshake. v3.1 request body adds optional appId; response now echoes pairingKey + peerDeviceId so the requester can HMAC-sign subsequent calls. |
POST | /v1/dvai/pair-qr | Rendezvous QR-pair (v3.0 — partial; per-SDK glue is a v3.1 finalization item). |
POST | /v1/dvai/pair-scan | Rendezvous QR-scan (same status). |
Handshake request shape (v3.1)
POST /v1/dvai/handshake
{
"peerDeviceId": "phone-pixel-9",
"peerDeviceName": "Pixel 9",
"appId": "com.acme.chat", // v3.1+ optional; falls back to peerDeviceId
"via": "lan-handshake"
}Handshake response shape (v3.1)
{
"paired": true,
"pairedAt": 1778184673778,
"via": "lan-handshake",
"pairingKey": "yxQwo0Xv9dws…", // v3.1+ — base64url-encoded 256-bit HMAC secret
"peerDeviceId": "phone-pixel-9" // v3.1+ — echoed for confirmation
}The pairing key travels back over the same Wi-Fi the handshake came in on — that's the LAN trust model. Rendezvous-QR pairings use ECDH key agreement and never touch this handler.
Identity-signed /v1/chat/completions (v3.1)
Once paired, a peer can sign its requests with four headers — and the audit log records who it was.
| Header | Description |
|---|---|
X-DVAI-Peer-Device-Id | The peer's deviceId (matches the handshake). |
X-DVAI-App-Id | The peer's appId (matches the handshake). |
X-DVAI-Nonce | Per-request nonce (any unique string). |
X-DVAI-Signature | Hex HMAC-SHA256(pairingKey, composeSignedMessage(nonce, "POST", "/v1/chat/completions", bodyJson)). |
composeSignedMessage, signHmac, and verifyHmac are exported from the @dvai-bridge/core package root. Targets — like the Hub — check the signature against the stored pairing key:
- All four headers present and the signature verifies — the audit row records the real
appIdandpeerDeviceId. - All four absent — the legacy anonymous path runs. Audit logs
appId: "anonymous". v3.0 SDKs land here. - Some headers but not all — 401 with the reason "all four or none".
The Hub interceptor rejects requests whose model name parses to family: "unknown" — that's the parser's sentinel for "I can't read this." Substituting one sentinel for another tells you nothing, so the Hub refuses.
