Native LLM (Capacitor)

DVAI-Bridge ships first-party Capacitor plugins. They run a local OpenAI-compatible HTTP server inside your iOS or Android app, fronted by a native inference backend.

TIP

Don't need Capacitor? Native iOS, native Android, and React Native consumers have direct paths that skip the Capacitor wrapper. Pick the right guide for your stack:

SwiftUI / UIKit: iOS Native SDK — @dvai-bridge/ios
Compose / Views / KMP: Android Native SDK — @dvai-bridge/android
React Native ≥ 0.77 (Bridgeless ON): React Native SDK — @dvai-bridge/react-native

All three expose the same 8-method DVAIBridge API and serve the same OpenAI HTTP surface. The Capacitor packages on this page remain the right choice for hybrid apps and for RN ≤ 0.73 consumers.

NOTE

The previous llama-cpp-capacitor integration is deprecated. The replacement is the three-package family documented here. See Migration notes at the bottom of this page.

Architecture

The Capacitor surface splits across five packages.

@dvai-bridge/capacitor              ← JS routing shim (no native code)
  ├─ @dvai-bridge/capacitor-llama        ← native: llama.cpp
  ├─ @dvai-bridge/capacitor-foundation   ← native: Apple Foundation Models (iOS 26+)
  ├─ @dvai-bridge/capacitor-mediapipe    ← native: MediaPipe LLM (Android)
  └─ @dvai-bridge/capacitor-mlx          ← native: MLX (Apple Silicon, iOS 17+)

Install the shim plus one or more backend plugins. The shim chooses which backend's start() to call based on StartOptions.backend.

How a backend plugin boots

JS calls DVAIBridge.start({ backend, modelPath, ... }).
The shim looks up the registered native plugin (DVAIBridgeLlama, DVAIBridgeFoundation, DVAIBridgeMediaPipe, or DVAIBridgeMLX) and dispatches.
The native side loads the model (mmap on iOS and Android), opens an HTTP server bound to 127.0.0.1:<port>, and starts the /v1/* route handlers.
The promise resolves with { baseUrl, port, backend, modelId }. baseUrl looks like http://127.0.0.1:38883/v1.

What runs on which platform

Plugin	iOS	Android
`capacitor-llama`	Swift + ObjC++ wrapping `llama.cpp`	Kotlin + JNI wrapping `libllama.so`
`capacitor-foundation`	Swift wrapping `LanguageModelSession`	Stub: returns `iOS-only` error
`capacitor-mediapipe`	Stub: returns `Android-only` error	Kotlin wrapping MediaPipe `LlmInference`
`capacitor-mlx`	Swift wrapping `mlx-swift-lm` (Apple Silicon only)	Stub: returns `iOS-only` error

The HTTP server library differs per OS.

iOS — Hummingbird 2.x, built on swift-nio. Replaced Telegraph in v3.2.0 to enable SSE streaming through the offload proxy and fix a clang-module collision.
Android — NanoHTTPD.

Both serve the OpenAI-compatible surface — /v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings where applicable — plus preflight CORS. The full route table lives in the Phase 1 design spec in the source tree.

Setup

See the Capacitor quickstart for end-to-end install + first-run code, including:

Package install + npx cap sync flow.
DVAIBridge.start() minimal example.
Streaming SSE consumption.
The downloadModel() helper for sha256-verified GGUF caching.
Common errors and fixes.

Choosing a backend

Goal	Backend
Run any GGUF model, broadest selection	`llama`
Zero-download text on iOS 26+	`foundation`
Vision-capable Gemma on Android	`mediapipe`
Embeddings	`llama` with `embeddingMode: true`
MLX-converted HF models on Apple Silicon	`mlx`

MLX backend

@dvai-bridge/capacitor-mlx loads MLX-converted HuggingFace models via mlx-swift-lm. The modelPath start option is the HuggingFace model id — not a local path. For example, mlx-community/Llama-3.2-1B-Instruct-4bit. The first call downloads the weights into the user's local HF cache (~/Library/Caches/...). Subsequent calls hit the cache.

import DVAIBridge from "@dvai-bridge/capacitor";
const result = await DVAIBridge.start({
  backend: "mlx",
  modelPath: "mlx-community/Llama-3.2-1B-Instruct-4bit",
});
// result.baseUrl is "http://127.0.0.1:38883/v1"

Constraints:

iOS 17+ at link time — the @dvai-bridge/ios-mlx-core package's Package.swift floor. @dvai-bridge/capacitor-mlx's ios podspec inherits this minimum.
Apple Silicon only at runtime. MLX uses Apple's GPU + Neural Engine through Metal Performance Shaders. iOS Simulator on Intel Macs has no MLX device — start() will throw. Real devices and iOS Simulator on Apple-Silicon Macs work.
The first run downloads model weights from HuggingFace Hub. A typical 4-bit quantized 1B model is ~700 MB. 8B models are several GB. No HF token is needed for public model repos.
embeddings() is not implemented on the MLX backend in this release. Use llama with embeddingMode: true for embeddings.

CocoaPods consumers: the MLX backend is currently SwiftPM-only. mlx-swift-lm's transitive Swift packages don't publish CocoaPods specs. Consumers using pod install should pick llama or coreml. SwiftPM consumers (Package.swift with the Capacitor SwiftPM shim) get full MLX support.

See Multimodal for the per-backend image and audio support matrix.

Migration from `llama-cpp-capacitor`

Apps using the deprecated llama-cpp-capacitor plugin should migrate to the new three-plugin family. The new API is OpenAI-compatible — you call HTTP, not bridge methods, after start() returns.

Before (deprecated)

import { LlamaCpp } from "llama-cpp-capacitor";

const ctx = await LlamaCpp.init({
  modelPath: "models/llama-2-7b.Q4_K_M.gguf",
  contextSize: 2048,
});

const reply = await LlamaCpp.completion(ctx.handle, {
  prompt: "Why is the sky blue?",
  maxTokens: 256,
});

After (new)

import { DVAIBridge } from "@dvai-bridge/capacitor";

const { baseUrl, modelId } = await DVAIBridge.start({
  backend: "llama",
  modelPath: "/data/.../llama-3.2-1b.gguf",
  contextSize: 2048,
});

const res = await fetch(`${baseUrl}/chat/completions`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: modelId,
    messages: [{ role: "user", content: "Why is the sky blue?" }],
    max_tokens: 256,
  }),
});
const data = await res.json();
const reply = data.choices[0].message.content;

API mapping

`llama-cpp-capacitor`	New equivalent
`LlamaCpp.init({ modelPath, contextSize, gpuLayers, threads })`	`DVAIBridge.start({ backend: "llama", modelPath, contextSize, gpuLayers, threads })`
`LlamaCpp.completion(handle, { prompt, maxTokens, … })`	`POST ${baseUrl}/chat/completions` (or `/completions`)
`LlamaCpp.tokenize(handle, text)`	Not yet exposed — file an issue if needed.
`LlamaCpp.embedding(handle, text)`	`POST ${baseUrl}/embeddings` after `start({ embeddingMode: true })`.
`LlamaCpp.release(handle)`	`DVAIBridge.stop()` (idempotent).
`nativeModelPath` core option	`modelPath` on `start()`.

Differences worth flagging:

The new API does not ship a default nativeModelPath resolver pointing at public/models/. Pass an absolute path — typically from downloadModel() — or assemble one yourself from Filesystem.
llama.cpp GPU-layers default is now 99 (request maximum offload). The runtime decides what's feasible.
The HTTP boundary lets you swap in any OpenAI-compatible client — Vercel AI SDK, LangChain, the openai SDK with a baseURL override — without changing application code when you switch to a hosted endpoint later.

Native LLM (Capacitor) ​

Architecture ​

How a backend plugin boots ​

What runs on which platform ​

Setup ​

Choosing a backend ​

MLX backend ​

Migration from llama-cpp-capacitor ​

Before (deprecated) ​

After (new) ​

API mapping ​

See also ​