Skip to content

Tested models

The DVAI-Bridge Capacitor plugins are model-agnostic. Anything in GGUF form runs on capacitor-llama. Anything in MediaPipe .task form runs on capacitor-mediapipe. This page lists the specific models we exercise in CI and pre-release smoke tests, organized by tier of verification effort.

NOTE

Some entries below carry a placeholder tag — the upstream name wasn't finalized when this document was written. Verify the exact HF revision before pinning into your app.

Tier 1 — development (per-PR smoke)

Cached on the self-hosted CI runner. Loaded on every PR-level smoke run on iOS and Android.

BackendModelFormat~SizeNotes
capacitor-llamaLlama-3.2-1B-InstructGGUF Q4_K_M~770 MBText completion baseline. contextSize: 2048, maxTokens: 256.
capacitor-llama (embeddings)bge-small-en-v1.5GGUF Q8_0~133 MBStarted with embeddingMode: true.
capacitor-mediapipegemma-3n-E2B-it (placeholder if gemma-4-E2B-it not yet shipped).task~1.5 GBVision-capable variant when the E2B .task artifact is published.
capacitor-foundationApple-managed (implicit on iOS 26+)0 (zero-download)No modelPath. Real-device required for full coverage; Simulator coverage is limited.

Runtime memory is roughly 1.3–1.6× the on-disk size for GGUF — working buffers plus KV cache scaled by contextSize. .task artifacts are closer to 1.1–1.3× because MediaPipe manages memory differently.

Tier 2 — pre-release (manual, fuller coverage)

Run against a real device in the week before tagging a release.

BackendModelFormat~SizeNotes
capacitor-llamaQwen2.5-1.5B-Instruct (or Qwen3.5-1.5B once shipped)GGUF Q4_K_M~1.0 GBMultilingual smoke.
capacitor-llamaPhi-3.5-mini-instructGGUF Q4_K_M~2.3 GBLong-context (128k) smoke; cap to contextSize: 8192 on phones.
capacitor-llamaLlama-3.2-3B-InstructGGUF Q4_K_M~2.0 GBLarger-text-only smoke.
capacitor-llama (vision)gemma-4-E2B-it (placeholder) + matching mmprojGGUF Q4_0 + mmproj~1.5 GB pairImage content parts; requires mmprojPath.
capacitor-llama (vision flagship)gemma-4-E4B-it (placeholder) + mmprojGGUF Q4_K_M + mmproj~3.5 GB pairHigher-quality vision; phone-RAM-class.
capacitor-llama (audio)Phi-4-Multimodal-InstructGGUF Q4_K_M~7 GBAudio + image; tablet / high-RAM phone only.
capacitor-mediapipegemma-2b-it.task~1.3 GBReference MediaPipe artifact from Google's tasks-genai distribution.

Tier 3 — backend-specific real-device

BackendNotes
capacitor-foundationApple's curated 3B-class model. Auto-loaded on iOS 26+. Cannot run on Simulator beyond a thin smoke; tier requires a real device with the system download present.
capacitor-mediapipe (vision)Vision-enabled Gemma .task variants from tasks-genai. Requires visionEnabled: true at session build-time; we wire this through StartOptions.
Model classcontextSizegpuLayersmaxTokens
1B text2048–409699 (full)256
1.5–3B text204899256
1.5B vision (image)409699384
3.5B vision (image)409699384
7B multimodal (audio + image)409699384
Embeddings51299n/a

gpuLayers: 99 means "request maximum offload" — llama.cpp decides what fits. Lowering it manually is rarely needed. Raise the floor only if you see the device thermal-throttling under sustained inference.

What we deliberately don't verify

  • Output quality. No BLEU, no benchmark accuracy. These tests check mechanics — load, respond, stream, free — not whether the model is good.
  • Specific tokens-per-second figures. They vary too widely across device tiers for a per-PR gate.
  • Long-running soak (>60s). Tier 2 covers a single round-trip per model. Longer sessions are an explicit user-test exercise pre-launch.

Curating this list

This file is the authoritative reference. Updates are docs-only commits. When upstream renames or removes a model, update both the entry above and the matching pinned URL and sha256 in your app's distribution config.