Tested models

The DVAI-Bridge Capacitor plugins are model-agnostic. Anything in GGUF form runs on capacitor-llama. Anything in MediaPipe .task form runs on capacitor-mediapipe. This page lists the specific models we exercise in CI and pre-release smoke tests, organized by tier of verification effort.

NOTE

Some entries below carry a placeholder tag — the upstream name wasn't finalized when this document was written. Verify the exact HF revision before pinning into your app.

Tier 1 — development (per-PR smoke)

Cached on the self-hosted CI runner. Loaded on every PR-level smoke run on iOS and Android.

Backend	Model	Format	~Size	Notes
`capacitor-llama`	`Llama-3.2-1B-Instruct`	GGUF Q4_K_M	~770 MB	Text completion baseline. `contextSize: 2048`, `maxTokens: 256`.
`capacitor-llama` (embeddings)	`bge-small-en-v1.5`	GGUF Q8_0	~133 MB	Started with `embeddingMode: true`.
`capacitor-mediapipe`	`gemma-3n-E2B-it` (placeholder if `gemma-4-E2B-it` not yet shipped)	`.task`	~1.5 GB	Vision-capable variant when the E2B `.task` artifact is published.
`capacitor-foundation`	Apple-managed (implicit on iOS 26+)	—	0 (zero-download)	No `modelPath`. Real-device required for full coverage; Simulator coverage is limited.

Runtime memory is roughly 1.3–1.6× the on-disk size for GGUF — working buffers plus KV cache scaled by contextSize. .task artifacts are closer to 1.1–1.3× because MediaPipe manages memory differently.

Tier 2 — pre-release (manual, fuller coverage)

Run against a real device in the week before tagging a release.

Backend	Model	Format	~Size	Notes
`capacitor-llama`	`Qwen2.5-1.5B-Instruct` (or `Qwen3.5-1.5B` once shipped)	GGUF Q4_K_M	~1.0 GB	Multilingual smoke.
`capacitor-llama`	`Phi-3.5-mini-instruct`	GGUF Q4_K_M	~2.3 GB	Long-context (128k) smoke; cap to `contextSize: 8192` on phones.
`capacitor-llama`	`Llama-3.2-3B-Instruct`	GGUF Q4_K_M	~2.0 GB	Larger-text-only smoke.
`capacitor-llama` (vision)	`gemma-4-E2B-it` (placeholder) + matching mmproj	GGUF Q4_0 + mmproj	~1.5 GB pair	Image content parts; requires `mmprojPath`.
`capacitor-llama` (vision flagship)	`gemma-4-E4B-it` (placeholder) + mmproj	GGUF Q4_K_M + mmproj	~3.5 GB pair	Higher-quality vision; phone-RAM-class.
`capacitor-llama` (audio)	`Phi-4-Multimodal-Instruct`	GGUF Q4_K_M	~7 GB	Audio + image; tablet / high-RAM phone only.
`capacitor-mediapipe`	`gemma-2b-it`	`.task`	~1.3 GB	Reference MediaPipe artifact from Google's `tasks-genai` distribution.

Tier 3 — backend-specific real-device

Backend	Notes
`capacitor-foundation`	Apple's curated 3B-class model. Auto-loaded on iOS 26+. Cannot run on Simulator beyond a thin smoke; tier requires a real device with the system download present.
`capacitor-mediapipe` (vision)	Vision-enabled Gemma `.task` variants from `tasks-genai`. Requires `visionEnabled: true` at session build-time; we wire this through `StartOptions`.

Recommended `start()` parameters by class

Model class	`contextSize`	`gpuLayers`	`maxTokens`
1B text	2048–4096	99 (full)	256
1.5–3B text	2048	99	256
1.5B vision (image)	4096	99	384
3.5B vision (image)	4096	99	384
7B multimodal (audio + image)	4096	99	384
Embeddings	512	99	n/a

gpuLayers: 99 means "request maximum offload" — llama.cpp decides what fits. Lowering it manually is rarely needed. Raise the floor only if you see the device thermal-throttling under sustained inference.

What we deliberately don't verify

Output quality. No BLEU, no benchmark accuracy. These tests check mechanics — load, respond, stream, free — not whether the model is good.
Specific tokens-per-second figures. They vary too widely across device tiers for a per-PR gate.
Long-running soak (>60s). Tier 2 covers a single round-trip per model. Longer sessions are an explicit user-test exercise pre-launch.

Curating this list

This file is the authoritative reference. Updates are docs-only commits. When upstream renames or removes a model, update both the entry above and the matching pinned URL and sha256 in your app's distribution config.

Tested models ​

Tier 1 — development (per-PR smoke) ​

Tier 2 — pre-release (manual, fuller coverage) ​

Tier 3 — backend-specific real-device ​

Recommended start() parameters by class ​

What we deliberately don't verify ​