Handler parity
The three Capacitor backend plugins all expose the same OpenAI-compatible HTTP surface. They must produce byte-equivalent JSON shapes for the same input fixtures. This page documents the shared contract, the one known SSE-frame asymmetry, and the discipline rule that keeps drift from silently accumulating.
The shared contract
Each plugin ships a *Handlers type:
| Plugin | Handler type | Bridge dependency |
|---|---|---|
capacitor-llama | LlamaHandlers (Swift) / LlamaHandlers (Kotlin) | LlamaCppBridge / JNI shim |
capacitor-foundation | FoundationHandlers (Swift) | FoundationBridge (LanguageModelSession) |
capacitor-mediapipe | MediaPipeHandlers (Kotlin) | MediaPipeBridge (LlmInference) |
All three implement the same logical handlers:
handleChatCompletions—POST /v1/chat/completions(text + content parts).handleCompletions—POST /v1/completions(legacy single-string prompt).handleModels—GET /v1/models(returns the activemodelId).handleEmbeddings—POST /v1/embeddings(llama-only whenembeddingMode: true; the others return 400).
For a given fixture in fixtures/transport-fixtures.json, every implementation produces:
- The same HTTP status code.
- The same JSON keys at every level.
- The same error wording on documented error paths (see Multimodal § Error semantics).
Cross-language parity is enforced by handler-equivalence tests that all three platforms run against the same JSON file.
The legacy chatToLegacyCompletion adapter
Each plugin also implements two small adapters:
chatToLegacyCompletion— converts achat/completionsrequest body into acompletionsbody (single-string prompt).adaptChunkToLegacy— converts achat.completion.chunkSSE frame into atext_completionchunk frame.
These are currently duplicated across all three plugins. They are a candidate for extraction into a shared in-language module (Swift package shared between iOS plugins; Kotlin module shared between Android plugins). Tracked as a Phase 2 cleanup.
Per-plugin SSE asymmetry
There is one documented difference in how the three plugins frame SSE streams. All three are valid OpenAI-compatible streams; SDK clients tolerate both shapes.
LlamaHandlers (Swift + Kotlin) and FoundationHandlers (Swift)
Both emit a separate empty-delta finish frame at the end of the stream:
data: {"id":"…","choices":[{"delta":{"content":"final"},"finish_reason":null}]}
data: {"id":"…","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]LlamaHandlers mirrors llama.cpp's upstream convention. FoundationHandlers emits the synthetic finish frame after the last partial returned by session.streamResponse(to:) for shape parity with the llama side.
MediaPipeHandlers (Kotlin)
Folds the finish reason into the last content frame:
data: {"id":"…","choices":[{"delta":{"content":"final"},"finish_reason":"stop"}]}
data: [DONE]This mirrors how MediaPipe surfaces end-of-turn signals: the engine's ProgressListener callback receives (partial, done) where done == true on the last invocation, so the natural fold point is on that final partial. Inserting a synthetic empty-delta frame would mean buffering one partial behind, costing one token of latency for no behavioral win.
Why we don't normalize
Both shapes are emitted by real OpenAI-compatible servers in the wild. Forcing one shape would mean inserting a synthetic frame on the MediaPipe / Foundation side or buffering on the llama side — both add latency or complexity for no behavioral win. SDK clients (Vercel AI SDK, official openai SDK, LangChain) handle either shape.
If your application code parses raw SSE chunks and assumes a specific shape, normalize on the client side.
Error wording parity (spec §8.5)
The exact error strings below are asserted by parity tests across all three plugins. They will not change without a CHANGELOG entry.
| Situation | Wording |
|---|---|
| Image content part, no mmproj loaded | Request includes an image but no mmproj was loaded. Set nativeMmprojPath when starting. |
| Image content part on Foundation | Image input not supported by Apple Foundation Models in this version. |
| Audio content part, no audio encoder | Loaded model has no native audio encoder. Use a multimodal model like Gemma 4 or Phi-4 Multimodal. |
| Image fetch failure | Failed to fetch image: <reason> |
| Audio decode failure | Audio decode failed: <reason> |
| Unsupported audio format | Unsupported audio format: <fmt>. Supported on this platform: <list>. |
When you add a new error path, add the exact wording in all three handler implementations + a parity test that loads the same fixture and asserts each platform returns the same body.
The discipline rule
When you change any handler logic:
- Update the matching fixture in
fixtures/transport-fixtures.json(or add a new one). - Update all three handler implementations to match.
- Run all three platforms' parity test suites locally before committing — TS and Kotlin from any host, Swift via Mac remote builds.
- CI re-runs the same suites; do not rely on CI to catch parity drift that you can catch in seconds locally.
Drift that lands silently because someone updated only the language they happened to be working in is the failure mode this rule exists to prevent.
See also
- Testing — how to run each layer.
- Multimodal — error wordings in user-facing context.
