Phi 4 Multimodal

by Microsoft

Phi-4 Multimodal is a lightweight 5.6 billion parameter open-weight foundation model that unifies speech, vision, and text processing in a single architecture using a mixture-of-LoRAs design. It supports a 128K token context window and processes multiple input modalities including text across 23 languages, English visual understanding, and audio in eight languages. Trained on 5 trillion text tokens, 2.3 million speech hours, and 1.1 trillion image-text tokens, the model handles image understanding, optical character recognition, chart and table reasoning, speech recognition and translation, and multi-image analysis. It ranks first on Hugging Face's OpenASR leaderboard and is the first open-source model offering speech summarization. Released under the MIT license in February 2025, Phi-4 Multimodal is designed for memory and compute-constrained environments while maintaining competitive performance on vision benchmarks against larger multimodal models.

Key info

Input
Output
Features
Context window
128K
Max output
4K
Input price
$0.11 /1M
Output price
$0.43 /1M
  • EU residency available
  • Zero data retention on pay-as-you-go
  • No training by default
  • GDPR DPA available

Available routes

Phi 4 Multimodal runs on 1 route through the Opper gateway. Compare residency, ZDR, and training posture at a glance β€” full data-handling detail per route below.

ProviderRegionZero data retentionTrainingInputOutput
EvrocEUZero data retentionNo$0.11$0.43

Training posture across routes: No training on prompts by default.

Data handling per route

Each route hosting Phi 4 Multimodal has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

Evroc β€” SwedenπŸ‡ΈπŸ‡ͺ

Zero data retention is on by default on Pay-as-you-go β€” no action required. No training on customer data. EU; DPA available.

Zero data retention
On by default on Pay-as-you-go.
Training
No training on customer data.
Logging
Abuse monitoring
Third-party access
None disclosed
GDPR DPA
DPA available
Transfer mechanism
Not applicable β€” data stays in EU

Get started

Call Phi 4 Multimodal through the Opper gateway with one API key. Let your coding agent set it up, or call it directly β€” Opper is drop-in compatible with the OpenAI, Anthropic, and Google AI SDKs.

Set it up with your agent

Copy this and paste it into your coding agent β€” Claude Code, Cursor, Codex, and more β€” and it'll wire up Opper for you.

Or call it directly

import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPPER_API_KEY,
baseURL: "https://api.opper.ai/v3/compat",
});
const completion = await client.chat.completions.create({
model: "evroc/phi-4-multimodal-instruct",
messages: [{ role: "user", content: "Hello" }],
});
console.log(completion.choices[0].message.content);

Compare Phi 4 Multimodal with…

Side-by-side on privacy, EU hosting, pricing, and benchmarks.

Other models from Microsoft

Start building with 300+ models

One API key. Every major provider. Up and running in minutes.

Get startedView Documentation
Phi 4 Multimodal by Microsoft β€” pricing, benchmarks, ZDR, EU hosting | Opper AI