Opper AI partners with Infercom for sovereign LLM inference

By Felix Wunderlich -

Stockholm, Sweden — May 2026. Opper AI and Infercom are partnering to add Infercom's sovereign LLM inference to the Opper AI gateway. Builders can now route to Infercom-hosted models alongside the 300+ models already in Opper, including MiniMax-M2.5 at 400+ tokens/sec, and pick the region for each call.

Why this matters

A different kind of accelerator. Most EU inference providers are racks of GPUs in an EU postcode. Infercom runs on SambaNova's RDU dataflow architecture with a three-tier memory system, serving large open-weight models in native BF16. No quantization, no shrinking the model to fit. Per Infercom's published numbers, that delivers up to ~10× faster inference and ~5× better energy efficiency than comparable GPU stacks.

Sovereign by design. Infercom serves its catalog from multiple regions: EU (German datacenters by default), US, and JP. EU-routed inference stays in German jurisdiction with no exposure to the US CLOUD Act, aligned with the German BDSG, the GDPR, the EU AI Act, and ISO 27001:2022. For DACH customers, regulated industries, and the public sector, that's the difference between a usable API and a procurement dead end.

One gateway with production guardrails. Opper is the AI gateway for agents on any model. A single LLM router across 300+ models, now including Infercom's catalog. Pin a specific model per call, configure fallbacks for rate limits and outages, and get evaluations and observability built in.

"Developers want fast inference and enterprises need data sovereignty. With Opper, they get both — route to Infercom for the fastest EU-hosted inference available, with zero code changes."

— Altug Eker, Managing Director, Infercom

"Europe needs more than one sovereign inference provider — it needs a network of them. Every additional EU node shortens latency for developers and keeps the data under European law for enterprises. Infercom raises that bar with SambaNova hardware in Munich, and now our 50,000 developers can reach it through a single API."

— Göran Sandahl, Co-founder and CEO, Opper AI

Models live today

The catalog below is fetched live from Opper's model API and filtered to Infercom-hosted models. Availability, context windows, and pricing stay in sync with what's actually callable through Opper. Region reflects the deployment region for each model.

ModelRegionContextInput / 1MOutput / 1M
infercom/deepseek-v3.1US128K$3.24$4.86
infercom/deepseek-v3.2US32K$3.24$4.86
infercom/gemma-3-12b-it-jpJP128K$0.22$0.38
infercom/gpt-oss-120b-euEU128K$0.24$0.64
infercom/llama-3.3-70b-instructUS128K$0.65$1.30
infercom/minimax-m2.5-euEU160K$0.32$1.30
USD per 1M tokens. Pricing and availability subject to change.

Get started

Point the Opper SDK at any Infercom model with a provider-prefixed name:

from opperai import Opper

opper = Opper()

result = opper.call(
    name="summarize",
    model="infercom/minimax-m2.5-eu",
    input="Long document goes here...",
)

Follow the quick start in our docs to wire this into a task with evaluations and a fallback model.

About Infercom Infercom is a sovereign AI inference platform headquartered in Luxembourg, serving open-weight models in native BF16 on SambaNova RDU dataflow hardware across EU, US, and JP regions. EU-routed inference runs in German datacenters and is aligned with the German BDSG, the GDPR, the EU AI Act, and ISO 27001:2022.

About Opper AI Opper AI is the AI gateway for agents on any model. A unified API across 300+ models with smart routing, automatic fallbacks, built-in evaluations and observability, and full OpenAI SDK compatibility.