High-availability inference: introducing model families
By Göran Sandahl -
When you call a model like aws/claude-sonnet-4-5-eu, you've actually made three choices: the model, the provider, and the region. And you've tied your app's uptime to a single endpoint.
That's fine until that one endpoint has a hiccup. The provider hits rate-limits and your traffic has nowhere to go. A region degrades and your calls start failing. A better-priced provider comes online, but you can only reach it by editing code. Every fix is a deploy — and none of them is really about your app. They're all about keeping tokens flowing.
What you actually want is something simpler: give me Claude Sonnet 4.5, and keep it available. Which cloud serves it, and from which region, is a routing decision — and that belongs in the gateway, not your code.
That's why we're introducing model families: high-availability inference behind a single name. One name stands in for every deployment of a model, across providers and regions, and Opper routes each call to a healthy one. If one deployment fails, the next takes over — your app never sees the outage. We've now made families the default way to target a model across the product.
Name the model, not the deployment
A family is the bare, canonical name of a model — no provider prefix:
claude-sonnet-4-5
gpt-oss-120b
mistral-medium-3.5
claude-opus-4-7
Behind claude-sonnet-4-5 sit all of its deployments. Be that Anthropic direct, Azure, AWS in the EU, GCP in the US and EU. You don't enumerate them. You name the family and Opper picks a member.
So instead of pinning a single deployment:
{
"name": "summarize",
"instructions": "Summarize the following text",
"model": "aws/claude-sonnet-4-5-eu",
"input": { "text": "..." }
}
you name the family and let the gateway resolve it:
{
"name": "summarize",
"instructions": "Summarize the following text",
"model": "claude-sonnet-4-5",
"input": { "text": "..." }
}
Same call and same response shape. The difference is that the second one keeps working when a provider has a bad day, and picks up new regions and deployments as we add them. Without you touching the code.
You can still pin a single deployment with a full provider/model id whenever you need exact control. Families are the default, not the only option.
What happens when you call a family
Resolving a family isn't round-robin roulette. Opper orders the members so the common cases are both fast and stable:
- Sticky within a session. The first call in a session lands on a member; subsequent calls in that session prefer the same one. That keeps you on a provider's warm prompt cache instead of paying full price for the prefix on every turn. Which is where a lot of multi-turn cost quietly goes.
- Fallback on failure. If a member returns a 429 or 5xx, Opper moves to the next member in the family and retries. For structured outputs it also retries JSON/XML parsing per model before moving on. The caller sees a result, not an outage. This is the same machinery described in Fallbacks and Aliases, now applied automatically to every family member.
- Inside your policy. Resolution only ever considers providers and models your org and project allow. If you've rules to only allow EU regions, the family expands to EU members only — so
claude-sonnet-4-5stays in Europe without you hand-picking the EU id. - Even spread on cold start. With no session to be sticky to, members are rotated so load doesn't pile onto one provider. You are getting the most likely fresh provider at every turn.
In additon, every attempt is optionally traced, so you can see which member actually served a call, where fallbacks kicked in, and where spend went.
Families and aliases: built-in lists vs. your lists
If you've used Opper's aliases, families will feel familiar. Both turn one name into an ordered list of concrete models with fallback. The difference is who curates the list.
| Model family | Alias | |
|---|---|---|
| Who maintains it | Opper (platform) | You (per organization) |
| The name | Canonical model name, e.g. claude-sonnet-4-5 | Whatever you choose, e.g. sonnet, fast, default-llm |
| What it expands to | Every deployment of that model, across providers and regions | The exact ordered list you define |
| Stays current as new regions/providers appear | Automatically | When you edit it |
| Best for | "Give me this model, wherever it runs best" | Org-wide conventions, mixing custom + vendor models, exact fallback order |
They compose. Point an alias at a couple of families and you get a stable team-wide name on top of always-current member lists:
curl -X POST https://api.opper.ai/v2/models/aliases \
-H "Authorization: Bearer YOUR_OPPER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "default-llm",
"fallback_models": ["claude-sonnet-4-5", "gpt-oss-120b"],
"description": "Sonnet 4.5 across all its deployments, then open-weight as a floor"
}'
Now your whole codebase calls "model": "default-llm". Sonnet's deployments are kept current by us; the fallback floor and the org convention are yours.
Why we standardized on families
- Painless migrations. Point releases and new regional deployments roll in behind the family name. No PR to bump an id.
- Resilience by default. Multi-provider fallback is just how a family behaves — you don't wire it up per call.
- EU residency without bookkeeping. Set the region policy once; families respect it.
claude-sonnet-4-5stays in the EU because your project says so, not because you memorized the EU id. - Cheaper multi-turn work. Session stickiness keeps you on warm provider caches instead of re-paying for the prompt prefix every turn.
- Cleaner code. Your application says what model it wants. The gateway handles where it runs.
Get started
If you're already on Opper, swap a pinned provider/model id for its family name on your next call and watch the trace. You'll see the member it resolved to. The full list of families and their deployments is in the model catalog and the docs. New to Opper? Sign up, then name a model and let us find the most optimal route. Questions? We're on Discord or at hello@opper.ai.