Buy to build: an enterprise case for renting the AI gateway

By Göran Sandahl - 1/15/2025

Every enterprise that gets serious about AI eventually draws the same diagram. On one side, the applications and agents your teams are building. On the other, the models — from a handful of providers, across a handful of regions. In the middle, a box: routing, fallback, key management, budgets, evals, prompt caching, region policy, tracing. The industry has settled on a name for that box — the AI gateway — and on a question to go with it: do we build it or buy it?

We think the build-versus-buy framing hides the decision that actually matters. The objective of an AI program is not to own a gateway. It is to discover how AI drives business outcomes for your organization — which workflows, which models, which patterns move a real number — and to get there before your competitors do. The gateway is plumbing on the route to that answer. The right question is therefore narrower and more strategic: which choice maximizes your speed to learn and preserves your optionality while the answer is still unknown? This note argues that, for the access layer specifically, that choice is to buy.

Key Findings

The competitive variable in enterprise AI is not infrastructure ownership; it is the cadence of experiments you can run against real business metrics, and the freedom to act on what they teach you. Building the access layer trades both away for control you do not need.
The "how" of enterprise AI is unsettled and changing fast. Model trackers logged hundreds of new releases in 2025 — by some counts a new model roughly every two days — and most models are deprecated inside 12 to 18 months. Any architecture hard-coded to today's model is a re-architecting project waiting to happen.
A gateway is not a thing you build; it is a thing you maintain and run, indefinitely. Naive build estimates routinely capture only the visible 20 to 40 percent of total cost of ownership and miss the standing maintain-and-run burden that follows.
The market is standardizing on the gateway as a control plane. Analysts project that the large majority of teams building multi-model applications will route through an AI gateway within a few years, up from roughly a quarter today — precisely because heterogeneity, cost, and governance are becoming unmanageable by hand.
The durable rule is build the differentiated layer, buy the commoditizing one. Enterprises are right to build domain-specific models, agents, and workflows — that is where advantage lives. The access layer beneath them is undifferentiated, fast-moving plumbing, and it is the part to rent.

Strategic Planning Assumptions

Through 2027, enterprises that hard-code the model layer into application code will re-architect those applications more often than they extend them, as point releases, deprecations, and price changes force unplanned migrations.
By 2028, the large majority of engineering teams building multi-model applications will route production traffic through an AI gateway, up from roughly one in four today, making the access layer a standard architectural tier rather than a per-team concern.
By 2027, multimodal and domain-specific models will dominate enterprise GenAI — with the share of domain- or function-specific models rising from low single digits a few years ago to more than half — widening the gap between single-vendor stacks and the frontier.
Through 2027, a large share of agentic AI initiatives — by some estimates approaching half — will be cancelled or stalled, disproportionately those that scaled before instrumenting cost, evaluation, and governance at the model layer.

Analysis

The objective is outcomes, not infrastructure

It is worth stating plainly what an AI program is for, because the build-versus-buy debate tends to forget it. No customer renews because your routing table is elegant. No board approves a budget to admire your retry logic. The program exists to find where AI changes a business outcome — deflected support tickets, faster underwriting, higher conversion, lower cost-to-serve — and to compound that learning faster than the competition.

Consultancy analysis of enterprise AI reaches the same conclusion from the cost side: as workloads scale from pilots to production, the meaningful comparison shifts from cost per technical unit — cost per token — to cost per business outcome. A stack that is cheap per token but slow to learn what those tokens are worth has optimized the wrong number.

Two capabilities determine who wins that race, and neither is "owns more infrastructure."

The first is speed to learn: how many real experiments your teams can run per quarter against actual business metrics. That cadence, far more than any single model choice, separates the organizations pulling ahead from the ones still standing up platforms. The second is flexibility in the how: when an experiment teaches you something — a different model is better, a cheaper route is good enough, a different pattern wins — how cheaply can you act on it? In a domain this immature, optionality is the asset and premature commitment is the liability.

Building your own access layer works against both. Every design decision baked into home-grown infrastructure is a decision frozen — and frozen before you know the answer. The engineering hours go into plumbing instead of experiments, and the architecture you committed to becomes the thing you have to unwind when the data surprises you. That is the real cost of building, and it is paid in the currency an AI program can least afford.

The "how" is unsettled — and will stay that way

The case for building rests on an unstated assumption: that the thing underneath the gateway holds still. It does not. The model layer is among the fastest-moving surfaces in enterprise software, and a gateway is a standing promise to absorb that motion so your applications do not feel it.

The pace is the headline. Model trackers logged hundreds of new model releases in 2025 — by some counts a new model arriving roughly every two days — and most models reach end-of-life within 12 to 18 months of shipping. A single-model bet is therefore not a one-time integration; it is a subscription to recurring migrations, each one re-tuning prompts and re-validating behavior against a model that did not exist last quarter.

The shape of the layer is shifting too, not just its membership. Industry research expects a large share of GenAI solutions to become multimodal within a couple of years, and the share of models that are domain- or function-specific to rise from low single digits to more than half. The frontier is fragmenting into many specialized models rather than consolidating into one. A stack wired to a single general-purpose endpoint does not just risk falling behind on quality; it structurally cannot reach where the value is moving.

And the providers do not agree with each other. Tool calling, structured output, system prompts, streaming, token accounting, error semantics, even model-naming conventions differ across vendors and change within each one over time. Reconciling that divergence so your application sees one stable contract is not a one-time adapter you write and forget. It is a permanent translation job that grows with every provider and every release.

None of this is individually hard. The difficulty is that it never stops. A gateway is not a feature you ship; it is a treadmill you agree to run, in time, every time, for as long as the product lives — and the speed of the treadmill is set by the market, not by you.

What building actually entails: build, maintain, run

A demonstration gateway is a weekend. A production one an enterprise can bet on is a different undertaking, and the build phase is the smallest part of it.

Build is the visible surface: a normalized interface across providers; a request/response contract that hides their differences; routing and fallback with session-aware stickiness; resilience machinery (timeouts, circuit breakers, rate-limit handling, retry budgets); cost controls and prompt caching; observability and tracing for every call; an eval harness to prove a model is good for your tasks rather than a public benchmark; and the governance and security apparatus — key rotation, region and residency policy, access control, audit logging, PII handling — your security function will require. None of these are research problems, but the gateway is only useful when all of them are in place at once, which is the part the "we'll build it in a quarter" estimate quietly omits.

Maintain is where the cost actually lives. By most accounts, naive build estimates capture only the visible fraction of total cost of ownership — frequently cited at 20 to 40 percent — and miss the 60 to 80 percent that is ongoing: keeping pace with model releases, absorbing provider API changes, re-capturing prompt-caching discounts as the rules shift, and adding regions and providers as they appear. Time-to-value for built infrastructure is routinely underestimated by 6 to 12 months, and falling behind has a compounding cost. A gateway that lags the frontier by six to nine months means every product team is six to nine months from the better, cheaper model — and that much slower to learn whether it moves the number.

Run is the standing operational load nobody puts on the build estimate: on-call for providers' outages, where your team owns mitigation for someone else's incident; capacity and rate-limit management across vendors; cost monitoring so token spend does not quietly compound; security patching; and keeping the compliance story current. The recurring version of this load is the model migration, and it is where home-grown gateways quietly fail. A new model lands on a Tuesday and the business wants it live.

Okay, new model. We have a hundred agents running on the old one. How do we move them over? How do we know the new one is actually better for our tasks? Who writes the evals — the platform team, or every application team, separately, by hand?

Standing up the routing is the easy 80 percent. The expensive 20 percent is the operational machinery around a swap: evals to prove the new model is better on your workloads, a safe rollout, tracing to catch regressions, and a path that does not make every team redo the work. Skip it and a migration becomes a coordinated, multi-team, multi-month effort — every single time the model changes, and the model changes constantly.

There is a final, quieter exposure that analysts would call concentration risk and engineering leaders recognize by name. Someone builds the gateway — call him Gunnar — and he is excellent. He understands every provider quirk and every reason the routing table looks the way it does. The problem is not that Gunnar cannot build it; it is that the organization cannot afford to depend on Gunnar. The knowledge lives in one head, on a critical and fast-moving system where being six months behind is a competitive problem.

We know Gunnar can build it. That was never the question. The question is whether we can be a company that depends on Gunnar — and we can't.

Skilled platform and MLOps talent is scarce and expensive, and it is mobile: it takes new roles, new companies, and leaves of absence. Buying converts a person-dependency into a vendor relationship with an SLA, a roadmap, and a team whose entire job is to keep this layer current. For infrastructure this fast-moving and this far from your differentiation, that is not a downgrade. It is the trade you want.

Why the market is converging on the gateway

The strongest signal that this is plumbing, not differentiation, is that the market is standardizing it. Analysts project that the large majority of teams building multi-model applications will route through an AI gateway within a few years — up from roughly a quarter today — and that a growing share of enterprises will run more than one to govern increasingly heterogeneous, multi-agent systems. Capabilities that were bespoke in 2024 are becoming table stakes: centralized authentication and policy enforcement, observability across distributed agents, cost controls, and the AI trust, risk, and security management (AI TRiSM) that turns experimental endpoints into governed production APIs.

Industry analysis increasingly frames platform selection not as a single procurement decision but as three linked choices — the model, the application architecture, and the control plane — each with its own implications for cost, lock-in, and governance. The gateway is that control-plane choice. Conflating it with the others, or optimizing it on headline token price alone, is how organizations end up over-spending, under-governing, or both.

This convergence is also where the cost story turns, and the token-economics picture is more counterintuitive than it looks. Headline token prices have fallen steeply — by some measures the blended cost of frontier-model output more than halved year over year — yet a majority of enterprises report AI spend running over budget. Falling unit prices collide with rising usage, and with poor price comparability across providers' billing meters: text billed by token, speech by duration, vision by image. The gateway is where tokens are metered, routed to the cheapest model that is good enough, and governed. Deliberate tiered routing — cheap models for simple traffic, frontier models only where they earn it, with aggressive caching — is reported to cut effective cost per unit of intelligence by figures approaching 85 to 90 percent, savings that are difficult to capture call-by-call in hand-rolled application code. And it is the layer where the governance demanded by emerging interoperability standards (such as model-context and agent-to-agent protocols) is most naturally enforced.

The cautionary half of the same research matters too: a large share of agentic AI initiatives are expected to be cancelled or stalled through 2027, disproportionately those that scaled before they could see and control cost, quality, and risk at the model layer. The initiatives that survive will be the ones that instrumented and governed that layer early — which is exactly what a gateway is for, and exactly the capability most organizations should not be inventing from scratch under deadline.

Build the differentiated layer, buy the commoditizing one

The apparent paradox is that the same research pointing to a buy decision here also shows enterprises building more, not less: a rising share of GenAI models becoming domain- and function-specific, and a meaningful portion of production GenAI being built in-house rather than bought off the shelf. This is not a contradiction. It is the answer.

Build is the right call for the layer where you have advantage — your domain-tuned models, your agents, your workflows, your proprietary data and evaluation criteria. That is the differentiated layer, and owning it is how AI becomes a moat rather than a feature. The access layer beneath it is the opposite: undifferentiated, identical to what every other company building on AI also needs, and changing too fast to be worth a standing internal team. Drawing the line there — own the experiments and the outcomes, rent the plumbing — is what lets you build the part that matters faster, because your best engineers are spending their hours on it instead of on provider reconciliation.

	Build the gateway	Buy the gateway, build the product
What you optimize for	Owning the plumbing	Speed to learn what drives outcomes
Where your best engineers spend time	Reconciling provider APIs, chasing model changes	Experiments against real business metrics
Trying a new model or pattern	A migration project, every time	Change one line, ship the experiment
When the data says "switch the how"	Re-plumb, under deadline	Already supported; you just opt in
Provider API / caching changes	You patch it, under deadline	Absorbed below your code
Evals for a model swap	Each team, by hand	Shared machinery, run once
Region / residency / governance	Hand-maintained routing	Set once, enforced centrally
Key person leaves	Critical knowledge walks out	Vendor team owns continuity
Share of TCO that is ongoing	Yours to carry (60–80%)	Priced into the platform
Time to the frontier	6–9 months behind	The day it ships

Recommendations

Treat the access layer as a control plane to acquire, not a system to build. Reserve internal engineering for the differentiated layer above it — domain models, agents, workflows, and the evaluation criteria that encode your standards.
Make optionality an explicit design requirement. Insist on an abstraction layer and model-name indirection so that switching a model, provider, or region is a configuration change, not a code rewrite. The ability to move cheaply is the asset; protect it deliberately.
Measure programs by speed to learn, not by infrastructure delivered. Track experiments run per quarter against real business metrics. If standing up plumbing is consuming the cadence, the build decision is costing you the thing you actually came for.
Instrument evaluation and tracing before scaling agents, not after. The agentic initiatives that fail tend to scale first and govern later. Put cost visibility, evals, and tracing at the model layer from the start.
Centralize governance and residency policy once. Region rules, access control, budgets, and audit logging belong in the access layer, enforced uniformly — not re-implemented per application team.
Eliminate key-person concentration on the model layer. This surface moves too fast and matters too much to depend on a single internal expert. A vendor with an SLA and a roadmap is the more resilient dependency.

The Opper perspective

This is the layer Opper operates. One API sits in front of every major provider and region, with routing, automatic fallback, evals, tracing, budgets, region policy, and prompt caching preserved across turns — kept current as models and APIs change, so your code does not move when the floor does.

You call a model by its plain name and we serve it across every provider that runs it. New models, point releases, and new regions roll in behind that name. When a provider has a bad day, traffic fails over. When an experiment shows a different model or pattern wins, switching is a one-line change rather than a migration project. That is the entire point: the plumbing stays current on its own, so your teams spend their cycles learning what drives outcomes — and stay free to change the how the day the data tells them to.

Get started

If you are sketching that box on a whiteboard right now, talk to us before you staff the project. Sign up and point one application at Opper, browse the full catalog at opper.ai/models, or read how it works in the docs. For an enterprise buy-versus-build conversation — security, residency, migration, the whole picture — reach us at hello@opper.ai or on Discord.