Introducing Model Interaction Protocol (MIP) - a Spec for Decoupling AI What from AI How

By Göran Sandahl - 11/25/2025

Over the last 12 months we've been building and running over 50 AI agents in production — including thousands of agentic live tasks across finance, operations, engineering and customer-facing workflows. In parallel to that, we have built and released research previews of state of the art agents:

State of the art coding (Delvin)
General computer use (Opperator)
Business workflow agents (getagento.com)

Doing this at scale has allowed us first hand experience of the pain of operating Agents and LLMs at scale and in production.

While a lot of focus nowadays is on the right prompt, the right agent sdk and the right model - the lifecycling of these things becomes an even more pressing issue in production and at scale. To operate agents at scale we need to be able to treat them as reliable cattle, and not delicate pets that need constant care.

To address this, we believe something more is needed than better evals and tracing. We need to address the underlying approach to how we interact with models from client side code.

Therefor we're introducing The Model Interaction Protocol (MIP). It aims to implement a pattern for decoupling what we want models to do, with the how to get models to successfully do it.

Why MIP?

All foundational technologies that have reached scaled adoption have implemented a level of abstraction and declarative standardization.

Networking went through this. Before HTTP standardized the contract between clients and servers, every system invented its own wire protocol. Then the what (request a resource) got separated from the how (TCP details, routing, caching), and the web became possible.
Containers went through this. Before standardized specs, deploying software meant handcrafting machine configurations. Then the what (run this image) got separated from the how (orchestration, scheduling, scaling), and cloud-native became possible.

AI is still stuck in the "before" era. Every agent hard-codes its own model name, prompt, context window, routing rules, retry heuristics, and fallback logic. Knowledge lives in the heads of engineers. Switching models means rewriting agents. Improving a prompt means QA'ing everything downstream. Scaling agents scale operational burden linearly:

What is MIP?

MIP is a task specification for decoupling What from the How.

It cleanly separates:

WHAT needs to be done — the task contract: input schema, output schema, constraints, success requirements of the task, potential feedback.
HOW it gets done — the execution process: exactly which models are to be used, how does the most effective prompt look like, how to structure available context, what hyper parameter settings is optimal, fallback logic, retries, confidence checks, evaluations etc.

The goal with MIP is for the client to only have task definitions in code, and that an executor of choice can be used to find the optimal way to make models complete the task.

The task definition can be of absolute arbitrary complexity. From a simple parse this image to a full blown task to balance a financial balance sheet. The executor is at the same time free to any degree of sophistication of the execution strategy. It may be naive templating for optimized model prompting or more advanced evolutionary mechanisms to derive the right strategy over time and with data.

Why this matters

Decoupling the What we want the models to do with the How to get them to do it offers many advantages:

1. Agents That Run Independent of Models

Currently, altering a model or updating a prompt necessitates modifications across every agent, followed by comprehensive quality assurance and a hope that no regressions occur in other workflows. With MIP, the agent merely requests a task, and the executor determines which model(s) and strategies to employ. This allows for prompt improvements, the substitution of more economical models for straightforward cases, or the gradual implementation of improved strategies, all without requiring changes to the client caller.

2. Systemized Optimization of Completion Quality

Presently, every significant AI workload is underpinned by a complex array of prompt engineering, routing rules, retries, and fallback mechanisms for more expensive models. This logic often resides in ad hoc code or is implicitly understood by individuals. MIP centralizes this logic within the executor. By declaring the desired outcome and constraints, the executor is empowered to optimize, retry, escalate, re-prompt, or chain sub-steps as necessary to fulfill the contract, thereby reducing the incidence of late-night production failures caused by upstream model drift.

3. Scalability of Agents without scaling of Operations

As agents and agentic runs increase from single digits to hundreds or thousands things like latency management, fallback logic, and cost control become substantial goals. MIP enables implementing systematic approaches to optimizing towards numerical goals at the execution layer, rather than within the agent logic itself. This facilitates pursuing growth without the need for more full-time resources to maintain this system.

4. Prioritization of Cost and Latency as First-Class Constraints

The task contract can incorporate parameters such as "max latency 1200ms" or "use the cheap model unless confidence <0.7," thereby enabling automatic cost-aware routing. Routine cases remain cost-effective, with only complex cases being escalated to more advanced models. This allows for significant savings without the need to rewrite business logic.

5. Embracing Iteration Without Production Disruption

Since the execution is handled at a central place, MIP allows for systematic task versioning and testing. This can mean directing 10% of agentic operations to a new revision while retaining 90% on a previous revision. Both versions are explicit and can operate in parallel, facilitating clean roll-forwards or roll-backs. This avoids "big-bang" model upgrades that surreptitiously alter behavior across the entire system.

6. A Path to an Open System

MIP enables the executor to independently implement observability, audit trails, and transparency into the process of orchestrating reliability. This means that the internal workings of how tasks are handled, from model selection to retry logic, can be made visible and auditable without requiring modifications to the agent or other upstream components. This fosters greater trust, debugging capabilities, and adherence to regulatory or internal compliance requirements.

In essence, MIP aims to sustain agents in production environments without requiring them to be treated as nascent systems that need constant care.

How MIP works

The core of MIP is an explicit JSON spec for a task's input schema, output schema, and constraints.

Input schema: what the executor will receive, with strict typing. This can include text, images, audio chunks, tool handles, structured data, etc.
Output schema: what the executor must return, with strict typing.
Constraints: performance requirements (latency, cost ceilings, compliance rules, confidence thresholds, etc.).

Here's a simplified sketch of how it could look like:

{
  "task": "summarize_customer_ticket",
  "input_schema": {
    "ticket_text": "string",
    "customer_tier": "enum['gold','silver','bronze']",
    "include_sentiment": "boolean"
  },
  "output_schema": {
    "summary": "string",
    "sentiment": "enum['positive','neutral','negative']",
    "requires_handoff": "boolean",
    "confidence": "number(0-1)"
  },
  "constraints": {
    "max_latency_ms": 1200,
    "compliance_policy": "gdpr"
  }
}

At runtime:

The agent submits this task spec + the actual instance of the input.
The executor handles the HOW:
- validates schemas
- picks model(s)
- assembles prompts and context
- maybe chains multiple steps (classify sentiment, then summarize)
- retries and falls back
- returns validated output or provides structured error
- optionally logs and evaluates the entire sequence

The agent doesn't need to know how any of that happened. The agent just gets reliable, validated structured output.

What is the executor?

In MIP terms, the executor is the runtime that works to fulfill the task contract.

It can be:

your in-house library
a hosted runtime (such as a LLM proxy)
a specialized executor (such as Oppers Task Completion API.)

The executor's job: satisfy the task's WHAT under its constraints, and the client / developer task is to define the task, the context to send, the output to receive and invoke it. This separation is the point of MIP, because it allows for evolving the execution strategy — prompts, routing, fallbacks, models — without rewriting any client side agent code.

Things MIP is intended to allow for

Cost-aware escalation. "Use the cheap local model for summaries. If confidence <0.7, re-run just that case on the expensive model." This allows for reducing spending without sacrificing quality.

Multimodal pipelines as a single task. "Take an image, OCR it, extract key fields, generate a risk score, return a structured incident report." That is one MIP task. Under the hood it may hit 3+ models. The caller doesn't care.

Model choice optimization. A task can use the best model since it will naturally satisfy the task definition and constraints the best. The best model for time series analysis if that is the task, or OCR or Code generation.

Production-safe iteration. You can improve prompts or routing logic inside the executor daily while keeping the task contract stable. That means your agents don't go stale, but also don't constantly need to be refactored.

Production behavior that's predictable. Instead of "sometimes it's fast, sometimes it's weird," you can actually say: "this task will answer under 1200ms, with a confidence score."

Q&A

Isn't this just structured output + function calling?

Not really. Function calling assumes the model itself decides how to act and which tool to call, and most of the guidence to the model is in the prompt or weights. With MIP you tell an executor the outcome you want and the shape of the result, and the executor decides how to achieve it (which may involve reducing pre-filtering tools, injecting additonal context etc). It's built for repeatability and optimization, not one-shot inference.

How is this similar or different from Model Context Protocol?

It is spiritually coming at the problem from the same place as MCP. MCP abstracts APIs or underlying systems into simple Tools. MIP abstracts underlying models into simple Tasks. Both MCP and MIP incorporate the idea of an executor or middleware that achieves the translation from Agent interaction to System interaction. The goal is the same, to build abstractions and systems for achieving greater flow on agent development.

Why not just keep using chat completions?

Because with chat completions, the "what" and the "how" are fused into one giant prompt that you now have to babysit forever. Every prompt edit is effectively a production code change. Every model swap is a fire drill. MIP splits those concerns so you can evolve how tasks are executed without touching every consumer.

Who owns versioning?

You may or the executor. You may call task:v3, later task:v4 from code, or allow the executor to grafefully A/B test new revisions and implement systematic quality processes such as canary releases or red/green deployments.

What about latency and cost?

They're part of the contract. You can say "max 1200ms" or "only escalate to premium model if confidence <0.7" or "never exceed X tokens." The executor is responsible for meeting those. That's how you run AI in high volume without blowing up either your SLOs or your invoice.

What happens if the executor can't fully solve it?

You would get a structured error back, versus the best possible attempt. This is the difference about error handling behavorial systems towards the reliability and quality we expect in production. Errors will happen, and we need to be able to catch them as such.

Is this just vendor lock-in with a cooler name?

The opposite, actually. By stabilizing the WHAT as a contract, you can keep swapping HOW: new models, new routing strategies, new prompt stacks — even different executors — without rewriting your agents. It's how you stop being dependent on one model API and start behaving like you own your runtime. It is a cool name, isn't it? :)

Call to action

We're looking for teams who are already feeling the pain of scale:

your agents work, but only because you have a team caring for them and their evals
changing a prompt feels like deploying a new microservice
cost, latency, and "does it actually answer?" all fight each other every day

If that sounds familiar, we would love to have a conversation on this spec and how it could be evolved or implemented.

We think MIP is sorely needed as we will be scaling from single digit agents in production to thousands. To get there we need to be on a path where we can treat agents and models as cattle, not pets that needs constant care.