Opper

By Göran Sandahl - 9/15/2024 Using o1-preview and o1-mini with RAG and structured output In this blog we do a quick explorative test of OpenAI's new reasoning models o1-mini and o1-preview on a retrieval use case, challenging reasoning and with structured output. Since these models are very new - essentially preview releases - they lack a lot of the features we have come to expect for model APIs. For example, there is no support for structured output and little guiding documentation on how to properly prompt them. At Opper we have evolved a generic way of interacting with models with structured input and output and provide very accessible APIs for indexing and retrieval of custom knowledge. In the example we feature below, we will simply plug o1-mini and o1-preview into an existing RAG pipeline (very similar to what is described in our earlier blog post on Simple RAG with Mistral). We will show that responses are very good, require no modifications to prompts and that they follow instructions and utilize context very well. One difference from the Mistral blog post is that we plug in a lot more context for the model to reason over (north of 25k words). We decided to use a SWOT analysis of the Reddit S1 filing as a use case since it is a typical high reasoning task that can be challenging. It requires utilizing a lot of context data and reasoning over that data to form a comprehensive analysis that is to the point and well structured. The task we will try to complete is Provide a data-driven SWOT analysis of Reddit with emphasis on impact from AI using the Reddit S1 filing PDF as source knowledge.. The pipeline is built with the following steps (available in full in our Python cookbook) Index the Reddit S1 filing as a PDF using the Opper SDK To retrieve high amounts of relevant context, we will generate relevant "sub-questions" with o1-mini and retrieve segments using the Opper SDK. We will use o1-preview to reason through retrieved knowledge and provide a SWOT analysis, using structured input and output. We will also ask for clear citations to where data is cited from, highlighting page number and document name. We will be using structured input and output in all parts of the pipeline, something that isn't expressively supported with these models. The Result: Strengths: Reddit's massive corpus of conversational data is foundational to current AI technology and many LLMs, making it valuable for model training [1]. Reddit is investing in AI to enhance the user experience, making it more personalized and safer, and to improve search capabilities, which is expected to increase user engagement and retention [2]. AI is expected to improve Reddit's ability to localize content and moderate content as they expand internationally [2]. Weaknesses: New AI applications require additional investment, increasing costs and complexity, which may impact gross margin [3]. Market acceptance of AI technologies is uncertain; Reddit may be unsuccessful in its product development efforts [3]. Reddit may face competition from LLMs; users might choose to use AI models instead of visiting Reddit directly [4]. Opportunities: Emerging opportunity in data licensing given the value of Reddit's data in sentiment analysis and trend identification [1]. Reddit can harness AI to improve content recommendations, driving user growth and retention [2]. Threats: AI is subject to evolving regulatory scrutiny; Reddit may need to adjust offerings as legal frameworks develop [3]. Potential misuse of Reddit data by third parties could harm Reddit's business and reputation [3]. [1] "We believe that Reddit will be core to the capabilities of organizations that use data as well as the next generation of generative AI and LLM platforms." from reddit-sec.pdf page 17 [2] "We are investing in ways to harness AI to make the user experience more personalized and safer and to improve our search capabilities, which we expect will increase user engagement and retention." from reddit-sec.pdf page 136 [3] "Developing, testing and deploying these technologies may also increase the cost profile of our offerings due to the nature of the computing costs involved in such initiatives. Moreover, market acceptance of AI technologies is uncertain, and we may be unsuccessful in our service or product development efforts." from reddit-sec.pdf page 66 [4] "In addition, we face competition from large language models ("LLMs"), such as ChatGPT, Gemini, and Anthropic; Redditors may choose to find information using LLMs, which in some cases may have been trained using Reddit data, instead of visiting Reddit directly." from reddit-sec.pdf page 43 Here is a few selected segments of the implementation: Using o1-mini to drive context retrieval We used o1-mini to generate subquestions that will drive the context retrieval. We used the opper.call API to call the model. Note that we ask the model to return List[str] for a structured set of subquestions to iterate over for retrieval. question = "Provide a data-driven SWOT analysis of Reddit with emphasis on impact from AI" subquestions, _ = opper.call( name="generate_subquestions", instructions="Given that you can query Reddit's S1 filing to answer the question, generate a list of subquestions that you would want the answer to in order to answer the main question. Only return the subquestions, not the question.", input=question, output_type=List[str], model="openai/o1-mini" ) knowledge = [] for subquestion in subquestions: print(subquestion) result = index.query( query=subquestion, k=1 ) knowledge.append(result) # How does Reddit's revenue model currently perform, and what are its primary sources of income? # What weaknesses does Reddit face in its platform infrastructure and user experience? # In what ways is Reddit integrating AI to enhance content moderation and user interactions? # How is Reddit addressing ethical considerations related to AI, such as bias and transparency in algorithms? # What are the projected financial implications of AI integration on Reddit's operational costs and revenue growth? # What opportunities does AI present for Reddit to innovate its services or expand its user base? Using o1-preview for the SWOT We built a response object that includes a thought process, an answer and a list of citations and gave that to o1-preview to complete. Knowledge in this case was roughly 25k words. This call took around 90 seconds to complete, which is around 5-10 times more than with other models. Note that we have a slightly more complex output structure in this call with a Response type that contains a list of Citation types. class Citation(BaseModel): file_name: str page_number: int citation: str class Response(BaseModel): thoughts: str answer: str citations: List[Citation] response, _ = opper.call( name="o1/respond", model="openai/o1-preview", instructions="Produce an answer to the question using knowledge. Refer to any facts with [1], [2] etc.", input={ "question": question, "knowledge": knowledge }, output_type=Response ) print(response.answer) The response to this was printed in full earlier above so we will leave that out. As for the response, I find it to be to the point with correct, relevant citations. It is interesting how plug and play these models were with our existing pipeline. Structured output and RAG worked out of the box with no modifications or adaptations to the pipeline. I believe these models may become very useful for integrating into our AI pipelines, especially for high reasoning tasks. Takeaways In this blog post we showed how to utilize the new reasoning models o1-mini and o1-preview from OpenAI to answer a question using knowledge retrieval and with structured output in a plug and play manner. We used the Opper indexing API to store and query the PDF and then used Opper's opper.call API to call the models. We think these new models offer an exciting new addition to the toolbox of building capable AI features. We look forward to exploring them in greater depths :) Opper

Models

Added fireworks/deepseek-v3-0324 model to the platform

Platform

Introduced a new pricing model with more granular service tracking, providing greater transparency and cost control for your AI operations

OpenAI GPT Image 1 Model: New Image Generation Capability

We've added OpenAI's new gpt-image-1 model to our platform, expanding our image generation capabilities. This addition gives you access to OpenAI's latest image generation technology:

openai/gpt-image-1

Embeddings Support in Node.js SDK

We've added embeddings functionality to the Node.js SDK, enabling you to generate vector embeddings for your text content. This new capability allows you to perform semantic search, content clustering, and other vector-based operations directly through our Node.js interface.

Evaluations Support in Python and Node.js SDKs

We've added support for creating evaluations directly from the Node.js SDK. This new capability allows you to create and manage evaluations programmatically.

Embeddings Support in Python SDK

We've added embeddings functionality to the Python SDK, enabling you to generate vector embeddings for your text content. This new capability allows you to perform semantic search, content clustering, and other vector-based operations directly through our Python interface.

Gemini 2.5 Flash Model

We've added Google's Gemini 2.5 Flash exp model to our platform:

gcp/gemini-2.5-flash

Claude 3.7 Sonnet on AWS

We've added the Claude 3.7 Sonnet model to our AWS provider:

aws/claude-3.7-sonnet-eu

OpenAI o3 and o4-mini models

Added openai/o3
Added openai/o4-mini

OpenAI GPT-4.1 Models: New AI Options

We've added OpenAI's latest GPT-4.1 models to our platform, giving you access to their newest and most capable AI models. These additions expand your options for powerful, state-of-the-art AI capabilities:

Added openai/gpt-4.1
Added openai/gpt-4.1-mini
Added openai/gpt-4.1-nano

New Grok 3 Models Avilable

Added xai/grok-3
Added xai/grok-3-mini-beta

Updated Mistral Models on Azure

We've replaced the retired Mistral model with the latest version available on Azure. This update ensures continued access to Mistral's powerful language capabilities with improved performance:

azure/mistral-large-eu is using the latest Mistral Large model (2411)
azure/mistral-large-2407-eu has been removed

PDF Media Type: Node.js SDK

We've added PDF media type support to the Node.js SDK (v2.7.0), enabling you to work with PDF documents in your applications. This enhancement expands the range of file types you can process using our SDK and simplifies PDF document handling in your Node.js projects.

Llama 4 Scout: New Model on Groq

We've added Meta's Llama 4 Scout model to our Groq integration, giving you access to this powerful new instruction-tuned model. Llama 4 Scout provides excellent performance while maintaining efficiency, expanding your options for AI-powered applications.

groq/llama-4-scout-17b-16e-instruct

Llama 4 Maverick: New Model on Groq

We've added Meta's Llama 4 Maverick model to our Groq integration, giving you access to this powerful new instruction-tuned model. Llama 4 Maverick features an impressive 131,072 token context window and 8,192 max completion tokens, allowing you to process much larger documents and conversations in a single request.

groq/llama-4-maverick-17b-128e-instruct

Gemini 2.5 Pro: Experimental Version

We have updated the Gemini 2.5 Pro model to an experimental version, providing our customers with access to the latest advancements in AI technology.

gcp/gemini-2.5-pro-exp-03-25

Cursor Rules: AI-Powered Code Assistance

We have released markdown files for AI code editors like Cursor that provide context for using the Opper SDK. These files serve as comprehensive guides for AI tools to understand how to interact with Opper for structured calls, indexing operations, tracing, and evaluations.

Available for both Python and TypeScript
Place in your project as .cursor/rules/opper.mdc
Enhances AI coding assistance with Opper-specific knowledge
Learn more at https://docs.opper.ai/sdks/llmtxt

OpenAI GPT-4.5 preview

We have added support for the new GPT-4.5 model.

openai/gpt-4.5-preview

Claude 3.7 Sonnet

We have added support for the new Sonnet model.

anthropic/claude-3.7-sonnet
anthropic/claude-3.7-sonnet-20250219

Thinking

In order to use the new Thinking mode in Claude 3.7, you can do something like this:

import asyncio
import os
from opperai import AsyncOpper
from opperai.types import CallConfiguration

opper = AsyncOpper()

async def main():
    result, _ = await opper.call(
        name="respond",
        model="anthropic/claude-3.7-sonnet",
        input="What is the capital of Sweden?",
        configuration=CallConfiguration(
            model_parameters={
                "thinking": {
                    "type": "enabled",
                    "budget_tokens": 1024,
                },
            }
        ),
    )

    print(result)

asyncio.run(main())

Embeddings API

The API now supports getting embeddings for arbitrary input. While our indexes are the most straightforward way of using external knowledge for RAG use-cases and other things, this provide advanced users greater control over embeddings for custom use-cases.

Example: Input as string

curl -X POST "https://api.opper.ai/v1/embeddings" \
  -H "x-opper-api-key: op-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure/text-embedding-3-large",
    "input": "The text"
  }'

Input as list of strings

curl -X POST "https://api.opper.ai/v1/embeddings" \
  -H "x-opper-api-key: op-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure/text-embedding-3-large",
    "input": ["First text", "Second text", "Third text"]
  }'

Available embedding models

azure/text-embedding-ada-002
azure/text-embedding-3-large
azure/text-embedding-3-large-1536
openai/text-embedding-ada-002
openai/text-embedding-3-large
openai/text-embedding-3-small
opper/e5-mistral-7b-instruct

Gemini 2.0: Flash-Lite

We have added support for Gemini 2.0 Flash-Lite, hosted by Google Cloud Platform (US). You can access it in Opper using the model name:

gcp/gemini-2.0-flash-lite-preview-02-05

OpenAI API/SDKs Compatibility Layer

We have added an OpenAI compatibility layer that allows you to use Opper models with the OpenAI API and SDKs. This gives you the ability to use any model provided by Opper in any project that uses the OpenAI API/SDKs. The compatibility layer supports additional Opper functionality through extra body arguments:

fallback_models: A list of models to use if the primary model is not available
tags: A dictionary of tags to add to the request
span_uuid: The UUID of the span to add to the request
evaluate: Whether to evaluate the generation or not

Python example using these features:

import os
from openai import OpenAI
from opperai import Opper

opper = Opper()

client = OpenAI(
    base_url="https://api.opper.ai/compat/openai",
    api_key="-", # must not be blank
    default_headers={"x-opper-api-key": os.getenv("OPPER_API_KEY")},
)

with opper.spans.start("reverse-name") as span:
    response = client.chat.completions.create(
        model="gorq/deepseek-r1-distill-llama-70", # This model is not available since provider is called "gorq" and not "groq"
        messages=[
            {"role": "user", "content": "What is the capital of France? Please reverse the name before answering."}
        ],
        extra_body={
            "fallback_models": [
                "groq/deepseek-r1-distill-llama-70b",
            ],
            "tags": {
                "user_id": "123",
            },
            "span_uuid": str(span.uuid),
            "evaluate": False,
        }
    )

Node example using these features:

import { OpenAI } from "openai";
import OpperAI from "opperai";

const opper = new OpperAI();

const client = new OpenAI({
    baseURL: "https://api.opper.ai/compat/openai",
    apiKey: "OPPER_API_KEY",
    defaultHeaders: { "x-opper-api-key": "OPPER_API_KEY" },
});

async function main() {
    const trace = await opper.traces.start({
        name: "node-sdk/using-the-openai-sdk",
        input: "What is the capital of France? Please reverse the name before answering.",
    });

    const completion = await client.chat.completions.create({
        model: "openai/gpt-4o-mini",
        messages: [
            {
                role: "user",
                content: "What is the capital of France? Please reverse the name before answering.",
            },
        ],

        // @ts-expect-error These are Opper specific params.
        // fallback_models: ["openai/gpt-4o-mini"],
        span_uuid: trace.uuid.toString(),
        // evaluate: false,
    });

    await trace.end({ output: { foo: completion.choices[0].message.content } });
}

main();

Gemini 2.0: Flash

We have added support for Gemini 2.0 Flash, hosted by Google Cloud Platform (US). You can access it in Opper using the model name:

gcp/gemini-2.0-flash

Billing enabled

You can now add your credit card in the Opper platform to enjoy unlimited usage. The free tier continues to exist, but has a limited usage allowance per month for experimentation and testing.

Gemini 2.0 Flash Thinking

The new experiment from Google called Gemini 2.0 Flash Thinking is now available to test in Opper.

gcp/gemini-2.0-flash-thinking-exp

Deepseek R1

We have added support for Deepseek R1. You can access it in Opper using this model name:

fireworks/deepseek-r1

Deepseek v3

We have added support for Deepseek v3, hosted by Fireworks AI (US). You can access it in Opper using this model name:

fireworks/deepseek-v3

New models

We have added support for the following new models.

gcp/gemini-2.0-flash-exp
groq/llama-3.3-70b-versatile

OpperCLI now supports showing usage information

The OpperCLI now supports showing usage information for your account. This can be used to get an overview of your usage, and optionally grouped by your custom call tags.

The basic usage showing total_tokens looks like this:

➜  opper usage list --fields=total_tokens
Usage Events:


By Göran Sandahl - 2/17/2025
New OpenAI-compatible endpoint: Use Opper with OpenAI SDKs and frameworks

We're excited to announce that Opper now provides an OpenAI-compatible API endpoint, making it easier than ever to access many models and capabilities through a single API. This compatibility layer allows you to use Opper with any tool or library designed for OpenAI's API or SDKs (such as LangChain, Vercel AI SDK, DSPy, etc).

Introducing the OpenAI-compatible API endpoint

Our new compatibility endpoint (https://api.opper.ai/compat/openai) allows you to use Opper API as a drop-in replacement for OpenAI's API. This means you can leverage Opper's capabilities (50+ models, fallbacks, trace logging, load balancing and centralized billing) while maintaining your existing codebase and tooling. Whether you're using the OpenAI SDK directly or working with higher-level frameworks like LangChain, you can now easily get the benefits of Opper.

Here's a simple example using Deepseek-R1 from provider Groq through our OpenAI-compatible endpoint using the OpenAI SDK:

TypeScript
Python

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://api.opper.ai/compat/openai", // Opper API
    apiKey: "-", // must not be blank
    defaultHeaders: { "x-opper-api-key": "OPPER_API_KEY" },
});

const completion = await client.chat.completions.create({
    model: "groq/deepseek-r1-distill-llama-70b", // Opper Model
    messages: [
        {
            role: "user",
            content: "What is the capital of France? Please reverse the name before answering.",
        },
    ],
});

console.log(completion.choices[0].message.content);
This yields:


Okay, so I need to figure out the capital of France and then reverse the name. Hmm, I know that the capital of France is Paris. Wait, is that right? I'm pretty sure it's Paris, but maybe I should double-check. Let me think... Yes, Paris is definitely the capital. Now, I need to reverse the name. So, Paris spelled backwards would be... let's see, P-A-R-I-S. Reversing the letters would be S-I-R-A-P. So, the reversed name is Sirap. That seems correct. I don't think I made any mistakes there. Paris is the capital, and reversing it gives Sirap. Yeah, that makes sense.


The capital of France is Paris. When reversed, the name becomes Sirap. 

Answer: Sirap
Use with popular frameworks

Since this is fundamentally a REST endpoint, you can use it with any framework that uses OpenAI's Python or TypeScript SDKs. Here are some examples:

LangChain

LangChain is a popular framework for experimenting with LLMs.

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
import os

# Initialize the LLM
llm = ChatOpenAI(
    base_url="https://api.opper.ai/compat/openai",
    api_key="-", # must not be blank
    default_headers={"x-opper-api-key": os.environ.get("OPPER_API_KEY")},
    model_name="openai/o3-mini",
)

# Create a prompt template
prompt = PromptTemplate(
    input_variables=["question"],
    template="Answer this question: {question}"
)

# Create and run the chain using pipe syntax
chain = prompt | llm

# Run the chain
response = chain.invoke({"question": "What is the capital of Sweden?"})
print(response.content)

# Output: The capital of Sweden is Stockholm.
Vercel AI SDK

The Vercel AI SDK provides a set of utilities for building AI-powered streaming text and chat UIs. Here's how to integrate Opper

import { createOpenAICompatible } from '@ai-sdk/openai-compatible'
import { streamText, convertToCoreMessages } from 'ai'

export const maxDuration = 30

export async function POST(req: Request) {
  const opper = createOpenAICompatible({
    name: 'opper',
    baseURL: 'https://api.opper.ai/compat/openai', // Opper API
    apiKey: '-', // required but not used
    headers: {
      'x-opper-api-key': process.env.OPPER_API_KEY, // Opper API key
    },
  })

  const { messages } = await req.json()

  const result = streamText({
    model: opper('groq/deepseek-r1-distill-llama-70b'), // Opper Model

    messages: convertToCoreMessages(messages),
  })

  return result.toDataStreamResponse()
}
When to use Opper Structured API vs OpenAI-compatible API?

The Opper Structured API via call() is the recommended for a consistent and predictable experience for developers. It allows focusing on writing structured, maintainable, clean AI code with signficantly less reliance on model specific prompting.

The OpenAI-compatible API is a great way to get started with Opper if you want to manage your prompts fully, but still want to get some of the main benefits of a Unified API, such as:

Go beyond just OpenAI models: Access 50+ models through a single API - switch between models with a single parameter change to find what works best for your use case
Debug and Improve Model Behavior: Built-in logging, tracing, and evaluation tools help you understand model behavior and systematically improve output quality
Manage Rate Limits: Load balancing, fallback across models and providers, and monitoring give you the control and reliability needed for production deployments
Manage API keys across models: Manage API keys on a project level, and allow developers to use this across all models and providers.
Reduce Model Dependencies: As new models emerge, instantly access them through the same API - no need to rewrite code or manage multiple integrations
Cost Control: Unified billing and usage tracking across all models, with a path to optimizing costs with smaller models.
Getting Started

To start using the OpenAI-compatible endpoint:

Sign up for an Opper API key at opper.ai - it's free up to $10/month.
Replace the base URL in your OpenAI client with https://api.opper.ai/compat/openai
Add your Opper API key in the headers as shown in the examples above
What's Next?

For more information on using Tracing, Fallbacks and other features in connection to the OpenAI-compatible API, see our OpenAI-compatible API documentation.

While we recommend using our structured API calls for the best experience, we believe the compatibility API can provide a great way for many to benefit from a model independent inference layer for better resillience, control and quality.

For detailed documentation and more examples, visit our OpenAI-compatible API documentation. If you have any questions or need support, join our Discord community.




Time Bucket: 2024-12-03T00:00:00Z
Cost: 0.029731
Count: 25
total_tokens: 4806

Time Bucket: 2024-12-04T00:00:00Z
Cost: 0.025908
Count: 13
total_tokens: 4155

Time Bucket: 2024-12-06T00:00:00Z
Cost: 0.017290
Count: 7
total_tokens: 2689

More usage information can be found by running the command:

➜  opper usage                           
Manage usage information

Usage:
  opper usage [command]

Examples:
  # List usage information
  opper usage list

  # List usage with time range and granularity
  opper usage list --from-date=2024-01-01T00:00:00Z --to-date=2024-12-31T23:59:59Z --granularity=day

  # List usage with specific fields and grouping
  opper usage list --fields=completion_tokens,total_tokens --group-by=model,project.name

  # Show count over time as ASCII graph (default)
  opper usage list --graph

  # Show cost over time as ASCII graph
  opper usage list --graph=cost

  # Show count over time by model
  opper usage list --group-by model --graph

  # Export usage as CSV
  opper usage list --out csv

Tracking calls using a customer tag looks like this. First include the customer tag in the call:

opper.call(
    name="my-function",
    input="Hello, world!",
    tags={"customer": "mycustomer"},
)

Then run the opper usage list --group-by=customer command to see the usage information grouped by the customer tag.

➜  opper usage list --fields=total_tokens --group-by=customer 
Usage Events:

Time Bucket: 2024-12-06T00:00:00Z
Cost: 0.025908
Count: 13
customer: <nil>
total_tokens: 4155

Time Bucket: 2024-12-06T00:00:00Z
Cost: 0.000007
Count: 1
customer: mycustomer
total_tokens: 23

New feature: Run evaluations on alternative models and prompts

Opper now supports running ad hoc evaluations with different models, instructions and function configurations. It works by running through a functions dataset entries and evaluating the results. This allows for testing how a function performs with current or alternative configuration.

Evaluating a function with different models to find the best one

See our documentation on Offline Evals for more information.

Updates to managing datasets

We have improved handling of datasets to help make it easier to populate them:

Dataset entries now includes an expected field that is used in evaluations and in few shot configuration.
Dataset entries can be populated from any trace, by uploading a json file or through the sdks.

Adding an entry to a dataset from a trace

See our documentation on Datasets for more information.

Added llms.txt to https://opper.ai

We added an llms.txt file to https://opper.ai to assist AI code editors like Cursor to find relevant documentation about Opper. See https://llmstxt.org/ for more information.

New models

We have added support for the following new models:

gcp/gemini-exp-1114
gcp/gemini-exp-1121
mistral/pixtral-large-latest-eu
xai/grok-beta
xai/grok-vision-beta

Support for custom models

There is now support for custom models in Opper. This means that you can bring your own key to an existing model or add a completely custom model.

The easiest way to add a model is to use the Opper CLI. The README explains how to add a model, but here is an example of adding your own Azure deployment:

opper models create example/my-gpt4 azure/gpt4-production my-api-key-here '{"api_base": "https://my-gpt4-deployment.openai.azure.com/", "api_version": "2024-06-01"}'

This adds your custom deployment on my-gpt4-deployment.openai.azure.com and the model name gpt4-production using the my-api-key-here API key. This model is then accessible in Opper using the name example/my-gpt4.

Support for fallback models

The Opper API now support providing a list of fallback models, in addition to the main model used in a call. They will be tried in order until a model returns successfully.

Python sync example

from opperai import Opper
opper = Opper()
response, _ = opper.call(
    name="GetFirstWeekday",
    input="Today is Tuesday, yesterday was Monday",
    instructions="Extract the first weekday mentioned in the text",
    model="azure/gpt-4o-eu",
    fallback_models=["openai/gpt-4o"],
)
print(response)

Python async example

from opperai import AsyncOpper
import asyncio
opper = AsyncOpper()
async def main():
    response, _ = await opper.call(
        name="GetFirstWeekday",
        input="Today is Tuesday, yesterday was Monday",
        instructions="Extract the first weekday mentioned in the text",
        model="azure/gpt-4o-eu",
        fallback_models=["openai/gpt-4o"],
    )
    print(response)
if __name__ == "__main__":
    asyncio.run(main())

Node example

import OpperAI from 'opperai';
import fs from "fs";
import path from "path";
import os from "os";

async function testCallFallback() {
    // Replace 'your-api-key' with your actual OpperAI API key
    const client = new OpperAI({ apiKey: 'your-api-key' });

    const { message, span_id } = await client.call({
        name: "GetFirstWeekday",
        input: "Today is Tuesday, yesterday was Monday",
        instructions: "Extract the first weekday mentioned in the text",
        model: "azure/gpt-4o-eu",
        fallback_models: ["openai/gpt-4o"],
    });

    console.log(message);
}

testCallFallback();

Added support for Anthropic Claude 3.5 Haiku

import asyncio
from opperai import AsyncOpper

async def haiku():
    aopper = AsyncOpper()
    res, _ = await aopper.call(
        model="anthropic/claude-3.5-haiku",
        name="new-haiku-3-5",
        instructions="answer the following question",
        input="what are some uses of 42",
    )
    print(res)


asyncio.run(haiku())

Enhanced Sidebar for Project Navigation

With our new sidebar update, users can now effortlessly select their desired projects directly from the side panel. This improved navigation persists across indexes, traces, and functions, ensuring a seamless workflow experience.

Added Metrics Filtering

We've upgraded our metrics display within trace spans. Users can now apply filters to better manage the metrics they need to focus on. These enhancements provide a clearer, more accessible presentation of data within the trace table.

Streaming support for `call()`

It is now possible to stream the response from the call() method.

import asyncio
from opperai import AsyncOpper

async def stream():
    aopper = AsyncOpper()
    res = await aopper.call(
        model="anthropic/claude-3.5-sonnet",
        input="what are some uses of 42",
        stream=True,
    )
    async for chunk in res.deltas:
        print(chunk)

asyncio.run(stream())

For node sdk see examples

Added support for updated version of Anthropic Claude 3.5 Sonnet

import asyncio
from opperai import AsyncOpper

async def sonnet():
    aopper = AsyncOpper()
    res, _ = await aopper.call(
        model="anthropic/claude-3.5-sonnet-20241022",
        name="new-sonnet-3-5",
        instructions="answer the following question",
        input="what are some uses of 42",
    )
    print(res)


asyncio.run(sonnet())

The anthropic/claude-3.5-sonnet model now defaults to the updated version.

Updated default model

If you do not explicitly provide a model in your call(), it will now default to the azure/gpt-4o-eu model.

Added support for Imagen 3 in the Python and Node SDKs

Opper now support two image generation models, azure/dall-e-3-eu and gcp/imagen-3.0-generate-001-eu. Here is an example of generating an image from a description in Python:

def generate_image(description: str) -> ImageOutput:
    image, _ = opper.call(
        name="generate_image",
        output_type=ImageOutput,
        input=description,
        model="gcp/imagen-3.0-generate-001-eu",
        configuration=CallConfiguration(
            model_parameters={
                "aspectRatio": "9:16",
            }
        ),
    ) 
    return image


description = "portrait of a person standing in front of a park. vibrant, autumn colors"

path = save_file(generate_image(description).bytes)
print(path)

Here is a similar example in TypeScript:

async function testImageGeneration() {
    const image = await client.generateImage({
        model: "gcp/imagen-3.0-generate-001-eu",
        prompt: "portrait of a person standing in front of a park. vibrant, autumn colors",
        configuration: {
            model_parameters: {
                aspectRatio: "9:16",
            }
        }
    });

    const tempFilePath = path.join(os.tmpdir(), "image.png");
    fs.writeFileSync(tempFilePath, image.bytes);
    console.log(`image written to temporary file: ${tempFilePath}`);
}

testImageGeneration();

Model parameters vary between models, but here are the supported ones for each model:

azure/dall-e-3-eu:

style: natural, vivid
quality: standard, hd
size: 1024x1024, 1792x1024, 1024x1792

gcp/imagen-3.0-generate-001-eu:

aspectRatio: 1:1, 3:4, 4:3, 16:9, 9:16

Images as input to multimodal models

You are now able to pass images as input to multimodal models.

Python SDK

# special type for images, this is to capture the need for encoding the image in the right format
from opperai import ImageInput 

description, response = await aopper.call(
    name="async_describe_image",
    instructions="Create a short description of the image",
    output_type=Description,
    input=Image(
        image=ImageInput.from_path("examples/cat.png"),
    ),
    model="openai/gpt-4o",
)

Node SDK

// special function to read images, this is to capture the need for encoding the image in the right format
import { opperImage } from "opperai"; 

const { message } = await client.call({
    parent_span_uuid: trace.uuid,
    name: "node-sdk/call/multimodal/image-input",
    instructions: "Create a short description of the image",
    input: {image: image("examples/cat.png")},
    model: "openai/gpt-4o",
});

Image generation using DALL-E 3 now available

Using the ImageOutput type you are now able to generate images via call using DALL-E 3 in the Python SDK.

from opperai import ImageOutput

cat, _ = await aopper.call(
    name="generate_cat",
    output_type=ImageOutput,
    input="Create an image of a cat",
)

Using the Node SDK you can generate images using DALL-E 3.

const cat = await client.generateImage({
    parent_span_uuid: trace.uuid,
    prompt: "Create an image of a cat",
});

New models added

aws/claude-3.5-sonnet-eu
cerebras/llama3.1-8b
cerebras/llama3.1-70b
gcp/gemini-1.5-pro-002-eu
gcp/gemini-1.5-flash-002-eu
groq/llama-3.1-70b-versatile
groq/llama-3.1-8b-instant
groq/gemma2-9b-it
mistral/pixtral-12b-2409-eu
openai/o1-preview
openai/o1-mini

See Cerebras for more information about these models.

Updated default embedding model

The new default embedding model for indexes is text-embedding-3-large.

New models added

azure/meta-llama-3.1-405b
azure/meta-llama-3.1-70b-eu
azure/mistral-large-2407
mistral/mistral-large-2407
openai/gpt-4o-2024-05-13 (openai/gpt-4o currently points to this)
openai/gpt-4o-2024-08-06

Add examples at call time

You can now add examples at call time. This is useful if you have a set of examples that you want to use as a reference for your model without having to manage a dataset.

output, _ = opper.call(
    name="changelog/python/call-with-examples",
    instructions="extract the weekday from a text",
    examples=[
        Example(input="Today is Monday", output="Monday"),
        Example(input="Friday is the best day of the week", output="Friday"),
        Example(
            input="Saturday is the second best day of the week", output="Saturday"
        ),
    ],
    input="Wonder what day it is on Sunday",
)

The three ways of tracing your code using the Python SDK

Manually

span = opper.traces.start_trace(name="my_function", input="Hello, world!")
# business logic here
span.end()

Using context manager

with opper.traces.start(name="my_function", input="Hello, world!") as span:
    # business logic here

Using the @trace decorator

@trace
def my_function(input: str) -> str:
    # business logic here

Call a LLM without explicitly creating a function using the Python SDK

You can now call a LLM without explicitly creating a function.

opper.call(name="anthropic/claude-3-haiku", input="Hello, world!")

Manually trace using the Node SDK

You can now manually trace using the Node SDK.

// Start parent trace
const trace = await client.traces.start({
    name: "node-sdk/tracing-manual",
    input: "Trace initialization",
});

// You can optionally start a child span and provide the input
const span = await trace.startSpan({
    name: "node-sdk/tracing-manual/span",
    input: "Some input given to the span",
});

// A metric and/or comment can be saved to the span
// A span generation can also be saved using .saveGeneration()
await span.saveMetric({
    dimension: "accuracy",
    score: 1,
    comment: "This is a comment",
});

By Johnny Chadda - 4/8/2025
Building a Simple GitHub PR Review Agent with ReAct

Imagine having an intelligent assistant that could automatically review your GitHub pull requests, providing thoughtful feedback, detecting bugs, and ensuring code quality standards are met. In this post, we'll build an initial version of that - a simple but effective GitHub PR review agent using the ReAct pattern.

There are many agent frameworks and patterns out there, and most of them are reasonably simple under the hood, which is why argue for building the agents yourself, so they can be tuned for your specific tasks. The ReAct (Reasoning, Acting, Observation) pattern is a powerful approach to building AI agents that can reason through complex problems step-by-step. It's an iterative process where the agent:

Reasons about the current state and goals
Acts by selecting and executing a relevant tool or action
Observes the results of the action
This approach leads to more transparent, reliable, and effective agents compared to agents that attempt to solve problems in a single step.

Note: All the code for this blog post is available in the react-agent/01-github-pr-reviewer repository. You can run and modify these examples to see the ReAct pattern in action.

Prerequisites

To follow along, you'll need:

Python 3.9+
A GitHub account and personal access token (optional for public repositories)
Basic familiarity with Pydantic and async Python
The Opper AI SDK (pip install opperai)
Understanding the ReAct Pattern

Before diving into code, let's understand why the ReAct pattern is so effective for building agents:

Transparent reasoning: Each step in the agent's thought process is explicit
Modularity: Tools and actions can be added, removed, or modified independently
Error recovery: The agent can observe errors and try alternative approaches
Trace-ability: Each step can be logged and analyzed for debugging
The ReAct pattern mirrors how humans solve problems - reasoning about the situation, taking an action, observing the result, and then continuing with this new information.

Core Architecture: The ReAct Loop

At the heart of our agent is the ReAct loop - a cycle of reasoning, action, and observation. Here's a simplified version of the core loop:

# ReAct loop
while current_step < self.max_steps:
    # Step 1: REASONING - Analyze the current state
    reasoning = await self._react_reasoning(agent, context)
    
    # Step 2: ACTION SELECTION - Select the next action
    action = await self._react_action_selection(agent, context, reasoning)
    
    # If the action is to finish, we're done
    if action.action_type == "finish":
        return action.output or {}
        
    # Step 3: OBSERVATION - Execute the selected tool
    if action.action_type == "use_tool" and action.tool_name:
        tool_name = action.tool_name
        tool_params = action.tool_params or {}
        
        # Execute the tool
        result = await self.tools[tool_name](tool_params)
        observation = str(result)
        
        # Update context with observation
        context["last_observation"] = observation
        context["intermediate_results"][f"step_{current_step}"] = result
This loop represents the core of our agent's execution model, alternating between reasoning, selecting actions, and making observations. The full implementation can be found in agent_runner.py.

Schema-Driven Design with Pydantic

A key principle in our agent implementation is using schemas to clearly define inputs and outputs. We use Pydantic models for this purpose:

class AgentReasoning(BaseModel):
    """Model for agent's reasoning step output."""
    content: str = Field(..., description="The agent's reasoning about the current state")

class AgentAction(BaseModel):
    """Model for agent's action selection output."""
    action_type: str = Field(..., description="Type of action: 'use_tool' or 'finish'")
    tool_name: Optional[str] = Field(None, description="Name of the tool to use")
    tool_params: Optional[Dict[str, Any]] = Field(None, description="Parameters for the tool")
    output: Optional[Dict[str, Any]] = Field(None, description="Final output if finishing")

class AgentOutput(BaseModel):
    """Model for the final PR review output."""
    review_summary: str = Field(..., description="Summary of the PR changes")
    issues_found: List[str] = Field(default_factory=list, description="List of issues found")
    suggestions: List[str] = Field(default_factory=list, description="List of suggestions")
    overall_assessment: str = Field(..., description="Overall assessment of the PR")
These schemas serve multiple purposes:

Validation: Ensure data meets our expectations
Documentation: Self-document the interface for developers
Structure for LLMs: Give the language model clear guidance on expected outputs
The GitHub PR Tool Implementation

Our agent needs a way to interact with GitHub. While we could create a generic MCP client that connects to a GitHub MCP server, here we implement a simple GitHubPRTool class that fetches PR information:

@trace(name="github_pr_tool.execute")
async def execute(self, params: Dict[str, Any]) -> Dict[str, Any]:
    """Execute the GitHub PR tool."""
    try:
        # Get PR information
        pr_info = await self._get_pr_info(params["owner"], params["repo"], params["pr_number"])
        
        # Check if repository is private and we're not authenticated
        if pr_info.get("private", False) and "Authorization" not in self.headers:
            return {
                "error": "This is a private repository. A GitHub token is required for access.",
                "status": "error"
            }
        
        # Get PR files and diff
        files = await self._get_pr_files(params["owner"], params["repo"], params["pr_number"])
        diff = await self._get_pr_diff(params["owner"], params["repo"], params["pr_number"])
        
        # Return structured result
        return {
            "pr_title": pr_info["title"],
            "pr_author": pr_info["user"]["login"],
            "changed_files": [f["filename"] for f in files],
            "additions": pr_info["additions"],
            "deletions": pr_info["deletions"],
            "diff": self._truncate_diff(diff),
            "pr_description": pr_info["body"] or "",
            "pr_url": pr_info["html_url"],
            "repository_private": pr_info.get("private", False),
            "status": "success"
        }
    except Exception as e:
        logger.error(f"Error executing GitHub PR tool: {e}", exc_info=True)
        return {"error": f"Error retrieving PR information: {str(e)}", "status": "error"}
The tool handles fetching various pieces of PR information from the GitHub API and returns them in a structured format. For the complete implementation, see github_pr_tool.py.

Using Opper for LLM Calls

A critical part of our agent is the LLM-powered reasoning and decision making. We use the Opper SDK for making structured LLM calls:

async def _react_reasoning(self, agent: Dict[str, Any], context: Dict[str, Any]) -> AgentReasoning:
    """Generate reasoning based on the current context."""
    reasoning_instructions = """
    You are in the REASONING phase of a ReAct (Reasoning-Acting-Observation) loop.
    
    In this phase, you should:
    1. Analyze the current state and context
    2. Think step-by-step about what you know and what you need to find out
    3. Consider what tools or actions might be helpful
    4. Determine your next steps
    
    Your reasoning should be thorough, logical, and clear.
    
    Additionally, provide a confidence score from 0.0 to 1.0 indicating how
    confident you are in your reasoning.
    """
    
    result, _ = await opper.call(
        name="agent_reasoning",
        instructions=reasoning_instructions,
        input={
            "agent_instructions": agent.get("instructions", ""),
            "context": context,
            "step_number": context.get("current_step", 0),
            "last_observation": context.get("last_observation", None),
        },
        output_type=AgentReasoning,
    )
    return result

async def _react_action_selection(
    self, agent: Dict[str, Any], context: Dict[str, Any], reasoning: AgentReasoning
) -> AgentAction:
    """Select the next action based on reasoning."""
    # Get the list of available tools
    available_tools = list(self.tools.keys())
    
    action_instructions = """
    You are in the ACTION SELECTION phase of a ReAct (Reasoning-Acting-Observation) loop.
    
    Based on your prior reasoning, you must now decide on the next action to take.
    
    You have two options:
    1. Use a tool to gather more information or make progress:
       - action_type: "use_tool"
       - tool_name: Select from the available tools in the input
       - tool_params: Provide the necessary parameters for the tool
       
    2. Finish the task if you have enough information:
       - action_type: "finish"
       - output: Provide your final review with:
         - review_summary: A concise summary of the PR changes
         - issues_found: A list of issues or concerns
         - suggestions: A list of improvement suggestions
         - overall_assessment: Your final assessment of the PR
    """
    
    result, _ = await opper.call(
        name="agent_action",
        instructions=action_instructions,
        input={
            "reasoning": reasoning.content,
            "reasoning_confidence": reasoning.confidence,
            "context": context,
            "available_tools": available_tools,
            "agent_instructions": agent.get("instructions", ""),
            "step_number": context.get("current_step", 0),
        },
        output_type=AgentAction,
    )
    return result
The key features here are:

Structured inputs: We carefully prepare the context for the LLM
Structured outputs: We use Pydantic models to define expected responses
Tracing: The @trace decorator enables observability
Comprehensive Tracing

One important aspect of building reliable agents is observability. We use Opper's tracing capabilities to track each step of execution:

@trace(name="agent_runner.run_agent")
async def run_agent(self, agent_id: str, agent: Dict[str, Any], input_data: Dict[str, Any]) -> Dict[str, Any]:
    """Run an agent with the given input data."""
    # ... implementation ...
By wrapping key functions with @trace, we get comprehensive traces for each agent run, including:

Time spent in each function
Inputs and outputs at each step
Error conditions
Custom metrics
Putting It All Together

Our main script ties everything together, creating a complete GitHub PR Reviewer:

# Agent configuration
PR_REVIEW_AGENT = {
    "instructions": """
    You are a GitHub PR reviewer. Your task is to review pull requests and provide helpful feedback.
    You should:
    1. Fetch the PR information using the github_pr_tool
    2. Analyze the changes and their impact
    3. Identify potential issues or improvements
    4. Provide a detailed review with actionable feedback
    
    Your final output should include:
    - A summary of the changes
    - List of issues found (if any)
    - Suggestions for improvement
    - Overall assessment
    """,
    "verbose": False,  # Will be set from command line args
}

# Initialize services
agent_runner = AgentRunnerService()

# Initialize GitHub PR tool
github_token = os.getenv("GITHUB_TOKEN")  # Optional for public repositories
github_pr_tool = GitHubPRTool(github_token)

# Register tools
agent_runner.register_tools({
    "github_pr_tool": github_pr_tool.execute
})

# Run the agent
result = await agent_runner.run_agent(
    agent_id="github_pr_reviewer",
    agent=PR_REVIEW_AGENT,
    input_data={
        "owner": args.owner,
        "repo": args.repo,
        "pr_number": args.pr_number,
    }
)
For the complete working example, see main.py.

Running the Example

To run the complete example:

Clone the repository
Navigate to the example directory: cd react-agent/01-github-pr-reviewer
Install dependencies: pip install -r requirements.txt
Create a .env file with your Opper API key (GitHub token optional)
Run the script: python main.py   
Add -v flag to see the agent's thought process: python main.py    -v
Detailed instructions are available in the README.

Next Steps

This implementation demonstrates the core concepts, but there are many ways to enhance it:

Error handling and retries: Add robust error handling for API calls and LLM calls
Caching: Cache API responses to avoid rate limiting. Opper supports returning cached respones for the same inputs.
Advanced PR analysis: Add code quality checks and security scanning
State persistence: Save and retrieve agent state between runs
Human feedback: Allow humans to provide feedback on the agent's reviews
Conclusion

We've built a simple but functional GitHub PR review agent using the ReAct pattern. This agent demonstrates several key principles:

ReAct pattern for structured reasoning, action, and observation cycles
Schema-driven design with Pydantic for clear interfaces
Opper tracing for comprehensive observability
Modular tools architecture for extensibility
For a deep-dive into the implementation, explore the complete code.

In a future post, we'll expand our agent's capabilities with more advanced GitHub features, and also make the agent more advanced.

Stay tuned!
CONTACT US
hello@opper.ai
GitHub
Discord
LinkedIn
X
Terms of service
Security overview
Privacy policy
Changelog
Opper Technology AB


// End the span and provide the output
await span.end({
    output: JSON.stringify({ foo: "bar" }),
});

// End the parent trace
await trace.end({ output: JSON.stringify({ foo: "bar" }) });

Call a LLM without explicitly creating a function using the Node SDK

You can now call a LLM without explicitly creating a function.

const { message } = await client.call({
    name: "node-sdk/call/basic",
    input: "what is the capital of sweden",
});

Manually adding generations

You can now manually add generations to your traces. This is useful if you call an LLM outside of Opper but still want to use the tracing capabilities of Opper.

def run():
    opper = Opper()
    spans = opper.spans

    with spans.start("transform", input="Hello, world!") as span:
        t0 = datetime.now(timezone.utc)
        manually_call_llm()
        t1 = datetime.now(timezone.utc)
        span.save_generation(
            called_at=t0,
            duration_ms=int((t1 - t0).total_seconds() * 1000),
            response="I'm happy because I'm happy",
            model="anthropic/claude-3-haiku",
            messages=[
                {
                    "role": "user",
                    "content": "Hello, world!",
                }
            ],
            cost=3.1,
            prompt_tokens=10,
            completion_tokens=10,
            total_tokens=20,
        )

OpenAI model GPT-4o mini now available

We have added support for the just released GPT-4o mini model from OpenAI.

Projects now available

Projects

Projects allow you to create separation in Opper. Currently, the following is tied to a project:

Functions
Indexes
Traces
API keys

When you create an API for a specific project, all usage will be associated with that specific project automatically, so there is no need to pass the project as you are using it.

Manage organizations and invite your colleagues

Projects

You are now able to create your own organizations in Opper. Go to Settings --> Organization and click Create Organization in the top right corner to get started.

Once you have created your organization, you are able to invite your colleagues by sending an invite to their email address. Once they are in, you are able to collaborate and have a common view of your AI usage.

Models

Platform

OpenAI GPT Image 1 Model: New Image Generation Capability

Embeddings Support in Node.js SDK

Evaluations Support in Python and Node.js SDKs

Embeddings Support in Python SDK

Gemini 2.5 Flash Model

Claude 3.7 Sonnet on AWS

OpenAI o3 and o4-mini models

OpenAI GPT-4.1 Models: New AI Options

New Grok 3 Models Avilable

Updated Mistral Models on Azure

PDF Media Type: Node.js SDK

Llama 4 Scout: New Model on Groq

Llama 4 Maverick: New Model on Groq

Gemini 2.5 Pro: Experimental Version

Cursor Rules: AI-Powered Code Assistance

OpenAI GPT-4.5 preview

Claude 3.7 Sonnet

Thinking

Embeddings API

Gemini 2.0: Flash-Lite

OpenAI API/SDKs Compatibility Layer

Python example using these features:

Node example using these features:

Gemini 2.0: Flash

Billing enabled

Gemini 2.0 Flash Thinking

Deepseek R1

Deepseek v3

New models

OpperCLI now supports showing usage information

New feature: Run evaluations on alternative models and prompts

Updates to managing datasets

Added llms.txt to https://opper.ai

New models

Support for custom models

Support for fallback models

Python sync example

Python async example

Node example

Added support for Anthropic Claude 3.5 Haiku

Enhanced Sidebar for Project Navigation

Added Metrics Filtering

Streaming support for call()

Added support for updated version of Anthropic Claude 3.5 Sonnet

Updated default model

Added support for Imagen 3 in the Python and Node SDKs

Images as input to multimodal models

Image generation using DALL-E 3 now available

New models added

Updated default embedding model

New models added

Add examples at call time

The three ways of tracing your code using the Python SDK

Call a LLM without explicitly creating a function using the Python SDK

Manually trace using the Node SDK

Call a LLM without explicitly creating a function using the Node SDK

Manually adding generations

OpenAI model GPT-4o mini now available

Projects now available

Manage organizations and invite your colleagues

Streaming support for `call()`

Context Reasoning

Context

Input

Expected output

Model output

Context

Input

Expected output

Model output