By Göran Sandahl - 11/27/2024

Agentic customer service chatbot with tools, tracing and evals

When chatbots move beyond simple message exchanges to performing complex tasks, they face unique challenges due to the conversational nature of interactions. Unlike traditional interfaces where we can design strict input forms, chatbots must extract necessary information from often ambiguous or unclear requests while maintaining a natural dialogue with users.

In this blog post, we'll explore how to build a robust chatbot that can interact with external tools and APIs in a structured way. The chatbot can enable an experience like the following:

Challenges with chat

Before diving into the implementation details, let's examine the key challenges that arise when building intelligent chatbots:

Understanding user intent from natural language - Users express their needs in varied and often ambiguous ways, making it crucial to reliably interpret their true intentions
Structured data extraction - Converting free-form text into well-defined parameters needed for API calls and database operations while handling edge cases gracefully
Asynchronous task management - Executing potentially long-running operations while maintaining engaging conversation flow and providing status updates
Error handling and observability - Implementing robust error recovery, detailed logging, and monitoring to ensure reliable operation and enable quick debugging
Context management - Maintaining conversation history and state to provide coherent, contextual responses across multiple interactions

Let's explore how building on Opper helps address each of these challenges systematically.

An agentic approach

In this example, we are going to be implementing the chatbot with an agentic approach.

An agentic system moves beyond hardcoded command pipelines. Instead of following predetermined paths, it maintains a dynamic trajectory - typically implemented as a log that accumulates interaction history and results. The next action is determined by analyzing this trajectory rather than following fixed steps.

This approach helps solve many of the challenges mentioned above by enabling flexible, context-aware responses. Rather than hardcoding paths, we allow the assistant to reason about the conversation state and determine appropriate next steps.

In our implementation, we use a shared messages log to store:

User inputs
Assistant responses
Contextual data from function calls

We introduce a special function role to inject analysis and context between user messages and assistant responses. This helps guide the assistant in providing accurate, contextual responses by performing intent identification, database lookups, and other background tasks while maintaining conversation flow.

Here is how we implement this:


# Get user input
user_input = input("User: ")

messages.append({
    "role": "user",
    "content": user_input
})

# Analyse the conversation and return an analysis
analysis = process_message(messages)

messages.append({
    "role": "function",
    "content": analysis
})

# Bake response to the user 
response = bake_response(messages)

print(f"Assistant: {response}")

messages.append({
    "role": "assistant",
    "content": response
})

This will yield a list like the following:

# Example conversation array
messages = [
    {
        "role": "user", 
        "content": "Hi, I'd like to check the status of my order #12345"
    },
    {
        "role": "function",
        "content": "User is requesting order status information. Got order id 12345 but missing email."
    },
    {
        "role": "assistant",
        "content": "I'll help you check the status of your order #12345. Could you please provide the email address associated with the order for verification?"
    },
    {
        "role": "user",
        "content": "Sure, it's john.doe@example.com"
    },
    {
        "role": "function", 
        "content": "User is requesting order status information. Got order id 12345 and email john.doe@example.com and can continue getting the status of the order"
    },
    {
        "role": "assistant",
        "content": "Thank you. Let me look up that order for you..."
    }
]

In this message log design, each role serves a distinct purpose:

The user role captures and preserves user inputs in their original form
The assistant role stores the chatbot's responses to maintain conversation flow
The function role acts as a processing layer that processes user messages and extracts structured information needed by the system. It helps bridge between raw user input and the system's internal logic by identifying intents, extracting entities, interacting with external systems and yielding results as context to the assistant.

This message log forms the backbone of our agent system by maintaining a complete conversation history and state. At each turn, the agent can process and reason over the entire conversation context to determine the most appropriate next action. This allows for dynamic, context-aware responses rather than following rigid predetermined paths.

For example, in the conversation above, the function role identified missing email information and guided the assistant to request it naturally, demonstrating how this architecture enables flexible, stateful interactions.

Implementing the function role

In our above example we see that the function takes its content from the function process_message(). Lets take a look at how we have implemented the process message function.


# Function to process messages
def process_message(messages):

    # We first determine intent with the conversation
    intent = determine_intent(messages)

    # Process based on intent
    if intent.intent == "get_order_status":

        # Extract requested order information
        order_request = extract_order_from_messages(messages)

        # Verify we have all needed order info
        if not order_request.order_id or not order_request.email:
            return  f"Need {'order ID and email' if not order_request.order_id and not order_request.email else 'order ID' if not order_request.order_id else 'email'}"

        # Perform database operation to get the order
        order = get_order(id=order_request.order_id, email=order_request.email)
        if order:
            # Return the order
            return {
                    "order_id": order_request.order_id,
                    "status": orders[order_request.order_id]["status"],
                    "email": orders[order_request.order_id]["email"],
                    "address": orders[order_request.order_id]["adress"]
                }
        else:
            # Return no such order
            return f"Could not find an order with id {order_request.order_id} and email {order_request.email} "

The process_message() function implements two essential components of our chatbot system:

Intent determination - First, we analyze the user's message to understand what they're trying to accomplish using the determine_intent() function.
Intent-specific processing - Based on the identified intent, we execute the appropriate business logic. For example, with an order status request:
- Extract order details from the conversation
- Validate that we have all required information (order ID and email)
- Query the database to retrieve the order if we have complete information
- Return either the order details or a request for missing information

This two-step approach allows our chatbot to first understand what the user wants, then systematically gather and validate the information needed to fulfill that request.

Let's examine how we implement intent classification in more detail...

Identifying intent of the user

The determine_intent() function analyzes conversations to classify user intent into three categories: Checking order status, searching products, and unsupported requests

It uses a structured IntentClassification output type with two fields:

thoughts: The model's reasoning about the user's intent
intent: The classified intent (get_order_status, query_products, or unsupported)

This structured approach ensures consistent, predictable outputs while giving the model a way to handle requests outside its capabilities.


class IntentClassification(BaseModel):
    thoughts: str
    intent: Literal["get_order_status", "query_products", "unsupported"]

def determine_intent(messages):
    intent, _ = opper.call(
        name="determine_intent",
        instructions="Analyze the user message and determine their intent. Supported intents are get_order_status and query products.",
        input={"messages": messages},
        output_type=IntentClassification
    )
    return intent

# Example output:
# {
#   "thoughts": "The user is asking about their recent order and wants to know its status. They mentioned tracking number which indicates they want order status information.",
#   "intent": "get_order_status"
# }

Extracting the necessary information for tools

After identifying the intent, the next step is extracting the specific information needed to fulfill the user's request. For example, checking an order status requires both an order_id and email address for security verification.

To systematically extract this information from the conversation, we'll create a dedicated function using another structured output type. This ensures we receive the required data in a consistent, type-safe format while also capturing the model's reasoning about what information it found or couldn't find in the conversation:

# A parsed order request
class ParsedOrder(BaseModel):
    thoughts: str
    order_id: int | None = None
    email: str | None = None

# Function to extract order data from conversation
def extract_order_from_messages(messages):
    order_info, _ = opper.call(
        name="extract_order_info",
        instructions="Extract order ID and email from the conversation if present",
        input={"messages": messages},
        output_type=ParsedOrder
    )
    return order_info

# Example output:
# {
#   "thoughts": "The user mentioned order #12345 but did not provide their email address",
#   "order_id": 12345,
#   "email": None
# }

This structured output of a ParsedOrder makes it easy to determine our next steps:

If both order_id and email are present, we can proceed with checking the order status
If either field is missing, we can prompt the user for the specific information we need
The thoughts field provides context about what was found or missing from the conversation

This structured approach ensures we have all required information before proceeding with any operations, while maintaining a natural conversation flow by only asking for what's actually needed.

Return the contextual output for the next step

Going back to our process_message() function we can see how we return the appropriate contextual information for the next step to be determined, be it the contents of a database call or any information about missing fields etc. This structured output ensures the conversation flow remains coherent while gathering all necessary information to complete the user's request.

Baking a response from the assistant

Now that our message log contains the complete conversation history in chronological order, including context from function calls, we can pass this information to a bake_response() function that will construct a natural and contextually-aware response to the user. This function leverages the full conversation context to generate responses that feel cohesive and maintain continuity with previous interactions.

Here is how this function can look like:

# Function to build a friendly AI response
def bake_response(messages):
    response, _ = opper.call(
        name="generate_response",
        instructions="Generate a helpful, friendly but brief response to the user's message in the conversation.",
        input={"messages": messages},
        output_type=str,
        # stream=True
    )
    return response

The output of this call could look like this:

"I've found your order #12345! It looks like it was shipped yesterday and is currently in transit. You can expect delivery by Friday. Let me know if you need anything else!"

This output is then added to the message log.

For UI integration purposes, we could modify bake_response to return a structured object instead of a plain string. By doing this it would be easier to render a UI component for the Order, like in the screenshot we provided at the top.

Observing and debugging sessions

Now, in production these conversations can take very different shapes and forms. It is important that we have means of seeing what is going on.

By implementing proper tracing, which the Opper SDKs supports, we can get a clean view of all user sessions, every turn and the underlying calls such as determine_intent, extract_order etc. This allows us to debug any user session in a straight forward manner.

A trace of logs showing user interacting with chatbot

Optimizing functions to find the best model and prompt

One common question is how to optimize prompts and select the most appropriate models. Since we've built task-oriented function calls, we're well positioned to optimize each component independently. Using Opper's datasets and evaluation tools, you can systematically test different models and prompts for each task to find the optimal combination.

Evaluating a function with different models to find the best one

For example:

Intent detection may work well with smaller, faster models
Complex information extraction might benefit from more capable models
Response generation could require models with strong natural language abilities, preferably with good multi-lingual skills if your chatbot serves markets in multiple languages.

By evaluating each function separately, you can optimize both performance and cost while maintaining high quality.

Conclusion

In this blog post, we've explored building a chatbot that effectively uses tools while maintaining natural conversation flow. Through structured input/output patterns and an agentic approach, we've addressed key challenges:

Extracting structured data from conversations
Safely performing validated external actions
Maintaining conversational context
Building natural, contextual responses

The key insight is treating conversations as dynamic trajectories rather than fixed pipelines. By storing context in a shared message log and using structured function calls, we create a flexible system that handles ambiguity while performing reliable operations.

This pattern extends to complex scenarios like multi-step workflows, error handling, and backend integrations. The structured approach ensures control and observability while enabling natural AI conversations.

The full code example, which is intended to be a proof of concept, is available in our docs