Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

100
Duration
12s
Input Tokens
1329
Output Tokens
401
Cost
$0.00
Context
Input
You have just received a trigger with a user request. What is the first thing you do?
Expected output
The first thing I do is to fully understand the user's intent, reason carefully, and break the problem into clear steps.
Model output
Reasoning: Upon receiving a trigger of type `user_request`, my very first action is to carefully read the trigger content so I can comprehend exactly what the user is asking for. This means parsing the user’s words, identifying the core intent (add the yearly company dinner to the calendar with a reminder), spotting any missing information (the exact date of the dinner, how many days before the reminder should fire), and noting the tools that might be relevant (email search to find the details, calendar actions to create the event, user-input tool if information is still missing). Only after I have a clear picture of the request and its gaps do I proceed to break the task into concrete steps and select the appropriate tools.

Response: First, I read and fully understand the user’s request, extract the key objectives, and note any missing details I must gather before taking further action.