Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

100
Duration
39s
Input Tokens
2283
Output Tokens
289
Cost
$0.00
Context
Input
What is your next action?
Expected output
The next action is to call the flight tool with the date 2025-05-29,from_city: London, to_city: Paris
Model output
Reasoning: Task requires retrieving flight information for London-Paris tomorrow. Context shows calendar tool already confirmed date as 2025-05-29. Available flights tool is specifically designed for this purpose. Direct tool usage satisfies instruction requirements for efficiency and appropriate tool selection.

Response: Next action: Call flights tool with {from_city: 'London', to_city: 'Paris', date: '2025-05-29'}