Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

1 **calendar.get_current_date** → {"action": "get_current_date"} 1b **memory.store_item** → {"action": "store_item", "key": "current_date", "value": "<current_date_result>"} 2 **calendar.add_business_days** → {"action": "add_business_days", "start_date": "<current_date>", "days_to_add": 2} 2b **memory.store_item** → {"action": "store_item", "key": "target_date", "value": "<result_from_add_business_days>"} 3 **calendar.format_date** → {"action": "format_date", "date": "<target_date>T09:00:00", "format": "YYYY-MM-DDTHH:mm:ssZ"} 3b **memory.store_item** → {"action": "store_item", "key": "formatted_send_time", "value": "<formatted_datetime_result>"} 4 **email.draft_email** → {"action": "draft_email", "to": "<customer_email>", "subject": "Address Update Confirmation", "body": "<confirmation_message>"} 4b **memory.store_item** → {"action": "store_item", "key": "draft_id", "value": "<draft_email_id>"} 5 **email.schedule_email** → {"action": "schedule_email", "draft_id": "<draft_id>", "send_at": "<formatted_send_time>"} 5b **memory.store_item** → {"action": "store_item", "key": "scheduled_email", "value": "Email scheduled for <formatted_send_time>"}

Reasoning: Analyze: We need to schedule a follow-up confirmation email exactly two business days from now at 09:00. The calendar tool can compute business-day offsets; the email tool can draft and schedule. We’ll minimize calls while adhering to memory discipline (store salient outputs after each call) and retry policy (up to 3 per call, implicitly). Plan: 1) Get today’s date; 2) Add 2 business days; 3) Draft the email; 4) Schedule it for 09:00 on the computed date. Act/Reflect/Decide: After each step, persist key result to memory and proceed; no redundant calls. Assumption: recipient address (customer_email) is already known and valid; if not, an upstream step would retrieve/confirm it. Response: Tool sequence and key parameters (with memory persistence after each call): 1) calendar.get_current_date - Tool: calendar - Params: { "action": "get_current_date" } - Persist: memory.store_item { "action": "store_item", "key": "today", "value": "<ISO date from step 1, e.g., 2025-08-19>" } 2) calendar.add_business_days - Tool: calendar - Params: { "action": "add_business_days", "start_date": "<value of today>", "days_to_add": 2 } - Persist: memory.store_item { "action": "store_item", "key": "target_date", "value": "<ISO date two business days from today>" } 3) email.draft_email - Tool: email - Params: { "action": "draft_email", "to": "<customer_email>", "subject": "Follow-up confirmation", "body": "Hi <First Name>,\n\nJust following up to confirm we have everything we need. If you have any updates or questions, feel free to reply to this email.\n\nBest regards,\n<Your Name>" } - Persist: memory.store_item { "action": "store_item", "key": "draft_id", "value": "<draft_id returned by draft_email>" } 4) email.schedule_email - Tool: email - Params: { "action": "schedule_email", "draft_id": "<draft_id>", "send_at": "<target_date>T09:00:00" } Note: send_at is in local time; include timezone offset if your system requires it, e.g., "<target_date>T09:00:00-04:00". - Persist: memory.store_item { "action": "store_item", "key": "scheduled_email_id", "value": "<email_id or schedule reference returned by schedule_email>" } Retries: For each tool call above, retry up to 3 times on transient failure; if all fail, store the error via memory.store_item with an "*_error" key and stop gracefully.

Agents

Context

Input

Expected output

Model output

Context

Input

Expected output

Model output