Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

54
anthropic/claude-3.5-haiku
Average duration
8s
Average tokens
2455
Average cost
$0.00
100
7s
1300
opper_agents_sample_01
100
9s
1548
opper_agents_sample_02
50
7s
1476
opper_agents_sample_03
100
9s
1708
opper_agents_sample_04
100
8s
1801
opper_agents_sample_05
50
10s
1843
opper_agents_sample_06
50
6s
1950
opper_agents_sample_07
50
9s
2086
opper_agents_sample_08
100
9s
3435
opper_agents_sample_09
0
7s
3300
opper_agents_sample_10
100
10s
3200
opper_agents_sample_11
50
9s
1574
opper_agents_sample_12
0
10s
1666
opper_agents_sample_13
100
6s
2105
opper_agents_sample_14
0
9s
1737
opper_agents_sample_15
50
7s
1573
opper_agents_sample_16
0
7s
1762
opper_agents_sample_17
0
9s
1812
opper_agents_sample_18
100
6s
4282
opper_agents_sample_19
100
7s
4317
opper_agents_sample_20
100
10s
4588
opper_agents_sample_21
0
9s
3352
opper_agents_sample_22
0
10s
4630
opper_agents_sample_23
0
7s
1885
opper_agents_sample_24