Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

85
anthropic/claude-3.5-sonnet
Average duration
11s
Average tokens
2575
Average cost
$0.00
100
10s
1411
opper_agents_sample_01
100
12s
1645
opper_agents_sample_02
100
8s
1498
opper_agents_sample_03
100
12s
1854
opper_agents_sample_04
100
11s
1969
opper_agents_sample_05
50
12s
1958
opper_agents_sample_06
100
9s
2018
opper_agents_sample_07
100
10s
2143
opper_agents_sample_08
100
10s
3420
opper_agents_sample_09
100
13s
3503
opper_agents_sample_10
100
11s
3322
opper_agents_sample_11
100
11s
1607
opper_agents_sample_12
100
10s
1685
opper_agents_sample_13
100
7s
2165
opper_agents_sample_14
0
10s
1846
opper_agents_sample_15
100
9s
1681
opper_agents_sample_16
0
11s
1876
opper_agents_sample_17
100
11s
1843
opper_agents_sample_18
100
9s
4419
opper_agents_sample_19
100
10s
4512
opper_agents_sample_20
100
12s
4615
opper_agents_sample_21
50
12s
3426
opper_agents_sample_22
50
21s
5461
opper_agents_sample_23
100
9s
1918
opper_agents_sample_24