Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

90
anthropic/claude-sonnet-4
Average duration
16s
Average tokens
2687
Average cost
$0.00
100
15s
1478
opper_agents_sample_01
100
16s
1705
opper_agents_sample_02
100
15s
1665
opper_agents_sample_03
100
21s
2102
opper_agents_sample_04
100
15s
2003
opper_agents_sample_05
100
18s
2106
opper_agents_sample_06
100
15s
2109
opper_agents_sample_07
100
14s
2173
opper_agents_sample_08
100
14s
3533
opper_agents_sample_09
100
17s
3617
opper_agents_sample_10
100
17s
3572
opper_agents_sample_11
100
18s
1786
opper_agents_sample_12
100
18s
1807
opper_agents_sample_13
100
14s
2272
opper_agents_sample_14
0
18s
1985
opper_agents_sample_15
100
18s
1940
opper_agents_sample_16
100
18s
2121
opper_agents_sample_17
100
13s
1863
opper_agents_sample_18
100
11s
4508
opper_agents_sample_19
100
14s
4660
opper_agents_sample_20
100
19s
5089
opper_agents_sample_21
50
19s
3649
opper_agents_sample_22
0
15s
4824
opper_agents_sample_23
100
9s
1925
opper_agents_sample_24