Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

85
anthropic/claude-sonnet-4.5
Average duration
14s
Average tokens
2913
Average cost
$0.00
100
11s
1695
opper_agents_sample_01
100
12s
1816
opper_agents_sample_02
100
11s
1659
opper_agents_sample_03
100
19s
2262
opper_agents_sample_04
100
15s
2284
opper_agents_sample_05
100
15s
2366
opper_agents_sample_06
100
8s
2232
opper_agents_sample_07
100
14s
2496
opper_agents_sample_08
100
15s
3835
opper_agents_sample_09
100
13s
3837
opper_agents_sample_10
100
16s
3715
opper_agents_sample_11
100
12s
1861
opper_agents_sample_12
0
19s
2100
opper_agents_sample_13
100
8s
2397
opper_agents_sample_14
100
14s
2096
opper_agents_sample_15
100
13s
1987
opper_agents_sample_16
0
15s
2318
opper_agents_sample_17
100
17s
2455
opper_agents_sample_18
100
12s
4797
opper_agents_sample_19
100
11s
4777
opper_agents_sample_20
100
15s
5199
opper_agents_sample_21
50
26s
4484
opper_agents_sample_22
0
16s
5131
opper_agents_sample_23
100
8s
2121
opper_agents_sample_24