Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

83
anthropic/claude-opus-4
Average duration
48s
Average tokens
2643
Average cost
$0.00
100
40s
1371
opper_agents_sample_01
100
1m 24s
1748
opper_agents_sample_02
100
40s
1611
opper_agents_sample_03
100
42s
1918
opper_agents_sample_04
100
25s
1833
opper_agents_sample_05
100
1m 41s
2189
opper_agents_sample_06
100
48s
2115
opper_agents_sample_07
100
27s
2216
opper_agents_sample_08
100
48s
3731
opper_agents_sample_09
100
30s
3481
opper_agents_sample_10
100
38s
3484
opper_agents_sample_11
50
33s
1629
opper_agents_sample_12
100
1m 43s
1884
opper_agents_sample_13
100
22s
2157
opper_agents_sample_14
100
44s
1987
opper_agents_sample_15
100
1m 19s
1895
opper_agents_sample_16
0
33s
2005
opper_agents_sample_17
100
1m 34s
1997
opper_agents_sample_18
100
16s
4385
opper_agents_sample_19
100
19s
4540
opper_agents_sample_20
100
33s
4784
opper_agents_sample_21
50
1m 24s
3562
opper_agents_sample_22
0
40s
4943
opper_agents_sample_23
0
20s
1978
opper_agents_sample_24