Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

92
openai/o3
Average duration
29s
Average tokens
2320
Average cost
$0.00
100
30s
1365
opper_agents_sample_01
100
15s
1444
opper_agents_sample_02
100
18s
1680
opper_agents_sample_03
100
19s
1623
opper_agents_sample_04
100
12s
1730
opper_agents_sample_05
100
19s
1829
opper_agents_sample_06
100
11s
1872
opper_agents_sample_07
100
17s
2040
opper_agents_sample_08
100
19s
3006
opper_agents_sample_09
100
1m 28s
2991
opper_agents_sample_10
100
40s
2811
opper_agents_sample_11
100
26s
1852
opper_agents_sample_12
100
15s
1577
opper_agents_sample_13
100
15s
2017
opper_agents_sample_14
50
19s
1738
opper_agents_sample_15
100
1m 33s
1470
opper_agents_sample_16
100
17s
1972
opper_agents_sample_17
100
19s
1706
opper_agents_sample_18
100
17s
3620
opper_agents_sample_19
100
17s
3745
opper_agents_sample_20
100
24s
4050
opper_agents_sample_21
50
29s
3685
opper_agents_sample_22
0
1m 36s
4040
opper_agents_sample_23
100
19s
1805
opper_agents_sample_24