Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

63
openai/o3-mini
Average duration
19s
Average tokens
2489
Average cost
$0.00
100
19s
1613
opper_agents_sample_01
100
19s
1395
opper_agents_sample_02
50
19s
1596
opper_agents_sample_03
50
18s
1739
opper_agents_sample_04
100
18s
2187
opper_agents_sample_05
100
17s
1853
opper_agents_sample_06
100
17s
2068
opper_agents_sample_07
100
18s
2108
opper_agents_sample_08
0
19s
3354
opper_agents_sample_09
0
17s
3296
opper_agents_sample_10
100
19s
3144
opper_agents_sample_11
100
17s
1812
opper_agents_sample_12
0
18s
1889
opper_agents_sample_13
100
18s
2129
opper_agents_sample_14
0
18s
2165
opper_agents_sample_15
100
19s
1626
opper_agents_sample_16
0
17s
1737
opper_agents_sample_17
100
19s
2164
opper_agents_sample_18
100
17s
3793
opper_agents_sample_19
100
17s
3838
opper_agents_sample_20
50
35s
4787
opper_agents_sample_21
50
18s
3412
opper_agents_sample_22
0
19s
4103
opper_agents_sample_23
0
18s
1921
opper_agents_sample_24