Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

88
openai/o4-mini
Average duration
19s
Average tokens
2509
Average cost
$0.00
100
22s
1843
opper_agents_sample_01
100
21s
1557
opper_agents_sample_02
50
18s
1599
opper_agents_sample_03
100
14s
1669
opper_agents_sample_04
100
22s
2077
opper_agents_sample_05
100
21s
2199
opper_agents_sample_06
100
18s
2526
opper_agents_sample_07
100
21s
2078
opper_agents_sample_08
100
21s
3573
opper_agents_sample_09
100
18s
3009
opper_agents_sample_10
100
21s
3086
opper_agents_sample_11
100
21s
1593
opper_agents_sample_12
100
21s
1942
opper_agents_sample_13
100
13s
2052
opper_agents_sample_14
100
18s
1641
opper_agents_sample_15
100
18s
1578
opper_agents_sample_16
0
21s
1855
opper_agents_sample_17
100
18s
1838
opper_agents_sample_18
100
13s
3907
opper_agents_sample_19
100
13s
3756
opper_agents_sample_20
100
17s
4415
opper_agents_sample_21
50
50s
4497
opper_agents_sample_22
0
18s
4263
opper_agents_sample_23
100
7s
1655
opper_agents_sample_24