Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

83
openai/gpt-5-mini
Average duration
24s
Average tokens
3307
Average cost
$0.00
100
12s
1729
opper_agents_sample_01
100
20s
2113
opper_agents_sample_02
100
17s
2099
opper_agents_sample_03
100
13s
2047
opper_agents_sample_04
100
23s
2425
opper_agents_sample_05
100
38s
3610
opper_agents_sample_06
100
17s
2415
opper_agents_sample_07
100
27s
3377
opper_agents_sample_08
100
16s
3780
opper_agents_sample_09
100
13s
3447
opper_agents_sample_10
100
28s
4168
opper_agents_sample_11
100
22s
2386
opper_agents_sample_12
0
23s
2998
opper_agents_sample_13
100
10s
2116
opper_agents_sample_14
50
28s
2323
opper_agents_sample_15
100
21s
2302
opper_agents_sample_16
0
32s
3430
opper_agents_sample_17
100
19s
2707
opper_agents_sample_18
100
23s
4508
opper_agents_sample_19
100
20s
4490
opper_agents_sample_20
100
1m 2s
7417
opper_agents_sample_21
50
38s
5395
opper_agents_sample_22
0
38s
5941
opper_agents_sample_23
100
13s
2136
opper_agents_sample_24