Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

65
openai/gpt-5-nano
Average duration
21s
Average tokens
4459
Average cost
$0.00
50
17s
2774
opper_agents_sample_01
100
21s
3075
opper_agents_sample_02
100
20s
3130
opper_agents_sample_03
0
21s
3069
opper_agents_sample_04
100
17s
3254
opper_agents_sample_05
50
21s
3790
opper_agents_sample_06
100
19s
3831
opper_agents_sample_07
100
17s
3769
opper_agents_sample_08
100
21s
4309
opper_agents_sample_09
0
21s
4515
opper_agents_sample_10
100
21s
5899
opper_agents_sample_11
100
25s
4379
opper_agents_sample_12
0
19s
4234
opper_agents_sample_13
100
19s
2909
opper_agents_sample_14
0
21s
3935
opper_agents_sample_15
100
21s
3302
opper_agents_sample_16
0
21s
3838
opper_agents_sample_17
100
22s
4400
opper_agents_sample_18
100
19s
5242
opper_agents_sample_19
100
17s
5352
opper_agents_sample_20
100
34s
9132
opper_agents_sample_21
50
31s
8683
opper_agents_sample_22
0
19s
6862
opper_agents_sample_23
0
19s
3325
opper_agents_sample_24