Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

88
openai/gpt-5
Average duration
29s
Average tokens
3455
Average cost
$0.00
100
14s
1803
opper_agents_sample_01
100
16s
2104
opper_agents_sample_02
100
33s
2389
opper_agents_sample_03
100
18s
1825
opper_agents_sample_04
100
23s
2600
opper_agents_sample_05
100
29s
3207
opper_agents_sample_06
100
12s
2524
opper_agents_sample_07
100
31s
3764
opper_agents_sample_08
100
18s
3683
opper_agents_sample_09
100
20s
3382
opper_agents_sample_10
100
55s
4807
opper_agents_sample_11
100
31s
2856
opper_agents_sample_12
0
34s
3514
opper_agents_sample_13
100
22s
2194
opper_agents_sample_14
50
25s
2380
opper_agents_sample_15
100
14s
1976
opper_agents_sample_16
100
24s
2863
opper_agents_sample_17
100
24s
2951
opper_agents_sample_18
100
30s
4337
opper_agents_sample_19
100
24s
4318
opper_agents_sample_20
100
1m 15s
7829
opper_agents_sample_21
50
1m 2s
7184
opper_agents_sample_22
0
48s
5777
opper_agents_sample_23
100
18s
2662
opper_agents_sample_24