Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

67
openai/gpt-4o
Average duration
18s
Average tokens
2117
Average cost
$0.00
100
12s
1173
opper_agents_sample_01
100
19s
1374
opper_agents_sample_02
0
19s
1335
opper_agents_sample_03
100
18s
1474
opper_agents_sample_04
100
19s
1644
opper_agents_sample_05
100
19s
1810
opper_agents_sample_06
100
19s
1764
opper_agents_sample_07
100
12s
1758
opper_agents_sample_08
100
16s
2871
opper_agents_sample_09
0
19s
2853
opper_agents_sample_10
100
22s
2949
opper_agents_sample_11
100
19s
1309
opper_agents_sample_12
0
16s
1366
opper_agents_sample_13
100
12s
1791
opper_agents_sample_14
0
12s
1481
opper_agents_sample_15
100
16s
1454
opper_agents_sample_16
0
18s
1502
opper_agents_sample_17
100
17s
1551
opper_agents_sample_18
100
17s
3634
opper_agents_sample_19
100
21s
3553
opper_agents_sample_20
100
35s
3925
opper_agents_sample_21
0
30s
2900
opper_agents_sample_22
0
17s
3709
opper_agents_sample_23
0
9s
1625
opper_agents_sample_24