Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

69
openai/o1-mini
Average duration
19s
Average tokens
3074
Average cost
$0.00
100
19s
2146
opper_agents_sample_01
100
18s
2502
opper_agents_sample_02
100
19s
3228
opper_agents_sample_03
100
19s
2309
opper_agents_sample_04
100
18s
2587
opper_agents_sample_05
50
18s
2949
opper_agents_sample_06
50
19s
2586
opper_agents_sample_07
0
18s
2561
opper_agents_sample_08
100
16s
3463
opper_agents_sample_09
0
18s
4020
opper_agents_sample_10
100
20s
3488
opper_agents_sample_11
100
20s
2560
opper_agents_sample_12
100
20s
2180
opper_agents_sample_13
100
17s
2691
opper_agents_sample_14
0
20s
2717
opper_agents_sample_15
100
20s
2017
opper_agents_sample_16
0
20s
2568
opper_agents_sample_17
100
19s
2506
opper_agents_sample_18
100
20s
4625
opper_agents_sample_19
100
19s
4408
opper_agents_sample_20
0
17s
5189
opper_agents_sample_21
50
19s
3702
opper_agents_sample_22
0
20s
4281
opper_agents_sample_23
100
19s
2497
opper_agents_sample_24