Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

75
openai/o1
Average duration
20s
Average tokens
2899
Average cost
$0.00
100
19s
1683
opper_agents_sample_01
100
19s
1813
opper_agents_sample_02
100
23s
2273
opper_agents_sample_03
100
23s
2973
opper_agents_sample_04
100
23s
2353
opper_agents_sample_05
50
23s
3218
opper_agents_sample_06
50
23s
2714
opper_agents_sample_07
100
9s
2054
opper_agents_sample_08
100
23s
3581
opper_agents_sample_09
100
24s
3897
opper_agents_sample_10
100
22s
3384
opper_agents_sample_11
100
19s
2007
opper_agents_sample_12
100
22s
2814
opper_agents_sample_13
100
12s
2327
opper_agents_sample_14
0
23s
2838
opper_agents_sample_15
100
22s
2130
opper_agents_sample_16
0
18s
2232
opper_agents_sample_17
100
18s
1995
opper_agents_sample_18
100
18s
3837
opper_agents_sample_19
100
22s
4131
opper_agents_sample_20
100
34s
5521
opper_agents_sample_21
0
18s
3711
opper_agents_sample_22
0
12s
3932
opper_agents_sample_23
0
22s
2150
opper_agents_sample_24