Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

73
groq/gpt-oss-20b
Average duration
3s
Average tokens
2443
Average cost
$0.00
100
3s
1501
opper_agents_sample_01
100
3s
1515
opper_agents_sample_02
50
3s
1784
opper_agents_sample_03
100
3s
1645
opper_agents_sample_04
100
3s
1770
opper_agents_sample_05
50
3s
1935
opper_agents_sample_06
100
3s
1855
opper_agents_sample_07
0
3s
2421
opper_agents_sample_08
100
3s
3131
opper_agents_sample_09
100
3s
3241
opper_agents_sample_10
100
3s
3067
opper_agents_sample_11
100
3s
1811
opper_agents_sample_12
0
3s
1802
opper_agents_sample_13
100
3s
2020
opper_agents_sample_14
0
3s
1716
opper_agents_sample_15
100
3s
1463
opper_agents_sample_16
0
3s
1987
opper_agents_sample_17
100
3s
1956
opper_agents_sample_18
100
3s
3725
opper_agents_sample_19
100
3s
4104
opper_agents_sample_20
100
4s
4508
opper_agents_sample_21
50
3s
3584
opper_agents_sample_22
0
3s
4091
opper_agents_sample_23
100
3s
1990
opper_agents_sample_24