Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

71
cerebras/qwen-3-235b-a22b-instruct-2507
Average duration
4s
Average tokens
2134
Average cost
$0.00
50
3s
1170
opper_agents_sample_01
100
4s
1330
opper_agents_sample_02
50
4s
1301
opper_agents_sample_03
100
4s
1546
opper_agents_sample_04
100
3s
1515
opper_agents_sample_05
50
3s
1597
opper_agents_sample_06
100
3s
1726
opper_agents_sample_07
100
3s
1816
opper_agents_sample_08
100
5s
3140
opper_agents_sample_09
100
3s
3048
opper_agents_sample_10
100
6s
3128
opper_agents_sample_11
100
4s
1245
opper_agents_sample_12
0
3s
1438
opper_agents_sample_13
100
3s
1893
opper_agents_sample_14
50
3s
1551
opper_agents_sample_15
100
3s
1472
opper_agents_sample_16
0
3s
1604
opper_agents_sample_17
100
3s
1563
opper_agents_sample_18
100
3s
3551
opper_agents_sample_19
100
3s
3588
opper_agents_sample_20
100
4s
3766
opper_agents_sample_21
0
4s
2756
opper_agents_sample_22
0
4s
3792
opper_agents_sample_23
0
3s
1688
opper_agents_sample_24