Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

67
groq/moonshotai/kimi-k2-instruct
Average duration
4s
Average tokens
2062
Average cost
$0.00
100
4s
1117
opper_agents_sample_01
100
4s
1334
opper_agents_sample_02
100
3s
1297
opper_agents_sample_03
100
4s
1574
opper_agents_sample_04
100
4s
1477
opper_agents_sample_05
100
4s
1562
opper_agents_sample_06
100
4s
1699
opper_agents_sample_07
100
4s
1874
opper_agents_sample_08
0
4s
2817
opper_agents_sample_09
100
3s
2737
opper_agents_sample_10
100
4s
2586
opper_agents_sample_11
0
4s
1293
opper_agents_sample_12
0
4s
1540
opper_agents_sample_13
100
3s
1749
opper_agents_sample_14
0
3s
1416
opper_agents_sample_15
100
9s
1429
opper_agents_sample_16
0
8s
1478
opper_agents_sample_17
100
3s
1440
opper_agents_sample_18
100
4s
3501
opper_agents_sample_19
100
3s
3589
opper_agents_sample_20
50
4s
3752
opper_agents_sample_21
50
4s
2843
opper_agents_sample_22
0
4s
3770
opper_agents_sample_23
0
3s
1624
opper_agents_sample_24