Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

65
mistral/pixtral-large-latest-eu
Average duration
14s
Average tokens
2306
Average cost
$0.00
100
5s
1239
opper_agents_sample_01
100
5s
1432
opper_agents_sample_02
50
4s
1349
opper_agents_sample_03
50
6s
1701
opper_agents_sample_04
100
5s
1663
opper_agents_sample_05
50
7s
1807
opper_agents_sample_06
100
4s
1879
opper_agents_sample_07
100
37s
2005
opper_agents_sample_08
100
5s
3363
opper_agents_sample_09
100
5s
3337
opper_agents_sample_10
0
6s
3291
opper_agents_sample_11
100
5s
1366
opper_agents_sample_12
0
36s
1521
opper_agents_sample_13
100
4s
1954
opper_agents_sample_14
0
35s
1585
opper_agents_sample_15
100
1m 7s
1529
opper_agents_sample_16
0
5s
1609
opper_agents_sample_17
100
5s
1633
opper_agents_sample_18
100
5s
3936
opper_agents_sample_19
100
35s
3989
opper_agents_sample_20
100
8s
4212
opper_agents_sample_21
0
4s
2951
opper_agents_sample_22
0
35s
4194
opper_agents_sample_23
0
5s
1799
opper_agents_sample_24