Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

73
mistral/magistral-medium-2506-eu
Average duration
20s
Average tokens
2473
Average cost
$0.00
100
1m 38s
1259
opper_agents_sample_01
100
8s
1643
opper_agents_sample_02
100
5s
1360
opper_agents_sample_03
100
4s
1415
opper_agents_sample_04
100
39s
1891
opper_agents_sample_05
50
5s
1565
opper_agents_sample_06
100
6s
1870
opper_agents_sample_07
100
6s
1918
opper_agents_sample_08
100
9s
3538
opper_agents_sample_09
100
4s
2992
opper_agents_sample_10
100
14s
3979
opper_agents_sample_11
0
39s
1700
opper_agents_sample_12
100
16s
2212
opper_agents_sample_13
100
27s
1954
opper_agents_sample_14
0
6s
1604
opper_agents_sample_15
100
13s
2140
opper_agents_sample_16
0
42s
4340
opper_agents_sample_17
100
4s
1494
opper_agents_sample_18
100
36s
3566
opper_agents_sample_19
100
35s
3703
opper_agents_sample_20
100
12s
4342
opper_agents_sample_21
0
3s
2566
opper_agents_sample_22
0
42s
4487
opper_agents_sample_23
0
6s
1810
opper_agents_sample_24