Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

81
mistral/mistral-medium-2508-eu
Average duration
27s
Average tokens
3685
Average cost
$0.00
100
22s
2158
opper_agents_sample_01
100
15s
1885
opper_agents_sample_02
100
21s
2465
opper_agents_sample_03
100
17s
1847
opper_agents_sample_04
100
12s
1953
opper_agents_sample_05
100
54s
5087
opper_agents_sample_06
100
21s
2801
opper_agents_sample_07
100
44s
3898
opper_agents_sample_08
100
15s
3557
opper_agents_sample_09
100
21s
4199
opper_agents_sample_10
100
1m 6s
6451
opper_agents_sample_11
100
29s
2520
opper_agents_sample_12
100
18s
1876
opper_agents_sample_13
100
8s
2093
opper_agents_sample_14
50
10s
1758
opper_agents_sample_15
100
33s
3580
opper_agents_sample_16
0
24s
2737
opper_agents_sample_17
100
11s
1886
opper_agents_sample_18
100
15s
4260
opper_agents_sample_19
100
17s
4695
opper_agents_sample_20
100
1m 1s
9878
opper_agents_sample_21
0
44s
6070
opper_agents_sample_22
0
50s
7986
opper_agents_sample_23
0
21s
2788
opper_agents_sample_24