Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

75
mistral/mistral-large-eu
Average duration
37s
Average tokens
3558
Average cost
$0.00
100
22s
1974
opper_agents_sample_01
100
16s
1740
opper_agents_sample_02
100
32s
2678
opper_agents_sample_03
100
17s
1750
opper_agents_sample_04
100
1m 16s
3126
opper_agents_sample_05
100
27s
2449
opper_agents_sample_06
100
35s
3315
opper_agents_sample_07
100
16s
2234
opper_agents_sample_08
100
33s
4663
opper_agents_sample_09
0
16s
3498
opper_agents_sample_10
100
53s
4505
opper_agents_sample_11
100
25s
2286
opper_agents_sample_12
0
13s
1610
opper_agents_sample_13
100
11s
2136
opper_agents_sample_14
50
18s
1826
opper_agents_sample_15
100
1m 6s
4448
opper_agents_sample_16
0
55s
4012
opper_agents_sample_17
100
23s
1938
opper_agents_sample_18
100
17s
4079
opper_agents_sample_19
100
21s
4533
opper_agents_sample_20
100
2m 5s
9867
opper_agents_sample_21
50
2m 2s
8935
opper_agents_sample_22
0
36s
5614
opper_agents_sample_23
0
19s
2175
opper_agents_sample_24