Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

17
mistral/mistral-tiny-eu
Average duration
24s
Average tokens
3977
Average cost
$0.00
50
1m 1s
3623
opper_agents_sample_01
0
15s
3099
opper_agents_sample_02
0
6s
1927
opper_agents_sample_03
50
14s
3368
opper_agents_sample_04
100
14s
3532
opper_agents_sample_05
50
1m 10s
5775
opper_agents_sample_06
0
14s
3289
opper_agents_sample_07
0
28s
2541
opper_agents_sample_08
0
42s
5847
opper_agents_sample_09
0
7s
4206
opper_agents_sample_10
0
12s
4967
opper_agents_sample_11
50
1m 3s
3974
opper_agents_sample_12
0
12s
2775
opper_agents_sample_13
0
7s
3112
opper_agents_sample_14
0
16s
3804
opper_agents_sample_15
50
12s
3342
opper_agents_sample_16
0
9s
2521
opper_agents_sample_17
0
41s
4196
opper_agents_sample_18
0
10s
5135
opper_agents_sample_19
0
18s
3924
opper_agents_sample_20
50
24s
6502
opper_agents_sample_21
0
13s
5084
opper_agents_sample_22
0
1m 3s
6036
opper_agents_sample_23
0
11s
2866
opper_agents_sample_24