Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

42
mistral/mistral-small-eu
Average duration
19s
Average tokens
2340
Average cost
$0.00
100
7s
1288
opper_agents_sample_01
100
10s
1572
opper_agents_sample_02
0
7s
1470
opper_agents_sample_03
100
6s
1565
opper_agents_sample_04
50
6s
1731
opper_agents_sample_05
50
5s
1626
opper_agents_sample_06
50
5s
1888
opper_agents_sample_07
0
4s
1824
opper_agents_sample_08
0
34s
3356
opper_agents_sample_09
0
33s
3394
opper_agents_sample_10
0
8s
3368
opper_agents_sample_11
0
6s
1443
opper_agents_sample_12
100
7s
1474
opper_agents_sample_13
100
5s
1995
opper_agents_sample_14
0
6s
1650
opper_agents_sample_15
0
8s
1562
opper_agents_sample_16
0
1m 6s
1857
opper_agents_sample_17
100
8s
1722
opper_agents_sample_18
100
1m 6s
4103
opper_agents_sample_19
100
6s
3985
opper_agents_sample_20
50
1m 18s
4255
opper_agents_sample_21
0
46s
3089
opper_agents_sample_22
0
6s
4223
opper_agents_sample_23
0
35s
1717
opper_agents_sample_24