Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

75
fireworks/deepseek-r1
Average duration
13s
Average tokens
2771
Average cost
$0.00
100
39s
2572
opper_agents_sample_01
100
8s
1679
opper_agents_sample_02
50
8s
1744
opper_agents_sample_03
100
8s
1886
opper_agents_sample_04
100
10s
1931
opper_agents_sample_05
50
9s
2050
opper_agents_sample_06
100
9s
2371
opper_agents_sample_07
100
10s
2301
opper_agents_sample_08
100
8s
3191
opper_agents_sample_09
100
11s
3149
opper_agents_sample_10
100
12s
3358
opper_agents_sample_11
100
11s
1962
opper_agents_sample_12
0
15s
2235
opper_agents_sample_13
100
6s
2002
opper_agents_sample_14
0
12s
2009
opper_agents_sample_15
100
11s
1931
opper_agents_sample_16
0
10s
1913
opper_agents_sample_17
100
12s
2135
opper_agents_sample_18
100
7s
3868
opper_agents_sample_19
100
18s
4324
opper_agents_sample_20
100
30s
6421
opper_agents_sample_21
0
38s
5442
opper_agents_sample_22
0
10s
4212
opper_agents_sample_23
100
7s
1825
opper_agents_sample_24