Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

60
fireworks/deepseek-v3
Average duration
7s
Average tokens
2124
Average cost
$0.00
100
8s
1136
opper_agents_sample_01
100
7s
1350
opper_agents_sample_02
100
7s
1325
opper_agents_sample_03
100
7s
1450
opper_agents_sample_04
50
7s
1543
opper_agents_sample_05
50
8s
1655
opper_agents_sample_06
100
6s
1765
opper_agents_sample_07
100
7s
1863
opper_agents_sample_08
100
6s
2907
opper_agents_sample_09
100
8s
2836
opper_agents_sample_10
100
9s
2869
opper_agents_sample_11
0
7s
1325
opper_agents_sample_12
0
6s
1354
opper_agents_sample_13
100
7s
1843
opper_agents_sample_14
0
6s
1415
opper_agents_sample_15
50
8s
1375
opper_agents_sample_16
0
7s
1498
opper_agents_sample_17
100
7s
1500
opper_agents_sample_18
50
5s
3665
opper_agents_sample_19
100
6s
3674
opper_agents_sample_20
50
10s
3981
opper_agents_sample_21
0
12s
3151
opper_agents_sample_22
0
7s
3832
opper_agents_sample_23
0
7s
1656
opper_agents_sample_24