Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

81
xai/grok-3
Average duration
19s
Average tokens
2344
Average cost
$0.00
100
13s
1245
opper_agents_sample_01
100
13s
1394
opper_agents_sample_02
100
14s
1450
opper_agents_sample_03
100
19s
1689
opper_agents_sample_04
100
26s
1769
opper_agents_sample_05
100
28s
2195
opper_agents_sample_06
100
17s
1950
opper_agents_sample_07
100
18s
2000
opper_agents_sample_08
100
24s
2978
opper_agents_sample_09
100
31s
3515
opper_agents_sample_10
100
26s
3145
opper_agents_sample_11
50
13s
1394
opper_agents_sample_12
100
19s
1583
opper_agents_sample_13
100
11s
1867
opper_agents_sample_14
50
21s
1855
opper_agents_sample_15
100
19s
1681
opper_agents_sample_16
0
24s
1743
opper_agents_sample_17
100
13s
1680
opper_agents_sample_18
100
15s
3746
opper_agents_sample_19
100
17s
3830
opper_agents_sample_20
100
28s
4723
opper_agents_sample_21
50
20s
3149
opper_agents_sample_22
0
14s
4010
opper_agents_sample_23
0
11s
1670
opper_agents_sample_24