Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

92
xai/grok-4
Average duration
47s
Average tokens
2161
Average cost
$0.00
100
12s
1137
opper_agents_sample_01
100
19s
1462
opper_agents_sample_02
100
30s
1424
opper_agents_sample_03
100
16s
1584
opper_agents_sample_04
100
1m
1530
opper_agents_sample_05
100
19s
1882
opper_agents_sample_06
100
1m 14s
1949
opper_agents_sample_07
100
28s
2029
opper_agents_sample_08
100
19s
2881
opper_agents_sample_09
100
19s
2823
opper_agents_sample_10
100
1m 18s
2881
opper_agents_sample_11
50
24s
1395
opper_agents_sample_12
100
29s
1345
opper_agents_sample_13
100
25s
1801
opper_agents_sample_14
0
39s
1470
opper_agents_sample_15
100
20s
1441
opper_agents_sample_16
100
2m 12s
1543
opper_agents_sample_17
100
1m 32s
1709
opper_agents_sample_18
100
14s
3509
opper_agents_sample_19
100
18s
3706
opper_agents_sample_20
100
1m 52s
4158
opper_agents_sample_21
50
3m 20s
2762
opper_agents_sample_22
100
40s
3923
opper_agents_sample_23
100
10s
1516
opper_agents_sample_24