Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

83
anthropic/claude-3.7-sonnet
Average duration
11s
Average tokens
2646
Average cost
$0.00
100
9s
1426
opper_agents_sample_01
100
12s
1692
opper_agents_sample_02
100
9s
1594
opper_agents_sample_03
100
14s
2053
opper_agents_sample_04
100
11s
1939
opper_agents_sample_05
100
13s
2122
opper_agents_sample_06
100
8s
2120
opper_agents_sample_07
100
10s
2219
opper_agents_sample_08
100
13s
3645
opper_agents_sample_09
100
10s
3472
opper_agents_sample_10
100
12s
3432
opper_agents_sample_11
100
12s
1796
opper_agents_sample_12
0
12s
1833
opper_agents_sample_13
100
9s
2262
opper_agents_sample_14
100
10s
1874
opper_agents_sample_15
100
9s
1747
opper_agents_sample_16
0
14s
2114
opper_agents_sample_17
100
10s
1902
opper_agents_sample_18
100
10s
4501
opper_agents_sample_19
100
10s
4534
opper_agents_sample_20
100
13s
4701
opper_agents_sample_21
0
14s
3691
opper_agents_sample_22
0
13s
4849
opper_agents_sample_23
100
9s
1979
opper_agents_sample_24