Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

88
gcp/gemini-flash-latest
Average duration
9s
Average tokens
3379
Average cost
$0.00
100
6s
1851
opper_agents_sample_01
100
7s
2254
opper_agents_sample_02
100
7s
2151
opper_agents_sample_03
100
8s
2505
opper_agents_sample_04
100
7s
2587
opper_agents_sample_05
100
10s
2935
opper_agents_sample_06
100
5s
2496
opper_agents_sample_07
100
7s
2789
opper_agents_sample_08
100
7s
4182
opper_agents_sample_09
100
10s
4655
opper_agents_sample_10
100
8s
4096
opper_agents_sample_11
100
6s
2198
opper_agents_sample_12
100
8s
2291
opper_agents_sample_13
100
6s
2671
opper_agents_sample_14
100
8s
2662
opper_agents_sample_15
100
6s
2173
opper_agents_sample_16
0
9s
2879
opper_agents_sample_17
100
7s
2533
opper_agents_sample_18
100
5s
4400
opper_agents_sample_19
100
6s
4647
opper_agents_sample_20
100
11s
5651
opper_agents_sample_21
0
35s
8633
opper_agents_sample_22
100
10s
5337
opper_agents_sample_23
0
6s
2524
opper_agents_sample_24