Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

77
gcp/gemini-2.5-flash
Average duration
9s
Average tokens
3028
Average cost
$0.00
100
8s
1634
opper_agents_sample_01
100
13s
2357
opper_agents_sample_02
100
8s
1799
opper_agents_sample_03
100
13s
2879
opper_agents_sample_04
100
5s
1596
opper_agents_sample_05
50
12s
2936
opper_agents_sample_06
100
6s
1961
opper_agents_sample_07
100
7s
2343
opper_agents_sample_08
100
9s
3946
opper_agents_sample_09
100
9s
3733
opper_agents_sample_10
100
11s
4014
opper_agents_sample_11
100
9s
2217
opper_agents_sample_12
100
9s
2090
opper_agents_sample_13
100
7s
2309
opper_agents_sample_14
0
12s
2844
opper_agents_sample_15
100
9s
2411
opper_agents_sample_16
0
6s
2060
opper_agents_sample_17
100
7s
1888
opper_agents_sample_18
100
5s
4067
opper_agents_sample_19
100
6s
4420
opper_agents_sample_20
100
17s
6702
opper_agents_sample_21
0
15s
5175
opper_agents_sample_22
0
11s
5210
opper_agents_sample_23
0
6s
2085
opper_agents_sample_24