Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

65
gcp/gemini-2.0-flash
Average duration
13s
Average tokens
2189
Average cost
$0.00
100
15s
1160
opper_agents_sample_01
100
14s
1341
opper_agents_sample_02
50
18s
1819
opper_agents_sample_03
100
15s
1482
opper_agents_sample_04
50
14s
1518
opper_agents_sample_05
50
15s
1703
opper_agents_sample_06
50
15s
1720
opper_agents_sample_07
100
15s
1899
opper_agents_sample_08
100
14s
3146
opper_agents_sample_09
50
14s
3069
opper_agents_sample_10
100
15s
2901
opper_agents_sample_11
100
15s
1286
opper_agents_sample_12
100
15s
1344
opper_agents_sample_13
100
14s
1853
opper_agents_sample_14
100
15s
1597
opper_agents_sample_15
50
11s
1339
opper_agents_sample_16
0
11s
1468
opper_agents_sample_17
0
11s
1555
opper_agents_sample_18
100
11s
3763
opper_agents_sample_19
100
11s
3820
opper_agents_sample_20
50
9s
4271
opper_agents_sample_21
0
8s
2946
opper_agents_sample_22
0
8s
3876
opper_agents_sample_23
0
5s
1671
opper_agents_sample_24