Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

54
gcp/gemini-2.5-flash-lite
Average duration
10s
Average tokens
2394
Average cost
$0.00
100
10s
1165
opper_agents_sample_01
100
10s
1359
opper_agents_sample_02
50
10s
1352
opper_agents_sample_03
100
17s
3160
opper_agents_sample_04
100
10s
1723
opper_agents_sample_05
50
10s
2008
opper_agents_sample_06
0
10s
1798
opper_agents_sample_07
50
8s
1835
opper_agents_sample_08
100
10s
3741
opper_agents_sample_09
0
8s
3242
opper_agents_sample_10
100
8s
3160
opper_agents_sample_11
0
10s
1398
opper_agents_sample_12
100
10s
1423
opper_agents_sample_13
100
10s
1860
opper_agents_sample_14
100
8s
1549
opper_agents_sample_15
50
10s
1312
opper_agents_sample_16
0
10s
1517
opper_agents_sample_17
100
10s
1632
opper_agents_sample_18
50
10s
3990
opper_agents_sample_19
50
5s
3890
opper_agents_sample_20
0
5s
3803
opper_agents_sample_21
0
8s
3910
opper_agents_sample_22
0
29s
4965
opper_agents_sample_23
0
5s
1662
opper_agents_sample_24