Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

52
gcp/gemini-2.0-flash-lite
Average duration
11s
Average tokens
2191
Average cost
$0.00
100
9s
1231
opper_agents_sample_01
100
10s
1383
opper_agents_sample_02
0
18s
1325
opper_agents_sample_03
50
10s
1459
opper_agents_sample_04
100
10s
1544
opper_agents_sample_05
50
18s
1670
opper_agents_sample_06
100
10s
1842
opper_agents_sample_07
100
10s
1882
opper_agents_sample_08
100
16s
3395
opper_agents_sample_09
0
15s
3118
opper_agents_sample_10
100
10s
3063
opper_agents_sample_11
0
9s
1352
opper_agents_sample_12
0
10s
1377
opper_agents_sample_13
100
15s
1882
opper_agents_sample_14
0
9s
1595
opper_agents_sample_15
50
9s
1380
opper_agents_sample_16
50
9s
1608
opper_agents_sample_17
100
9s
1483
opper_agents_sample_18
50
5s
3716
opper_agents_sample_19
50
5s
3749
opper_agents_sample_20
50
10s
3889
opper_agents_sample_21
0
10s
2907
opper_agents_sample_22
0
10s
4021
opper_agents_sample_23
0
10s
1707
opper_agents_sample_24