Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

81
gcp/gemini-flash-lite-latest
Average duration
3s
Average tokens
2671
Average cost
$0.00
100
2s
1466
opper_agents_sample_01
100
3s
1805
opper_agents_sample_02
100
3s
1747
opper_agents_sample_03
100
4s
2152
opper_agents_sample_04
100
2s
1895
opper_agents_sample_05
100
4s
2597
opper_agents_sample_06
100
2s
2084
opper_agents_sample_07
100
3s
2322
opper_agents_sample_08
100
3s
3699
opper_agents_sample_09
100
2s
3467
opper_agents_sample_10
100
4s
3626
opper_agents_sample_11
100
3s
1803
opper_agents_sample_12
0
3s
1882
opper_agents_sample_13
100
2s
2172
opper_agents_sample_14
100
3s
1820
opper_agents_sample_15
50
3s
1997
opper_agents_sample_16
100
4s
2440
opper_agents_sample_17
100
3s
1950
opper_agents_sample_18
100
2s
4063
opper_agents_sample_19
100
3s
4069
opper_agents_sample_20
100
4s
5097
opper_agents_sample_21
0
3s
3516
opper_agents_sample_22
0
3s
4348
opper_agents_sample_23
0
3s
2075
opper_agents_sample_24