Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

83
gcp/gemini-2.5-pro
Average duration
0
Average tokens
0
Average cost
$0
100
null
0
opper_agents_sample_01
100
null
0
opper_agents_sample_02
100
null
0
opper_agents_sample_03
100
null
0
opper_agents_sample_04
100
null
0
opper_agents_sample_05
100
null
0
opper_agents_sample_06
100
null
0
opper_agents_sample_07
100
null
0
opper_agents_sample_08
100
null
0
opper_agents_sample_09
100
null
0
opper_agents_sample_10
100
null
0
opper_agents_sample_11
0
null
0
opper_agents_sample_12
100
null
0
opper_agents_sample_13
100
null
0
opper_agents_sample_14
0
null
0
opper_agents_sample_15
100
null
0
opper_agents_sample_16
100
null
0
opper_agents_sample_17
100
null
0
opper_agents_sample_18
100
null
0
opper_agents_sample_19
100
null
0
opper_agents_sample_20
100
null
0
opper_agents_sample_21
0
null
0
opper_agents_sample_22
0
null
0
opper_agents_sample_23
100
null
0
opper_agents_sample_24