Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where models must decide which tools to use, plan multi-step processes, and diagnose when things go wrong. This represents some of the most challenging tasks as they require open-ended reasoning and decision-making.

60
openai/gpt-4o-mini
Average duration
17s
Average tokens
2071
Average cost
$0.00
100
14s
1093
opper_agents_sample_01
100
14s
1241
opper_agents_sample_02
0
14s
1251
opper_agents_sample_03
0
24s
1369
opper_agents_sample_04
100
16s
1476
opper_agents_sample_05
50
16s
1525
opper_agents_sample_06
100
16s
1675
opper_agents_sample_07
100
16s
1768
opper_agents_sample_08
100
14s
2825
opper_agents_sample_09
100
16s
2788
opper_agents_sample_10
100
14s
2715
opper_agents_sample_11
100
14s
1275
opper_agents_sample_12
0
16s
1262
opper_agents_sample_13
100
16s
1747
opper_agents_sample_14
0
16s
1485
opper_agents_sample_15
50
16s
1473
opper_agents_sample_16
0
16s
1413
opper_agents_sample_17
100
16s
1468
opper_agents_sample_18
100
16s
3735
opper_agents_sample_19
100
18s
3790
opper_agents_sample_20
50
32s
3698
opper_agents_sample_21
0
19s
3037
opper_agents_sample_22
0
16s
4010
opper_agents_sample_23
0
14s
1580
opper_agents_sample_24