Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

85
openai/gpt-5-nano
Average duration
19s
Average tokens
20419
Average cost
$0.00
0
21s
4332
opper_context_sample_01
50
21s
3727
opper_context_sample_02
100
18s
3599
opper_context_sample_03
50
33s
8692
opper_context_sample_04
0
18s
3919
opper_context_sample_05
100
18s
3660
opper_context_sample_06
50
18s
4647
opper_context_sample_07
100
18s
3886
opper_context_sample_08
100
18s
3828
opper_context_sample_09
100
21s
4196
opper_context_sample_10
100
21s
4715
opper_context_sample_11
100
21s
4605
opper_context_sample_12
100
17s
4103
opper_context_sample_13
100
17s
4162
opper_context_sample_14
100
18s
3948
opper_context_sample_15
100
21s
3951
opper_context_sample_16
100
17s
5183
opper_context_sample_17
100
21s
6912
opper_context_sample_18
100
17s
7248
opper_context_sample_19
100
17s
7563
opper_context_sample_20
100
32s
9650
opper_context_sample_21
100
21s
8012
opper_context_sample_22
50
21s
7238
opper_context_sample_23
50
17s
8032
opper_context_sample_24
100
21s
80391
opper_context_sample_25
100
19s
80805
opper_context_sample_26
100
13s
80211
opper_context_sample_27
100
14s
80392
opper_context_sample_28
100
19s
81211
opper_context_sample_29
100
14s
79759
opper_context_sample_30