Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

88
openai/gpt-5
Average duration
36s
Average tokens
19500
Average cost
$0.00
0
53s
4833
opper_context_sample_01
50
18s
3018
opper_context_sample_02
100
24s
3163
opper_context_sample_03
50
54s
6417
opper_context_sample_04
100
24s
3358
opper_context_sample_05
100
34s
3153
opper_context_sample_06
50
28s
4011
opper_context_sample_07
100
24s
3154
opper_context_sample_08
100
34s
3146
opper_context_sample_09
100
34s
3097
opper_context_sample_10
100
1m 39s
3724
opper_context_sample_11
100
24s
3446
opper_context_sample_12
100
18s
3320
opper_context_sample_13
100
28s
3619
opper_context_sample_14
100
33s
3296
opper_context_sample_15
100
17s
3370
opper_context_sample_16
100
1m 39s
3568
opper_context_sample_17
100
24s
6218
opper_context_sample_18
100
19s
6548
opper_context_sample_19
100
34s
6515
opper_context_sample_20
100
38s
7264
opper_context_sample_21
50
34s
6546
opper_context_sample_22
50
28s
6711
opper_context_sample_23
100
38s
7318
opper_context_sample_24
100
34s
79274
opper_context_sample_25
100
34s
79561
opper_context_sample_26
100
34s
79428
opper_context_sample_27
100
56s
79280
opper_context_sample_28
100
34s
79510
opper_context_sample_29
100
27s
79144
opper_context_sample_30