Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

97
openai/gpt-5-mini
Average duration
23s
Average tokens
19447
Average cost
$0.00
100
26s
4337
opper_context_sample_01
100
19s
3127
opper_context_sample_02
100
11s
3191
opper_context_sample_03
100
48s
6043
opper_context_sample_04
50
17s
3546
opper_context_sample_05
100
9s
3026
opper_context_sample_06
100
1m 12s
3698
opper_context_sample_07
100
15s
3355
opper_context_sample_08
100
14s
3319
opper_context_sample_09
100
13s
3171
opper_context_sample_10
100
9s
3548
opper_context_sample_11
100
16s
3711
opper_context_sample_12
100
12s
3656
opper_context_sample_13
100
10s
3493
opper_context_sample_14
100
7s
3306
opper_context_sample_15
100
10s
3390
opper_context_sample_16
100
18s
3644
opper_context_sample_17
100
1m 18s
6336
opper_context_sample_18
100
15s
6345
opper_context_sample_19
100
18s
6586
opper_context_sample_20
100
23s
7022
opper_context_sample_21
100
12s
6387
opper_context_sample_22
50
1m 30s
6576
opper_context_sample_23
100
34s
6706
opper_context_sample_24
100
11s
79164
opper_context_sample_25
100
13s
79386
opper_context_sample_26
100
16s
79314
opper_context_sample_27
100
12s
79270
opper_context_sample_28
100
18s
79633
opper_context_sample_29
100
11s
79127
opper_context_sample_30