Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

53
openai/gpt-4o-mini
Average duration
21s
Average tokens
18949
Average cost
$0.00
0
9s
2834
opper_context_sample_01
50
8s
2805
opper_context_sample_02
100
9s
2849
opper_context_sample_03
0
15s
3328
opper_context_sample_04
0
8s
2856
opper_context_sample_05
100
8s
2825
opper_context_sample_06
0
8s
2840
opper_context_sample_07
100
1m 27s
2835
opper_context_sample_08
100
13s
2840
opper_context_sample_09
100
8s
2845
opper_context_sample_10
100
12s
3322
opper_context_sample_11
100
8s
3222
opper_context_sample_12
100
8s
3207
opper_context_sample_13
50
8s
3225
opper_context_sample_14
100
6s
3172
opper_context_sample_15
100
8s
3201
opper_context_sample_16
100
9s
3266
opper_context_sample_17
100
8s
5942
opper_context_sample_18
100
18s
6059
opper_context_sample_19
0
9s
6024
opper_context_sample_20
0
9s
6037
opper_context_sample_21
0
9s
6002
opper_context_sample_22
0
8s
5982
opper_context_sample_23
100
16s
6658
opper_context_sample_24
100
2m 14s
79057
opper_context_sample_25
0
17s
79102
opper_context_sample_26
0
30s
79028
opper_context_sample_27
0
1m 47s
79030
opper_context_sample_28
0
15s
79041
opper_context_sample_29
0
17s
79032
opper_context_sample_30