Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

88
xai/grok-4
Average duration
22s
Average tokens
17253
Average cost
$0.00
100
36s
2897
opper_context_sample_01
0
27s
2857
opper_context_sample_02
100
18s
2891
opper_context_sample_03
50
1m 39s
2947
opper_context_sample_04
0
21s
2960
opper_context_sample_05
100
10s
2862
opper_context_sample_06
100
19s
2881
opper_context_sample_07
100
19s
2912
opper_context_sample_08
100
19s
2927
opper_context_sample_09
100
18s
2842
opper_context_sample_10
100
10s
3269
opper_context_sample_11
100
9s
3235
opper_context_sample_12
100
13s
3216
opper_context_sample_13
100
19s
3202
opper_context_sample_14
100
24s
3170
opper_context_sample_15
50
13s
3152
opper_context_sample_16
100
8s
3259
opper_context_sample_17
100
19s
5923
opper_context_sample_18
100
13s
5940
opper_context_sample_19
100
13s
5955
opper_context_sample_20
100
11s
6144
opper_context_sample_21
100
9s
5874
opper_context_sample_22
100
24s
6087
opper_context_sample_23
50
20s
6208
opper_context_sample_24
100
27s
70638
opper_context_sample_25
100
34s
70710
opper_context_sample_26
100
27s
70649
opper_context_sample_27
100
29s
70664
opper_context_sample_28
100
21s
70716
opper_context_sample_29
100
19s
70606
opper_context_sample_30