Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

87
xai/grok-3
Average duration
40s
Average tokens
17290
Average cost
$0.00
0
1m
2875
opper_context_sample_01
0
53s
2828
opper_context_sample_02
100
29s
2894
opper_context_sample_03
50
1m 27s
3235
opper_context_sample_04
0
43s
2893
opper_context_sample_05
100
55s
2871
opper_context_sample_06
100
1m 6s
3042
opper_context_sample_07
100
1m
2941
opper_context_sample_08
100
41s
2950
opper_context_sample_09
100
1m
2871
opper_context_sample_10
100
1m
3260
opper_context_sample_11
100
1m
3214
opper_context_sample_12
100
55s
3216
opper_context_sample_13
100
1m
3247
opper_context_sample_14
100
26s
3142
opper_context_sample_15
100
27s
3187
opper_context_sample_16
100
32s
3292
opper_context_sample_17
100
29s
5966
opper_context_sample_18
100
42s
6271
opper_context_sample_19
100
17s
6019
opper_context_sample_20
100
34s
6175
opper_context_sample_21
100
8s
5918
opper_context_sample_22
50
15s
5977
opper_context_sample_23
100
35s
6258
opper_context_sample_24
100
27s
70684
opper_context_sample_25
100
28s
70750
opper_context_sample_26
100
26s
70673
opper_context_sample_27
100
27s
70696
opper_context_sample_28
100
26s
70694
opper_context_sample_29
100
28s
70656
opper_context_sample_30