Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

88
groq/gpt-oss-120b
Average duration
4s
Average tokens
19311
Average cost
$0.00
100
4s
3674
opper_context_sample_01
100
2s
3135
opper_context_sample_02
100
2s
3183
opper_context_sample_03
100
5s
4556
opper_context_sample_04
0
3s
3173
opper_context_sample_05
100
2s
3042
opper_context_sample_06
100
3s
3328
opper_context_sample_07
100
3s
3203
opper_context_sample_08
100
2s
3162
opper_context_sample_09
100
2s
3128
opper_context_sample_10
100
3s
3867
opper_context_sample_11
100
2s
3423
opper_context_sample_12
100
2s
3443
opper_context_sample_13
100
2s
3523
opper_context_sample_14
100
2s
3402
opper_context_sample_15
50
2s
3485
opper_context_sample_16
100
3s
3625
opper_context_sample_17
100
3s
6258
opper_context_sample_18
100
3s
6248
opper_context_sample_19
100
3s
6341
opper_context_sample_20
100
3s
6714
opper_context_sample_21
100
3s
6348
opper_context_sample_22
100
3s
6398
opper_context_sample_23
100
3s
6774
opper_context_sample_24
100
6s
79235
opper_context_sample_25
100
6s
79589
opper_context_sample_26
0
5s
79197
opper_context_sample_27
100
8s
79247
opper_context_sample_28
0
8s
79442
opper_context_sample_29
100
11s
79197
opper_context_sample_30