Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

77
groq/gpt-oss-20b
Average duration
4s
Average tokens
19355
Average cost
$0.00
0
8s
5321
opper_context_sample_01
50
4s
2996
opper_context_sample_02
100
4s
3076
opper_context_sample_03
50
5s
4749
opper_context_sample_04
0
2s
3048
opper_context_sample_05
100
2s
3141
opper_context_sample_06
50
2s
3414
opper_context_sample_07
100
3s
3419
opper_context_sample_08
100
2s
3052
opper_context_sample_09
100
2s
3030
opper_context_sample_10
100
2s
3649
opper_context_sample_11
100
2s
3696
opper_context_sample_12
100
3s
3472
opper_context_sample_13
100
2s
3498
opper_context_sample_14
100
2s
3469
opper_context_sample_15
100
2s
3325
opper_context_sample_16
100
2s
3716
opper_context_sample_17
100
2s
6253
opper_context_sample_18
100
2s
6239
opper_context_sample_19
100
4s
6393
opper_context_sample_20
100
2s
6605
opper_context_sample_21
100
3s
6240
opper_context_sample_22
50
4s
6489
opper_context_sample_23
100
4s
6639
opper_context_sample_24
100
9s
79236
opper_context_sample_25
0
9s
79283
opper_context_sample_26
0
9s
79163
opper_context_sample_27
100
9s
79272
opper_context_sample_28
100
10s
79598
opper_context_sample_29
0
8s
79180
opper_context_sample_30