Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

82
groq/moonshotai/kimi-k2-instruct
Average duration
26s
Average tokens
17930
Average cost
$0.00
0
4s
2923
opper_context_sample_01
50
5s
2865
opper_context_sample_02
100
6s
2899
opper_context_sample_03
0
12s
3050
opper_context_sample_04
0
5s
2953
opper_context_sample_05
100
3s
2881
opper_context_sample_06
0
4s
2877
opper_context_sample_07
100
3s
2933
opper_context_sample_08
100
6s
2907
opper_context_sample_09
100
3s
2946
opper_context_sample_10
100
14s
3330
opper_context_sample_11
100
4s
3247
opper_context_sample_12
100
8s
3286
opper_context_sample_13
100
4s
3294
opper_context_sample_14
100
33s
3218
opper_context_sample_15
100
3s
3246
opper_context_sample_16
100
9s
3306
opper_context_sample_17
100
17s
6143
opper_context_sample_18
100
10s
6114
opper_context_sample_19
100
9s
6189
opper_context_sample_20
100
5s
6378
opper_context_sample_21
100
6s
6055
opper_context_sample_22
100
16s
6122
opper_context_sample_23
100
17s
6314
opper_context_sample_24
100
12s
73709
opper_context_sample_25
100
1m 3s
73762
opper_context_sample_26
100
1m 9s
73745
opper_context_sample_27
100
56s
73754
opper_context_sample_28
0
2m 57s
73752
opper_context_sample_29
100
3m 23s
73705
opper_context_sample_30