Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

87
mistral/mistral-medium-2508-eu
Average duration
16s
Average tokens
22086
Average cost
$0.00
0
12s
3706
opper_context_sample_01
100
7s
3334
opper_context_sample_02
100
7s
3279
opper_context_sample_03
50
9s
3670
opper_context_sample_04
50
10s
3447
opper_context_sample_05
100
4s
3228
opper_context_sample_06
100
5s
3331
opper_context_sample_07
100
10s
3597
opper_context_sample_08
100
10s
3388
opper_context_sample_09
100
5s
3346
opper_context_sample_10
100
10s
4007
opper_context_sample_11
100
10s
3956
opper_context_sample_12
100
6s
3867
opper_context_sample_13
100
6s
3921
opper_context_sample_14
100
6s
3831
opper_context_sample_15
100
6s
3810
opper_context_sample_16
100
6s
3965
opper_context_sample_17
100
6s
7168
opper_context_sample_18
100
41s
8622
opper_context_sample_19
100
11s
7339
opper_context_sample_20
100
18s
7922
opper_context_sample_21
100
21s
7135
opper_context_sample_22
100
13s
7468
opper_context_sample_23
100
27s
7592
opper_context_sample_24
100
40s
91217
opper_context_sample_25
100
37s
91171
opper_context_sample_26
0
38s
91331
opper_context_sample_27
100
26s
91179
opper_context_sample_28
0
47s
91787
opper_context_sample_29
100
23s
90971
opper_context_sample_30