Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

48
mistral/magistral-medium-2506-eu
Average duration
34s
Average tokens
3666
Average cost
$0.00
0
43s
3299
opper_context_sample_01
0
13s
3222
opper_context_sample_02
100
5m 41s
3151
opper_context_sample_03
0
1m
3745
opper_context_sample_04
50
38s
3257
opper_context_sample_05
100
44s
3267
opper_context_sample_06
50
6s
3172
opper_context_sample_07
0
1m 40s
3163
opper_context_sample_08
100
36s
3163
opper_context_sample_09
100
10s
3217
opper_context_sample_10
100
42s
3718
opper_context_sample_11
100
16s
3907
opper_context_sample_12
50
8s
3795
opper_context_sample_13
100
14s
3757
opper_context_sample_14
100
12s
3671
opper_context_sample_15
100
5s
3693
opper_context_sample_16
0
38s
3807
opper_context_sample_17
100
47s
7013
opper_context_sample_18
100
11s
7038
opper_context_sample_19
50
15s
7159
opper_context_sample_20
100
17s
7422
opper_context_sample_21
0
14s
6937
opper_context_sample_22
0
37s
8223
opper_context_sample_23
50
40s
7194
opper_context_sample_24
0
N/A
0
opper_context_sample_25
0
N/A
0
opper_context_sample_26
0
N/A
0
opper_context_sample_27
0
N/A
0
opper_context_sample_28
0
N/A
0
opper_context_sample_29
0
N/A
0
opper_context_sample_30