Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

78
mistral/mistral-large-eu
Average duration
18s
Average tokens
22013
Average cost
$0.00
0
21s
3868
opper_context_sample_01
0
7s
3270
opper_context_sample_02
100
9s
3321
opper_context_sample_03
50
16s
3693
opper_context_sample_04
0
16s
3349
opper_context_sample_05
100
43s
3288
opper_context_sample_06
100
32s
3432
opper_context_sample_07
100
16s
3434
opper_context_sample_08
100
14s
3327
opper_context_sample_09
100
8s
3280
opper_context_sample_10
100
10s
3973
opper_context_sample_11
100
16s
4035
opper_context_sample_12
100
14s
3877
opper_context_sample_13
100
8s
3906
opper_context_sample_14
100
9s
3743
opper_context_sample_15
100
6s
3789
opper_context_sample_16
100
13s
3958
opper_context_sample_17
100
9s
7086
opper_context_sample_18
100
24s
7836
opper_context_sample_19
50
13s
7152
opper_context_sample_20
100
36s
8346
opper_context_sample_21
100
11s
7067
opper_context_sample_22
50
18s
7250
opper_context_sample_23
100
23s
7368
opper_context_sample_24
100
25s
91112
opper_context_sample_25
100
23s
91085
opper_context_sample_26
0
42s
91317
opper_context_sample_27
100
24s
91129
opper_context_sample_28
0
27s
91104
opper_context_sample_29
100
20s
90997
opper_context_sample_30