Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

12
mistral/mistral-tiny-eu
Average duration
14s
Average tokens
4447
Average cost
$0.00
0
24s
3407
opper_context_sample_01
0
20s
3730
opper_context_sample_02
0
2s
3403
opper_context_sample_03
0
35s
4892
opper_context_sample_04
0
11s
3415
opper_context_sample_05
0
42s
3432
opper_context_sample_06
0
19s
3399
opper_context_sample_07
0
22s
3434
opper_context_sample_08
100
34s
3419
opper_context_sample_09
100
16s
5188
opper_context_sample_10
0
4s
4940
opper_context_sample_11
0
8s
6002
opper_context_sample_12
0
23s
4923
opper_context_sample_13
50
12s
5666
opper_context_sample_14
0
11s
5479
opper_context_sample_15
0
N/A
0
opper_context_sample_16
0
7s
5056
opper_context_sample_17
0
32s
8454
opper_context_sample_18
0
5s
8453
opper_context_sample_19
0
34s
10303
opper_context_sample_20
0
11s
10424
opper_context_sample_21
100
4s
8366
opper_context_sample_22
0
16s
8535
opper_context_sample_23
0
10s
9085
opper_context_sample_24
0
N/A
0
opper_context_sample_25
0
N/A
0
opper_context_sample_26
0
N/A
0
opper_context_sample_27
0
N/A
0
opper_context_sample_28
0
N/A
0
opper_context_sample_29
0
N/A
0
opper_context_sample_30