Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

72
mistral/pixtral-large-latest-eu
Average duration
20s
Average tokens
23530
Average cost
$0.00
0
12s
3493
opper_context_sample_01
100
12s
3467
opper_context_sample_02
100
35s
3440
opper_context_sample_03
0
7s
3620
opper_context_sample_04
50
5s
3483
opper_context_sample_05
100
5s
3443
opper_context_sample_06
50
7s
3442
opper_context_sample_07
100
5s
3455
opper_context_sample_08
100
3s
3429
opper_context_sample_09
50
6s
3451
opper_context_sample_10
100
7s
4510
opper_context_sample_11
100
12s
4429
opper_context_sample_12
100
5s
4429
opper_context_sample_13
50
5s
4430
opper_context_sample_14
100
4s
4387
opper_context_sample_15
50
4s
4393
opper_context_sample_16
100
6s
4489
opper_context_sample_17
100
10s
8400
opper_context_sample_18
100
18s
8561
opper_context_sample_19
50
5s
8486
opper_context_sample_20
100
39s
8758
opper_context_sample_21
100
5s
8430
opper_context_sample_22
50
5s
8469
opper_context_sample_23
100
18s
8524
opper_context_sample_24
100
1m 26s
96743
opper_context_sample_25
0
25s
96786
opper_context_sample_26
0
1m 2s
96779
opper_context_sample_27
100
1m 47s
96739
opper_context_sample_28
0
22s
96714
opper_context_sample_29
100
57s
96719
opper_context_sample_30