Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

63
fireworks/deepseek-v3
Average duration
7s
Average tokens
19423
Average cost
$0.00
0
6s
2902
opper_context_sample_01
0
7s
2872
opper_context_sample_02
100
6s
2865
opper_context_sample_03
0
6s
2966
opper_context_sample_04
0
5s
2890
opper_context_sample_05
100
3s
2861
opper_context_sample_06
0
6s
2868
opper_context_sample_07
100
6s
2886
opper_context_sample_08
100
4s
2859
opper_context_sample_09
100
5s
2912
opper_context_sample_10
100
6s
3341
opper_context_sample_11
100
5s
3311
opper_context_sample_12
100
7s
3312
opper_context_sample_13
100
4s
3277
opper_context_sample_14
100
5s
3222
opper_context_sample_15
100
7s
3250
opper_context_sample_16
100
5s
3315
opper_context_sample_17
100
6s
6074
opper_context_sample_18
100
8s
6101
opper_context_sample_19
100
7s
6107
opper_context_sample_20
0
8s
6195
opper_context_sample_21
50
5s
6028
opper_context_sample_22
50
8s
6114
opper_context_sample_23
100
7s
6108
opper_context_sample_24
100
15s
81355
opper_context_sample_25
0
12s
81387
opper_context_sample_26
0
13s
81361
opper_context_sample_27
0
14s
81335
opper_context_sample_28
0
12s
81327
opper_context_sample_29
100
12s
81282
opper_context_sample_30