Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

82
fireworks/deepseek-r1
Average duration
10s
Average tokens
19930
Average cost
$0.00
0
28s
5367
opper_context_sample_01
50
7s
3241
opper_context_sample_02
100
5s
3139
opper_context_sample_03
50
36s
6306
opper_context_sample_04
50
6s
3202
opper_context_sample_05
100
6s
3211
opper_context_sample_06
50
7s
3273
opper_context_sample_07
100
6s
3192
opper_context_sample_08
100
4s
3069
opper_context_sample_09
100
5s
3113
opper_context_sample_10
100
5s
3477
opper_context_sample_11
100
5s
3506
opper_context_sample_12
100
5s
3557
opper_context_sample_13
100
6s
3573
opper_context_sample_14
100
4s
3394
opper_context_sample_15
100
6s
3481
opper_context_sample_16
100
7s
3561
opper_context_sample_17
100
6s
6288
opper_context_sample_18
100
8s
6427
opper_context_sample_19
100
9s
6521
opper_context_sample_20
50
10s
6545
opper_context_sample_21
100
6s
6295
opper_context_sample_22
100
10s
6603
opper_context_sample_23
100
15s
6842
opper_context_sample_24
100
7s
81581
opper_context_sample_25
100
15s
81721
opper_context_sample_26
0
15s
81601
opper_context_sample_27
100
9s
81749
opper_context_sample_28
0
24s
82601
opper_context_sample_29
100
9s
81462
opper_context_sample_30