Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

98
anthropic/claude-sonnet-4.5
Average duration
10s
Average tokens
22208
Average cost
$0.00
100
14s
3865
opper_context_sample_01
100
8s
3501
opper_context_sample_02
100
10s
3632
opper_context_sample_03
50
15s
4063
opper_context_sample_04
100
11s
3678
opper_context_sample_05
100
10s
3491
opper_context_sample_06
100
9s
3654
opper_context_sample_07
100
9s
3633
opper_context_sample_08
100
9s
3664
opper_context_sample_09
100
5s
3472
opper_context_sample_10
100
9s
4540
opper_context_sample_11
100
8s
4471
opper_context_sample_12
100
7s
4372
opper_context_sample_13
100
9s
4365
opper_context_sample_14
100
5s
4277
opper_context_sample_15
100
6s
4284
opper_context_sample_16
100
8s
4456
opper_context_sample_17
100
10s
8225
opper_context_sample_18
100
19s
8577
opper_context_sample_19
100
14s
8321
opper_context_sample_20
100
14s
8573
opper_context_sample_21
100
8s
8080
opper_context_sample_22
100
16s
8434
opper_context_sample_23
100
12s
8312
opper_context_sample_24
100
13s
90147
opper_context_sample_25
100
12s
90069
opper_context_sample_26
100
10s
90036
opper_context_sample_27
100
9s
89997
opper_context_sample_28
100
11s
90169
opper_context_sample_29
100
7s
89891
opper_context_sample_30