Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

97
anthropic/claude-sonnet-4
Average duration
10s
Average tokens
22091
Average cost
$0.00
100
10s
3708
opper_context_sample_01
100
7s
3384
opper_context_sample_02
100
8s
3485
opper_context_sample_03
50
11s
3698
opper_context_sample_04
100
11s
3671
opper_context_sample_05
100
7s
3382
opper_context_sample_06
100
8s
3542
opper_context_sample_07
100
9s
3495
opper_context_sample_08
100
8s
3510
opper_context_sample_09
100
8s
3462
opper_context_sample_10
100
13s
4389
opper_context_sample_11
100
7s
4275
opper_context_sample_12
100
6s
4223
opper_context_sample_13
100
7s
4250
opper_context_sample_14
100
6s
4181
opper_context_sample_15
100
5s
4221
opper_context_sample_16
100
7s
4308
opper_context_sample_17
100
11s
8201
opper_context_sample_18
100
15s
8404
opper_context_sample_19
100
10s
8071
opper_context_sample_20
100
12s
8405
opper_context_sample_21
100
8s
8018
opper_context_sample_22
100
11s
8191
opper_context_sample_23
100
12s
8313
opper_context_sample_24
100
16s
90232
opper_context_sample_25
100
14s
90029
opper_context_sample_26
100
8s
89821
opper_context_sample_27
100
19s
89955
opper_context_sample_28
50
26s
90029
opper_context_sample_29
100
10s
89881
opper_context_sample_30