Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

90
anthropic/claude-3.5-sonnet
Average duration
11s
Average tokens
21961
Average cost
$0.00
100
10s
3438
opper_context_sample_01
100
8s
3327
opper_context_sample_02
100
10s
3361
opper_context_sample_03
50
12s
3623
opper_context_sample_04
100
9s
3420
opper_context_sample_05
100
8s
3371
opper_context_sample_06
100
11s
3374
opper_context_sample_07
100
11s
3414
opper_context_sample_08
100
9s
3398
opper_context_sample_09
100
8s
3364
opper_context_sample_10
100
10s
4276
opper_context_sample_11
100
10s
4261
opper_context_sample_12
100
8s
4224
opper_context_sample_13
100
9s
4232
opper_context_sample_14
100
10s
4168
opper_context_sample_15
100
8s
4125
opper_context_sample_16
100
9s
4252
opper_context_sample_17
100
11s
8006
opper_context_sample_18
100
11s
8054
opper_context_sample_19
50
9s
7943
opper_context_sample_20
100
12s
8110
opper_context_sample_21
100
9s
7890
opper_context_sample_22
100
11s
8045
opper_context_sample_23
100
10s
8056
opper_context_sample_24
100
18s
89855
opper_context_sample_25
100
18s
89848
opper_context_sample_26
100
17s
89824
opper_context_sample_27
100
16s
89843
opper_context_sample_28
0
29s
89973
opper_context_sample_29
0
14s
89757
opper_context_sample_30