Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

78
anthropic/claude-3.5-haiku
Average duration
9s
Average tokens
21931
Average cost
$0.00
0
6s
3447
opper_context_sample_01
100
4s
3304
opper_context_sample_02
100
4s
3320
opper_context_sample_03
0
7s
3505
opper_context_sample_04
0
6s
3355
opper_context_sample_05
100
4s
3320
opper_context_sample_06
50
5s
3328
opper_context_sample_07
100
4s
3287
opper_context_sample_08
100
5s
3366
opper_context_sample_09
100
4s
3313
opper_context_sample_10
100
7s
4262
opper_context_sample_11
100
7s
4227
opper_context_sample_12
100
5s
4221
opper_context_sample_13
100
6s
4224
opper_context_sample_14
100
5s
4161
opper_context_sample_15
100
4s
4163
opper_context_sample_16
100
5s
4212
opper_context_sample_17
100
6s
7917
opper_context_sample_18
100
9s
8059
opper_context_sample_19
50
6s
7929
opper_context_sample_20
100
9s
8114
opper_context_sample_21
100
7s
7892
opper_context_sample_22
0
8s
8012
opper_context_sample_23
100
7s
8014
opper_context_sample_24
100
23s
89878
opper_context_sample_25
100
22s
89838
opper_context_sample_26
100
18s
89794
opper_context_sample_27
50
21s
89866
opper_context_sample_28
0
21s
89828
opper_context_sample_29
100
19s
89766
opper_context_sample_30