Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

77
gcp/gemini-2.0-flash
Average duration
12s
Average tokens
21862
Average cost
$0.00
100
22s
3255
opper_context_sample_01
100
6s
3159
opper_context_sample_02
100
21s
3183
opper_context_sample_03
0
21s
3518
opper_context_sample_04
0
22s
3211
opper_context_sample_05
100
6s
3126
opper_context_sample_06
50
22s
3149
opper_context_sample_07
0
22s
3159
opper_context_sample_08
100
21s
3182
opper_context_sample_09
50
21s
3182
opper_context_sample_10
100
6s
4067
opper_context_sample_11
100
6s
3958
opper_context_sample_12
100
21s
3938
opper_context_sample_13
100
6s
4000
opper_context_sample_14
100
21s
3941
opper_context_sample_15
100
6s
3950
opper_context_sample_16
100
6s
4042
opper_context_sample_17
100
6s
7463
opper_context_sample_18
100
6s
7481
opper_context_sample_19
50
6s
7499
opper_context_sample_20
100
6s
7887
opper_context_sample_21
100
6s
7501
opper_context_sample_22
50
6s
7525
opper_context_sample_23
100
6s
7763
opper_context_sample_24
100
6s
90456
opper_context_sample_25
100
10s
90496
opper_context_sample_26
100
9s
90440
opper_context_sample_27
0
11s
90505
opper_context_sample_28
0
9s
90442
opper_context_sample_29
100
9s
90393
opper_context_sample_30