Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

93
gcp/gemini-2.5-flash
Average duration
7s
Average tokens
22132
Average cost
$0.00
100
10s
4610
opper_context_sample_01
100
4s
3274
opper_context_sample_02
100
5s
3275
opper_context_sample_03
0
23s
7077
opper_context_sample_04
100
11s
4325
opper_context_sample_05
100
5s
3409
opper_context_sample_06
50
15s
4732
opper_context_sample_07
100
7s
3529
opper_context_sample_08
100
7s
3578
opper_context_sample_09
100
7s
3649
opper_context_sample_10
100
5s
4169
opper_context_sample_11
100
5s
4176
opper_context_sample_12
100
5s
4166
opper_context_sample_13
100
7s
4336
opper_context_sample_14
100
4s
4133
opper_context_sample_15
100
4s
4010
opper_context_sample_16
100
8s
4261
opper_context_sample_17
100
5s
8007
opper_context_sample_18
100
10s
8636
opper_context_sample_19
100
3s
7753
opper_context_sample_20
100
6s
8311
opper_context_sample_21
100
7s
7751
opper_context_sample_22
50
8s
8187
opper_context_sample_23
100
7s
8169
opper_context_sample_24
100
7s
89451
opper_context_sample_25
100
7s
89448
opper_context_sample_26
100
6s
89316
opper_context_sample_27
100
7s
89424
opper_context_sample_28
100
6s
89522
opper_context_sample_29
100
5s
89264
opper_context_sample_30