Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

77
gcp/gemini-2.5-flash-lite
Average duration
13s
Average tokens
21737
Average cost
$0.00
0
18s
3276
opper_context_sample_01
0
13s
3160
opper_context_sample_02
100
17s
3249
opper_context_sample_03
0
15s
3359
opper_context_sample_04
50
17s
3281
opper_context_sample_05
100
18s
3338
opper_context_sample_06
100
15s
3374
opper_context_sample_07
100
13s
3293
opper_context_sample_08
100
15s
3309
opper_context_sample_09
0
17s
3441
opper_context_sample_10
100
15s
4257
opper_context_sample_11
100
13s
4100
opper_context_sample_12
100
13s
4036
opper_context_sample_13
100
17s
4076
opper_context_sample_14
100
17s
3988
opper_context_sample_15
100
9s
3983
opper_context_sample_16
100
9s
4092
opper_context_sample_17
100
9s
7587
opper_context_sample_18
100
9s
8140
opper_context_sample_19
50
8s
7738
opper_context_sample_20
100
8s
7853
opper_context_sample_21
100
8s
7564
opper_context_sample_22
50
8s
7801
opper_context_sample_23
100
6s
7888
opper_context_sample_24
100
12s
89218
opper_context_sample_25
100
13s
89606
opper_context_sample_26
100
15s
89268
opper_context_sample_27
0
11s
89263
opper_context_sample_28
50
13s
89367
opper_context_sample_29
100
29s
89211
opper_context_sample_30