Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

67
gcp/gemini-2.0-flash-lite
Average duration
17s
Average tokens
21837
Average cost
$0.00
0
24s
3210
opper_context_sample_01
50
23s
3139
opper_context_sample_02
100
14s
3146
opper_context_sample_03
0
10s
3236
opper_context_sample_04
50
24s
3155
opper_context_sample_05
100
23s
3148
opper_context_sample_06
50
15s
3138
opper_context_sample_07
0
23s
3188
opper_context_sample_08
100
14s
3162
opper_context_sample_09
100
24s
3182
opper_context_sample_10
100
15s
4012
opper_context_sample_11
100
24s
3949
opper_context_sample_12
100
23s
3964
opper_context_sample_13
100
24s
3961
opper_context_sample_14
100
23s
3943
opper_context_sample_15
100
14s
3938
opper_context_sample_16
50
12s
4002
opper_context_sample_17
100
12s
7463
opper_context_sample_18
100
12s
7525
opper_context_sample_19
100
12s
7514
opper_context_sample_20
0
10s
7647
opper_context_sample_21
100
12s
7459
opper_context_sample_22
50
12s
7546
opper_context_sample_23
50
12s
7639
opper_context_sample_24
100
12s
90539
opper_context_sample_25
0
11s
90491
opper_context_sample_26
100
18s
90437
opper_context_sample_27
100
15s
90498
opper_context_sample_28
0
11s
90467
opper_context_sample_29
0
18s
90405
opper_context_sample_30