Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

82
gcp/gemini-flash-lite-latest
Average duration
3s
Average tokens
21989
Average cost
$0.00
0
5s
4110
opper_context_sample_01
100
2s
3294
opper_context_sample_02
100
3s
3476
opper_context_sample_03
100
6s
5433
opper_context_sample_04
50
3s
3666
opper_context_sample_05
100
2s
3345
opper_context_sample_06
50
2s
3509
opper_context_sample_07
0
3s
3984
opper_context_sample_08
100
2s
3403
opper_context_sample_09
100
2s
3431
opper_context_sample_10
100
2s
4279
opper_context_sample_11
100
2s
4187
opper_context_sample_12
100
2s
4220
opper_context_sample_13
100
2s
4232
opper_context_sample_14
100
2s
4131
opper_context_sample_15
100
2s
4154
opper_context_sample_16
100
2s
4227
opper_context_sample_17
100
2s
7879
opper_context_sample_18
100
3s
8296
opper_context_sample_19
100
2s
7945
opper_context_sample_20
100
3s
7930
opper_context_sample_21
100
2s
7732
opper_context_sample_22
50
3s
8124
opper_context_sample_23
100
3s
8214
opper_context_sample_24
100
3s
89391
opper_context_sample_25
100
3s
89416
opper_context_sample_26
100
4s
89358
opper_context_sample_27
100
3s
89386
opper_context_sample_28
0
3s
89518
opper_context_sample_29
0
3s
89388
opper_context_sample_30