Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

93
gcp/gemini-flash-latest
Average duration
6s
Average tokens
22376
Average cost
$0.00
100
7s
4125
opper_context_sample_01
100
5s
3767
opper_context_sample_02
100
6s
3971
opper_context_sample_03
50
17s
6020
opper_context_sample_04
50
6s
3923
opper_context_sample_05
100
3s
3472
opper_context_sample_06
50
6s
4071
opper_context_sample_07
100
5s
3819
opper_context_sample_08
100
5s
3890
opper_context_sample_09
100
3s
3620
opper_context_sample_10
100
5s
4772
opper_context_sample_11
100
4s
4434
opper_context_sample_12
100
4s
4543
opper_context_sample_13
100
4s
4540
opper_context_sample_14
100
3s
4275
opper_context_sample_15
100
3s
4282
opper_context_sample_16
100
5s
4634
opper_context_sample_17
100
6s
8317
opper_context_sample_18
100
10s
8955
opper_context_sample_19
100
8s
8640
opper_context_sample_20
100
7s
8665
opper_context_sample_21
100
4s
7992
opper_context_sample_22
50
10s
8904
opper_context_sample_23
100
8s
8792
opper_context_sample_24
100
7s
89838
opper_context_sample_25
100
9s
90207
opper_context_sample_26
100
5s
89461
opper_context_sample_27
100
7s
89864
opper_context_sample_28
100
6s
89818
opper_context_sample_29
100
7s
89676
opper_context_sample_30