Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

78
openai/o1-mini
Average duration
37s
Average tokens
19998
Average cost
$0.00
0
35s
4219
opper_context_sample_01
50
35s
3530
opper_context_sample_02
100
14s
3565
opper_context_sample_03
50
34s
4861
opper_context_sample_04
0
34s
3930
opper_context_sample_05
100
34s
3441
opper_context_sample_06
50
14s
3464
opper_context_sample_07
100
35s
3792
opper_context_sample_08
100
35s
3205
opper_context_sample_09
100
15s
3884
opper_context_sample_10
100
34s
4413
opper_context_sample_11
100
34s
3774
opper_context_sample_12
100
34s
3712
opper_context_sample_13
100
34s
3858
opper_context_sample_14
100
14s
3770
opper_context_sample_15
50
34s
3858
opper_context_sample_16
100
16s
4376
opper_context_sample_17
100
34s
6510
opper_context_sample_18
100
14s
6840
opper_context_sample_19
50
14s
6924
opper_context_sample_20
100
34s
7807
opper_context_sample_21
100
34s
7463
opper_context_sample_22
50
1m 34s
8079
opper_context_sample_23
50
14s
7319
opper_context_sample_24
100
34s
80045
opper_context_sample_25
0
1m 7s
82233
opper_context_sample_26
100
1m 52s
79772
opper_context_sample_27
100
34s
79859
opper_context_sample_28
100
26s
81194
opper_context_sample_29
100
1m 43s
80238
opper_context_sample_30