Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

0
Duration
24s
Input Tokens
5882
Output Tokens
220
Cost
$0.00
Context
Input
Which patient had a more dangerous medical timeline?
Expected output
Melisa Farrow
Model output
Oscar Jansson had a more dangerous medical timeline because he was diagnosed with a chronic condition (mild persistent asthma) with ongoing management, multiple tests and visits, and a documented history of respiratory symptoms over time. In contrast, Melisa Farrow experienced an acute, resolved pneumonia without long-term complications.