Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

100
Duration
20s
Input Tokens
7455
Output Tokens
1581
Cost
$0.00
Context
Input
Which patient had a more dangerous medical timeline?
Expected output
Melisa Farrow
Model output
Melisa Farrow had a more dangerous medical timeline. Her condition, community-acquired pneumonia, escalated rapidly within 24 hours, leading to a high fever, elevated heart rate, and a drop in oxygen saturation to 93%. This required immediate hospitalization, intravenous antibiotics, and supplemental oxygen. In contrast, Oscar Jansson's condition was a manageable flare-up of mild persistent asthma that was treated entirely on an outpatient basis without any life-threatening signs like oxygen desaturation.