Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

100
Duration
10s
Input Tokens
7592
Output Tokens
1363
Cost
$0.00
Context
Input
Which patient had a more dangerous medical timeline?
Expected output
Melisa Farrow
Model output
Melisa Farrow had a more dangerous medical timeline. Her condition rapidly progressed from initial symptoms to confirmed Community-Acquired Pneumonia (CAP) with associated pleural effusion. This required urgent unscheduled hospitalization, treatment with intravenous antibiotics, and supplemental oxygen due to significant systemic signs of infection and hypoxia (O2 saturation dropped to 93%). In contrast, Oscar Jansson's timeline involved the diagnosis and management of a mild persistent asthma exacerbation, which was treated successfully with outpatient inhaled medication and resulted in a full recovery without ever requiring hospitalization or critical care interventions.