Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

100
Duration
25s
Input Tokens
5986
Output Tokens
1507
Cost
$0.00
Context
Input
Which patient had a more dangerous medical timeline?
Expected output
Melisa Farrow
Model output
Melisa Farrow had a more dangerous medical timeline. While Oscar Jansson had mild persistent asthma with a respiratory infection that was treated outpatient, Melisa developed severe pneumonia with pleural effusion that required hospitalization and intravenous antibiotics. Her condition rapidly deteriorated within 24 hours, leading to decreased oxygen saturation (93%), high fever, and respiratory distress that necessitated immediate aggressive treatment to prevent potential respiratory compromise. Pneumonia with pleural effusion carries a higher immediate health risk compared to Oscar's asthma exacerbation.