Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

100
Duration
19s
Input Tokens
7846
Output Tokens
731
Cost
$0.00
Context
Input
Which patient had a more dangerous medical timeline?
Expected output
Melisa Farrow
Model output
Melisa Farrow had a significantly more dangerous medical timeline. While Oscar Jansson experienced mild persistent asthma with a viral infection that was managed entirely as an outpatient, Melisa developed acute community-acquired pneumonia with pleural effusion that rapidly deteriorated, requiring emergency hospital admission, IV antibiotics, and oxygen therapy. Her oxygen saturation dropped to 93%, she had significant systemic inflammatory markers (CRP: 98 mg/L, WBC: 13.7), and showed signs of respiratory compromise with tachycardia and tachypnea. Without prompt hospitalization, her condition could have progressed to respiratory failure or sepsis, making her case medically dangerous and requiring urgent intervention, whereas Oscar's condition was never life-threatening and resolved with outpatient management.