Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

0
Duration
12s
Input Tokens
3061
Output Tokens
645
Cost
$0.00
Context
Input
How many cities does the author mention
Expected output
19
Model output

      The author mentions **17 cities** in total across their journal entries:
      - **Colombia**: Cartagena, Leticia (2)
      - **Spain**: Granada, Córdoba, Sevilla, Cádiz, Málaga, Jaén, Toledo, Madrid, Ávila, Salamanca, Zamora, San Sebastián, Bilbao (13)
      - **Morocco**: Tangier, Chefchaouen (2).
      \n      *Note*: Manchester (England) is referenced but not counted as part of the travel log.