Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledge Q&A applications. Models are evaluated on their ability to provide accurate answers that are properly grounded in the given context rather than hallucinating information.

0
Duration
4s
Input Tokens
2804
Output Tokens
343
Cost
$0.00
Context
Input
Order all the cities named in the text by time of visiting them
Expected output
1.Cartagena (14 Jun 1995) 2.Leticia (mid-Jun 1995, the day after Cartagena) 3.Manchester (3 Oct 2002 entry) 4.Tangier (mid-Apr 2004, one week before 22 Apr) 5.Chefchaouen (mid-Apr 2004, just after Tangier) 6.Granada (22 Apr 2004) 7.Córdoba 8.Sevilla 9.Cádiz 10.Málaga 11.Jaén 12.Toledo 13.Madrid 14.Ávila 15.Salamanca 16.Zamora 17.San Sebastián 18.Bilbao (8 May 2004)
Model output
1. Cartagena, Colombia - June 14, 1995
2. Leticia, Colombia (mentioned as a destination from Cartagena) - June 15, 1995 onwards
3. Granada, Spain - April 22, 2004
4. Córdoba, Spain - after April 22, 2004
5. Sevilla, Spain - after Córdoba
6. Cádiz, Spain - after Sevilla
7. Málaga, Spain - after Cádiz
8. Jaén, Spain - after Málaga
9. Toledo, Spain - after Jaén
10. Madrid, Spain - after Toledo
11. Ávila, Spain - after Madrid
12. Salamanca, Spain - after Ávila
13. Zamora, Spain - after Salamanca
14. San Sebastián, Spain - after Zamora
15. Bilbao, Spain - May 8, 2004
16. Manchester, England - October 3, 2002 (visited before, as it's a reflection on past travels)
17. Chefchaouen, Morocco - before April 22, 2004 (mentioned as being visited before arriving in Granada)
18. Tangier, Morocco - before Chefchaouen (mentioned as being on the way from Morocco to Spain)
19. Valencia, Spain - mentioned as a potential future destination from Granada