Context Reasoning
Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledg...
View taskBenchmark results across diverse tasks — reasoning, generation, and understanding — so you can choose the right model with confidence.
Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledg...
View taskNatural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces....
View taskAI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where mode...
View taskData processing and normalization tasks evaluate structured output from messy prose and different structures. This capability is essential for catalogue and product pipelines....
View taskMultilingual comprehension and reasoning in Swedish, covering fact checking, summarization, inference, literary and legal analysis, and dialectal understanding....
View task