Context Reasoning
Answer questions accurately based on a given document or knowledge source — without hallucinating what isn't there. The core skill behind doc Q&A, support bots, and internal knowledge lookup....
View taskLLM Evaluations
How leading models perform across the work AI actually gets put on — context reasoning, SQL generation, agent decisions, data extraction, and multilingual understanding. Each task scored against ground truth.
Answer questions accurately based on a given document or knowledge source — without hallucinating what isn't there. The core skill behind doc Q&A, support bots, and internal knowledge lookup....
View taskTurn plain-English questions into correct SQL against a schema. The skill behind analytics chat, natural-language dashboards, and 'just ask the database' interfaces....
View taskDecide what to do next: pick the right tool, plan multi-step actions, recover when something goes wrong. The hardest category — open-ended decisions, no single correct answer....
View taskPull structured data out of messy text — ingredients from a recipe, fields from a contract, attributes from a product description. The plumbing behind catalogue, intake, and processing pipelines....
View taskHow well models follow Swedish instructions and produce well-formed Swedish answers — covering disambiguation, summarisation, inference, and dialectal nuance....
View task