See How AI Models Perform

Benchmark results across diverse tasks — reasoning, generation, and understanding — so you can choose the right model with confidence.

Context Reasoning

Context understanding and reasoning tasks test accurate answers grounded in provided context. This capability is essential for knowledge-base support bots, policy lookup systems, and internal knowledg...

View task

gemini-3.1-pro-preview

claude-opus-4.5

claude-sonnet-4.5

SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces....

Agents

AI agent reasoning and tool selection tests planning, tool choice, and self-diagnosis capabilities. These skills are crucial for autonomous ticket triage systems and complex agent workflows where mode...

View task

claude-opus-4.6

gemini-3.1-pro-preview

claude-sonnet-4.6

Data Normalization

Data processing and normalization tasks evaluate structured output from messy prose and different structures. This capability is essential for catalogue and product pipelines....

View task

gemini-3-flash-preview

gemini-3-pro-preview

kimi-k2-thinking-turbo

Swedish Language Understanding

Multilingual comprehension and reasoning in Swedish, covering fact checking, summarization, inference, literary and legal analysis, and dialectal understanding....