SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces where users need to query data using natural language. Models must understand both the intent behind the question and the structure of the underlying database schema.

90
anthropic/claude-sonnet-4
Average duration
12s
Average tokens
1091
Average cost
$0.00
100
11s
735
opper_sql_sample_01
100
11s
760
opper_sql_sample_02
100
11s
764
opper_sql_sample_03
100
11s
789
opper_sql_sample_04
100
11s
824
opper_sql_sample_05
100
11s
837
opper_sql_sample_06
100
11s
748
opper_sql_sample_07
75
11s
748
opper_sql_sample_08
100
11s
765
opper_sql_sample_09
100
11s
807
opper_sql_sample_10
100
11s
1044
opper_sql_sample_11
100
11s
1066
opper_sql_sample_12
100
11s
1093
opper_sql_sample_13
100
11s
1096
opper_sql_sample_14
100
11s
1112
opper_sql_sample_15
100
12s
1166
opper_sql_sample_16
100
12s
1108
opper_sql_sample_17
75
12s
1197
opper_sql_sample_18
50
12s
1107
opper_sql_sample_19
75
12s
1224
opper_sql_sample_20
100
12s
1280
opper_sql_sample_21
100
12s
1381
opper_sql_sample_22
100
12s
1426
opper_sql_sample_23
75
11s
1370
opper_sql_sample_24
75
14s
1414
opper_sql_sample_25
100
14s
1374
opper_sql_sample_26
100
14s
1377
opper_sql_sample_27
75
11s
1444
opper_sql_sample_28
0
14s
1391
opper_sql_sample_29
100
14s
1292
opper_sql_sample_30