SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces where users need to query data using natural language. Models must understand both the intent behind the question and the structure of the underlying database schema.

93
openai/o3-mini
Average duration
31s
Average tokens
1281
Average cost
$0.00
100
42s
808
opper_sql_sample_01
100
26s
840
opper_sql_sample_02
100
23s
828
opper_sql_sample_03
100
23s
819
opper_sql_sample_04
100
23s
851
opper_sql_sample_05
100
42s
965
opper_sql_sample_06
100
26s
877
opper_sql_sample_07
100
26s
808
opper_sql_sample_08
100
26s
759
opper_sql_sample_09
100
23s
862
opper_sql_sample_10
100
26s
955
opper_sql_sample_11
100
41s
1114
opper_sql_sample_12
100
41s
1263
opper_sql_sample_13
100
23s
1140
opper_sql_sample_14
75
41s
1103
opper_sql_sample_15
100
23s
1466
opper_sql_sample_16
100
41s
1146
opper_sql_sample_17
100
41s
1532
opper_sql_sample_18
50
26s
1690
opper_sql_sample_19
100
41s
1837
opper_sql_sample_20
100
1m 8s
1220
opper_sql_sample_21
100
23s
1434
opper_sql_sample_22
100
23s
1547
opper_sql_sample_23
75
26s
1701
opper_sql_sample_24
100
23s
1750
opper_sql_sample_25
100
41s
1635
opper_sql_sample_26
0
41s
2818
opper_sql_sample_27
75
26s
1518
opper_sql_sample_28
100
23s
1781
opper_sql_sample_29
100
26s
1351
opper_sql_sample_30