SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces where users need to query data using natural language. Models must understand both the intent behind the question and the structure of the underlying database schema.

93
openai/o3
Average duration
26s
Average tokens
1146
Average cost
$0.00
100
28s
611
opper_sql_sample_01
100
27s
753
opper_sql_sample_02
100
27s
764
opper_sql_sample_03
100
19s
823
opper_sql_sample_04
100
28s
855
opper_sql_sample_05
100
26s
887
opper_sql_sample_06
100
19s
683
opper_sql_sample_07
100
27s
685
opper_sql_sample_08
100
19s
758
opper_sql_sample_09
100
19s
935
opper_sql_sample_10
100
27s
963
opper_sql_sample_11
100
28s
969
opper_sql_sample_12
100
28s
1055
opper_sql_sample_13
100
19s
1192
opper_sql_sample_14
75
26s
948
opper_sql_sample_15
100
26s
1189
opper_sql_sample_16
100
19s
1272
opper_sql_sample_17
75
28s
1317
opper_sql_sample_18
50
26s
2159
opper_sql_sample_19
100
19s
1288
opper_sql_sample_20
100
26s
1280
opper_sql_sample_21
100
27s
1278
opper_sql_sample_22
100
28s
1540
opper_sql_sample_23
75
26s
1490
opper_sql_sample_24
100
19s
1419
opper_sql_sample_25
100
27s
1435
opper_sql_sample_26
50
28s
1255
opper_sql_sample_27
75
19s
1504
opper_sql_sample_28
100
26s
1774
opper_sql_sample_29
100
1m 3s
1310
opper_sql_sample_30