SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces where users need to query data using natural language. Models must understand both the intent behind the question and the structure of the underlying database schema.

95
openai/gpt-5
Average duration
27s
Average tokens
1780
Average cost
$0.00
100
28s
998
opper_sql_sample_01
100
13s
1079
opper_sql_sample_02
100
28s
941
opper_sql_sample_03
100
18s
1185
opper_sql_sample_04
100
13s
1226
opper_sql_sample_05
75
18s
1449
opper_sql_sample_06
100
1m 17s
991
opper_sql_sample_07
100
28s
1256
opper_sql_sample_08
100
18s
954
opper_sql_sample_09
100
18s
1116
opper_sql_sample_10
100
18s
1187
opper_sql_sample_11
100
28s
1172
opper_sql_sample_12
100
28s
1588
opper_sql_sample_13
100
28s
1328
opper_sql_sample_14
100
23s
1587
opper_sql_sample_15
100
28s
1983
opper_sql_sample_16
100
23s
1825
opper_sql_sample_17
100
23s
2864
opper_sql_sample_18
100
28s
2243
opper_sql_sample_19
100
35s
2294
opper_sql_sample_20
100
23s
1667
opper_sql_sample_21
75
28s
1936
opper_sql_sample_22
100
23s
2221
opper_sql_sample_23
75
23s
2395
opper_sql_sample_24
75
39s
3479
opper_sql_sample_25
100
28s
2407
opper_sql_sample_26
100
23s
2465
opper_sql_sample_27
75
23s
1801
opper_sql_sample_28
100
49s
3503
opper_sql_sample_29
75
31s
2245
opper_sql_sample_30