SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces where users need to query data using natural language. Models must understand both the intent behind the question and the structure of the underlying database schema.

94
anthropic/claude-opus-4
Average duration
17s
Average tokens
1107
Average cost
$0.00
100
10s
734
opper_sql_sample_01
100
15s
747
opper_sql_sample_02
100
20s
792
opper_sql_sample_03
100
12s
783
opper_sql_sample_04
100
20s
837
opper_sql_sample_05
100
17s
867
opper_sql_sample_06
100
9s
749
opper_sql_sample_07
75
15s
749
opper_sql_sample_08
100
10s
765
opper_sql_sample_09
100
19s
823
opper_sql_sample_10
100
15s
1053
opper_sql_sample_11
100
15s
1067
opper_sql_sample_12
100
15s
1087
opper_sql_sample_13
100
17s
1144
opper_sql_sample_14
75
19s
1142
opper_sql_sample_15
100
19s
1209
opper_sql_sample_16
100
16s
1149
opper_sql_sample_17
75
27s
1273
opper_sql_sample_18
50
15s
1122
opper_sql_sample_19
100
22s
1243
opper_sql_sample_20
100
11s
1290
opper_sql_sample_21
100
22s
1371
opper_sql_sample_22
100
22s
1420
opper_sql_sample_23
75
22s
1383
opper_sql_sample_24
75
16s
1417
opper_sql_sample_25
100
20s
1415
opper_sql_sample_26
100
19s
1350
opper_sql_sample_27
100
19s
1469
opper_sql_sample_28
100
23s
1458
opper_sql_sample_29
100
18s
1293
opper_sql_sample_30