SQL

Natural language to SQL query generation evaluates text-to-query fidelity and schema reasoning. This task is particularly relevant for analytics chat assistants and simplified database interfaces where users need to query data using natural language. Models must understand both the intent behind the question and the structure of the underlying database schema.

96
xai/grok-4
Average duration
29s
Average tokens
868
Average cost
$0.00
100
14s
598
opper_sql_sample_01
100
18s
614
opper_sql_sample_02
100
18s
622
opper_sql_sample_03
100
18s
611
opper_sql_sample_04
100
17s
654
opper_sql_sample_05
100
22s
690
opper_sql_sample_06
100
18s
614
opper_sql_sample_07
100
18s
607
opper_sql_sample_08
100
18s
605
opper_sql_sample_09
100
17s
661
opper_sql_sample_10
100
18s
818
opper_sql_sample_11
100
33s
838
opper_sql_sample_12
100
22s
862
opper_sql_sample_13
100
17s
872
opper_sql_sample_14
75
35s
884
opper_sql_sample_15
100
46s
912
opper_sql_sample_16
100
16s
888
opper_sql_sample_17
100
33s
1004
opper_sql_sample_18
50
1m 30s
884
opper_sql_sample_19
100
25s
1024
opper_sql_sample_20
100
12s
994
opper_sql_sample_21
100
32s
1050
opper_sql_sample_22
100
28s
1113
opper_sql_sample_23
75
43s
1056
opper_sql_sample_24
100
1m 22s
1157
opper_sql_sample_25
100
29s
1090
opper_sql_sample_26
100
35s
1033
opper_sql_sample_27
75
19s
1148
opper_sql_sample_28
100
1m 11s
1133
opper_sql_sample_29
100
15s
1003
opper_sql_sample_30