What Will Replace SQL?

Nothing. Ok, I should probably write a bit more than that.

Nothing replaces SQL outright. Instead, three complementary layers are emerging around SQL. It will likely be with us forever, but it’s interesting to understand what alternatives exist and why.

  1. Domain-specific query languages (Malloy, PRQL, SQL++) that compile to or extend SQL;
  2. Polyglot graph standards (ISO GQL) and SQL/PGQ that add graph semantics;
  3. Interchange/semantic layers (Substrait, dbt Semantic Layer/MetricFlow, DataFrame APIs) that standardize plans and metrics across engines.
    Together, these make SQL less of a single universal language and more of a stable execution substrate under higher-level contracts, open table formats, and semantic models. (Peter Eisentraut, gqlstandards.org, Wikipédia, substrait.io, dbt Labs)

SQL Isn’t Dead – It’s Evolving

  • SQL:2023 (ISO/IEC 9075:2023) introduced Part 16 – Property Graph Queries (SQL/PGQ), plus major JSON enhancements. This lets RDBMSs express graph patterns and JSON natively, reducing the incentive to flee SQL for these use cases. (Wikipédia, blogs.oracle.com)
  • Vendors (PostgreSQL, Exasol, etc.) are tracking SQL:2023 features, underscoring ongoing investment rather than retreat. (PostgreSQL, docs.exasol.com)

Implication: SQL is expanding to cover more modalities (graphs, JSON). The “replacement” narrative misunderstands the standard’s trajectory. (Wikipédia)

Domain-Specific Languages That Compile to SQL

Malloy (semantic modeling + query)

Malloy is an open-source language that describes data relationships and metrics once, then generates efficient SQL for multiple backends (BigQuery, Snowflake, Postgres, Trino, DuckDB). It aims to reduce verbose, error-prone SQL by encoding business semantics and reusable patterns. (GitHub, Malloy Data). Here is a simple example using the Malloy Python library.

%%malloy_model flights

source: airports is duckdb.table('../data/airports.parquet') extend {
  primary_key: code
  measure: airport_count is count()
}

source: flights is duckdb.table('../data/flights.parquet') extend {
  rename:
    origin_code is origin
    destination_code is destination

  join_one: origin is airports with origin_code
  join_one: destination is airports with destination_code

  measure:
    flight_count is count()
    destination_count is destination.count()
    origin_count is origin.count()
}

PRQL (Pipelined Relational Transformations)

PRQL is a modern, composable, pipeline-style relational language that compiles to SQL, adding variables, functions, and clearer dataflow syntax. It targets any SQL database, and has ecosystem bindings (e.g., R). (GitHub, PRQL, CRAN)

Why it matters: PRQL improves authoring ergonomics while preserving SQL compatibility — a likely pattern for many teams that want nicer syntax, not a new engine. (GitHub)

Here is a simple SQL example:

SELECT name, salary, hire_date
FROM employees
WHERE department = 'Engineering'
  AND salary > 80000
ORDER BY salary
LIMIT 10;

And the same code written in PRQL

from employees
filter department == "Engineering"
filter salary > 80000
select {name, salary, hire_date}
sort salary
take 10

SQL++ / N1QL (Semi-Structured JSON first)

SQL++ generalizes SQL for nested JSON, with Couchbase’s N1QL a commercial implementation; it brings path navigation, arrays, and heterogeneity into a SQL-like grammar. Academic lineage comes via AsterixDB. (docs.couchbase.com, asterixdb.apache.org, Couchbase, odbms.org)

Bottom line: These languages don’t kill SQL; they augment or transpile to it so you keep portability and optimizer maturity.

There is a simple SQL example:

SELECT p.name, p.age, a.city, COUNT(h.hobby_id) as hobby_count
FROM people p
JOIN addresses a ON p.id = a.person_id
JOIN person_hobbies ph ON p.id = ph.person_id
JOIN hobbies h ON ph.hobby_id = h.id
WHERE p.age > 25
  AND a.country = 'USA'
  AND h.name = 'coding'
GROUP BY p.id, p.name, p.age, a.city
ORDER BY p.age DESC
LIMIT 10;

And the same code written in SQL++:

SELECT p.name, p.age, p.address.city, ARRAY_LENGTH(p.hobbies) as hobby_count
FROM people p
WHERE p.age > 25 
  AND p.address.country = "USA"
  AND ANY hobby IN p.hobbies SATISFIES hobby = "coding" END
ORDER BY p.age DESC
LIMIT 10;

Remember that SQL++ is designed for NoSQL databases with nested JSON as shown below. Meaning that the joins necessary in SQL do not apply:

{
  "name": "Alice",
  "age": 30,
  "address": {
    "city": "San Francisco",
    "country": "USA"
  },
  "hobbies": ["coding", "hiking", "photography"]
}

Graph Workloads: ISO GQL and SQL/PGQ Converge

  • GQL (ISO/IEC 39075:2024) is the first ISO database language standard beyond SQL, targeting property graphs. It standardizes creation, querying, and maintenance of graphs across native graph DBs. (iso.org, gqlstandards.org, jtc1info.org)
  • SQL/PGQ (part of SQL:2023) lets you write graph pattern queries inside SQL over tabular data. Think: RDBMS graph views without leaving SQL. (Peter Eisentraut)

Take: rather than a replacement, we now have two interoperable standards (SQL/PGQ for relational, GQL for native graph). Expect dual-stack systems and cross-compilers over time. (Wikipédia)

Here is a simple SQL example:

SELECT p.name, p.age, c.name AS company, COUNT(ps.skill_id) AS skill_count
FROM people p
JOIN employment e ON p.id = e.person_id
JOIN companies c ON e.company_id = c.id
JOIN person_skills ps ON p.id = ps.person_id
JOIN skills s ON ps.skill_id = s.id
WHERE p.age > 25
  AND c.country = 'USA'
  AND EXISTS (
    SELECT 1 FROM person_skills ps2
    JOIN skills s2 ON ps2.skill_id = s2.id
    WHERE ps2.person_id = p.id AND s2.name = 'coding'
  )
GROUP BY p.id, p.name, p.age, c.name
ORDER BY p.age DESC
LIMIT 10;

And the same code wrotten in ISO GQL:

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WHERE p.age > 25 
  AND c.country = "USA"
  AND (p)-[:HAS_SKILL]->(:Skill {name: "coding"})
RETURN p.name, p.age, c.name AS company, 
       COUNT((p)-[:HAS_SKILL]->()) AS skill_count
ORDER BY p.age DESC
LIMIT 10

Cross-Engine Interop via Substrait

Substrait defines a cross-language, unambiguous representation for relational compute (logical/physical plans). Systems can exchange plans independent of a specific query language string, enabling consistent optimization and portability across engines. (substrait.io, Medium)

Why it matters: If the plan is portable, “universal language” pressure weakens. You can author in SQL, PRQL, Malloy, or a UI and still ship a common plan to Spark, DuckDB, DataFusion, etc. (substrait.io)

The Rise of Semantic Layers

dbt Semantic Layer (MetricFlow) formalizes business metrics (definitions, grains, time bindings) and compiles queries dynamically for the underlying engine. This makes metrics the interface rather than raw SQL – a different kind of “universal language” that business tools can call consistently. (docs.getdbt.com, dbt Labs, GitHub)

Why it matters: When consumers ask for revenue_7d instead of writing joins, the “universal language” becomes a metrics API that happens to emit SQL under the hood. (dbt Labs)

semantic_models:
  - name: order_item
    defaults:
      agg_time_dimension: ordered_at
    description: |
      Items contained in each order. The grain of the table is one row per order item.
    model: ref('order_items')

For more information on dbt (Data Built Tool), please see the blog post below.


DataFrames as the Developer

Python/R DataFrames (Ibis, Pandas) are effectively a lingua franca for developers. Engines like DuckDB run SQL directly on DataFrames or provide relational APIs, while Ibis offers a portable DataFrame API that targets multiple SQL engines. (DuckDB, GitHub)

Net effect: The surface language may be Pythonic, but execution often compiles to SQL or a relational plan — reinforcing SQL as the substrate. (DuckDB)

So What Could Replace SQL?

  • Nothing single. The industry is standardizing on a stack:
    • Authoring layer: Malloy / PRQL / SQL++ / DataFrames / (NL→SQL in tools)
    • Semantic layer: metrics & governance (dbt SL/MetricFlow, catalogs)
    • Plan layer: Substrait or engine-native IRs
    • Execution layer: SQL engines (Snowflake, BigQuery, DuckDB, Postgres, Trino), plus graph via SQL/PGQ and native GQL.
  • You get human-friendly authoring, consistent business semantics, portable plans, and battle-tested optimizers, without a risky big-bang language migration.

Decision Guide (2025)

  • Heavy BI / governed metrics? Put a semantic layer in front of your warehouse; keep SQL under the hood. (dbt SL/MetricFlow) (dbt Labs)
  • Analytical ergonomics with reuse? Try Malloy for semantic modeling + cross-engine SQL generation. (GitHub)
  • Cleaner transformation syntax? PRQL if you want pipelines that compile to native SQL. (GitHub)
  • JSON-first apps? SQL++/N1QL for nested JSON workloads. (docs.couchbase.com)
  • Graph-centric problems? Use SQL/PGQ inside your RDBMS or go native with ISO GQL depending on scale & features. (Peter Eisentraut, iso.org)
  • Portability across engines? Invest in Substrait so tools can exchange query plans cleanly. (substrait.io)
  • Developer experience? DataFrames (Ibis) for code-first teams; route execution to your chosen engine. (GitHub)

Summary

Discover more from Data Lingua. Where Data Engineering Meets Agentic Business Strategy

Subscribe now to keep reading and get access to the full archive.

Continue reading