Databases and Set Theory

I always find joy in understanding concepts at a deep and fundamental level. This stems from a love of mathematics and the mental journeys you can embark on. Being able to understand set theory, this fundamental branch of mathematics, is one of those journeys, providing the theoretical underpinnings of relational databases. This post explores how concepts from set theory, such as sets, unions, intersections, and differences are manifested in SQL operations and the relational model.

By thinking about tables as sets of tuples and SQL commands as set operations, I can illustrate the deep links between these fields. Through examples and analysis, it’s possible to demonstrate how understanding set theory enhances SQL proficiency. This is a high-level summary of the key concepts only and not a deep exposé of George Cantors career.

Basic Sets and Tables

In set theory, a set is a collection of distinct elements. In SQL, a table is a collection of distinct rows (when considering set operations).

Set Theory:

A = \{1, 2, 3, 4, 5\}
B = \{4, 5, 6, 7, 8\}

SQL:

CREATE TABLE set_a (value INTEGER);
INSERT INTO set_a VALUES (1), (2), (3), (4), (5);

CREATE TABLE set_b (value INTEGER);
INSERT INTO set_b VALUES (4), (5), (6), (7), (8);

Union: Combining Sets

Set Theory:

The union of two sets A and B, denoted:

A \cup B

contains all elements that are in A, in B, or in both.

A \cup B = \{1, 2, 3, 4, 5, 6, 7, 8\}

Note: Pure set theory doesn’t have duplicates, but SQL’s UNION ALL operates on multisets (bags) where duplicates are allowed.

Intersection: Common Elements

Set Theory:

The intersection of sets A and B, denoted:

A \cap B

contains only elements present in both sets.

A \cap B = \{4, 5\}

SQL Intersect:

SELECT DISTINCT a.value
FROM set_a a
INNER JOIN set_b b ON a.value = b.value;
-- Result: 4, 5

Difference: Set Subtraction

Set Theory:

The difference (or relative complement):

A - B

contains elements in A but not in B.

A - B = \{1, 2, 3\}
B - A = \{6, 7, 8\}

SQL Difference:

-- left join
SELECT a.value
FROM set_a a
LEFT JOIN set_b b ON a.value = b.value
WHERE b.value IS NULL;
-- Result: 1, 2, 3

--not in
SELECT value FROM set_b
WHERE value NOT IN (SELECT value FROM set_a);
-- Result: 6, 7, 8

Cartesian Product

Set Theory:

The Cartesian product:

A \times B

creates ordered pairs with all possible combinations.

A \times B = \{(a, b) \mid a \in A, b \in B\}

SQL Cross Join:

-- Sample tables
CREATE TABLE colors (color VARCHAR(10));
INSERT INTO colors VALUES ('Red'), ('Blue');

CREATE TABLE sizes (size VARCHAR(10));
INSERT INTO sizes VALUES ('Small'), ('Large');

-- Cartesian product
SELECT c.color, s.size
FROM colors c
CROSS JOIN sizes s;
-- Result:
-- Red, Small
-- Red, Large
-- Blue, Small
-- Blue, Large

Subset and Superset

Set Theory:

A \subseteq B

means A is a subset of B (every element of A is in B).

SQL Check:

-- Check if all values in C exist in A
SELECT CASE 
    WHEN COUNT(*) = 0 THEN 'C is a subset of A'
    ELSE 'C is NOT a subset of A'
END as result
FROM (
    SELECT value FROM set_c
    EXCEPT
    SELECT value FROM set_a
) not_in_a;

-- use not exists
SELECT CASE
    WHEN NOT EXISTS (
        SELECT 1 FROM set_c c
        WHERE NOT EXISTS (
            SELECT 1 FROM set_a a WHERE a.value = c.value
        )
    ) THEN 'C is a subset of A'
    ELSE 'C is NOT a subset of A'
END as result;

Power Set

Set Theory:

The power set:

\mathcal{P}(A)

is the set of all subsets of A.

\mathcal{P}(A) = \{\emptyset, \{1\}, \{2\}, \{3\}, \{1,2\}, \{1,3\}, \{2,3\}, \{1,2,3\}\}

SQL Power Set:

WITH RECURSIVE power_set AS (
    -- Start with empty set
    SELECT ARRAY[]::INTEGER[] as subset, 0 as idx
    
    UNION ALL
    
    -- Add each element
    SELECT 
        ps.subset || a.value,
        ps.idx + 1
    FROM power_set ps
    CROSS JOIN (SELECT value, ROW_NUMBER() OVER (ORDER BY value) as rn FROM set_a) a
    WHERE a.rn > ps.idx
)
SELECT subset FROM power_set;
-- Result: all possible subsets

Complement

Set Theory:

The complement of A with respect to universal set U, denoted:

A^c
\bar{A}

contains all elements in U but not in A.$

U = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\}
A = \{1, 2, 3, 4, 5\}
A^c = \{6, 7, 8, 9, 10\}

SQL Compliment:

CREATE TABLE universal_set (value INTEGER);
INSERT INTO universal_set VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);

-- Complement of A
SELECT value FROM universal_set
EXCEPT
SELECT value FROM set_a;
-- Result: 6, 7, 8, 9, 10

Symmetric Difference

Set Theory:

The symmetric difference:

A \triangle B

contains elements in either A or B but not in both.

A \triangle B = (A - B) \cup (B - A) = (A \cup B) - (A \cap B)

SQL Exammple:

-- Method 1: Union of differences
SELECT value FROM set_a
EXCEPT
SELECT value FROM set_b
UNION
SELECT value FROM set_b
EXCEPT
SELECT value FROM set_a;
-- Result: 1, 2, 3, 6, 7, 8

-- Method 2: Union minus intersection
SELECT value FROM (
    SELECT value FROM set_a
    UNION
    SELECT value FROM set_b
) combined
EXCEPT
SELECT value FROM (
    SELECT value FROM set_a
    INTERSECT
    SELECT value FROM set_b
) common;
-- Result: 1, 2, 3, 6, 7, 8

Cardinality

Set Theory:

The cardinality:

|A|

is the number of elements in set A.

|A| = 5
|B| = 5
|A \cap B| = 1
|A \cup B| = 8

SQL Example:

-- Cardinality of A
SELECT COUNT(*) as cardinality FROM set_a;
-- Result: 5

-- Cardinality of intersection
SELECT COUNT(*) as cardinality FROM (
    SELECT value FROM set_a
    JOIN
    SELECT value FROM set_b
) intersection;
-- Result: 1

-- Cardinality of union
SELECT COUNT(*) as cardinality FROM (
    SELECT value FROM set_a
    UNION
    SELECT value FROM set_b
) union_set;
-- Result: 8

Empty Set

Set Theory:

The empty set :

\emptyset

contains no elements.

Set Theory:

-- Query that returns empty set
SELECT value FROM set_a WHERE value > 100;
-- Result: (empty)

-- Check for empty set
SELECT CASE
    WHEN COUNT(*) = 0 THEN 'Empty set'
    ELSE 'Non-empty set'
END as result
FROM (SELECT value FROM set_a WHERE value > 100) empty_check;
-- Result: 'Empty set'

De Morgan’s Laws

De Morgan’s Laws describe how negation distributes over set operations:

First Law: The complement of a union equals the intersection of complements

  • “NOT (A OR B)” is the same as “(NOT A) AND (NOT B)”

Second Law: The complement of an intersection equals the union of complements

  • “NOT (A AND B)” is the same as “(NOT A) OR (NOT B)”

In simple terms: when you negate a combined set operation, you flip both the operation (union ↔ intersection) and negate each individual set.

Set Theory:

De Morgan’s Laws state:

\overline{A \cup B} = \bar{A} \cap \bar{B}

SQL Example:

-- NOT (A OR B) = (NOT A) AND (NOT B)
-- Find values not in A and not in B
SELECT value FROM universal_set
WHERE value NOT IN (SELECT value FROM set_a)
  AND value NOT IN (SELECT value FROM set_b);

-- Equivalent to:
SELECT value FROM universal_set
EXCEPT
(
    SELECT value FROM set_a
    UNION
    SELECT value FROM set_b
);

Key Differences: Set Theory vs SQL

Duplicates

Set Theory: Sets have no duplicates by definition.

SQL: Tables can have duplicate rows (multisets/bags).

-- SQL allows duplicates in tables
INSERT INTO set_a VALUES (5), (5), (5);
-- Now set_a has multiple 5s

-- UNION removes duplicates (set semantics)
SELECT value FROM set_a UNION SELECT value FROM set_b;

-- UNION ALL keeps duplicates (bag semantics)
SELECT value FROM set_a UNION ALL SELECT value FROM set_b;

NULL Values

Set Theory: No concept of NULL or unknown.

SQL: NULL represents unknown/missing values with special logic.

-- NULL in set operations
SELECT NULL as value
UNION
SELECT NULL as value;
-- Result: One NULL (duplicates removed)

-- NULL in comparisons
SELECT * FROM set_a WHERE value = NULL;    -- Returns nothing
SELECT * FROM set_a WHERE value IS NULL;   -- Correct way

Ordering

Set Theory: Sets are unordered.

SQL: Results can be ordered with ORDER BY (returns a list, not a set).

-- Set theory: {3, 1, 2} = {1, 2, 3}
-- SQL without ORDER BY: order is undefined
SELECT value FROM set_a;  -- Order not guaranteed

-- SQL with ORDER BY: returns ordered list
SELECT value FROM set_a ORDER BY value;  -- Guaranteed order

Understanding the set theory foundations of SQL helps you:

  1. Reason about queries mathematically – Know what operations will produce
  2. Optimize queries – Choose the right set operation for your needs
  3. Debug complex queries – Break them down into set operations
  4. Design better schemas – Think in terms of sets and relationships

Remember: SQL is applied set theory. Every table is a set (or multiset), every query is a combination of set operations, and every JOIN is a form of Cartesian product with filtering. Master set theory, and you’ll master SQL.

Discover more from Where Data Engineering Meets Business Strategy

Subscribe now to keep reading and get access to the full archive.

Continue reading