Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

If you’ve ever queried a database with SQL (Structured Query Language) then you’ve been standing on the mathematical shoulders of Georg Cantor. His 19th-century work on set theory laid the conceptual foundation that later revolutionised how we store and retrieve data. This is the unlikely but fascinating link between infinite sets and your company’s customer records.
SQL is the standard for managing and querying relational databases, which were formalized by Edgar F. Codd in 1970 based on relational algebra. The relational model treats data as relations (via tables in the physical sense or entities in the logical) that can be manipulated using algebraic operations, many of which directly correspond to set theory constructs.

In the late 1800s, Georg Cantor formalised the mathematical discipline of set theory. Set theory deals with collections of distinct objects called sets, and operations on them such as union, intersection, and difference. It forms the basis of much of modern mathematics and has practical applications in computer science, particularly in database systems.
What Cantor discovered was that the types elements in set led to different definitions of infinite sets. The infinite set of all integers {1,2,3…n} is smaller that the infinite set of all real numbers.
He was initially ridiculed for his work and suffered a number of mental health struggles as a result.
… I don’t know when I shall return to the continuation of my scientific work. At the moment I can do absolutely nothing with it, and limit myself to the most necessary duty of my lectures; how much happier I would be to be scientifically active, if only I had the necessary mental freshness.
Fast forward a century to 1970, when Edgar F. Codd, working at IBM, published A Relational Model of Data for Large Shared Data Banks.
Codd’s innovation was to recognise that:
This ability to map the inner working of relational databases to the foundations laid by Cantor was transformational. Cantor gave us the abstract grammar of relationships. Although databases store finite data (due to the fundamental laws of physics), the relational model is defined over potentially infinite sets. Infinity is something we are simply unable to grasp. But here, it was deliberate and it’s where Cantor’s influence is most visible.
Each attribute in a table comes from a particular domain, that being the set of all allowable values. For example DATE is the set of all possible dates, theoretically infinite in both directions.
Even though the stored table contains a finite subset, the definition assumes the infinite set exists. He proved that there are different cardinalities of infinity; the set of integers is infinite but countable, while the set of real numbers is uncountable. A domain in Codd’s model, e.g. all possible dates, all possible strings, is conceptually infinite, you can’t enumerate every valid date, string, or integer. Even though stored tables are finite subsets, the type system assumes potentially infinite sets of possible values. It’s subtle but important and usually involves sitting in a quiet room, thinking about it’s implications.
If domains were finite and tiny, you could just hard-code all possible values and relationships. It would take a while, encoding every possible name, but you could do it. Certain subsets of domains would actually be possible using this approach. Sets that contained currencies or the names of countries would word with finite bounds, but generally speaking mosts sets would not. The relational models power and universality comes from treating each domain as if it were drawn from a potentially infinite mathematical set. SQL works whether your table has 2 rows or 200 trillion, because operations are defined on sets, not on fixed lists. Every attribute has a domain that’s theoretically unbounded.
Cantor’s Cartesian product applies to infinite sets: When you run a CROSS JOIN in SQL, you’re materialising part of a Cartesian product that, in pure mathematics, would be infinite.
-- Table A: 1000 rows
-- Table B: 1000 rows
-- A × B = 1,000,000 rows (combinatorial explosion!)
SELECT * FROM orders o CROSS JOIN customers c;Cantor’s idea of one-to-one correspondences, or bijections, is baked into primary keys. A primary key defines a one-to-one relationship between the set of possible keys (infinite) and the set of stored rows (finite). A bijection is a one-to-one correspondence between two sets to prove they are the same size. If you can create one, the sets are the same size even if they are infinite. Cantor used bijections to prove counterintuitive facts that natural numbers, integers, and fractions are all the same size, but real numbers are a larger infinity.
Natural Numbers: 1 → 2 → 3 → 4 → 5 → ...
↓ ↓ ↓ ↓ ↓
Even Numbers: 2 → 4 → 6 → 8 → 10 → ...The guarantee of uniqueness is a finite manifestation of Cantor’s bijection principle. You still with me?
To most people it won’t. Most developers will not have heard of Geroge Cantor and won’t want to explore the inner workings of a centuries old paper (a paper that needs a deep understanding in some fundamental mathematical principles…and the ability to read German). But without Cantor’s abstraction, the relational model would be tied to the physical storage limits of the time, making it obsolete as soon as storage grew. Infinity allows the model to be defined once and applied to any dataset, past or future, without redefinition. SQL constantly battles infinity. Joins create explosive combinations, recursive queries can loop forever, and streaming data is truly infinite. Cantor’s set theory gave us the mathematical foundation to understand and control these infinities. Every LIMIT, JOIN, and INDEX is set theory in action!
Cantor, G. (1895). Beiträge zur Begründung der transfiniten Mengenlehre.
Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, 13(6), 377–387.
You must be logged in to post a comment.