Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

When storing and retrieving large volumes of ordered data, the goal is to keep search, insert, and delete operations efficient while staying balanced. B‑trees (Balanced Trees) do exactly this: maintain sorted keys, keep all leaves at the same depth, and deliver search, insert, and delete. Rudolf Bayer and Edward M. McCreight coined the term B-tree data structure at the Boeing Research Labs in 1971 in the paper Organization and Maintenance of Large Ordered Indices.
Here start with an example of a balanced binary search tree (BST). A BST is a very basic structure that obeys a small number of rules:

The time complexity for a balanced BST is Θ(log(n)), where n is the number of individul nodes. This degrades to the following example, which is still a valid BST, but here we have the worst case time complexity, giving Θ(n). This is an example of an unbalanced BST.

To prevent this issue we now look at the B-Tree. You can think of B-Tress as self-balancing trees that avoids the issue observed with an unbalanced BST. One initial change to the BST is that nodes are now allowed to have more than two children; this condition alone controls the height of the resultant tree. In data storage terms, this means less physical blocks of data in which to store the information. Accessing a large number of blocks of data results in condierable latency overhead.
Formally, the definition of a B-Tree is as follows:
- Every node has at most m children.
- Every node, except for the root and the leaves, has at least ⌈m/2⌉ children.
- The root node has at least two children unless it is a leaf.
- All leaves appear on the same level.
- A non-leaf node with k children contains k−1 keys.
Each of the internal nodes separates values that divide its subtrees. In this example, if an internal node has 3 child nodes (or subtrees), then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2. (this example has a specific order – namely the branching factor, the maximum number of children a node may have of 3).

Θ(log(n))Θ(log(n))Θ(log(n))Θ(n)Θ(n)B‑trees deliver the trifecta of fast point lookups, efficient range scans, and stable update performance at scale. The combination of high fan‑out, balanced height, and sorted, linked leaves is why they remain the default general‑purpose index structure decades on.
You must be logged in to post a comment.