D. J. Bernstein
Data structures and program structures
Crit-bit trees
Interface to crit-bit trees
A crit-bit tree stores a prefix-free set of bit strings:
a set of 64-bit integers,
for example,
or a set of variable-length 0-terminated byte strings.
A crit-bit tree supports the following operations at high speed:
- See whether a string x is in the tree.
- Add x to the tree.
- Remove x from the tree.
- Find the lexicographically smallest string in the tree larger than x,
if there is one.
A crit-bit tree can be used as a high-speed associative array.
For example,
an array mapping 54 to 1, 19 to 2, 99 to 3, 85 to 4, and 88 to 5
can be stored as a crit-bit tree containing 54=1, 19=2, 99=3, 85=4, and 88=5.
The smallest string in the crit-bit tree larger than 85= is 85=4.
Definition of crit-bit trees
A crit-bit tree for a nonempty prefix-free set S of bit strings
has an external node for each string in S;
an internal node for each bit string x
such that x0 and x1 are prefixes of strings in S;
and ancestors defined by prefixes.
The internal node for x is compressed:
one simply stores the length of x,
identifying the position of the critical bit (crit bit)
that follows x.
This idea was introduced by Morrison in 1968 under the name PATRICIA,
and independently by Gwehenberger at about the same time.
As with other binary trees,
it's sometimes useful to augment the internal nodes to include
parent pointers, child counts, etc.
Crit-bit trees versus other data structures
Compared to a hash table,
a crit-bit tree has comparable speed and two big advantages.
The first advantage is that a crit-bit tree supports more fast operations:
finding the smallest string, for example.
The second advantage is that a crit-bit tree guarantees good performance:
it doesn't have any tricky slowdowns for unusual (or malicious) data.
Crit-bit trees are faster than
comparison-based structures such as AVL trees and B-trees.
They're also simpler, especially for variable-length strings.
Crit-bit trees have the disadvantage of not (yet!) being widely appreciated.
Very few textbooks explain them,
and very few libraries implement them.
Literature:
- Morrison 1968.
- Gwehenberger 1968.
- Sklower 1993.
- Knuth spends a few pages describing crit-bit trees.
He explains suffix searching and (in an exercise) insertion.
He doesn't mention that crit-bit trees support fast deletion
and lexicographic operations.
- Sedgewick says more but, at least in the first two editions,
doesn't get the algorithms right... I should add details here.
I'm switching to crit-bit trees in my software,
and writing a paper with complete proofs.