Balanced Trees.
Balanced trees are traditionally used for indexing linearly ordered data, - a task also known as the problem of maintaining dynamic dictionaries. A number of balance conditions have been developed over time.
The first example of a balanced tree, an AVL-tree, was introduced in 1962 by G.M.Adelson-Velskii and E.M.Landis. In 1970 J.Hopcroft developed 2-3-trees. A generalization of the latter, called B-trees, was proposed by R.Bayer and E.McCreight in 1971. A number of variants of the classical B-trees have been examined in literature by now. B*-trees, B+-trees, symmetric B-trees, and (a,b)-trees are among them.
The history of balanced trees.
- 1962 AVL-tree G.M.Adelson-Velskii and E.M.Landis
- 1970 2-3-tree J.Hopcroft
- 1971 B-tree R.Bayer, E.McCreight
- 1971 red-black-tree R.Bayer
- B*-tree, B+tree, (a,b)-tree
- 1992 S(1)-tree utilization 1/2 - ε
- 1994 S(2)-tree utilization 2/3 - ε
- 1995 S(b)-tree utilization 1 - ε
Various balance conditions used in balanced trees are intended to provide fast logarithmic time access to the data.
Utilization characterizes the space overhead of storing data in a particular type of balanced trees. Utilization is defined as the ratio of the size of the original data to the space used to represent the data set as a balanced tree.
The main disadvantage of B-trees is that they provide poor utilization, when used to store variable length keys, e.g. strings. If the average length of keys stored in a B-tree is substantially smaller than the key length maximum, then it is not possible to guarantee any lower bound for utilization of B-trees greater than 0, even though that the number of elements in the tree is always more than a half of its capacity.
S(b)-tree is a generalization of B-trees.
A new type of balanced trees, called S(b)-trees, was developed in order to efficiently maintain variable length keys. Contrary to B-trees, S(b)-trees provide optimal utilization of keys of variable length, while the data access time remains logarithmic, the same as for B-trees. The difference with B-trees is that a node capacity for S(b)-trees is evaluated by the total length of keys stored in the node, rather than by their number as in B-trees. The balance condition for S(b)-trees is based on their local incompressibility. That is, any sequence consisting of b+1 neighboring nodes of the tree cannot be compressed into b well-formed nodes.
This is the summary of results proven for S(b)-trees:- S(b)-tree is a generalization of B-trees: a B-tree is a special case of a S(0)-tree.
- The lower bound of utilization for S(b)-trees is 1 - ε, where ε is inversely proportional to the tree branching.
- Search, insertion, and deletion of a key in a n-node S(b)-tree can be performed in time O(log n).
- S(b)-trees form a shrinking hierarchy by parameter b. That is if b < b' then the class of S(b')-trees is strictly contained in the class of S(b)-trees.
See my old site, which describes an implementation of a S(b)-tree library.