Balanced Trees.

Balanced trees are traditionally used for indexing linearly ordered data, - a task also known as the problem of maintaining dynamic dictionaries. A number of balance conditions have been developed over time.

The first example of a balanced tree, an AVL-tree, was introduced in 1962 by G.M.Adelson-Velskii and E.M.Landis. In 1970 J.Hopcroft developed 2-3-trees. A generalization of the latter, called B-trees, was proposed by R.Bayer and E.McCreight in 1971. A number of variants of the classical B-trees have been examined in literature by now. B*-trees, B+-trees, symmetric B-trees, and (a,b)-trees are among them.

The history of balanced trees.

Various balance conditions used in balanced trees are intended to provide fast logarithmic time access to the data.

Utilization characterizes the space overhead of storing data in a particular type of balanced trees. Utilization is defined as the ratio of the size of the original data to the space used to represent the data set as a balanced tree.

The main disadvantage of B-trees is that they provide poor utilization, when used to store variable length keys, e.g. strings. If the average length of keys stored in a B-tree is substantially smaller than the key length maximum, then it is not possible to guarantee any lower bound for utilization of B-trees greater than 0, even though that the number of elements in the tree is always more than a half of its capacity.

S(b)-tree is a generalization of B-trees.

A new type of balanced trees, called S(b)-trees, was developed in order to efficiently maintain variable length keys. Contrary to B-trees, S(b)-trees provide optimal utilization of keys of variable length, while the data access time remains logarithmic, the same as for B-trees. The difference with B-trees is that a node capacity for S(b)-trees is evaluated by the total length of keys stored in the node, rather than by their number as in B-trees. The balance condition for S(b)-trees is based on their local incompressibility. That is, any sequence consisting of b+1 neighboring nodes of the tree cannot be compressed into b well-formed nodes.

This is the summary of results proven for S(b)-trees:

See my old site, which describes an implementation of a S(b)-tree library.