CPSC 223 : https://zoo.cs.yale.edu/classes/cs223/f2022/index.html
Data Structures and Programming Techniques : http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html
Balanced Tree : http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html#balancedTrees
Binary search trees are a fine idea, but they only work if they are balanced. If moving a tree to its left or right subtree reduces the size by a constant fraction. Balanced binary trees add some extra mechanism to the basic binary search tree to ensure balance.
The problem is that as we insert new nodes, some paths through the tree may become very long, so we need to be able to shrink the long paths by moving nodes elsewhere in the tree.
But how do we do this? The idea is notice that there may be binary search trees that contain the same data, and that we can transform one into another by a local modification called a rotation.
y x
/ \ <==> / \
x C A y
/ \ / \
A B B C
If A < x < B y < C, then both versions of this tree have the binary search tree property. By doing the rotation in one direction, we move A up and C down ; in the other direction, we move A down and C up. So rotations can be used to transfer dpeth from the leftmost grandchild of a node to the rightmost and vice versa.
But what if its'the middle grandchild B that's the problem? A single rotation as above doesn't move B up or down. To move b, we have to reposition it so that it's on the end of something. We do this by spliting B into two subtrees B1 and B2, and doing two rotations that split the two subtrees while moving both up. For this we need to do two rotations:
z z y
/ \ ===> / \ ===> / \
x C y C x z
/ \ / \ /| |\
A y x B2 A B1 B2 C
/ \ / \
B1 B2 A B1
Rotations in principle let us rebalance a tree, but we still need to decide when to do them. If we try to keep the tree in perfect balance(all paths nearly the same length), we'll spend so much time rotating that we won't be able to do anything else. So we need a scheme that keeps a tree balanced enough that we get paths of length O(log n) while still doing O(log n) rotations per operations.
One of the simplest balanced tree data structures is the treap, a randomized tree that combines the invariants of a binary search tree and a heap. The idea is to give each node two keys: a tree key used for searching, and a heap key, which is chosen at random and used to decide which nodes rise to the top. If we ignore the heap keys, a treap is a binary search tree: the tree keys store the contents of the tree, and the tree key in each node is greater than all keys in its left subtree and less than all keys in its right subtree. If we ignore the tree keys: a treap is a heap: each node has a heap key that is larger than all heap keys in is descendants.
It is not immediately obvious that we can preserve both invariants at once, but if we are presented with an unorganized pile of nodes already assigned tree and heap keys, we can build a treap by making the node with the large heap key the root, putting all the nodes with the smaller tree heap keys in the left subtree and all the nodes with larger tere keys in the right subtree, then organizing both subtrees recursively in the same way. This tells use that for any assignment of tree and heap keys, a treap exists (and is in fact unique!), so the only trick is to make sure that it doesn't cost too much to preserve the treap property when we insert a new element.
To insert a new node, we assign it a random heap key and insert it at a leaf using the usual method for a binary search tree. Because the new node's random key may be large, this may violate the heap property. But we can rotate it up until the heap property is restored.
For deletion, we first have to search for a node with the key we want to delete, then remove it from the tree. If the node has most one child, we can just patch it out, by changing its parent's pointer to point to the child(or to null, if there isn no child). If the node has two children, we pick bigger one and rotate it up. Repeating this proces will eventually rotate the node we are deleting down until it either has one child or is a leaf.
It's not hard to see that the cost of any of these operations is proportional to length of some path in the treap. If all nodes have random heap keys, then the root node will be equally likely to be any of the nodes. This doesn't guarantee that we get a good split, but a very bad split requires picking a root node close to one side or the other, which is unlikely. People smarter than I am have analyzed the expected height of a tree constructed in this way and show that the length of the longest path converges to (4.311…)log n in the limit (Devroye, A note on the height of binary search trees, JACM 1986; the upper bound is due to Robson, The height of binary search trees, Australian Computer Journal, 1979). This gives an O(logn) bound for the expected height in practice. However, we do have to be careful to make sure that whoever is supplying our inputs can't see what heap keys we pick, or they will be able to generate an unbalanced tree by repeatedly inserting and deleting nodes until they draw bad heap keys.
AVL trees solve the balancing problem by enforcing the invariant that the heights of the two subtrees sitting under each node differ by at most one. This does not guarantee perfect balance, but it does get close. Let S(k) be the size of the smallest AVL tree with height K. This tree will have at least one subtree of height k - 1, but its other subtree can be of height k - 2 (and should be, to keep it as small as possible).
How do we maintain this invariant? The first thing to do is add extra information to the tree, so that we can tell when the invariant has been violated. AVL trees store in each node the difference between the heights of its left and right subtrees, which will be one of -1, 0, 1. In an ideal world this would require log2(3) (which is 1.58) bits per node, but since factional bits are difficult to represent on modern computers a typical implementation uses two bits. Inserting a new node into an AVL tree involves
1. Doing a standard binary search tree insertion.
2. Updating the balance fields for every node on the insertion path.
3. Performing a single or double rotation to restore balance if needed.
Implementing this correctly is tricky. Intuitively, we can imagine a verision of an AVL tree in which we stored the height of each node. When we insert a node, only the heights of its ancestors change. Similarly, it is only these ancestors that can be too tall. It turns out that fixing the closest ancestor fixes all the ones above it (because it shortens their longest paths by one as well). So just one single or double rotation restores balance.
Deletions are also possible, but are uglier: a deletion in an AVL tree may require as many as O rotations. The basic idea is to use the standard binary search tree deletion trick of either splicing out a node if it has no right child, or replacing it with the minimum value in its right subtree(the node for which is spliced out): we then have to check to see if we need to rebalance at every node above whatever node we removed.
Which rotations we need to do to rebalance depends on how some pair of siblings are unbalanced. Below, we show the possible cases.
Zig-zig case : this can occur inserting in A or deleting in C.
y x
/ \ ===> / \
x C A y
/ \ | / \
A B # B C
|
#
Zig-zag case : This can occur after inserting in B or deleting in C. This requires a double rotation.
z z y
/ \ ===> / \ ===> / \
x C y C x z
/ \ / \ /| |\
A y x B2 A B1 B2 C
/ \ / \
B1 B2 A B1
Zig-zag case, again : This last case comes up after deletion if both nephews of the short node are too tall. The same double rotation we used in the previous case works here, too. Note that one of the subtrees is still one taller than the others, but that's OK.
z z y
/ \ ===> / \ ===> / \
x C y C x z
/ \ / \ /| |\
A y x B2 A B1 B2 C
| / \ / \ |
# B1 B2 A B1 #
|
#