Red Black Properties • A binary search tree where – Every node is either “red” or “black” • For consistency we count “null” pointers as if they were “black” nodes. This means that an empty tree (one where the root is null) satisfies the definition (it’s black). – If a node is red, then both children are black • Another reason for “null” pointers being considered black – For any node, x, in the tree. All paths from x to a leaf have exactly the same number of black nodes. Red/Black Tree has height O(log n) • Let the “black-height” of a node x be the number of black nodes that are descendants of x. • Then, for any node x, the subtree rooted at x has at least 2bh(x) -1 nodes. – Proof, by induction on the height of node x. • If the height of x is 0, then the bh is zero, and 2 0 -1 is 0. Of course, if the height of x is 0 then there are no nodes in the tree (x is an empty subtree). • If the height of x is k, and the bh of x is b (b < k), then consider each of the two children of x (either of which may be null). The black height of the child must be at least b – 1 (which happens if the child is black). By the inductive hypothesis, the subtree rooted at each child has 2 b-1 -1 nodes. If we add up the number of nodes in both children, we have 2 * (2 b-1 – 1), or 2b – 2. When we add in the node x we get 2b – 1. So the number of nodes in the subtree rooted at x is at least 2b-1 • Note that bh > h/2. So a tree with height h must have at least 2h/2-1 nodes in it. i.e., 2h/2 -1 ≤ n. • Therefore (taking the log base 2 of both sides) h < 2log2(n+1) Huh? Doesn’t that work for any tree? • That proof kinda stinks of the “let’s prove zero equals one” sort of proofs… in particular, it seems that the technique could be used to prove that all trees are balanced. • The inductive proof relies on the black height being “well defined” for any node. – The height is defined as the longest path to a leaf – The black height is the same for all paths to a leaf. • That’s why you cannot prove that any tree of height h has at least Ω(2h) nodes in it. Making Red/Black Trees • The basic idea of a Red/Black tree is to use the insert and remove operations of an ordinary binary search tree. – But this may result in violating the red/black properties • So, we’ll “fixup” the tree after each insert/remove Rotations • A “right rotate” will interchange a node with its left child. The child will become the parent, and the parent will become a child. – The parent becomes the right child of the child. • The old “right grandchild” becomes the left child of the parent • A “left rotate” will interchange a node with its right child – The parent becomes the left child of the child • The old “left grandchild” becomes the right child of the parent • Note that these operations are exact opposites (inverses) of each other. • Note also that Rotations do not affect the BST properties (although they will almost certainly affect the red/black properties). Insert • Insert the value normally, and make the new node “red” – We have not changed the black height of any node in the tree. – However, we may have created a red node with a red parent (and this is bad). • As we “fixup” we’ll always have a pointer to a red node. We’ll always know that the black height is OK, and the only problem we need to worry about is that the parent of this red node is also red. Fixup • Let c (child) be the red node we inserted • Let p (parent) be the parent of c • Let gp (grandparent) be the parent of p • Let u (uncle) be the child of gp that is not equal to p • If p->color == black, we’re done. So, assume p- >color == red. – We know, therefore, that gp->color == black. • Two interesting cases – Uncle is red (easy), or uncle is black (harder) Uncle is red • If the grandparent is black, the parent is red and the uncle is red, then – we would not change the black height by making the grandparent red and the parent (and uncle) black. – We may, however, have introduced a new problem where the grandparent is now red, and its parent is also red (the great-grandparent). • So, if the uncle is red, make it and the parent black. Make the grandparent red, and then repeat fixup where we treat the grandparent as the next “child”. Uncle is Black • Make the parent black – But this increases the number of black nodes along the path to the child. • Make the grandparent red – This fixes the problem with the path from the root to the child, but it decreases (breaks) the number of black nodes on the path from the root to the uncle • Rotate around the grandparent and parent – So that the path from the root to the uncle now passes through both the parent and the grandparent – (and the path from the root to the child no longer passes through the grandparent). Case Analysis • Coding this up requires six cases. Three cases are for when the parent is the left child of the grandparent, and three (perfectly symmetric) cases for when the parent is the right child of the grandparent. • Of the remaining three cases, “uncle is red” is one case. • Two cases are required for “uncle is black” depending on whether the path from grandparent to child is “straight” or “crooked” – If the path is “crooked” then we’ll need to rotate first around the parent and child, and then perform the rotation around the parent and grandparent. “Root is Black” Sentinel • Once we reach the root of the tree, we’re done. • If we can ensure that the root is always black, then we don’t need to worry about the special case of reaching the root in fixup. – Fixup stops whenever the next “child” is black, or when the parent is black. • It’s easy (and always correct) to simply make the root black as the last step in any insert/remove operation. Time Complexity • Fixup runs in a loop. Each iteration of the loop we do – O(1) work in case analysis – O(1) work recoloring nodes (“uncle is red” case) – O(1) work performing rotations (at worst 2 rotations) • Each iteration of the loop we either terminate (always the case after a rotation), or we set “child” equal to grandparent – i.e., each iteration of the loop uses a node with height less than the previous iteration. • Since height must decrease each iteration, we can do at most h iterations. Since h = O(log n), we do O(log n) iterations with O(1) work each iteration.