In hash table, find the minimal value or successor of a value takes \(O(n)\) time. Can we do better?
Binary Search Tree
A data structure: For each node, all values in the left subtree are less than the value of this node; all values in the right subtree are greater than the value of this node.
Each node: value (or key-value pair), pointer to the left subtree, pointer to the right subtree.
Operations
Find(x): Start from the root, do
- If
x<cur,Find(x.left) - If
x>cur,Find(x.right) - If
x==cur, returncur. (If reach the leaf but not equal, returnNot Found)
Find the minimal value: Always go left. The leftmost leaf.
Find the maximal value: Always go right, The rightmost leaf.
Find the successor of value \(x\):
- If \(x\) has the right subtree, find the minimal value in the right subtree.
- If \(x\) doesn't have the right subtree, go up and find the first ancestor on the right.
Insert a value \(x\): Similar to the find operation.
Problem of Binary Search Tree
All these operation calls for \(O(\text{height of the tree})\). In the worst cases the height is \(n\), i.e., the binary search tree becomes a linked list.
Can we insert/delete the element s.t. the height of the tree is always \(O(\log n)\)? Yes
Balanced Binary Search Tree
For example Red-black Tree.
- It is a binary search tree, nodes are either black or red.
- The root is black.
- Leaves are black and
nil. - Red nodes have \(2\) black childs.
- For every node \(v\), all path from \(v\) to leaves have the same number of black nodes.
Definition
The height of node \(v\): The maximum distance from \(v\) to a leaf.
Black height of node \(v\): The maximum number of back node on the path from \(v\) to the leaf. (Not count itself)
Proposition: \(\text{height}(v)\leq2\cdot BH(v)\).
Lower bound for the number of descendants (not count leafs) of a node as a function of its black height:
- \(1+\) size of left subtree \(v\) + size of right subtree of \(v\geq 2^{BH(v)}-1\).
Proof by induction, we get the height of the tree is \(O(\log n)\).
Operations
Find in red-black tree: \(O(\log n)\)
Insert: \(O(\log n)\) We need to keep all properties satisfied
- Do a normal binary search tree insert, set the new node red.
- Rotate and recolor. This step takes \(O(1)\) time.
Delete: Similar to insert, but consider more cases. \(O(\log n)\).
Summary
Red-black tree implements dictionary. Find, insert, delete, find minimal, find maximal, find successor all takes \(O(\log n)\) time
Red-black Tree VS. Hash Table
- Do the benchmark.
- If we need to find minimal/maximal/successor, use red-black tree.
- If the whole dictionary is a key: use red-black tree.
- Real-time request? red-black tree.
Otherwise, use hash table.
Other Balanced Tree
AVL tree, B-tree...