Lecture 35 - Binary search trees
Logistics
- HW6 has been graded, corrections due in a week (Dec 8)
- HW7 due Friday Dec 5 (no corrections)
- Discussion tomorrow will go over data structures so far
Learning objectives
By the end of today, you should be able to:
- Explain what makes a binary search tree (BST) special
- Analyze time complexity of BST operations
- Trace BST operations (search, insert, delete)
- Recognize when BSTs are useful vs other data structures
- Understand the impact of balance on BST performance
Quick review: Tree basics
A Root (top node, no parent)
/ \
B C Children of A, Parents of D/E/F
/ / \
D E F Leaves (no children)
Key terms:
- Root, parent, child, leaf
- Height: longest path from root to leaf
- Depth: distance from root to a node
- Binary tree: at most 2 children per node
Last lecture (L34): Binary heaps used complete binary trees for priority queues
Today: Binary Search Trees - use trees for fast search
Types of binary trees
Full binary tree: Every node has 0 or 2 children (no nodes with 1 child)
5
/ \
3 8
/ \
1 4
Complete binary tree: All levels filled except possibly last, which fills left to right
5
/ \
3 8
/
1
Perfect binary tree: All internal nodes have 2 children, all leaves at same depth
5
/ \
3 8
/ \ / \
1 4 7 9
The BST (binary search tree) property
Binary Search Tree: A binary tree with a special ordering property:
For every node:
- All values in left subtree are < node value
- All values in right subtree are > node value
Example BST:
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
Check:
- 8: left subtree (3,1,6,4,7) all < 8, right subtree (10,14,13) all > 8
- 3: left subtree (1) < 3, right subtree (6,4,7) > 3
- And so on...
Not a BST:
8
/ \
3 10
/ \
1 12 b/c 12 > 8, shouldn't be in left subtree!
Why is this useful?
The BST property enables binary search!
Search for 6:
8 Compare with 8: 6 < 8, go left
/ \
3 10 Compare with 3: 6 > 3, go right
/ \ \
1 6 14 Compare with 6: found it!
/ \ /
4 7 13
Time complexity: O(height) - at most one comparison per level
If tree is balanced: height = O(log n), so search is O(log n)!
Key difference: BST vs Binary Heap representation
Binary Heap (L34): Complete binary tree
42
/ \
30 25
/ \ /
10 20 15
Array: [42, 30, 25, 10, 20, 15]
Easy arithmetic to find parent/children!
BST: NOT necessarily complete - can have gaps
8
/ \
3 10
/ \
1 14
/
13
Can't use an array! Need pointers/references
Why this matters:
- Heap: Store in
Vec, use index arithmetic, great cache locality - BST: Need struct with pointers to left/right children, recursive structure
#![allow(unused)] fn main() { // BST node representation (conceptual) struct Node { value: i32, left: Option<Box<Node>>, // Pointer to left child right: Option<Box<Node>>, // Pointer to right child } }
BST operations are naturally recursive (traverse left or right subtree)
BST vs sorted array
Search in sorted array: Binary search is also O(log n)
So why use BST?
| Operation | Sorted Array | Balanced BST |
|---|---|---|
| Search | O(log n) | O(log n) |
| Insert | O(n) - must shift elements | O(log n) |
| Delete | O(n) - must shift elements | O(log n) |
| Find min/max | O(1) | O(log n) - but still fast! |
BST wins when you need frequent insertions/deletions!
Think/Pair/Share: Is this a BST?
Tree 1: Tree 2: Tree 3:
5 5 5
/ \ / \ / \
3 7 2 8 3 7
/ \ / \ / \
1 4 1 3 4 6
BST Operation 1: Search
Algorithm:
- Start at root
- If value equals current node, found it!
- If value < current node, search left subtree
- If value > current node, search right subtree
- If reach Empty, value not in tree
Example: Search for 6 in BST
8 6 < 8, go left
/ \
3 10 6 > 3, go right
/ \ \
1 6 14 6 == 6, found!
/ \ /
4 7 13
Time complexity: O(height) = O(log n) for balanced tree
Operation 2: Insert
Algorithm:
- If tree is empty, create new node
- If value < current node, insert into left subtree
- If value > current node, insert into right subtree
- (If value equals current, either skip or allow duplicates)
Example: Insert 5 into BST
Original: After insert 5:
8 8
/ \ / \
3 10 3 10
/ \ \ / \ \
1 6 14 1 6 14
/ \ / / \ /
4 7 13 4 7 13
\
5
Steps:
1. 5 < 8, go left
2. 5 > 3, go right
3. 5 < 6, go left
4. 5 > 4, go right
5. Right of 4 is empty, insert 5 there!
Time complexity: O(height) = O(log n) for balanced tree
Operation 3: Find min/max
Finding minimum: Keep going left until you can't
8
/ \
3 10 min is leftmost node
/ \ \
1 6 14 min is 1!
/ \ /
4 7 13
Finding maximum: Keep going right until you can't
Max is rightmost node = 14
Time complexity: O(height)
Operation 4: Delete (the tricky one!)
Three cases:
Case 1: Node has no children
- Just remove it!
Delete 13:
8 8
/ \ / \
3 10 3 10
/ / \ / / \
1 6 14 -> 1 6 14
/ /
13 (removed)
Case 2: Node has one child
- Replace node with its child (with anything below it)
Delete 10:
7 7
/ \ / \
3 10 3 9
/ -> /
9 8
/
8
Case 3: Node has two children (hard!)
- Find in-order successor (smallest value in right subtree)
- ie go right, then left, left, left...
- Replace node's value with successor's value
- Delete successor from right subtree
Delete 3:
8 8
/ \ / \
3 10 4 10
/ \ \ -> / \ \
1 6 14 1 6 14
/ \ / / \ /
4 7 13 (rem.) 7 13
Why? This ensures the BST property is maintained!
Time complexity: O(height) = O(log n) for balanced tree
Think/Pair/Share: Trace a deletion
Delete 8 from this BST:
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
BST performance and balance
Best case (balanced):
4
/ \
2 6 Height = 2
/ \ / \ O(log n) operations
1 3 5 7
Worst case (degenerate - like a linked list!):
1
\
2 Height = 6
\ O(n) operations
3
\
4
\
5
\
6
\
7
How does this happen? Insert sorted data: 1, 2, 3, 4, 5, 6, 7
Impact on performance
| Tree Type | Height | Search | Insert | Delete |
|---|---|---|---|---|
| Balanced | O(log n) | O(log n) | O(log n) | O(log n) |
| Degenerate | O(n) | O(n) | O(n) | O(n) |
With 1000 nodes:
- Balanced: ~10 operations
- Degenerate: ~1000 operations
Solution: Self-balancing trees
Problem: Ordinary BST can become unbalanced
Solutions (advanced topics, FYI):
- AVL trees: Maintain strict balance (height difference ≤ 1)
- Red-Black trees: Relax balance slightly for faster insertions
- B-trees: Nodes with many children, used in databases
Rust's BTreeMap and BTreeSet: Use B-trees for guaranteed O(log n) operations
For now: Understand that balance matters, real-world implementations maintain it
When to use BST?
Good for:
- Dynamic data (frequent insertions/deletions)
- Need to maintain sorted order
- Range queries (find all values between x and y)
- Fast search, insert, delete (when balanced)
Not ideal for:
- Mostly static data (use sorted array)
- Need constant-time operations (use hash map)
- Very small datasets (overhead not worth it)
Use BST when: You need both dynamic updates AND sorted order
Rust's BTree collections
BTreeSet - for unique values in sorted order:
use std::collections::BTreeSet; fn main() { let mut set = BTreeSet::new(); set.insert(5); set.insert(2); set.insert(8); set.insert(1); // Iterate in sorted order for val in &set { println!("{}", val); } // Search if set.contains(&5) { println!("Found 5"); } // Range query for val in set.range(2..=5) { println!("{}", val); } }
BTreeMap - for key-value pairs in sorted order:
use std::collections::BTreeMap; fn main() { let mut grades = BTreeMap::new(); grades.insert("Charlie", 85); grades.insert("Alice", 92); grades.insert("Bob", 88); // Iterate in sorted key order for (name, grade) in &grades { println!("{}: {}", name, grade); } // Lookup if let Some(&grade) = grades.get("Alice") { println!("Alice's grade: {}", grade); } // Range query by keys for (name, grade) in grades.range("B".."D") { println!("{}: {}", name, grade); } }
Guaranteed O(log n) for all operations!
Summary
Key takeaways
- Trees: Hierarchical data structures (root, children, leaves)
- Binary trees: At most 2 children per node
- BST property: Left < Root < Right (enables binary search)
- BST operations: Search, insert, delete all O(height)
- Balance matters: Balanced = O(log n), Unbalanced = O(n)
- Real implementations: Use self-balancing trees (BTreeMap, BTreeSet)
Tree structure comparison: All the trees we've seen
General Tree: Binary Tree: Complete Binary:
A 5 42
/ | \ / \ / \
B C D 3 8 30 25
/| /\ / \ /
E F 1 4 10 20 15
Any # children <=2 children <=2 children + filled left-right
Binary Heap: BST:
42 8
/ \ / \
30 25 3 10
/ \ / \ / \ \
10 20 15 8 1 5 14
Complete binary and Binary and
parent >= children left < root < right