Lecture 35 - Binary search trees

Logistics

  • HW6 has been graded, corrections due in a week (Dec 8)
  • HW7 due Friday Dec 5 (no corrections)
  • Discussion tomorrow will go over data structures so far

Learning objectives

By the end of today, you should be able to:

  • Explain what makes a binary search tree (BST) special
  • Analyze time complexity of BST operations
  • Trace BST operations (search, insert, delete)
  • Recognize when BSTs are useful vs other data structures
  • Understand the impact of balance on BST performance

Quick review: Tree basics

         A         Root (top node, no parent)
        / \
       B   C       Children of A, Parents of D/E/F
      /   / \
     D   E   F    Leaves (no children)

Key terms:

  • Root, parent, child, leaf
  • Height: longest path from root to leaf
  • Depth: distance from root to a node
  • Binary tree: at most 2 children per node

Last lecture (L34): Binary heaps used complete binary trees for priority queues

Today: Binary Search Trees - use trees for fast search

Types of binary trees

Full binary tree: Every node has 0 or 2 children (no nodes with 1 child)

       5
      / \
     3   8
    / \
   1   4

Complete binary tree: All levels filled except possibly last, which fills left to right

       5
      / \
     3   8     
    / 
   1  

Perfect binary tree: All internal nodes have 2 children, all leaves at same depth

       5
      / \
     3   8
    / \ / \
   1  4 7  9

The BST (binary search tree) property

Binary Search Tree: A binary tree with a special ordering property:

For every node:

  • All values in left subtree are < node value
  • All values in right subtree are > node value

Example BST:

         8
        / \
       3   10
      / \    \
     1   6    14
        / \   /
       4   7 13

Check:
- 8: left subtree (3,1,6,4,7) all < 8, right subtree (10,14,13) all > 8 
- 3: left subtree (1) < 3, right subtree (6,4,7) > 3 
- And so on...

Not a BST:

         8
        / \
       3   10
      / \
     1   12      b/c 12 > 8, shouldn't be in left subtree!

Why is this useful?

The BST property enables binary search!

Search for 6:

         8          Compare with 8: 6 < 8, go left
        / \
       3   10       Compare with 3: 6 > 3, go right
      / \    \
     1   6    14    Compare with 6: found it!
        / \   /
       4   7 13

Time complexity: O(height) - at most one comparison per level

If tree is balanced: height = O(log n), so search is O(log n)!

Key difference: BST vs Binary Heap representation

Binary Heap (L34): Complete binary tree

       42
      /  \
    30    25
   / \    /
  10 20  15

Array: [42, 30, 25, 10, 20, 15]
       Easy arithmetic to find parent/children!

BST: NOT necessarily complete - can have gaps

       8
      / \
     3   10
    /     \
   1       14
          /
         13

Can't use an array! Need pointers/references

Why this matters:

  • Heap: Store in Vec, use index arithmetic, great cache locality
  • BST: Need struct with pointers to left/right children, recursive structure
#![allow(unused)]
fn main() {
// BST node representation (conceptual)
struct Node {
    value: i32,
    left: Option<Box<Node>>,   // Pointer to left child
    right: Option<Box<Node>>,  // Pointer to right child
}
}

BST operations are naturally recursive (traverse left or right subtree)

BST vs sorted array

Search in sorted array: Binary search is also O(log n)

So why use BST?

OperationSorted ArrayBalanced BST
SearchO(log n)O(log n)
InsertO(n) - must shift elementsO(log n)
DeleteO(n) - must shift elementsO(log n)
Find min/maxO(1)O(log n) - but still fast!

BST wins when you need frequent insertions/deletions!

Think/Pair/Share: Is this a BST?

Tree 1:        Tree 2:       Tree 3:
    5              5             5
   / \            / \           / \
  3   7          2   8         3   7
 / \            / \           /     \
1   4          1   3         4       6

Algorithm:

  1. Start at root
  2. If value equals current node, found it!
  3. If value < current node, search left subtree
  4. If value > current node, search right subtree
  5. If reach Empty, value not in tree

Example: Search for 6 in BST

         8          6 < 8, go left
        / \
       3   10       6 > 3, go right
      / \    \
     1   6    14    6 == 6, found!
        / \   /
       4   7 13

Time complexity: O(height) = O(log n) for balanced tree

Operation 2: Insert

Algorithm:

  1. If tree is empty, create new node
  2. If value < current node, insert into left subtree
  3. If value > current node, insert into right subtree
  4. (If value equals current, either skip or allow duplicates)

Example: Insert 5 into BST

Original:              After insert 5:
         8                     8
        / \                   / \
       3   10                3   10
      / \    \              / \    \
     1   6    14           1   6    14
        / \   /               / \   /
       4   7 13              4   7 13
                              \  
                               5

Steps:
1. 5 < 8, go left
2. 5 > 3, go right
3. 5 < 6, go left
4. 5 > 4, go right
5. Right of 4 is empty, insert 5 there!

Time complexity: O(height) = O(log n) for balanced tree

Operation 3: Find min/max

Finding minimum: Keep going left until you can't

         8
        / \
       3   10       min is leftmost node
      / \    \
     1   6    14    min is 1!
        / \   /
       4   7 13

Finding maximum: Keep going right until you can't

Max is rightmost node = 14

Time complexity: O(height)

Operation 4: Delete (the tricky one!)

Three cases:

Case 1: Node has no children

  • Just remove it!
Delete 13:
         8                  8
        / \                / \
       3   10             3   10
      /   /  \           /   /  \
     1   6   14    ->   1   6    14
             /                    /
            13                (removed)

Case 2: Node has one child

  • Replace node with its child (with anything below it)
Delete 10:
         7                     7
        / \                   / \
       3   10                3   9
          /       ->            /
         9                     8
        /                      
       8                      

Case 3: Node has two children (hard!)

  • Find in-order successor (smallest value in right subtree)
    • ie go right, then left, left, left...
  • Replace node's value with successor's value
  • Delete successor from right subtree
Delete 3:
         8                     8
        / \                   / \
       3   10                4   10   
      / \    \     ->       / \    \
     1   6    14           1   6    14
        / \   /               / \   /
       4   7 13           (rem.) 7 13  

Why? This ensures the BST property is maintained!

Time complexity: O(height) = O(log n) for balanced tree

Think/Pair/Share: Trace a deletion

Delete 8 from this BST:

         8
        / \
       3   10
      / \    \
     1   6    14
        / \   /
       4   7 13

BST performance and balance

Best case (balanced):

       4
      / \
     2   6          Height = 2
    / \ / \         O(log n) operations
   1  3 5  7

Worst case (degenerate - like a linked list!):

   1
    \
     2              Height = 6
      \             O(n) operations
       3
        \
         4
          \
           5
            \
             6
              \
               7

How does this happen? Insert sorted data: 1, 2, 3, 4, 5, 6, 7

Impact on performance

Tree TypeHeightSearchInsertDelete
BalancedO(log n)O(log n)O(log n)O(log n)
DegenerateO(n)O(n)O(n)O(n)

With 1000 nodes:

  • Balanced: ~10 operations
  • Degenerate: ~1000 operations

Solution: Self-balancing trees

Problem: Ordinary BST can become unbalanced

Solutions (advanced topics, FYI):

  • AVL trees: Maintain strict balance (height difference ≤ 1)
  • Red-Black trees: Relax balance slightly for faster insertions
  • B-trees: Nodes with many children, used in databases

Rust's BTreeMap and BTreeSet: Use B-trees for guaranteed O(log n) operations

For now: Understand that balance matters, real-world implementations maintain it

When to use BST?

Good for:

  • Dynamic data (frequent insertions/deletions)
  • Need to maintain sorted order
  • Range queries (find all values between x and y)
  • Fast search, insert, delete (when balanced)

Not ideal for:

  • Mostly static data (use sorted array)
  • Need constant-time operations (use hash map)
  • Very small datasets (overhead not worth it)

Use BST when: You need both dynamic updates AND sorted order

Rust's BTree collections

BTreeSet - for unique values in sorted order:

use std::collections::BTreeSet;
fn main() {
    let mut set = BTreeSet::new();
    set.insert(5);
    set.insert(2);
    set.insert(8);
    set.insert(1);
    // Iterate in sorted order
    for val in &set {
        println!("{}", val); 
    }
    // Search
    if set.contains(&5) {
        println!("Found 5");
    }
    // Range query
    for val in set.range(2..=5) {
        println!("{}", val);
    }
}

BTreeMap - for key-value pairs in sorted order:

use std::collections::BTreeMap;
fn main() {
    let mut grades = BTreeMap::new();
    grades.insert("Charlie", 85);
    grades.insert("Alice", 92);
    grades.insert("Bob", 88);
    // Iterate in sorted key order
    for (name, grade) in &grades {
        println!("{}: {}", name, grade);
    }
    // Lookup
    if let Some(&grade) = grades.get("Alice") {
        println!("Alice's grade: {}", grade);  
    }
    // Range query by keys
    for (name, grade) in grades.range("B".."D") {
        println!("{}: {}", name, grade);  
    }
}

Guaranteed O(log n) for all operations!

Summary

Key takeaways

  • Trees: Hierarchical data structures (root, children, leaves)
  • Binary trees: At most 2 children per node
  • BST property: Left < Root < Right (enables binary search)
  • BST operations: Search, insert, delete all O(height)
  • Balance matters: Balanced = O(log n), Unbalanced = O(n)
  • Real implementations: Use self-balancing trees (BTreeMap, BTreeSet)

Tree structure comparison: All the trees we've seen

General Tree:        Binary Tree:         Complete Binary:
     A                   5                     42
   / | \                / \                   /  \
  B  C  D              3   8                30    25
 /|                   /\                   / \    /
E F                  1  4                10  20  15

Any # children      <=2 children          <=2 children + filled left-right


Binary Heap:                             BST:
    42                                    8
   /  \                                  / \
 30    25                               3   10
/ \    / \                             / \    \
10 20  15  8                          1   5    14

Complete binary and                  Binary and 
parent >= children                   left < root < right

Activity time (see below / on paper and reporting on gradescope)