Binary search trees

Lecture

Understand binary search trees

Organize data into a binary tree
Similar to binary heaps where the parent was greater (or lesser) than either of its children
Binary Search Tree: the value of a node is greater than the values of all nodes in its left subtree and less than the value of all nodes in the right subtree.
Enables binary search for efficient lookup, addition and removal of items.
Each comparison skips half of the remaining tree.
Complexity for search, insert and delete is $O (lo g n)$ , assuming balanced trees.

Invariant at each node:
- all left descendants $\leq$ parent
- parent $\leq$ all right descendants

Compared to binary heaps:
- different ordering of elements

Basic operations: find a key $k$

How can we do this?

Descend recursively from the root until $k$ found or stuck:
- If current node key is $k$ , return.
- If $k <$ value at the current node, go left
- If $k >$ value at the current node, go right
- If finished searching, return not found

[Example: Find the key 7 in the tree above.]

Basic operations: insert a key $k$

How can we do this?

Keep descending from the root until you leave the tree
- If $k \leq$ value at the current node, go left
- If $k >$ value at the current node, go right
Create a new node containing $k$ there

[Example: Insert value 6 in tree above.]

Basic operations: delete a node

How can we do this?

This can be more complicated because we might have to find a replacement.

Case 1 -- Leaf Node

If node is a leaf node, just remove and return.

Case 2 -- One Child

If node has only one child, remove and move the child up.

Example: Remove 4.

Remove 4 and move 6 up.

      7
     / \
    6   11

Case 3 -- Else

Otherwise, find the left-most descendant of the right subtree and move up.

This is the inorder successor.

Example? Remove 4.

In this case it is 5, so replace 4 with 5.

Cost of these operations?

O( depth of the tree )

Bad news: the depth can be made proportional to $n$ , the number of nodes. How?

Good news: smart ways to make the depth $O (lo g n)$

Balanced binary search trees

There are smart ways to rebalance the tree!

Depth: $O (lo g n)$

Binary search trees -- Implementation

Applications (range searching)
Rust: BTreeMap and BTreeSet
Tries (Prefix Trees)

Let's slightly revise our TreeNode and BinaryTree structs and methods.
The main difference is the insert_at() function.

#[derive(Debug)]
struct TreeNode {
    value: usize,
    left: Option<usize>,
    right: Option<usize>,
}

impl TreeNode {
    fn new(value: usize) -> Self {
        TreeNode {
            value,
            left: None,
            right: None,
        }
    }
}

#[derive(Debug)]
struct BinaryTree {
    nodes: Vec<TreeNode>,
    root: Option<usize>,
}

impl BinaryTree {
    fn new() -> Self {
        BinaryTree { nodes: Vec::new(), root: None }
    }

    fn insert(&mut self, value: usize) {
        let new_node_index = self.nodes.len();
        self.nodes.push(TreeNode::new(value));

        match self.root {
            Some(root_index) => self.insert_at(root_index, new_node_index, value),
            None => self.root = Some(new_node_index),
        }
    }

    fn insert_at(&mut self, current_index: usize, new_node_index: usize, value:usize) {
        let current_node = &mut self.nodes[current_index];
        if current_node.value < value {
            if current_node.right.is_none() {
                current_node.right = Some(new_node_index);
            } else {
                let right = current_node.right.unwrap();
                self.insert_at(right, new_node_index, value);
            }
        } else {
            if current_node.left.is_none() {
                current_node.left = Some(new_node_index);
            } else {
                let left = current_node.left.unwrap();
                self.insert_at(left, new_node_index, value);
            }
        }
    }
}

What happens if we insert the following values?


let mut tree = BinaryTree::new();
tree.insert(1);
tree.insert(2);
tree.insert(3);
tree.insert(6);
tree.insert(7);
tree.insert(11);
tree.insert(23);
tree.insert(34);
println!("{:#?}", tree);

BinaryTree {
    nodes: [
        TreeNode {
            value: 1,
            left: None,
            right: Some(
                1,
            ),
        },
        TreeNode {
            value: 2,
            left: None,
            right: Some(
                2,
            ),
        },
        TreeNode {
            value: 3,
            left: None,
            right: Some(
                3,
            ),
        },
        TreeNode {
            value: 6,
            left: None,
            right: Some(
                4,
            ),
        },
        TreeNode {
            value: 7,
            left: None,
            right: Some(
                5,
            ),
        },
        TreeNode {
            value: 11,
            left: None,
            right: Some(
                6,
            ),
        },
        TreeNode {
            value: 23,
            left: None,
            right: Some(
                7,
            ),
        },
        TreeNode {
            value: 34,
            left: None,
            right: None,
        },
    ],
    root: Some(
        0,
    ),
}

Unbalanced binary search trees can be inefficient!

This is a degenerate tree -- basically a linked list.

Balanced binary search trees

There are smart ways to rebalance the tree!

Depth: $O (lo g n)$
Usually additional information has to be kept at each node
Many popular, efficient examples:
- Red–black trees
- AVL trees
- BTrees (Used in Rust)
- ...
Fundamentally they all support rebalancing operations using some form of tree rotation.

We'll look at some simple approaches.

Basic operations: rebalance a tree

How can we do this?

Quite a bit more complicated
One basic idea. Find the branch that has gotten too long
- Swap the parent with the child that is at the top of the branch by making the child the parent and the parent the child
- If you picked a left child take the right subtree and make it a left child of the old parent
- if you picked a right child take left subtree and make it a right child of the old parent

Example

Let's go back to our degenerate tree. It's just one long branch.

Step 1: Rotate around 1-2-3.

Step 2: Rotate around 3-6-7.

Step 3: Rotate around 7-11-23.

A simple way to rebalance a binary tree

First do an in-order traversal to get the nodes in sorted order
Then use the middle of the sorted vector to be the root of the tree and recursively build the rest

impl BinaryTree {
    fn in_order_traversal(&self, node_index: Option<usize>)->Vec<usize> {
        let mut u: Vec<usize> = vec![];
        if let Some(index) = node_index {
            let node = &self.nodes[index];
            u = self.in_order_traversal(node.left);
            let mut v: Vec<usize> = vec![node.value];
            let mut w: Vec<usize> = self.in_order_traversal(node.right);
            u.append(&mut v);
            u.append(&mut w);
        }
        return u;
    }
}

let mut z = tree.in_order_traversal(tree.root);
z.sort();
println!("{:?}", z);

[1, 2, 3, 6, 7, 11, 23, 34]

impl BinaryTree {
    // This function recursively builds a perfectly balanced BST by:
    // 1. Finding the middle element of a sorted array to use as the root
    // 2. Recursively building the left and right subtrees using the elements
    //    before and after the middle element, respectively.
    fn balance_bst(&mut self, v: &[usize], start:usize, end:usize) -> Option<usize> {
        if start >= end {
            return None;
        }
        let mid = (start+end) / 2;
        let node_index = self.nodes.len();
        self.insert(v[mid]);
        self.nodes[node_index].left = self.balance_bst(v, start, mid);
        self.nodes[node_index].right = self.balance_bst(v, mid+1, end);
        Some(node_index)
    }
}

let mut bbtree = BinaryTree::new();
bbtree.balance_bst(&z, 0, z.len());
println!("{:#?}", bbtree);
println!("{:?}", level_order_traversal2(&bbtree));

BinaryTree {
    nodes: [
        TreeNode {
            value: 7,
            left: Some(
                1,
            ),
            right: Some(
                5,
            ),
        },
        TreeNode {
            value: 3,
            left: Some(
                2,
            ),
            right: Some(
                4,
            ),
        },
        TreeNode {
            value: 2,
            left: Some(
                3,
            ),
            right: None,
        },
        TreeNode {
            value: 1,
            left: None,
            right: None,
        },
        TreeNode {
            value: 6,
            left: None,
            right: None,
        },
        TreeNode {
            value: 23,
            left: Some(
                6,
            ),
            right: Some(
                7,
            ),
        },
        TreeNode {
            value: 11,
            left: None,
            right: None,
        },
        TreeNode {
            value: 34,
            left: None,
            right: None,
        },
    ],
    root: Some(
        0,
    ),
}
7 
3 23 
2 6 11 34 
1 
()

Produces the result:

        7
      /   \
    3      23
   / \    /  \
  2   6  11  34
 /
1

Why use binary search trees?

Hash maps and hash sets give us $O (1)$ time operations?

Reason 1:

Good worst case behavior: no need for a good hash function

Reason 2:

Can answer efficiently questions such as:
- What is the smallest/greatest element?
- What is the smallest element greater than $x$ ?
- List all elements between $x$ and $y$

Example: find the smallest element greater than $x$

Question: How can you list all elements in order in $O (n)$ time?

Answer: recursively starting from the root

visit left subtree
output current node
visit right subtree

Outputting smallest element greater than $x$ :

Like above, ignoring whole subtrees smaller than $x$
Will get the first element greater than $x$ in $O (lo g n)$ time

For balanced trees: listing $t$ first greater elements takes $O (t + lo g n)$ time

$B$ -Trees

Are there binary search trees in Rust's standard library?

Not exactly
Binary Search Trees are computationally very efficent for search/insert/delete ( $O (lo g n)$ ).
In practive, very inefficient on modern computer architectures.
- Every insert triggers a heap allocation
- Every single comparison is a cache-miss.

Enter $B$ -trees:

B-trees are balanced search trees where each node contains between B and 2B keys, with one more subtree than keys.
All leaf nodes are at the same depth, ensuring consistent O(log n) performance for search, insert, and delete operations.
B-trees are widely used in database systems for indexing and efficient range queries.
They're implemented in Rust's standard library as BTreeMap and BTreeSet for in-memory operations.
The structure is optimized for both disk and memory operations, with nodes sized appropriately for the storage medium.

By CyHawk, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11701365

What does the B stand for?

Invented by Bayer and McCreight at Boeing. Suggested explanations have been Boeing, balanced, broad, bushy, and Bayer. McCreight has said that "the more you think about what the B in B-trees means, the better you understand B-trees.

`BTreeSet` and `BTreeMap`

std::collections::BTreeSet
std::collections::BTreeMap

Sets and maps, respectively

// let's create a set
use std::collections::BTreeSet;
let mut set: BTreeSet<i32> = BTreeSet::new();
set.insert(11);
set.insert(7);
set.insert(5);
set.insert(23);
set.insert(25);

// listing a range
set.range(7..25).for_each(|x| println!("{}", x));

7
11
23

// listing a range: left and right bounds and 
// whether they are included or not
use std::ops::Bound::{Included,Excluded};
set.range((Excluded(5),Included(11))).for_each(|x| println!("{}", x));

7
11

// Iterating through the items is the in-order traversal that will give you sorted output
for i in &set {
    println!("{}", *i);
}

// let's make a map now
use std::collections::BTreeMap;
// We can rely on type inference to avoid having to fully write out the types
let mut map = BTreeMap::new();
map.insert("DS310", "Data Mechanics");
map.insert("DS210", "Programming for Data Science");
map.insert("DS120", "Foundations of Data Science I");
map.insert("DS121", "Foundations of Data Science II");
map.insert("DS122", "Foundations of Data Science III");

// Try to find a course
if !map.contains_key("DS111") {
    println!("Course not found");
}

Course not found





()

for (course, name) in &map {
    println!("{course}: \"{name}\"");
}

DS120: "Foundations of Data Science I"
DS121: "Foundations of Data Science II"
DS122: "Foundations of Data Science III"
DS210: "Programming for Data Science"
DS310: "Data Mechanics"





()

Note that the keys are sorted.

// listing a range
map.range("DS000".."DS199").for_each(|(x,y)| println!("{}: \"{}\"", x, y));

DS120: "Foundations of Data Science I"
DS121: "Foundations of Data Science II"
DS122: "Foundations of Data Science III"

BTreeMap vs HashMap

Use HashMap when:
- You need O(1) average-case lookup time for specific keys
- You don't need to maintain any particular order of elements
- You're doing mostly individual key lookups rather than range queries
- Your keys implement Hash, Eq, and PartialEq traits
Use BTreeMap when:
- You need to efficiently list all values in a specific range
- You need ordered iteration over keys/values
- You need to find the smallest/greatest element or elements between two values
- You need consistent O(log n) performance for all operations
- Your keys implement Ord trait
Key Differences:
- HashMap provides O(1) average-case lookup but can have O(n) worst-case
- BTreeMap provides guaranteed O(log n) performance for all operations
- BTreeMap maintains elements in sorted order
- HashMap is generally faster for individual lookups when you don't need ordering
Practical Examples:
- Use HashMap for: counting word frequencies, caching, quick lookups
- Use BTreeMap for: maintaining ordered data, range queries, finding nearest values
Memory Considerations:
- HashMap may use more memory due to its underlying array structure which needs resizing (doubling and copyng) when full
- BTreeMap's memory usage is more predictable and grows linearly with the number of elements -- adds new nodes as needed.

Prefix Tree (Trie)

Can be pronounced /ˈtraɪ/ (rhymes with tide) or /ˈtriː/ (tree).

A very efficient data structure for dictionary search, word suggestions, error corrections etc.

A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". Each complete English word has an arbitrary integer value associated with it. Wikipedia

Available in Rust as an external create https://docs.rs/trie-rs/latest/trie_rs/

:dep trie-rs="0.1.1"
fn testme() {
    use std::str;
    use trie_rs::TrieBuilder;
    let mut builder = TrieBuilder::<u8>::new();  
    
    builder.push("to");
    builder.push("tea");
    builder.push("ted");
    builder.push("ten");
    builder.push("teapot");
    builder.push("in");
    builder.push("inn");
    let trie = builder.build();
    println!("Find suffixes of \"te\"");
    let results_in_u8s: Vec<Vec<u8>> = trie.predictive_search("te");
    let results_in_str: Vec<&str> = results_in_u8s
        .iter()
        .map(|u8s| str::from_utf8(u8s).unwrap())
        .collect();
    println!("{:?}", results_in_str);
}

testme();

Find suffixes of "te"
["tea", "teapot", "ted", "ten"]

To Insert a Word

Start at the Root
- Begin at the root node of the trie
- The root node represents an empty string
Process Each Character
- For each character in the word you want to insert:
  - Check if the current node has a child node for that character
  - If yes: Move to that child node
  - If no: Create a new node for that character and make it a child of the current node
Mark the End
- After processing all characters, mark the final node as an end-of-word node
- This indicates that a complete word ends at this node

Let me illustrate with an example. Suppose we want to insert the word "tea" into an empty trie:

Step 1: Start at root
   (root)

Step 2: Insert 't'
   (root)
     |
     t

Step 3: Insert 'e'
   (root)
     |
     t
     |
     e

Step 4: Insert 'a'
   (root)
     |
     t
     |
     e
     |
     a*

Step 5: Mark 'a' as end-of-word (shown with *)

If we then insert "ted", the trie would look like:

   (root)
     |
     t
     |
     e
    / \
   a*  d*

The asterisk (*) marks nodes where words end. This structure allows us to:

Share common prefixes between words
Efficiently search for words
Find all words with a given prefix
Check if a word exists in the trie

Most likely your spellchecker is based on a trie

In this case, we will compare possible close matches and then sort by frequency of occurence in some corpus and present top 3-5.

If your word is not in the trie do the following:

Step 1: Find the largest prefix that is present and find the trie words with that prefix
Step 2: Delete the first letter from your word and redo Step 1
Step 3: Insert a letter (for all letters) to the beginning of the word and redo Step 1
Step 4: Replace the beginning letter with a different one (for all letters) and redo Step 1
Step 5: Transpose the first two letters and redo Step 1
Step 6: Collect all words from Steps 1-5 sort by frequency of occurrence and present top 3-5 to user

DS210 Course Notes

Binary search trees

About This Module

Prework

Prework Reading

Pre-lecture Reflections

Lecture

Learning Objectives

Understand binary search trees

Basic operations: find a key $k$

Basic operations: insert a key $k$

Basic operations: delete a node

Case 1 -- Leaf Node

Case 2 -- One Child

Case 3 -- Else

Cost of these operations?

Balanced binary search trees

Binary search trees -- Implementation

Balanced binary search trees

Basic operations: rebalance a tree

Example

A simple way to rebalance a binary tree

Why use binary search trees?

Reason 1:

Reason 2:

Example: find the smallest element greater than $x$

$B$ -Trees

`BTreeSet` and `BTreeMap`

BTreeMap vs HashMap

Prefix Tree (Trie)

To Insert a Word

Most likely your spellchecker is based on a trie

Technical Coding Challenge

Coding Challenge

Coding Challenge Review