Collections Deep Dive: Entry API, BTreeMap, and Circular Buffers

About This Module

This module provides a deep dive into advanced collection patterns essential for HW7. You'll master the Entry API for efficient HashMap/BTreeMap updates, learn BTreeMap for ordered data with range queries, use the ordered-float crate for float keys, and implement circular buffers with VecDeque.

Prework

Prework Reading

Please read the following:

Pre-lecture Reflections

  1. What's the difference between using .get() then .insert() vs using the Entry API?
  2. When would you want keys to be sorted (BTreeMap) vs unsorted (HashMap)?
  3. Why can't f64 be used directly as a HashMap/BTreeMap key?
  4. What's the difference between a regular queue and a circular buffer?

Learning Objectives

By the end of this module, you will be able to:

  • Use the Entry API to efficiently update collections without double lookups
  • Choose between HashMap and BTreeMap based on requirements
  • Use BTreeMap for ordered data, range queries, and percentile calculations
  • Work with float keys using the ordered-float crate
  • Implement circular buffers with VecDeque for rolling window calculations

Part 1: Mastering the Entry API

The Double-Lookup Problem

A common pattern when updating HashMaps:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

fn count_words_inefficient(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();
    
    for word in text.split_whitespace() {
        // DON'T: This does TWO lookups!
        if counts.contains_key(word) {
            let count = counts.get_mut(word).unwrap();
            *count += 1;
        } else {
            counts.insert(word.to_string(), 1);
        }
    }
    counts
}

let result = count_words_inefficient("rust is awesome rust is fast");
println!("{:?}", result);
}

Problem: We look up the key twice - once to check, once to modify.

The Entry API Solution

#![allow(unused)]
fn main() {
use std::collections::HashMap;

fn count_words_efficient(text: &str) -> HashMap<String, usize> {
    let mut counts = HashMap::new();
    
    for word in text.split_whitespace() {
        // DO: Single lookup with Entry API!
        *counts.entry(word.to_string()).or_insert(0) += 1;
    }
    counts
}

let result = count_words_efficient("rust is awesome rust is fast");
println!("{:?}", result);
}

Understanding Entry

The .entry() method returns an Entry enum:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut map: HashMap<&str, i32> = HashMap::new();

println!("{:?}", map.entry("key"));

// Entry can be Occupied or Vacant
match map.entry("key") {
    std::collections::hash_map::Entry::Occupied(entry) => {
        println!("Key exists with value: {}", entry.get());
    }
    std::collections::hash_map::Entry::Vacant(entry) => {
        println!("Key doesn't exist, inserting...");
        entry.insert(42);
    }
}

println!("{:?}", map.entry("key"));
}

Entry API Methods

  • or_insert: Insert default if vacant, return mutable reference
  • or_insert_with: Insert computed value if vacant (lazy evaluation)
  • or_default: Insert Default::default() if vacant, e.g. 0 for i32, "" for String, etc. (types with Default trait)
  • and_modify: Modify existing value, or insert default
#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut scores: HashMap<String, Vec<i32>> = HashMap::new();

// or_insert: Insert default if vacant, return mutable reference
scores.entry("Alice".to_string()).or_insert(vec![]).push(95);
scores.entry("Alice".to_string()).or_insert(vec![]).push(87);

// or_insert_with: Insert computed value if vacant (lazy evaluation)
scores.entry("Bob".to_string()).or_insert_with(|| {
    println!("Computing default for Bob...");
    vec![100]  // This only runs if key is vacant
});

// or_default: Insert Default::default() if vacant
let mut counts: HashMap<String, usize> = HashMap::new();
*counts.entry("hello".to_string()).or_default() += 1;

// and_modify: Modify existing value, or insert default
counts.entry("hello".to_string())
    .and_modify(|c| *c += 1)
    .or_insert(1);

println!("Scores: {:?}", scores);
println!("Counts: {:?}", counts);
}

Entry API for Grouping (Split-Apply-Combine)

Perfect for grouping data by categories:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let data = vec![("A", 10), ("B", 20), ("A", 30), ("B", 40), ("A", 50)];

// Group values by category
let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();

for (category, value) in data {
    groups.entry(category).or_default().push(value);
}

// Now calculate aggregates per group
for (category, values) in &groups {
    let sum: i32 = values.iter().sum();
    let mean = sum as f64 / values.len() as f64;
    println!("{}: values={:?}, sum={}, mean={:.1}", category, values, sum, mean);
}
}

HW7 Connection: This pattern is the foundation of GroupedSeries in Part 2!

Entry API Works with BTreeMap Too!

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

let mut sorted_counts: BTreeMap<String, usize> = BTreeMap::new();

for word in "rust is awesome rust is fast".split_whitespace() {
    *sorted_counts.entry(word.to_string()).or_insert(0) += 1;
}

// BTreeMap iterates in sorted key order!
for (word, count) in &sorted_counts {
    println!("{}: {}", word, count);
}
}

Part 2: BTreeMap for Ordered Data

HashMap vs BTreeMap

FeatureHashMapBTreeMap
LookupO(1) averageO(log n)
Iteration orderRandomSorted by key
Range queries❌ Not supported✅ Supported
Key requirementHash + EqOrd
MemoryLess predictableMore predictable

When to Use BTreeMap

Use BTreeMap when you need:

  • Sorted iteration over keys
  • Range queries (get all keys between X and Y)
  • Min/max key operations
  • Percentile calculations
  • Keys that don't implement Hash

Note: See modules on graphs, trees, and binary search trees for background.

BTreeMap: Sorted Iteration

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

let mut temps: BTreeMap<u32, f64> = BTreeMap::new();
temps.insert(2020, 14.9);
temps.insert(2018, 14.7);
temps.insert(2022, 15.1);
temps.insert(2019, 14.8);
temps.insert(2021, 15.0);

// Iteration is always in sorted order by key!
println!("Global temperatures by year:");
for (year, temp) in &temps {
    println!("  {}: {:.1}°C", year, temp);
}

// First and last entries
println!("\nFirst: {:?}", temps.first_key_value());
println!("Last: {:?}", temps.last_key_value());
}

Note the order of years inserted and the order from the iteration.

BTreeMap: Range Queries

One of BTreeMap's killer features:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
use std::ops::Bound;

let mut events: BTreeMap<u64, String> = BTreeMap::new();
events.insert(100, "Login".to_string());
events.insert(150, "View page".to_string());
events.insert(200, "Click button".to_string());
events.insert(250, "Submit form".to_string());
events.insert(300, "Logout".to_string());

// Get events in time range [150, 250]
println!("Events from 150-250:");
for (time, event) in events.range(150..=250) {
    println!("  t={}: {}", time, event);
}

// Events before time 200
println!("\nEvents before 200:");
for (time, event) in events.range(..200) {
    println!("  t={}: {}", time, event);
}

// Using Bound for more control
use std::ops::Bound::{Included, Excluded};
println!("\nEvents in (150, 300):");
for (time, event) in events.range((Excluded(150), Excluded(300))) {
    println!("  t={}: {}", time, event);
}
}

BTreeMap for Histogram Bins

Perfect for building sorted histograms:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

fn build_histogram(data: &[f64], bin_width: f64) -> BTreeMap<i64, usize> {
    let mut bins: BTreeMap<i64, usize> = BTreeMap::new();
    
    for &value in data {
        // Calculate bin index (floor division)
        let bin = (value / bin_width).floor() as i64;
        *bins.entry(bin).or_insert(0) += 1;
    }
    
    bins
}

let data = vec![1.2, 2.5, 2.7, 3.1, 3.8, 4.2, 4.5, 5.0, 5.5];
let hist = build_histogram(&data, 1.0);

println!("Histogram (bin_width=1.0):");
for (bin, count) in &hist {
    let start = *bin as f64;
    let end = start + 1.0;
    let bar = "*".repeat(*count);
    println!("  [{:.1}, {:.1}): {} {}", start, end, bar, count);
}
}

HW7 Connection: This is essentially what Histogram in Part 3 does!

Part 3: Using Floats as Keys with ordered-float

The Problem with Float Keys

use std::collections::BTreeMap;

// This WON'T compile!
let mut map: BTreeMap<f64, String> = BTreeMap::new();
map.insert(3.14, "pi".to_string());

// Error: the trait bound `f64: Ord` is not satisfied

Why? Floats have NaN (Not a Number) which breaks ordering:

  • NaN != NaN (violates reflexivity)
  • NaN is not less than, equal to, or greater than any value

Solution: ordered-float Crate

Add to Cargo.toml:

[dependencies]
ordered-float = "4.2"

Then use OrderedFloat:

use ordered_float::OrderedFloat;
use std::collections::BTreeMap;

fn main() {
    let mut map: BTreeMap<OrderedFloat<f64>, String> = BTreeMap::new();
    
    // Wrap floats in OrderedFloat
    map.insert(OrderedFloat(3.14), "pi".to_string());
    map.insert(OrderedFloat(2.72), "e".to_string());
    map.insert(OrderedFloat(1.41), "sqrt(2)".to_string());
    
    // Iteration is sorted by float value!
    for (key, value) in &map {
        println!("{:.2}: {}", key.0, value);
    }
    
    // Access the inner value with .0
    let pi_key = OrderedFloat(3.14);
    println!("\nLookup {}: {:?}", pi_key.0, map.get(&pi_key));
}

OrderedFloat for Histogram Bins

use ordered_float::OrderedFloat;
use std::collections::BTreeMap;

struct Histogram {
    bins: BTreeMap<OrderedFloat<f64>, usize>,
    bin_width: f64,
}

impl Histogram {
    fn new(bin_width: f64) -> Self {
        Histogram {
            bins: BTreeMap::new(),
            bin_width,
        }
    }
    
    fn add(&mut self, value: f64) {
        let bin_edge = (value / self.bin_width).floor() * self.bin_width;
        *self.bins.entry(OrderedFloat(bin_edge)).or_insert(0) += 1;
    }
    
    fn get_count(&self, value: f64) -> usize {
        let bin_edge = (value / self.bin_width).floor() * self.bin_width;
        self.bins.get(&OrderedFloat(bin_edge)).copied().unwrap_or(0)
    }
    
    fn cumulative_distribution(&self) -> Vec<(f64, f64)> {
        let total: usize = self.bins.values().sum();
        let mut cumulative = 0;
        
        self.bins.iter()
            .map(|(bin_edge, &count)| {
                cumulative += count;
                (bin_edge.0 + self.bin_width / 2.0, cumulative as f64 / total as f64)
            })
            .collect()
    }
}

HW7 Connection: This is exactly how Histogram in Part 3 is structured!

Part 4: VecDeque for Circular Buffers

What is a Circular Buffer?

A circular buffer (ring buffer) is a fixed-size data structure that:

  • Overwrites oldest data when full
  • Perfect for "sliding window" or "rolling" calculations
  • Efficient O(1) operations at both ends
Initial (capacity 4):
[_, _, _, _]  (empty)

After push 1, 2, 3:
[1, 2, 3, _]

After push 4:
[1, 2, 3, 4]  (full)

After push 5 (overwrites oldest):
[5, 2, 3, 4] → conceptually [2, 3, 4, 5]

VecDeque Review

#![allow(unused)]
fn main() {
use std::collections::VecDeque;

let mut deque: VecDeque<i32> = VecDeque::new();

// Add to back (queue behavior)
deque.push_back(1);
deque.push_back(2);
deque.push_back(3);

println!("Deque: {:?}", deque);  // [1, 2, 3]

// Remove from front
let first = deque.pop_front();
println!("Popped: {:?}", first);  // Some(1)
println!("Deque: {:?}", deque);   // [2, 3]

// Also supports push_front and pop_back
deque.push_front(0);
println!("Deque: {:?}", deque);   // [0, 2, 3]
}

Implementing a Rolling Buffer

#![allow(unused)]
fn main() {
use std::collections::VecDeque;

struct RollingBuffer {
    buffer: VecDeque<f64>,
    capacity: usize,
}

impl RollingBuffer {
    fn new(capacity: usize) -> Self {
        RollingBuffer {
            buffer: VecDeque::with_capacity(capacity),
            capacity,
        }
    }
    
    fn push(&mut self, value: f64) {
        if self.buffer.len() == self.capacity {
            self.buffer.pop_front();  // Remove oldest
        }
        self.buffer.push_back(value);  // Add newest
    }
    
    fn mean(&self) -> Option<f64> {
        if self.buffer.is_empty() {
            None
        } else {
            let sum: f64 = self.buffer.iter().sum();
            Some(sum / self.buffer.len() as f64)
        }
    }
    
    fn is_full(&self) -> bool {
        self.buffer.len() == self.capacity
    }
}

// Example: Rolling average of last 3 values
let mut rolling = RollingBuffer::new(3);

for value in [10.0, 20.0, 30.0, 40.0, 50.0] {
    rolling.push(value);
    println!("After {}: mean = {:?}, full = {}", 
             value, rolling.mean(), rolling.is_full());
}
}

Rolling Statistics Applications

#![allow(unused)]
fn main() {
use std::collections::VecDeque;

struct RollingStats {
    buffer: VecDeque<f64>,
    capacity: usize,
}

impl RollingStats {
    fn new(capacity: usize) -> Self {
        RollingStats {
            buffer: VecDeque::with_capacity(capacity),
            capacity,
        }
    }
    
    fn push(&mut self, value: f64) {
        if self.buffer.len() == self.capacity {
            self.buffer.pop_front();
        }
        self.buffer.push_back(value);
    }
    
    fn mean(&self) -> Option<f64> {
        if self.buffer.is_empty() {
            return None;
        }
        let sum: f64 = self.buffer.iter().sum();
        Some(sum / self.buffer.len() as f64)
    }
    
    fn std_dev(&self) -> Option<f64> {
        if self.buffer.len() < 2 {
            return None;
        }
        let mean = self.mean()?;
        let variance: f64 = self.buffer.iter()
            .map(|&x| (x - mean).powi(2))
            .sum::<f64>() / (self.buffer.len() - 1) as f64;
        Some(variance.sqrt())
    }
}

// Detect anomalies using rolling statistics
let data = [100.0, 102.0, 98.0, 101.0, 150.0, 99.0, 103.0];
let mut stats = RollingStats::new(4);

for &value in &data {
    stats.push(value);
    if let (Some(mean), Some(std)) = (stats.mean(), stats.std_dev()) {
        let z_score = (value - mean).abs() / std;
        if z_score > 2.0 {
            println!("ANOMALY: {} (z-score: {:.2})", value, z_score);
        } else {
            println!("Normal: {} (mean: {:.1}, std: {:.1})", value, mean, std);
        }
    }
}
}

HW7 Connection: This is the RollingBuffer you'll implement in Part 3!

Summary: Collections for HW7

HW7 PartCollections UsedKey Patterns
Part 1HashMap, HashSetEntry API for counting, set operations
Part 2HashMapEntry API for grouping, split-apply-combine
Part 3BTreeMap, VecDequeOrderedFloat for keys, circular buffer

Key Takeaways

  1. Entry API eliminates double lookups - use it everywhere!
  2. BTreeMap when you need sorted keys or range queries
  3. ordered-float enables float keys in ordered collections
  4. VecDeque is perfect for fixed-size sliding windows

In-Class Exercise: Rolling Window Statistics

Task: Implement a function that computes a rolling mean over a data stream.

Given a stream of temperature readings and a window size, output the rolling mean after each reading.

Use Rust Playground or VSCode to develop your solution.

fn rolling_mean(data: &[f64], window_size: usize) -> Vec<f64> {
    // TODO: Implement using VecDeque
    todo!()
}

// Test it
let data = vec![20.0, 22.0, 21.0, 23.0, 25.0, 24.0];
let means = rolling_mean(&data, 3);

for (i, (val, mean)) in data.iter().zip(means.iter()).enumerate() {
    println!("Step {}: value={}, rolling_mean={:.1}", i, val, mean);
}
// Output:
// Step 0: value=20, rolling_mean=20.0  (window: [20])
// Step 1: value=22, rolling_mean=21.0  (window: [20, 22])
// Step 2: value=21, rolling_mean=21.0  (window: [20, 22, 21])
// Step 3: value=23, rolling_mean=22.0  (window: [22, 21, 23])
// Step 4: value=25, rolling_mean=23.0  (window: [21, 23, 25])
// Step 5: value=24, rolling_mean=24.0  (window: [23, 25, 24])

Bonus: Add detection of values more than 2 standard deviations from the rolling mean.

Solution
#![allow(unused)]
fn main() {
use std::collections::VecDeque;

fn rolling_mean(data: &[f64], window_size: usize) -> Vec<f64> {
    let mut buffer: VecDeque<f64> = VecDeque::with_capacity(window_size);
    let mut results = Vec::new();
    
    for &value in data {
        // If buffer is full, remove oldest value
        if buffer.len() == window_size {
            buffer.pop_front();
        }
        
        // Add new value
        buffer.push_back(value);
        
        // Calculate mean of current window
        let sum: f64 = buffer.iter().sum();
        let mean = sum / buffer.len() as f64;
        results.push(mean);
    }
    
    results
}

// Test it
let data = vec![20.0, 22.0, 21.0, 23.0, 25.0, 24.0];
let means = rolling_mean(&data, 3);

for (i, (val, mean)) in data.iter().zip(means.iter()).enumerate() {
    println!("Step {}: value={}, rolling_mean={:.1}", i, val, mean);
}
// Output:
// Step 0: value=20, rolling_mean=20.0  (window: [20])
// Step 1: value=22, rolling_mean=21.0  (window: [20, 22])
// Step 2: value=21, rolling_mean=21.0  (window: [20, 22, 21])
// Step 3: value=23, rolling_mean=22.0  (window: [22, 21, 23])
// Step 4: value=25, rolling_mean=23.0  (window: [21, 23, 25])
// Step 5: value=24, rolling_mean=24.0  (window: [23, 25, 24])
}

Bonus Solution (with anomaly detection):

#![allow(unused)]
fn main() {
use std::collections::VecDeque;

fn rolling_mean_with_anomaly(data: &[f64], window_size: usize) -> Vec<(f64, bool)> {
    let mut buffer: VecDeque<f64> = VecDeque::with_capacity(window_size);
    let mut results = Vec::new();
    
    for &value in data {
        // Check for anomaly BEFORE adding to buffer (compare to previous window)
        let is_anomaly = if buffer.len() >= 2 {
            let mean: f64 = buffer.iter().sum::<f64>() / buffer.len() as f64;
            let variance: f64 = buffer.iter()
                .map(|&x| (x - mean).powi(2))
                .sum::<f64>() / (buffer.len() - 1) as f64;
            let std_dev = variance.sqrt();
            
            // Z-score > 2 means anomaly
            std_dev > 0.0 && (value - mean).abs() / std_dev > 2.0
        } else {
            false
        };
        
        // Update buffer
        if buffer.len() == window_size {
            buffer.pop_front();
        }
        buffer.push_back(value);
        
        // Calculate current mean
        let mean = buffer.iter().sum::<f64>() / buffer.len() as f64;
        results.push((mean, is_anomaly));
    }
    
    results
}

// Test with an anomaly
let data = vec![20.0, 21.0, 20.0, 22.0, 50.0, 21.0];  // 50.0 is anomaly
let results = rolling_mean_with_anomaly(&data, 4);

for (i, (val, (mean, anomaly))) in data.iter().zip(results.iter()).enumerate() {
    let flag = if *anomaly { " ⚠️ ANOMALY!" } else { "" };
    println!("value={}: mean={:.1}{}", val, mean, flag);
}
}

Key insights:

  • VecDeque gives O(1) push_back and pop_front operations
  • The window naturally "slides" by removing oldest and adding newest
  • For anomaly detection, compare the new value against the previous window's statistics

Next Lecture Preview

In the next lecture, we'll cover:

  • Quantile and percentile calculations
  • Graph representation and traversal (BFS, DFS)
  • Algorithm design patterns