Collections Deep Dive: Entry API, BTreeMap, and Circular Buffers
About This Module
This module provides a deep dive into advanced collection patterns essential for HW7. You'll master the Entry API for efficient HashMap/BTreeMap updates, learn BTreeMap for ordered data with range queries, use the ordered-float crate for float keys, and implement circular buffers with VecDeque.
Prework
Prework Reading
Please read the following:
- The Rust Book: HashMap - Review Entry API section
- BTreeMap Documentation
- VecDeque Documentation
- ordered-float crate
Pre-lecture Reflections
- What's the difference between using
.get()then.insert()vs using the Entry API? - When would you want keys to be sorted (BTreeMap) vs unsorted (HashMap)?
- Why can't
f64be used directly as a HashMap/BTreeMap key? - What's the difference between a regular queue and a circular buffer?
Learning Objectives
By the end of this module, you will be able to:
- Use the Entry API to efficiently update collections without double lookups
- Choose between HashMap and BTreeMap based on requirements
- Use BTreeMap for ordered data, range queries, and percentile calculations
- Work with float keys using the ordered-float crate
- Implement circular buffers with VecDeque for rolling window calculations
Part 1: Mastering the Entry API
The Double-Lookup Problem
A common pattern when updating HashMaps:
#![allow(unused)] fn main() { use std::collections::HashMap; fn count_words_inefficient(text: &str) -> HashMap<String, usize> { let mut counts = HashMap::new(); for word in text.split_whitespace() { // DON'T: This does TWO lookups! if counts.contains_key(word) { let count = counts.get_mut(word).unwrap(); *count += 1; } else { counts.insert(word.to_string(), 1); } } counts } let result = count_words_inefficient("rust is awesome rust is fast"); println!("{:?}", result); }
Problem: We look up the key twice - once to check, once to modify.
The Entry API Solution
#![allow(unused)] fn main() { use std::collections::HashMap; fn count_words_efficient(text: &str) -> HashMap<String, usize> { let mut counts = HashMap::new(); for word in text.split_whitespace() { // DO: Single lookup with Entry API! *counts.entry(word.to_string()).or_insert(0) += 1; } counts } let result = count_words_efficient("rust is awesome rust is fast"); println!("{:?}", result); }
Understanding Entry
The .entry() method returns an Entry enum:
#![allow(unused)] fn main() { use std::collections::HashMap; let mut map: HashMap<&str, i32> = HashMap::new(); println!("{:?}", map.entry("key")); // Entry can be Occupied or Vacant match map.entry("key") { std::collections::hash_map::Entry::Occupied(entry) => { println!("Key exists with value: {}", entry.get()); } std::collections::hash_map::Entry::Vacant(entry) => { println!("Key doesn't exist, inserting..."); entry.insert(42); } } println!("{:?}", map.entry("key")); }
Entry API Methods
or_insert: Insert default if vacant, return mutable referenceor_insert_with: Insert computed value if vacant (lazy evaluation)or_default: Insert Default::default() if vacant, e.g. 0 for i32, "" for String, etc. (types with Default trait)and_modify: Modify existing value, or insert default
#![allow(unused)] fn main() { use std::collections::HashMap; let mut scores: HashMap<String, Vec<i32>> = HashMap::new(); // or_insert: Insert default if vacant, return mutable reference scores.entry("Alice".to_string()).or_insert(vec![]).push(95); scores.entry("Alice".to_string()).or_insert(vec![]).push(87); // or_insert_with: Insert computed value if vacant (lazy evaluation) scores.entry("Bob".to_string()).or_insert_with(|| { println!("Computing default for Bob..."); vec![100] // This only runs if key is vacant }); // or_default: Insert Default::default() if vacant let mut counts: HashMap<String, usize> = HashMap::new(); *counts.entry("hello".to_string()).or_default() += 1; // and_modify: Modify existing value, or insert default counts.entry("hello".to_string()) .and_modify(|c| *c += 1) .or_insert(1); println!("Scores: {:?}", scores); println!("Counts: {:?}", counts); }
Entry API for Grouping (Split-Apply-Combine)
Perfect for grouping data by categories:
#![allow(unused)] fn main() { use std::collections::HashMap; let data = vec![("A", 10), ("B", 20), ("A", 30), ("B", 40), ("A", 50)]; // Group values by category let mut groups: HashMap<&str, Vec<i32>> = HashMap::new(); for (category, value) in data { groups.entry(category).or_default().push(value); } // Now calculate aggregates per group for (category, values) in &groups { let sum: i32 = values.iter().sum(); let mean = sum as f64 / values.len() as f64; println!("{}: values={:?}, sum={}, mean={:.1}", category, values, sum, mean); } }
HW7 Connection: This pattern is the foundation of
GroupedSeriesin Part 2!
Entry API Works with BTreeMap Too!
#![allow(unused)] fn main() { use std::collections::BTreeMap; let mut sorted_counts: BTreeMap<String, usize> = BTreeMap::new(); for word in "rust is awesome rust is fast".split_whitespace() { *sorted_counts.entry(word.to_string()).or_insert(0) += 1; } // BTreeMap iterates in sorted key order! for (word, count) in &sorted_counts { println!("{}: {}", word, count); } }
Part 2: BTreeMap for Ordered Data
HashMap vs BTreeMap
| Feature | HashMap | BTreeMap |
|---|---|---|
| Lookup | O(1) average | O(log n) |
| Iteration order | Random | Sorted by key |
| Range queries | ❌ Not supported | ✅ Supported |
| Key requirement | Hash + Eq | Ord |
| Memory | Less predictable | More predictable |
When to Use BTreeMap
Use BTreeMap when you need:
- Sorted iteration over keys
- Range queries (get all keys between X and Y)
- Min/max key operations
- Percentile calculations
- Keys that don't implement
Hash
Note: See modules on graphs, trees, and binary search trees for background.
BTreeMap: Sorted Iteration
#![allow(unused)] fn main() { use std::collections::BTreeMap; let mut temps: BTreeMap<u32, f64> = BTreeMap::new(); temps.insert(2020, 14.9); temps.insert(2018, 14.7); temps.insert(2022, 15.1); temps.insert(2019, 14.8); temps.insert(2021, 15.0); // Iteration is always in sorted order by key! println!("Global temperatures by year:"); for (year, temp) in &temps { println!(" {}: {:.1}°C", year, temp); } // First and last entries println!("\nFirst: {:?}", temps.first_key_value()); println!("Last: {:?}", temps.last_key_value()); }
Note the order of years inserted and the order from the iteration.
BTreeMap: Range Queries
One of BTreeMap's killer features:
#![allow(unused)] fn main() { use std::collections::BTreeMap; use std::ops::Bound; let mut events: BTreeMap<u64, String> = BTreeMap::new(); events.insert(100, "Login".to_string()); events.insert(150, "View page".to_string()); events.insert(200, "Click button".to_string()); events.insert(250, "Submit form".to_string()); events.insert(300, "Logout".to_string()); // Get events in time range [150, 250] println!("Events from 150-250:"); for (time, event) in events.range(150..=250) { println!(" t={}: {}", time, event); } // Events before time 200 println!("\nEvents before 200:"); for (time, event) in events.range(..200) { println!(" t={}: {}", time, event); } // Using Bound for more control use std::ops::Bound::{Included, Excluded}; println!("\nEvents in (150, 300):"); for (time, event) in events.range((Excluded(150), Excluded(300))) { println!(" t={}: {}", time, event); } }
BTreeMap for Histogram Bins
Perfect for building sorted histograms:
#![allow(unused)] fn main() { use std::collections::BTreeMap; fn build_histogram(data: &[f64], bin_width: f64) -> BTreeMap<i64, usize> { let mut bins: BTreeMap<i64, usize> = BTreeMap::new(); for &value in data { // Calculate bin index (floor division) let bin = (value / bin_width).floor() as i64; *bins.entry(bin).or_insert(0) += 1; } bins } let data = vec![1.2, 2.5, 2.7, 3.1, 3.8, 4.2, 4.5, 5.0, 5.5]; let hist = build_histogram(&data, 1.0); println!("Histogram (bin_width=1.0):"); for (bin, count) in &hist { let start = *bin as f64; let end = start + 1.0; let bar = "*".repeat(*count); println!(" [{:.1}, {:.1}): {} {}", start, end, bar, count); } }
HW7 Connection: This is essentially what
Histogramin Part 3 does!
Part 3: Using Floats as Keys with ordered-float
The Problem with Float Keys
use std::collections::BTreeMap;
// This WON'T compile!
let mut map: BTreeMap<f64, String> = BTreeMap::new();
map.insert(3.14, "pi".to_string());
// Error: the trait bound `f64: Ord` is not satisfied
Why? Floats have NaN (Not a Number) which breaks ordering:
NaN != NaN(violates reflexivity)NaNis not less than, equal to, or greater than any value
Solution: ordered-float Crate
Add to Cargo.toml:
[dependencies]
ordered-float = "4.2"
Then use OrderedFloat:
use ordered_float::OrderedFloat;
use std::collections::BTreeMap;
fn main() {
let mut map: BTreeMap<OrderedFloat<f64>, String> = BTreeMap::new();
// Wrap floats in OrderedFloat
map.insert(OrderedFloat(3.14), "pi".to_string());
map.insert(OrderedFloat(2.72), "e".to_string());
map.insert(OrderedFloat(1.41), "sqrt(2)".to_string());
// Iteration is sorted by float value!
for (key, value) in &map {
println!("{:.2}: {}", key.0, value);
}
// Access the inner value with .0
let pi_key = OrderedFloat(3.14);
println!("\nLookup {}: {:?}", pi_key.0, map.get(&pi_key));
}
OrderedFloat for Histogram Bins
use ordered_float::OrderedFloat;
use std::collections::BTreeMap;
struct Histogram {
bins: BTreeMap<OrderedFloat<f64>, usize>,
bin_width: f64,
}
impl Histogram {
fn new(bin_width: f64) -> Self {
Histogram {
bins: BTreeMap::new(),
bin_width,
}
}
fn add(&mut self, value: f64) {
let bin_edge = (value / self.bin_width).floor() * self.bin_width;
*self.bins.entry(OrderedFloat(bin_edge)).or_insert(0) += 1;
}
fn get_count(&self, value: f64) -> usize {
let bin_edge = (value / self.bin_width).floor() * self.bin_width;
self.bins.get(&OrderedFloat(bin_edge)).copied().unwrap_or(0)
}
fn cumulative_distribution(&self) -> Vec<(f64, f64)> {
let total: usize = self.bins.values().sum();
let mut cumulative = 0;
self.bins.iter()
.map(|(bin_edge, &count)| {
cumulative += count;
(bin_edge.0 + self.bin_width / 2.0, cumulative as f64 / total as f64)
})
.collect()
}
}
HW7 Connection: This is exactly how
Histogramin Part 3 is structured!
Part 4: VecDeque for Circular Buffers
What is a Circular Buffer?
A circular buffer (ring buffer) is a fixed-size data structure that:
- Overwrites oldest data when full
- Perfect for "sliding window" or "rolling" calculations
- Efficient O(1) operations at both ends
Initial (capacity 4):
[_, _, _, _] (empty)
After push 1, 2, 3:
[1, 2, 3, _]
After push 4:
[1, 2, 3, 4] (full)
After push 5 (overwrites oldest):
[5, 2, 3, 4] → conceptually [2, 3, 4, 5]
VecDeque Review
#![allow(unused)] fn main() { use std::collections::VecDeque; let mut deque: VecDeque<i32> = VecDeque::new(); // Add to back (queue behavior) deque.push_back(1); deque.push_back(2); deque.push_back(3); println!("Deque: {:?}", deque); // [1, 2, 3] // Remove from front let first = deque.pop_front(); println!("Popped: {:?}", first); // Some(1) println!("Deque: {:?}", deque); // [2, 3] // Also supports push_front and pop_back deque.push_front(0); println!("Deque: {:?}", deque); // [0, 2, 3] }
Implementing a Rolling Buffer
#![allow(unused)] fn main() { use std::collections::VecDeque; struct RollingBuffer { buffer: VecDeque<f64>, capacity: usize, } impl RollingBuffer { fn new(capacity: usize) -> Self { RollingBuffer { buffer: VecDeque::with_capacity(capacity), capacity, } } fn push(&mut self, value: f64) { if self.buffer.len() == self.capacity { self.buffer.pop_front(); // Remove oldest } self.buffer.push_back(value); // Add newest } fn mean(&self) -> Option<f64> { if self.buffer.is_empty() { None } else { let sum: f64 = self.buffer.iter().sum(); Some(sum / self.buffer.len() as f64) } } fn is_full(&self) -> bool { self.buffer.len() == self.capacity } } // Example: Rolling average of last 3 values let mut rolling = RollingBuffer::new(3); for value in [10.0, 20.0, 30.0, 40.0, 50.0] { rolling.push(value); println!("After {}: mean = {:?}, full = {}", value, rolling.mean(), rolling.is_full()); } }
Rolling Statistics Applications
#![allow(unused)] fn main() { use std::collections::VecDeque; struct RollingStats { buffer: VecDeque<f64>, capacity: usize, } impl RollingStats { fn new(capacity: usize) -> Self { RollingStats { buffer: VecDeque::with_capacity(capacity), capacity, } } fn push(&mut self, value: f64) { if self.buffer.len() == self.capacity { self.buffer.pop_front(); } self.buffer.push_back(value); } fn mean(&self) -> Option<f64> { if self.buffer.is_empty() { return None; } let sum: f64 = self.buffer.iter().sum(); Some(sum / self.buffer.len() as f64) } fn std_dev(&self) -> Option<f64> { if self.buffer.len() < 2 { return None; } let mean = self.mean()?; let variance: f64 = self.buffer.iter() .map(|&x| (x - mean).powi(2)) .sum::<f64>() / (self.buffer.len() - 1) as f64; Some(variance.sqrt()) } } // Detect anomalies using rolling statistics let data = [100.0, 102.0, 98.0, 101.0, 150.0, 99.0, 103.0]; let mut stats = RollingStats::new(4); for &value in &data { stats.push(value); if let (Some(mean), Some(std)) = (stats.mean(), stats.std_dev()) { let z_score = (value - mean).abs() / std; if z_score > 2.0 { println!("ANOMALY: {} (z-score: {:.2})", value, z_score); } else { println!("Normal: {} (mean: {:.1}, std: {:.1})", value, mean, std); } } } }
HW7 Connection: This is the
RollingBufferyou'll implement in Part 3!
Summary: Collections for HW7
| HW7 Part | Collections Used | Key Patterns |
|---|---|---|
| Part 1 | HashMap, HashSet | Entry API for counting, set operations |
| Part 2 | HashMap | Entry API for grouping, split-apply-combine |
| Part 3 | BTreeMap, VecDeque | OrderedFloat for keys, circular buffer |
Key Takeaways
- Entry API eliminates double lookups - use it everywhere!
- BTreeMap when you need sorted keys or range queries
- ordered-float enables float keys in ordered collections
- VecDeque is perfect for fixed-size sliding windows
In-Class Exercise: Rolling Window Statistics
Task: Implement a function that computes a rolling mean over a data stream.
Given a stream of temperature readings and a window size, output the rolling mean after each reading.
Use Rust Playground or VSCode to develop your solution.
fn rolling_mean(data: &[f64], window_size: usize) -> Vec<f64> {
// TODO: Implement using VecDeque
todo!()
}
// Test it
let data = vec![20.0, 22.0, 21.0, 23.0, 25.0, 24.0];
let means = rolling_mean(&data, 3);
for (i, (val, mean)) in data.iter().zip(means.iter()).enumerate() {
println!("Step {}: value={}, rolling_mean={:.1}", i, val, mean);
}
// Output:
// Step 0: value=20, rolling_mean=20.0 (window: [20])
// Step 1: value=22, rolling_mean=21.0 (window: [20, 22])
// Step 2: value=21, rolling_mean=21.0 (window: [20, 22, 21])
// Step 3: value=23, rolling_mean=22.0 (window: [22, 21, 23])
// Step 4: value=25, rolling_mean=23.0 (window: [21, 23, 25])
// Step 5: value=24, rolling_mean=24.0 (window: [23, 25, 24])
Bonus: Add detection of values more than 2 standard deviations from the rolling mean.
Solution
#![allow(unused)] fn main() { use std::collections::VecDeque; fn rolling_mean(data: &[f64], window_size: usize) -> Vec<f64> { let mut buffer: VecDeque<f64> = VecDeque::with_capacity(window_size); let mut results = Vec::new(); for &value in data { // If buffer is full, remove oldest value if buffer.len() == window_size { buffer.pop_front(); } // Add new value buffer.push_back(value); // Calculate mean of current window let sum: f64 = buffer.iter().sum(); let mean = sum / buffer.len() as f64; results.push(mean); } results } // Test it let data = vec![20.0, 22.0, 21.0, 23.0, 25.0, 24.0]; let means = rolling_mean(&data, 3); for (i, (val, mean)) in data.iter().zip(means.iter()).enumerate() { println!("Step {}: value={}, rolling_mean={:.1}", i, val, mean); } // Output: // Step 0: value=20, rolling_mean=20.0 (window: [20]) // Step 1: value=22, rolling_mean=21.0 (window: [20, 22]) // Step 2: value=21, rolling_mean=21.0 (window: [20, 22, 21]) // Step 3: value=23, rolling_mean=22.0 (window: [22, 21, 23]) // Step 4: value=25, rolling_mean=23.0 (window: [21, 23, 25]) // Step 5: value=24, rolling_mean=24.0 (window: [23, 25, 24]) }
Bonus Solution (with anomaly detection):
#![allow(unused)] fn main() { use std::collections::VecDeque; fn rolling_mean_with_anomaly(data: &[f64], window_size: usize) -> Vec<(f64, bool)> { let mut buffer: VecDeque<f64> = VecDeque::with_capacity(window_size); let mut results = Vec::new(); for &value in data { // Check for anomaly BEFORE adding to buffer (compare to previous window) let is_anomaly = if buffer.len() >= 2 { let mean: f64 = buffer.iter().sum::<f64>() / buffer.len() as f64; let variance: f64 = buffer.iter() .map(|&x| (x - mean).powi(2)) .sum::<f64>() / (buffer.len() - 1) as f64; let std_dev = variance.sqrt(); // Z-score > 2 means anomaly std_dev > 0.0 && (value - mean).abs() / std_dev > 2.0 } else { false }; // Update buffer if buffer.len() == window_size { buffer.pop_front(); } buffer.push_back(value); // Calculate current mean let mean = buffer.iter().sum::<f64>() / buffer.len() as f64; results.push((mean, is_anomaly)); } results } // Test with an anomaly let data = vec![20.0, 21.0, 20.0, 22.0, 50.0, 21.0]; // 50.0 is anomaly let results = rolling_mean_with_anomaly(&data, 4); for (i, (val, (mean, anomaly))) in data.iter().zip(results.iter()).enumerate() { let flag = if *anomaly { " ⚠️ ANOMALY!" } else { "" }; println!("value={}: mean={:.1}{}", val, mean, flag); } }
Key insights:
- VecDeque gives O(1) push_back and pop_front operations
- The window naturally "slides" by removing oldest and adding newest
- For anomaly detection, compare the new value against the previous window's statistics
Next Lecture Preview
In the next lecture, we'll cover:
- Quantile and percentile calculations
- Graph representation and traversal (BFS, DFS)
- Algorithm design patterns