Algorithms for Data Science: Quantiles, Graphs, and Algorithm Design

About This Module

This module covers essential algorithms for data science applications. You'll learn quantile calculations for statistical analysis, graph representation and traversal algorithms (BFS/DFS), and algorithm design patterns including split-apply-combine, greedy algorithms, and divide-and-conquer approaches.

Prework

Prework Reading

Please read the following:

Wikipedia: Percentile - Focus on calculation methods
Wikipedia: Breadth-first search
Wikipedia: Depth-first search

Pre-lecture Reflections

What's the difference between percentile and quantile?
Why might BFS find the shortest path in an unweighted graph?
When would you use DFS vs BFS for graph exploration?
What is the "greedy" approach to solving problems?

Learning Objectives

By the end of this module, you will be able to:

Calculate quantiles and percentiles using linear interpolation
Understand interquartile range (IQR) and its uses
Implement ranking algorithms (standard and dense rank)
Represent graphs using adjacency lists
Implement BFS and DFS traversals
Apply algorithm design patterns to data problems

Part 1: Quantiles and Statistical Algorithms

What are Quantiles?

Quantiles divide sorted data into equal parts:

Quartiles (4 parts): Q1 (25%), Q2/median (50%), Q3 (75%)
Percentiles (100 parts): P50 = median, P95 = 95th percentile
Deciles (10 parts): D1 (10%), D5 (50%), etc.

Sorted data: [1, 2, 3, 4, 5, 6, 7, 8, 9]

       Q1    Q2    Q3
       ↓     ↓     ↓
[1, 2, 3, 4, 5, 6, 7, 8, 9]
      25%   50%   75%

Calculating Quantiles: Linear Interpolation

For quantile q (0.0 to 1.0) on sorted data of length n:

position = q * (n - 1)
lower_idx = floor(position)
upper_idx = ceil(position)
fraction = position - lower_idx

if lower_idx == upper_idx:
    result = data[lower_idx]
else:
    result = data[lower_idx] * (1 - fraction) + 
             data[upper_idx] * fraction

Quantile Implementation

fn quantile(data: &[f64], q: f64) -> Option<f64> {
    if data.is_empty() || !(0.0..=1.0).contains(&q) {
        return None;
    }
    
    // Sort the data
    let mut sorted = data.to_vec();
    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
    
    // Calculate position
    let pos = q * (sorted.len() - 1) as f64;
    let lower = pos.floor() as usize;
    let upper = pos.ceil() as usize;
    //println!("For q: {}, index position is: {}, lower is: {}, upper is: {}", q, pos, lower, upper);
    //println!("lower f[{}]: {}, upper f[{}]: {}", lower, sorted[lower], upper, sorted[upper]);
    
    if lower == upper {
        Some(sorted[lower])
    } else {
        // Linear interpolation
        let fraction = pos - lower as f64;
        Some(sorted[lower] * (1.0 - fraction) + sorted[upper] * fraction)
    }
}

fn main() {
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0];
//let data = vec![-1.0, 2.0, 4.4, 6.7, 11.2, 22.8, 83.1, 124.7];

println!("Q1 (25%): {:?}", quantile(&data, 0.25));  // 2.5
println!("Q2 (50%): {:?}", quantile(&data, 0.50));  // 5.0
println!("Q3 (75%): {:?}", quantile(&data, 0.75));  // 7.5
println!("P90: {:?}", quantile(&data, 0.90));       // 8.2
}

HW7 Connection: This is the quantile() function in Part 3 specifically for f64 values!

Interquartile Range (IQR)

IQR = Q3 - Q1 measures the spread of the middle 50% of data.

Uses:

Less sensitive to outliers than using the range (max - min)
Outlier detection:
- $values < Q1 - 1.5 \times IQR$ , or
- $values > Q3 + 1.5 \times IQR$

#![allow(unused)]
fn main() {
fn iqr(data: &[f64]) -> Option<f64> {
    let q1 = quantile(data, 0.25)?;
    let q3 = quantile(data, 0.75)?;
    Some(q3 - q1)
}

fn quantile(data: &[f64], q: f64) -> Option<f64> {
    if data.is_empty() || !(0.0..=1.0).contains(&q) {
        return None;
    }
    let mut sorted = data.to_vec();
    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
    let pos = q * (sorted.len() - 1) as f64;
    let lower = pos.floor() as usize;
    let upper = pos.ceil() as usize;
    if lower == upper {
        Some(sorted[lower])
    } else {
        let fraction = pos - lower as f64;
        Some(sorted[lower] * (1.0 - fraction) + sorted[upper] * fraction)
    }
}

let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0];
let iqr_val = iqr(&data).unwrap();

let q1 = quantile(&data, 0.25).unwrap();
let q3 = quantile(&data, 0.75).unwrap();

println!("Q1: {}, Q3: {}, IQR: {}", q1, q3, iqr_val);

// Outlier bounds
let lower_bound = q1 - 1.5 * iqr_val;
let upper_bound = q3 + 1.5 * iqr_val;
println!("Outlier bounds: [{:.1}, {:.1}]", lower_bound, upper_bound);
}

Reminder: The `?` operator is used to propagate errors up the call stack.

It is equivalent to `return Err(e)` if the expression is an `Err(e)`.
It is equivalent to `return Ok(x)` if the expression is an `Ok(x)`.
It is equivalent to `return x` if the expression is a value.

IQR with outliers

Let's use a slightly more interesting dataset:

#![allow(unused)]
fn main() {
fn iqr(data: &[f64]) -> Option<f64> {
    let q1 = quantile(data, 0.25)?;
    let q3 = quantile(data, 0.75)?;
    Some(q3 - q1)
}

fn quantile(data: &[f64], q: f64) -> Option<f64> {
    if data.is_empty() || !(0.0..=1.0).contains(&q) {
        return None;
    }
    let mut sorted = data.to_vec();
    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
    let pos = q * (sorted.len() - 1) as f64;
    let lower = pos.floor() as usize;
    let upper = pos.ceil() as usize;
    if lower == upper {
        Some(sorted[lower])
    } else {
        let fraction = pos - lower as f64;
        Some(sorted[lower] * (1.0 - fraction) + sorted[upper] * fraction)
    }
}

let data = vec![-1.0, 2.0, 4.4, 6.7, 11.2, 22.8, 83.1, 124.7];
let iqr_val = iqr(&data).unwrap();

let q1 = quantile(&data, 0.25).unwrap();
let q3 = quantile(&data, 0.75).unwrap();

println!("Q1: {}, Q3: {}, IQR: {}", q1, q3, iqr_val);

// Outlier bounds
let lower_bound = q1 - 1.5 * iqr_val;
let upper_bound = q3 + 1.5 * iqr_val;
println!("Outlier bounds: [{:.1}, {:.1}]", lower_bound, upper_bound);
}

Ranking Algorithms

Standard Rank: Position in sorted order (ties get same rank, gaps follow)

Dense Rank: Position in sorted order (ties get same rank, no gaps)

Values:      [100, 95, 95, 90, 85]
Standard:    [  1,  2,  2,  4,  5]  ← gap after ties
Dense:       [  1,  2,  2,  3,  4]  ← no gaps

Standard and Dense Ranking in Sports

Out of curiosity, I asked Anthropic Opus 4.5 to find examples of standard and dense ranking in sports.

Standard Competition Ranking (1, 2, 2, 4) — Skips positions after ties

Most individual sports and races use this method:

Golf ⛳ — The classic example. You'll see "T2" (tied for 2nd) on leaderboards, and the next player is listed as 4th if two players tied for 2nd. This emphasizes that a player finished ahead of X competitors.
Tennis (ATP/WTA rankings) 🎾 — Points-based rankings, but when ties occur in tournament results, standard ranking applies.
Olympic events 🏅 — Track & field, swimming, skiing, etc. If two athletes tie for silver, no bronze is awarded (they give two silvers). The next finisher is 4th.
Marathon / Running races 🏃 — If two runners tie for 2nd, the next finisher is 4th place.
Horse racing 🐎 — Finish positions follow standard ranking.
Cycling (race stages) 🚴 — Stage finishes use standard ranking.

Dense Ranking (1, 2, 2, 3) — Consecutive positions, no gaps

Less common in sports, but used in some contexts:

Soccer/Football league tables ⚽ — While ties on points are typically broken by goal difference (so ties are rare), some leagues display positions using dense-style numbering during the season.
Some fitness leaderboards — Particularly in CrossFit or gym competitions where continuous ranking is preferred.
Some esports standings — Varies by organization.

Key Insight

The distinction often comes down to what the rank is meant to communicate:

Standard Rank	Dense Rank
"How many competitors finished ahead of you?"	"What tier/bracket are you in?"
Emphasizes individual achievement	Emphasizes grouping/classification

Golf's use of standard ranking makes intuitive sense: if you tied for 2nd, there's still only one person who beat you, but two people share a position ahead of the 4th-place finisher—so that finisher had 3 people beat them.

Implementing Dense Rank

#![allow(unused)]
fn main() {
fn dense_rank(data: &[f64]) -> Vec<usize> {
    if data.is_empty() {
        return vec![];
    }
    
    // Create (index, value) pairs and sort by value
    let mut indexed: Vec<(usize, f64)> = data.iter()
        .enumerate()           // produces iter of (index, &value) pairs
        .map(|(i, &v)| (i, v)) // extract index and dereference value
        .collect();
    
    // sort by the values (second element of the tuples)
    indexed.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
    
    // Assign dense ranks
    let mut ranks = vec![0; data.len()];
    let mut current_rank = 0;
    let mut prev_value: Option<f64> = None;
    
    for &(original_idx, value) in &indexed {
        if let Some(prev) = prev_value {
            // compare with some small epsilon to avoid floating point
            // precision issues (e.g. 1.0 and 1.0000000000000001)
            if (value - prev).abs() > 1e-10 { 
                current_rank += 1;  // Only increment for new values
            }
        }
        ranks[original_idx] = current_rank;
        prev_value = Some(value);
    }
    
    ranks
}

let scores = vec![85.0, 95.0, 90.0, 95.0, 80.0];
let ranks = dense_rank(&scores);

for (score, rank) in scores.iter().zip(ranks.iter()) {
    println!("Score: {}, Rank: {}", score, rank);
}
}

HW7 Connection: This is the dense_rank() function in Part 3!

Part 2: Graph Representation

What is a Graph?

A graph G = (V, E) consists of:

Vertices (V): nodes/points
Edges (E): connections between vertices

    0 --- 1
    |     |
    |     |
    3 --- 2

Vertices: {0, 1, 2, 3}
Edges: {(0,1), (1,2), (2,3), (3,0)}

Adjacency List Representation

Store graph as a list of neighbors for each vertex:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

// Using Vec<Vec<usize>>
fn create_graph_vec(n: usize, edges: &[(usize, usize)]) -> Vec<Vec<usize>> {
    let mut adj = vec![vec![]; n];
    for &(u, v) in edges {
        adj[u].push(v);
        adj[v].push(u);  // For undirected graph
    }
    adj
}

// Using HashMap for sparse or labeled graphs
fn create_graph_map<'a>(edges: &[(&'a str, &'a str)]) -> HashMap<&'a str, Vec<&'a str>> {
    let mut adj: HashMap<&'a str, Vec<&'a str>> = HashMap::new();
    for &(u, v) in edges {
        adj.entry(u).or_default().push(v);
        adj.entry(v).or_default().push(u);
    }
    adj
}

// Example: Square graph
let edges = vec![(0, 1), (1, 2), (2, 3), (3, 0)];

let graph = create_graph_vec(4, &edges);
for (vertex, neighbors) in graph.iter().enumerate() {
    println!("Vertex {}: neighbors = {:?}", vertex, neighbors);
}

let edges_map = vec![("0", "1"), ("1", "2"), ("2", "3"), ("3", "0")];

let graph_map = create_graph_map(&edges_map);
for (vertex, neighbors) in graph_map.iter() {
    println!("Vertex {}: neighbors = {:?}", vertex, neighbors);
}

}

When to Use Each Representation

Representation	Best For	Lookup	Memory
`Vec<Vec<usize>>`	Dense graphs, integer vertices	O(1)	O(V + E)
`HashMap<K, Vec<K>>`	Sparse graphs, labeled vertices	O(1) avg	O(V + E)

Part 3: Graph Traversal with BFS and DFS

Breadth-First Search (BFS)

BFS explores nodes level by level using a queue (VecDeque):

Graph:              BFS from vertex 0:
                    
    0               Level 0: [0]
   / \              Level 1: [1, 3]  (neighbors of 0)
  1   3             Level 2: [2]     (unvisited neighbors)
   \ /              
    2               Visit order: 0 → 1 → 3 → 2

BFS Implementation

This BFS implementation uses a HashSet to track visited nodes and a VecDeque as a FIFO queue. Starting from a given vertex, it repeatedly dequeues the front node, marks it visited, and enqueues all unvisited neighbors. The algorithm returns nodes in the order they were first discovered, which corresponds to visiting vertices level by level outward from the start.

#![allow(unused)]
fn main() {
use std::collections::{VecDeque, HashSet};

fn bfs(graph: &[Vec<usize>], start: usize) -> Vec<usize> {
    let mut visited = HashSet::new();
    let mut queue = VecDeque::new();
    let mut order = Vec::new();
    
    queue.push_back(start);
    visited.insert(start);
    
    while let Some(current) = queue.pop_front() {
        order.push(current);
        
        for &neighbor in &graph[current] {
            if !visited.contains(&neighbor) {
                visited.insert(neighbor);
                queue.push_back(neighbor);
            }
        }
    }
    
    order
}

// Square graph with diagonal
//   0 --- 3
//   |     |
//   |     |
//   1 --- 2

let graph = vec![
    vec![1, 3],     // 0
    vec![0, 2],     // 1
    vec![1, 3],     // 2
    vec![0, 2],     // 3
];

let order = bfs(&graph, 0);
println!("BFS order from 0: {:?}", order);
}

Note: VecDeque is essential for O(1) queue operations!

BFS for Shortest Path (Unweighted)

Why does BFS find shortest paths? Because BFS explores nodes level by level, the first time we reach any node is guaranteed to be via the shortest path. When we visit a node at distance d from the start, we've already visited all nodes at distances 0, 1, ..., d-1. This means we can't later find a shorter path to that node.

Key insight: In an unweighted graph, "shortest path" means fewest edges. BFS naturally discovers nodes in order of increasing distance from the start.

#![allow(unused)]
fn main() {
use std::collections::{VecDeque, HashMap};

fn bfs_distances(graph: &[Vec<usize>], start: usize) -> HashMap<usize, usize> {
    let mut distances = HashMap::new();
    let mut queue = VecDeque::new();
    
    queue.push_back(start);
    distances.insert(start, 0);
    
    while let Some(current) = queue.pop_front() {
        let current_dist = distances[&current];
        
        for &neighbor in &graph[current] {
            if !distances.contains_key(&neighbor) {
                distances.insert(neighbor, current_dist + 1);
                queue.push_back(neighbor);
            }
        }
    }
    
    distances
}

let graph = vec![
    vec![1, 3],     // 0
    vec![0, 2],     // 1
    vec![1, 3],     // 2
    vec![0, 2],     // 3
];

let distances = bfs_distances(&graph, 0);
for (node, dist) in &distances {
    println!("Distance from 0 to {}: {}", node, dist);
}
}

Depth-First Search (DFS)

DFS explores as deep as possible first using a stack (Vec or recursion):

Graph:              DFS from vertex 0:
                    
    0               Step 1: Visit 0, push neighbors [1,3]
   / \              Step 2: Pop 1, visit it, push neighbor [2]
  1   3             Step 3: Pop 2, visit it, push neighbor [3]
   \ /              Step 4: Pop 3, visit it (no new neighbors)
    2               
                    Visit order: 0 → 1 → 2 → 3
                    (Goes deep before exploring siblings)

DFS Implementation (Iterative)

This iterative DFS uses a Vec as a LIFO stack and a HashSet to track visited nodes. Starting from a given vertex, it pops the top node, marks it visited if not already seen, and pushes all unvisited neighbors onto the stack. Neighbors are added in reverse order to maintain consistent left-to-right traversal. The algorithm explores as deep as possible along each branch before backtracking.

#![allow(unused)]
fn main() {
use std::collections::HashSet;

fn dfs(graph: &[Vec<usize>], start: usize) -> Vec<usize> {
    let mut visited = HashSet::new();
    let mut stack = vec![start];  // Use Vec as stack
    let mut order = Vec::new();
    
    while let Some(current) = stack.pop() {
        if visited.contains(&current) {
            continue;
        }
        
        visited.insert(current);
        order.push(current);
        
        // Add neighbors to stack (reverse for consistent ordering)
        for &neighbor in graph[current].iter().rev() {
            if !visited.contains(&neighbor) {
                stack.push(neighbor);
            }
        }
    }
    
    order
}

let graph = vec![
    vec![1, 3],     // 0
    vec![0, 2],     // 1
    vec![1, 3],     // 2
    vec![0, 2],     // 3
];

let order = dfs(&graph, 0);
println!("DFS order from 0: {:?}", order);
}

BFS vs DFS Summary

Feature	BFS	DFS
Data Structure	Queue (VecDeque)	Stack (Vec)
Order	Level by level	Deep first
Shortest path	✅ (unweighted)	❌
Memory	O(width)	O(depth)
Use case	Shortest path, levels	Cycle detection, components

Part 4: Algorithm Design Patterns

We'll cover the following patterns:

Split-Apply-Combine
Greedy Algorithms
Divide and Conquer

Let's start with the first pattern: Split-Apply-Combine.

Pattern 1: Split-Apply-Combine

Already covered in HW7 Part 2 with GroupedSeries:

1. SPLIT: Group data by category
2. APPLY: Calculate aggregate per group
3. COMBINE: Collect results

data = [(A, 10), (B, 20), (A, 30), (B, 40)]
         ↓ SPLIT
groups = {A: [10, 30], B: [20, 40]}
         ↓ APPLY (mean)
means = {A: 20.0, B: 30.0}
         ↓ COMBINE
result = HashMap with means

Pattern 2: Greedy Algorithms

Greedy: Make the locally optimal choice at each step.

Example: Coin change (when it works)

#![allow(unused)]
fn main() {
fn greedy_coin_change(amount: u32, coins: &[u32]) -> Vec<u32> {
    let mut result = Vec::new();
    let mut remaining = amount;
    
    // Sort coins in descending order
    let mut sorted_coins = coins.to_vec();
    sorted_coins.sort_by(|a, b| b.cmp(a));
    
    for &coin in &sorted_coins {
        while remaining >= coin {
            result.push(coin);
            remaining -= coin;
        }
    }
    
    result
}

let coins = vec![25, 10, 5, 1];  // US coins
let change = greedy_coin_change(67, &coins);
println!("67 cents: {:?}", change);  // [25, 25, 10, 5, 1, 1]
}

Greedy Coin Change is Not Always Optimal

Warning: Greedy doesn't always give optimal solutions!

The greedy approach to the coin change problem is not always optimal when the coin denominations are not in a canonical system. A canonical system is a system of coin denominations where each denomination is at least twice the value of the next smaller denomination.

For example, consider the coin denominations [25, 15, 1] and we want to make change of 30 cents.

#![allow(unused)]
fn main() {
fn greedy_coin_change(amount: u32, coins: &[u32]) -> Vec<u32> {
    let mut result = Vec::new();
    let mut remaining = amount;
    
    // Sort coins in descending order
    let mut sorted_coins = coins.to_vec();
    sorted_coins.sort_by(|a, b| b.cmp(a));
    
    for &coin in &sorted_coins {
        while remaining >= coin {
            result.push(coin);
            remaining -= coin;
        }
    }
    
    result
}

let coins = vec![25, 15, 1];
let change = greedy_coin_change(30, &coins);
println!("30 cents: {:?}", change);  // [25, 1, 1, 1, 1, 1]
}

Pattern 3: Divide and Conquer

Divide and Conquer:

Divide problem into smaller subproblems
Conquer subproblems recursively
Combine solutions

Classic example: Binary Search

#![allow(unused)]
fn main() {
fn binary_search(sorted: &[i32], target: i32) -> Option<usize> {
    let mut left = 0;
    let mut right = sorted.len();
    
    while left < right {
        let mid = left + (right - left) / 2;
        
        match sorted[mid].cmp(&target) {
            std::cmp::Ordering::Equal => return Some(mid),
            std::cmp::Ordering::Less => left = mid + 1,
            std::cmp::Ordering::Greater => right = mid,
        }
    }
    
    None
}

let data = vec![1, 3, 5, 7, 9, 11, 13, 15];
println!("Index of 7: {:?}", binary_search(&data, 7));   // Some(3)
println!("Index of 8: {:?}", binary_search(&data, 8));   // None
}

If we just searched item by item we would need O(n) time. Binary search gives us O(log n) time assuming the data is sorted which we get if we use a sorted data structure like BTreeMap. Otherwise we would need to sort the data first which is O(n log n) time.

Algorithm Design Summary

Pattern	Key Idea	When to Use
Split-Apply-Combine	Group, aggregate, collect	Data aggregation by category
Greedy	Best local choice	Optimization with greedy property
Divide & Conquer	Split, solve, merge	Problems with optimal substructure

Summary: HW7 Algorithm Connections

HW7 Component	Concepts Used
FrequencyTable	Counting, Entry API
GroupedSeries	Split-apply-combine, closures
Histogram	BTreeMap, binning
quantile/iqr	Sorting, interpolation
RollingBuffer	VecDeque, circular buffer
rank/dense_rank	Sorting, index tracking

Key Takeaways

Quantiles require sorted data and linear interpolation
IQR is robust to outliers (Q3 - Q1)
BFS uses VecDeque, finds shortest paths
DFS uses Vec as stack, explores deeply
Algorithm patterns help structure solutions

In-Class Exercise: Outlier Detection

Task: Implement a function that finds all outliers in a dataset using the IQR method covered earlier in this lecture.

Recall: A value is an outlier if it falls outside the bounds:

Lower bound: Q1 - 1.5 × IQR
Upper bound: Q3 + 1.5 × IQR

fn find_outliers(data: &[f64]) -> Vec<f64> {
    // TODO: Return a Vec containing all outlier values
    // Hint: You can use the quantile() function from earlier
    
    // Step 1: Calculate Q1 and Q3
    
    // Step 2: Calculate IQR
    
    // Step 3: Calculate bounds
    
    // Step 4: Filter and collect outliers
    
    todo!()
}

// Example:
// data = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 100.0]
// Q1 = 2.25, Q3 = 6.25, IQR = 4.0
// Lower bound = 2.25 - 6.0 = -3.75
// Upper bound = 6.25 + 6.0 = 12.25
// Output: [100.0]  (only 100.0 is outside the bounds)

Hints:

First implement or copy the quantile() function from the slides
Use .iter().filter().cloned().collect() to find values outside bounds
Remember to handle the empty data case

Solution

#![allow(unused)]
fn main() {
fn quantile(data: &[f64], q: f64) -> Option<f64> {
    if data.is_empty() || !(0.0..=1.0).contains(&q) {
        return None;
    }
    
    let mut sorted = data.to_vec();
    sorted.sort_by(|a, b| a.partial_cmp(b).unwrap());
    
    let pos = q * (sorted.len() - 1) as f64;
    let lower = pos.floor() as usize;
    let upper = pos.ceil() as usize;
    
    if lower == upper {
        Some(sorted[lower])
    } else {
        let fraction = pos - lower as f64;
        Some(sorted[lower] * (1.0 - fraction) + sorted[upper] * fraction)
    }
}

fn find_outliers(data: &[f64]) -> Vec<f64> {
    if data.is_empty() {
        return vec![];
    }
    
    // Step 1: Calculate Q1 and Q3
    let q1 = quantile(data, 0.25).unwrap();
    let q3 = quantile(data, 0.75).unwrap();
    
    // Step 2: Calculate IQR
    let iqr = q3 - q1;
    
    // Step 3: Calculate bounds
    let lower_bound = q1 - 1.5 * iqr;
    let upper_bound = q3 + 1.5 * iqr;
    
    // Step 4: Filter and collect outliers
    data.iter()
        .filter(|&&x| x < lower_bound || x > upper_bound)
        .cloned()
        .collect()
}

// Test it
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 100.0];
let outliers = find_outliers(&data);

let q1 = quantile(&data, 0.25).unwrap();
let q3 = quantile(&data, 0.75).unwrap();
let iqr = q3 - q1;

println!("Q1: {}, Q3: {}, IQR: {}", q1, q3, iqr);
println!("Lower bound: {}", q1 - 1.5 * iqr);
println!("Upper bound: {}", q3 + 1.5 * iqr);
println!("Outliers: {:?}", outliers);

// Test with data that has outliers on both ends
let data2 = vec![-50.0, 10.0, 12.0, 14.0, 15.0, 16.0, 18.0, 200.0];
println!("\nData with outliers on both ends:");
println!("Outliers: {:?}", find_outliers(&data2));
}

Key insights:

Reuses the quantile() function from earlier in the lecture
The filter closure uses &&x because we're iterating over &f64 references
.cloned() converts &f64 to f64 before collecting
This pattern (compute statistics, then filter) is common in data science

HW7 Brings It All Together

HW7 combines everything:

Generics and traits (Numeric)
Collections (HashMap, HashSet, BTreeMap, VecDeque)
Closures (aggregation functions)
Iterators (data processing)
Algorithm design (statistics, grouping)

Good luck on HW7!

DS210 Course Notes