Lecture 31 - Big O Notation & Algorithmic Complexity

Logistics

Welcome to the algorithms & data structures unit!
Retest scores out, corrections tomorrow in discussion
HW6 due Friday / HW7 released Friday (maybe Saturday)
HW5 grades will be released tomorrow / corrections due a week from tomorrow
Readings shift focus: Python DS book + videos (concepts, not syntax!)
DECKS OF CARDS?

Learning objectives

By the end of today, you should be able to:

Use Big O notation to describe time and space complexity
Analyze code to determine its Big O complexity (loops, nested loops, logarithmic patterns)
Recognize common complexity classes: O(1), O(log n), O(n), O(n^2), O(2^n)
Apply key rules: drop constants, keep dominant terms

Part 1: Big O notation - The math of "about how fast?"

Motivation: When does speed matter?

Think about:

Sorting 10 items vs. sorting 1 million items
Searching through 100 names vs. searching Facebook's 3 billion users
A game processing 60 frames per second

Our intuition is usually that a task that's twice as big should take twice as long

It's often not that simple - it depends on the algorithm

Part 1: Given this code:

#![allow(unused)]
fn main() {
fn sum_array(arr: &[i32]) -> i32 {
    let mut total = 0;
    for &num in arr {
        total += num;
    }
    total
}
}

Question: If the array has n elements, how many addition operations happen?

Part 2: Now consider this code:

#![allow(unused)]
fn main() {
fn count_pairs(n: usize) -> usize {
    let mut count = 0;
    for i in 1..n {
        for j in i..n {
            count += 1;
        }
    }
    count
}
}

Question: If we call count_pairs(n), how many times does the inner loop execute in total?

Try with a small value like n=4 to trace through it
Can you find a pattern or formula?

What is Big O?

Big O notation describes how runtime/memory grows as input size grows.

Key idea: We ignore:

Exact number of operations
Constants and performance on small inputs
Hardware / OS dependent values

We focus on: The growth rate as n goes to infinity

Example: Linear growth

#![allow(unused)]
fn main() {
fn print_all(arr: &[i32]) {
    for &item in arr {  // n iterations
        println!("{}", item);
    }
}
}

Array of size 10: ~10 operations
Array of size 100: ~100 operations
Array of size n: ~n operations

This is O(n) - "linear time"

Example: Quadratic growth

#![allow(unused)]
fn main() {
fn print_all_pairs(arr: &[i32]) {
    for &i in arr {           // n iterations
        for &j in arr {       // n iterations for EACH i
            println!("{}, {}", i, j);
        }
    }
}
}

Array of size 10: ~100 operations (10 × 10)
Array of size 100: ~10,000 operations (100 × 100)
Array of size n: ~n^2 operations

This is O(n^2) - "quadratic time"

Example: Logarithmic growth

#![allow(unused)]
fn main() {
fn binary_search(arr: &[i32], target: i32) -> Option<usize> {
    let mut low = 0;
    let mut high = arr.len();

    while low < high {
        let mid = (low + high) / 2;
        if arr[mid] == target {
            return Some(mid);
        } else if arr[mid] < target {
            low = mid + 1;     
        } else {
            high = mid;
        }
    }
    None
}
}

Array of size 10: ~3-4 operations (log_2 10 ≈ 3.3)
Array of size 100: ~6-7 operations (log_2 100 ≈ 6.6)
Array of size 1,000,000: ~20 operations! (log_2 1,000,000 ≈ 20)

This is O(log n) - "logarithmic time" (very fast!)

Example: Exponential growth

#![allow(unused)]
fn main() {
fn print_all_subsets(arr: &[i32], index: usize, current: &mut Vec<i32>) {
    if index == arr.len() {
        println!("{:?}", current);  // Print one subset
        return;
    }

    // Don't include arr[index]
    print_all_subsets(arr, index + 1, current);

    // Include arr[index]
    current.push(arr[index]);
    print_all_subsets(arr, index + 1, current);
    current.pop();
}
}

Array of size 3: 8 subsets (2³)
Array of size 10: 1,024 subsets (2¹⁰)
Array of size 20: 1,048,576 subsets (2²⁰)
Array of size n: 2^n subsets

This is O(2^n) - "exponential time" (explodes quickly!)

Example: Constant time

#![allow(unused)]
fn main() {
fn get_first(arr: &[i32]) -> Option<i32> {
    arr.first().copied()
}
}

Array of size 10: 1 operation
Array of size 1000: 1 operation
Array of size n: still 1 operation!

This is O(1) - "constant time" (doesn't depend on n)

Think about: What's the complexity?

#![allow(unused)]
fn main() {
fn find_range(arr: &[i32]) -> Option<i32> {
    let mut min = arr.first()?;
    for &item in arr {
        if item < min {
            min = item;
        }
    }

    let mut max = arr.first()?;
    for &item in arr {
        if item > max {
            max = item;
        }
    }

    Some(max - min)
}
}

[PAUSE - think-pair-share]

Common complexity classes (from best to worst)

Notation	Name	Example
O(1)	Constant	Array access by index
O(log n)	Logarithmic	Binary search
O(n)	Linear	Loop through array once
O(n log n)	Linearithmic	Good sorting algorithms
O(n^2)	Quadratic	Nested loops
O(2^n)	Exponential	Trying all subsets
O(n!)	Factorial	Trying all permutations

Rule of thumb: Each step down this list is MUCH slower!

Rules for analyzing code

Loops: Multiply complexity by number of iterations
- Loop n times doing O(1) work = O(n)
- Loop n times doing O(n) work = O(n^2)
- Outer loop n times, inner loop m times = O(n m)
Drop constants and lower-order terms:
- O(3n) -> O(n)
- O(n^2 + n) -> O(n^2)
- O(5) -> O(1)

Let's do this one together

#![allow(unused)]
fn main() {
fn mystery_function(arr: &[i32]) -> i32 {
    let n = arr.len();
    let mut count = 0;

    for i in 0..n {
        count += arr[i];
    }

    for i in 0..10 {
        count += 1;
    }

    for i in 0..n {
        for j in 0..n {
            if arr[i] == arr[j] {
                count += 1;
            }
        }
    }

    count
}
}

Space complexity too!

Big O also applies to memory usage.

#![allow(unused)]
fn main() {
fn make_doubles(arr: &[i32]) -> Vec<i32> {
    let mut result = Vec::new();
    for &item in arr {
        result.push(item * 2);
    }
    result
}
}

Time complexity: O(n) - one loop
Space complexity: O(n) - create new vector of size n

Best case vs. worst case vs. average case

Example: Linear search

#![allow(unused)]
fn main() {
fn find_position(arr: &[i32], target: i32) -> Option<usize> {
    for (i, &item) in arr.iter().enumerate() {
        if item == target {
            return Some(i);
        }
    }
    None
}
}

Best case: O(1) - target is first element
Worst case: O(n) - target not in array (must check all)
Average case: O(n) - on average, check half the array

Usually we care most about worst case!

Complexity of Rust Operations

Vec operations: What's the complexity?

Let's think about standard Vec operations:

Operation	Big O	Why
`vec[i]` (indexing)	O(1)	Direct memory access
`vec.push(x)`	O(1)*	Usually just increment (amortized*)
`vec.pop()`	O(1)	Just decrement
`vec.insert(0, x)`	O(n)	Must shift all elements
`vec.remove(i)`	O(n)	Must shift elements after i
`vec.contains(&x)`	O(n)	Must check each element

How Vec.push() is clever

Problem: Vec has fixed capacity. What if it fills up?

Solution: When full, allocate double the space and copy everything over.

Example growth: capacity goes 4 -> 8 -> 16 -> 32 -> 64...

Cost analysis:

Most pushes: O(1) - just add to end
Occasional push: O(n) - must copy everything
Amortized over many operations: O(1)!

Example: Implementing a simple dynamic array

Here's a simplified version showing the core idea:

struct SimpleVec {
    data: Vec<i32>,
    len: usize,
    capacity: usize,
}

impl SimpleVec {
    fn new() -> Self {
        SimpleVec {
            data: Vec::new(),
            len: 0,
            capacity: 0,
        }
    }

    fn push(&mut self, value: i32) {
        // Check if we need to grow
        if self.len == self.capacity {
            // Double capacity (or start with 4)
            let new_capacity = if self.capacity == 0 { 4 } else { self.capacity * 2 };

            // Allocate new space and copy
            let mut new_data = Vec::with_capacity(new_capacity);
            for i in 0..self.len {
                new_data.push(self.data[i]);
            }

            self.data = new_data;
            self.capacity = new_capacity;
            println!("Resized! New capacity: {}", new_capacity);
        }

        // Add the new element
        self.data.push(value);
        self.len += 1;
    }
}

fn main() {
    let mut v = SimpleVec::new();
    for i in 0..10 {
        println!("Pushing {}", i);
        v.push(i);
    }
}

What you'll see:

Pushing 0
Resized! New capacity: 4
Pushing 1
Pushing 2
Pushing 3
Pushing 4
Resized! New capacity: 8
Pushing 5
...

Key insight: Most pushes don't resize. The occasional expensive resize is amortized across many cheap pushes!

Bonus: Why "Big-O"? The notation family

You might wonder: Is there a "little-o"? Why "Big"?

Big-O is part of a family of asymptotic notations:

Big-O (O): Upper bound - "at most this fast"

Both O(n) and O(n^2) algorithms are O(n³)
Most common - used for worst-case analysis

Big-Theta (Θ): Tight bound - "exactly this fast"

More precise than Big-O

Big-Omega (Ω): Lower bound - "at least this fast"

Eg. Any sorting algorithm is Ω(n) because you must look at all elements
Used for best-case or impossibility results

Little-o (o): Strict upper bound - "strictly slower than"

Example: n is o(n^2), but n is not o(n)
Rarely used in practice

When you'll see the others:

Θ: Advanced algorithms courses, research papers
Ω: Proving lower bounds, impossibility results
o: Theoretical CS, mathematical proofs

Bonus - P vs NP and computational complexity

What is P?

P = Problems solvable in Polynomial time

Polynomial time means O(n^k) for some constant k:

O(n), O(n^2), O(n^3), O(n^10) are all polynomial
O(2^n), O(n!), O(n^n) are NOT polynomial

Examples of P problems:

Sorting: O(n log n)
Finding max: O(n)
Matrix multiplication (in the activity!)
Shortest path (Dijkstra): O(E log V)

Key idea: Problems in P are considered "efficiently solvable"

What is NP?

NP = Nondeterministic Polynomial time

Definition: Problems where:

Solutions can be verified in polynomial time
But finding solutions might be harder

Example: Sudoku

Verifying a solution: O(n^2) - just check rows, columns, boxes
Finding a solution: Unknown - might need to try many possibilities

All P problems are in NP:

If you can solve it fast, you can verify it fast too
P is a subset of NP

The million-dollar question: P vs NP

Question: Does P = NP?

In other words: If we can quickly verify a solution, can we quickly find it too?

Most believe: P != NP (there are problems where verifying is easier than solving)

Why it matters:

If P = NP: Many "hard" problems become easy (cryptography breaks!)
If P != NP: Some problems are fundamentally hard

Prize: Solve this and win $1 million (Clay Mathematics Institute)

NP-complete problems

NP-Complete: The "hardest" problems in NP

Examples:

Traveling Salesman Problem (TSP)
Boolean satisfiability (SAT)
Knapsack problem
Graph coloring
Sudoku solving

Special property: If you can solve ANY NP-complete problem in polynomial time, then P = NP!

Why should you care?

In practice:

Recognize when a problem is NP-complete
Don't waste time looking for fast exact solutions
Use approximations or heuristics instead

Example:

Finding THE best route (TSP): NP-complete, use approximations
Finding A good route (Dijkstra): P, can solve exactly

Remember: Not all hard-looking problems are NP-complete!

Some can be solved efficiently with clever algorithms
Learning algorithms helps you recognize which is which

Complexity cheat sheet

Fast to Slow:

O(1) - Instant, no matter the size
O(log n) - Doubles the size, adds one step
O(n) - Proportional to size
O(n log n) - The best we can do for sorting
O(n^2) - Nested loops, gets bad quickly
O(2^n) - Explodes! Avoid if possible

Remember: The difference between O(n) and O(n^2) can be seconds vs. hours!

Lauren's DS210 Materials