Lecture 18 - Strings and Slices

Logistics

  • Same notes as yesterday re HWs, Exams (Exam corrections are due tonight)
  • I'll schedule the oral exams early Thursday, they'll start on Friday
  • Request from Joey - For homeworks, please just ignore the "feedback" branch/PR, don't merge it

Learning Objectives

By the end of today, you should be able to:

  • (Really!) Understand the difference between String and &str types
  • Understand Unicode and UTF-8 encoding basics
  • Apply ownership rules correctly when working with strings and slices

Part 1 - Review and Clarification

Clarifying - modifying through a mutable reference

fn main() {
    let mut vec = vec![1, 2, 3];
    let vec_ref = &mut vec;

    // Method 1: Call methods that modify in-place
    vec_ref.push(4); 
    // Method 2: Use * to dereference and assign
    *vec_ref = vec![5,6,7];

    println!("vec is now: {:?}", vec);  // [5, 6, 7]

    let mut x = 5;
    let y = &mut x;

    // Dereferencing is your only option here
    *y = 10;  
    println!("x is now: {}", x);  // 10
}

Understanding mut in Different Positions

fn main() {
    let mut x = 5;
    let mut y = &mut x;

    // Two different 'mut' keywords here!
    // First mut: y itself can be reassigned to point elsewhere
    // Second mut: y points to mutable data (can modify *y)

    *y = 10;  // OK - modify the value y points to
    println!("y is now: {}", y); 
    // Now that we're done with y we can look at:
    println!("x is now: {}", x); 

    // y = 5;  // ERROR! Can't assign i32 to &mut i32
    // This would try to change y from a reference into a number

    // but we could make it a different &mut i32:
    let mut z = 6;
    y = &mut z;
    println!("y is now: {}", y);

}

Key insight:

  • *y = value changes what y points to
  • y = &mut other changes where y points
  • Methods like .push() automatically dereference, so no * needed

Review - Some common borrowing patterns

Pattern 1: Read-Only Processing

fn find_max(numbers: &Vec<i32>) -> Option<&i32> {
    numbers.iter().max()
}

fn count_even(numbers: &Vec<i32>) -> usize {
    let mut count = 0;
    for &n in numbers.iter() {
        if n % 2 == 0 {
            count += 1;
        }
    }
    count
}

fn main() {
    let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

    // Can call multiple read-only functions:
    let max_val = find_max(&data);
    let even_count = count_even(&data);
    let sum: i32 = data.iter().sum();

    println!("Max: {:?}, Even count: {}, Sum: {}", max_val, even_count, sum);
    println!("Original data still available: {:?}", data);
}

Pattern 2: In-Place Modification

fn double_all(numbers: &mut Vec<i32>) {
    for item in numbers.iter_mut() {
        *item *= 2;
    }
}

fn filter_positive(numbers: &mut Vec<i32>) {
    let mut i = 0;
    while i < numbers.len() {
        if numbers[i] <= 0 {
            numbers.remove(i);
        } else {
            i += 1;
        }
    }
}

fn main() {
    let mut data = vec![-2, 1, -1, 3, 0, 4];
    println!("Original: {:?}", data);

    double_all(&mut data);
    println!("Doubled: {:?}", data);

    filter_positive(&mut data);
    println!("Positive only: {:?}", data);
}

Part 2 - Slices

A slice is a reference to a contiguous portion of data without ownership:

Key points: &[T] type, borrowed references, syntax &collection[start..end]

#![allow(unused)]
fn main() {
let data = [1, 2, 3, 4, 5, 6];
let slice1 = &data[1..4];    // [2, 3, 4]
let slice2 = &data[..3];     // [1, 2, 3] - from start
let slice3 = &data[2..];     // [3, 4, 5, 6] - to end
println!("Slice3: {:?}", slice3);
}

Mutable Slices

Note the index is relative to the slice!

#![allow(unused)]
fn main() {
let mut numbers = [10, 20, 30, 40, 50];
{
    let slice = &mut numbers[1..4];
    slice[0] = 999;  // Modify through slice
}  // slice scope ends
println!("{:?}", numbers);  // [10, 999, 30, 40, 50]
}

Slices of Different Types

Slices work with any contiguous data:

  • &[T] - slice of array/Vec elements
  • &str - slice of string bytes (UTF-8)
fn main() {
    // Slice of an array
    let array = [1, 2, 3, 4, 5];
    let array_slice: &[i32] = &array[1..4];
    println!("Array slice: {:?}", array_slice);

    // Slice of a Vec
    let vec = vec![1.1, 2.2, 3.3, 4.4, 5.5];
    let vec_slice: &[f32] = &vec[2..];
    println!("Vec slice: {:?}", vec_slice);

    // Slice of a String (string slice = &str)
    let string = String::from("Hello World");
    let str_slice: &str = &string[0..5];
    println!("String slice: {}", str_slice); 

    // Slice of a slice
    let vec_slice_slice: &[f32] = &vec_slice[0..2];
    println!("Slice of a slice: {:?}", vec_slice_slice); 
}

Memory Representation of Slices

Slices are "fat pointers" - they contain pointer + length:

#![allow(unused)]
fn main() {
let v = vec![10, 20, 30, 40, 50];
let x = &v[1..4];  // Points to middle 3 elements
}
     STACK                    Heap
┌───────────────┐          ┌────────────────────────┐
│ x: &[i32]     │          │ 10 │ 20 │ 30 │ 40 │ 50 │
│    ptr ───────┼──────────|───►│──────────────┤    │
│    len: 3     │          └────────────────────────┘
├───────────────┤                 ▲
│ v: Vec<i32>   │                 │
│    ptr ───────┼─────────────────┘
│    len: 5     │
│    capacity: 5│
└───────────────┘

Ownership Interlude: Slice Borrowing

What do you think happens in this code?

#![allow(unused)]
fn main() {
let mut data = vec![1, 2, 3, 4];
let slice1 = &data[0..2];     
let slice2 = &mut data[2..4]; 
println!("{:?} {:?}", slice1, slice2);
}

A) Compiles fine - non-overlapping slices (indices 0 and 1 vs 2 and 3)
B) Compiler error - mixing mutable and immutable borrows
C) Runtime panic

Part 3 - Strings Deep-dive

The FOUR (or 3) kinds of Strings

#![allow(unused)]
fn main() {
let s = String::from("Hello DS210");    // Heap allocation - owned String
let s_ref: &String = &s;                // Reference to the String itself
let literal: &str = "literal";          // String slice from program binary
let slice: &str = &s[0..5];             // String slice from heap (borrows from s)
}
       STACK                    HEAP                PROGRAM BINARY
┌──────────────────┐         ┌─────────────┐      ┌─────────────┐
│ s: String        │◄─┐      │"Hello DS210"│      │  "literal"  │
│   ptr ───────────┼──┼─────►│             │      └─────────────┘
│   len: 11        │  │      └─────────────┘             ▲
│   capacity: 20   │  │           ▲                      │
├──────────────────┤  │           │                      │
│ s_ref: &String   │  │           │                      │
│   ptr ───────────┼──┘           │                      │
├──────────────────┤              │                      │
│ literal: &str    │              │                      │
│   ptr ───────────┼──────────────┼──────────────────────┘
│   len: 7         │              │
├──────────────────┤              │
│ slice: &str      │              │
│   ptr ───────────┼──────────────┘ (points to "Hello" in heap)
│   len: 5         │
└──────────────────┘

String encodings: Unicode and UTF-8

Unicode is a standard that assigns a unique number (called a "code point") to every character across all writing systems. For example:

  • 'A' = U+0041
  • 'é' = U+00E9
  • '你' = U+4F60
  • '🦀' = U+1F980

The char type in Rust stores this value directly - so there are always 4 bytes per char

UTF-8 is an encoding (a way to represent those Unicode code points as bytes in memory/files). It's one of several ways to encode Unicode:

  • UTF-8: Variable-length (1-4 bytes per character), backward compatible with ASCII
  • UTF-16: Variable-length (2 or 4 bytes per character)
  • UTF-32: Fixed-length (always 4 bytes per character)

UTF-8 encoding uses variable-length bytes per character:

Character   UTF-8 Bytes    Binary Representation
'A'         1 byte         01000001
'é'         2 bytes        11000011 10101001
'你'        3 bytes        11100100 10111000 10101101
'🦀'        4 bytes        11110000 10011111 10100110 10000000

Strings in Rust use UTF-8 so use 1-4 bytes per character as needed.

Strings Are Collections of Characters

A String or &str is a sequence of Unicode characters encoded in UTF-8:

#![allow(unused)]
fn main() {
let emoji = "🦀🚀";                                
println!("Bytes: {}", emoji.len());                // 8 bytes (4 + 4 in UTF-8)
println!("Characters: {}", emoji.chars().count()); // 2 characters

let accents = "Aé";                                
println!("Bytes: {}", accents.len());                // 3 bytes (1 + 2 in UTF-8)
println!("Characters: {}", accents.chars().count()); // 2 characters
}

The key insight:

  • .len() returns bytes, not character count!
  • Use .chars() to iterate over actual characters

Converting Between char and String

#![allow(unused)]
fn main() {
// char to String
let c: char = '🦀';
let s: String = c.to_string();

// String to chars
let text = "Hello";
for ch in text.chars() {  // ch is type char
    println!("{}", ch);
}

// Collecting chars into a String
let chars: Vec<char> = vec!['H', 'i', '!'];
let s: String = chars.iter().collect(); // we'll see collect more soon
}

So THAT'S why string indexing is forbidden

text[0] would return a byte, potentially splitting a multi-byte character and corrupting Unicode data.

fn main() {
    let text = "Hello, 世界!";
    // let c = text[0];  // ERROR!

    let first = text.chars().next().unwrap();  // Safe

    let first_three: String = text.chars().take(3).collect(); // Also safe
}

Slices won't throw compiler errors but are also potentially dangerous:

fn main() {
    // ASCII - works fine
    let text = "Hello, world!";
    let hello = &text[0..5];    // OK - slices at character boundaries

    // Emoji at the boundary - PANIC!
    // let text = "🦀Hello";
    // let slice = &text[0..2];    // PANIC! - slices through middle of 🦀 (4 bytes)

    // Emoji not at boundary - OK
    let text = "🦀Hello";
    let slice = &text[4..9];    // OK - starts after 🦀, slices "Hello"
}

Ownership Interlude: String Ownership Quiz

Question: What happens here?

#![allow(unused)]
fn main() {
let s1 = String::from("Hello");
let s2 = s1;
let s3 = s2.clone();
println!("{} {}", s1, s2);  // What happens?
}

A) Prints "Hello Hello"
B) Compiler error - s1 cannot be assigned to s2 on line 2
C) Compiler error - s2 cannot be cloned on line 3
D) Compiler error - s1 cannot print on line 4
E) Runtime panic

String Concatenation

#![allow(unused)]
fn main() {
// Method 1: Mutation (keeps ownership)
let mut s = String::from("Hello");
s.push_str(" World");  // Mutates s

// Method 2: + operator (moves first string)
let s1 = String::from("Hello");
let s2 = s1 + " World";  // s1 is moved!

// Method 3: format! (no ownership taken)
let name = "Data";
let num = 210;
let result = format!("{} Science {}", name, num);  // name & num still usable
}

Ownership note: + moves first operand, format! borrows all inputs.

Function Parameters: &str vs &String

#![allow(unused)]
fn main() {
// Good: accepts &String, and &str
fn analyze_text(text: &str) -> usize { ...

// Less flexible: only accepts &String
fn analyze_ref(text: &String) -> usize { ...

// Moves ownership
fn analyze_owned(text: String) -> usize { ...
}

Best practice: Use &str parameters - more flexible, no ownership transfer.

Memory Layout: Passing &String to &str Parameter

When you pass &String to a function expecting &str, Rust converts it for you:

fn analyze_text(text: &str) -> usize {
    text.len()
}

fn main() {
    let s = String::from("Hello DS210");
    let s_ref = &s;
    analyze_text(s_ref);  // &String → &str conversion
}
         STACK                           HEAP

  ┌─ analyze_text ──┐          ┌─────────────────┐
  │ text: &str      │          │  "Hello DS210"  │
  │   ptr ──────────┼──────────┤─►┤───────────│  │
  │   len: 11       │          └─────────────────┘
  └─────────────────┘                     ▲
                                          │
  ┌──── main ───────┐                     │
  │ s_ref: &String  │                     │
  │   ptr ──────────┼──┐                  │
  ├─────────────────┤  │                  │
  │ s: String       │◄─┘                  │
  │   ptr ──────────┼─────────────────────┘
  │   len: 11       │
  │   capacity: 20  │
  └─────────────────┘

What happens:

  1. s owns the heap data
  2. s_ref is a reference to s itself (points to stack)
  3. When passed to analyze_text, Rust converts &String&str
  4. text is a string slice pointing directly to the heap data

Think-Pair-Share: String Slice Safety

Thought Experiment:

Consider this situation:

#![allow(unused)]
fn main() {
let mut s = String::from("Hello");
let slice = &s[0..5];  // Points directly to heap data
}

The string slice slice points directly to the heap, not to s on the stack.

Question: What happens if we modify s after creating the slice?

#![allow(unused)]
fn main() {
let mut s = String::from("Hello");
let slice = &s[0..5];
s.push_str(" World!");  // String grows, might reallocate!
println!("{}", slice);  // Is slice still valid?
}

Since slice points directly to heap memory, and the String might reallocate to a new location when it grows, won't the slice pointer become invalid (dangling)?

Part 4 - Iter and Collect

More on iter() and iter_mut()

We saw .iter() before - now we'll add .iter_mut():

  • .iter(): Gives you immutable references (&T) to each element
  • .iter_mut(): Gives you mutable references (&mut T) to each element
fn main() {
    let mut numbers = vec![1, 2, 3, 4, 5];

    // .iter() - read-only access
    for num in numbers.iter() {
        println!("{}", num);  // num is &i32
        // *num += 1;  // ERROR! Can't modify through immutable reference
    }

    // .iter_mut() - mutable access
    for num in numbers.iter_mut() {
        *num += 10;  // num is &mut i32 - can modify!
    }

    println!("Modified: {:?}", numbers);  // [11, 12, 13, 14, 15]
}

Dereferencing with iter_mut()

With .iter_mut() you always need to dereference with * to modify the value:

Why no pattern matching? With .iter() you can work off a copy because you're just reading. With .iter_mut() you need the mutable reference itself to assign through it, so you must use *.

fn main() {
    let mut numbers = vec![1, 2, 3, 4, 5];

    // Must use * to modify through mutable reference
    for num in numbers.iter_mut() {
        *num *= 2;  // num is &mut i32, *num is i32
    }

    println!("{:?}", numbers);  // [2, 4, 6, 8, 10]
}

Enumerate with iter_mut()

You can combine .iter_mut() with .enumerate() to get both the index and a mutable reference:

fn main() {
    let mut scores = vec![78, 85, 92, 67, 88];

    // enumerate gives (usize, &mut i32)
    for (i, score_ref) in scores.iter_mut().enumerate() {
        println!("Score {}: {}", i, score_ref);

        // Modify based on index
        if i == 0 {
            *score_ref += 10;  // Bonus for first student
        }
    }

    println!("Updated scores: {:?}", scores);  // [88, 85, 92, 67, 88]
}

Intro to functions on iterators: sum() and collect()

.iter() also enables you to use:

  • Math functions like sum() and max()
  • The .collect() method, which can transform an iterator into various types (more on this later)
fn main() {
    let numbers = vec![1, 2, 3, 4, 5];

    // sum() consumes the iterator and returns a single value
    let total: i32 = numbers.iter().sum();
    println!("Total: {}", total);  // 15 (empty iter -> 0)

    // max() returns an Option<&i32>
    let largest = numbers.iter().max();
    println!("Largest: {:?}", largest);  // Some(5) (empty iter -> None)

    // Collect strings into a single String
    let words = vec!["Hello", "world", "!"];
    let sentence: String = words.iter().collect();
    println!("Sentence: {}", sentence);  // "Helloworld!"
    // .collect() can build different types based on the type annotation!
}

More Examples of .collect()

.collect() is very flexible - it can build different collection types based on your type annotation:

fn main() {
    // Collect chars into a String
    let letters = vec!['H', 'e', 'l', 'l', 'o'];
    let word: String = letters.iter().collect();
    println!("{}", word);  // "Hello"

    // Collect into a Vec
    let numbers = [1, 2, 3, 4, 5];
     // this is "closure" notation we'll learn later:
    let doubled: Vec<i32> = numbers.iter().map(|x| x * 2).collect();
    println!("{:?}", doubled);  // [2, 4, 6, 8, 10]

    // Collect string slices into a String
    let parts = vec!["Data", " ", "Science", " ", "210"];
    let course: String = parts.iter().collect();
    println!("{}", course);  // "Data Science 210"

    // Collect range into a Vec
    let range_vec: Vec<i32> = (0..5).collect();
    println!("{:?}", range_vec);  // [0, 1, 2, 3, 4]

    // Collect chars from a string into a Vec
    let text = "Hello";
    let char_vec: Vec<char> = text.chars().collect();
    println!("{:?}", char_vec);  // ['H', 'e', 'l', 'l', 'o']

    // Take first 3 chars and collect back into String
    let first_three: String = "Hello World".chars().take(3).collect();
    println!("{}", first_three);  // "Hel"
}