Lecture 18 - Strings and Slices
Logistics
- Same notes as yesterday re HWs, Exams (Exam corrections are due tonight)
- I'll schedule the oral exams early Thursday, they'll start on Friday
- Request from Joey - For homeworks, please just ignore the "feedback" branch/PR, don't merge it
Learning Objectives
By the end of today, you should be able to:
- (Really!) Understand the difference between
Stringand&strtypes - Understand Unicode and UTF-8 encoding basics
- Apply ownership rules correctly when working with strings and slices
Part 1 - Review and Clarification
Clarifying - modifying through a mutable reference
fn main() { let mut vec = vec![1, 2, 3]; let vec_ref = &mut vec; // Method 1: Call methods that modify in-place vec_ref.push(4); // Method 2: Use * to dereference and assign *vec_ref = vec![5,6,7]; println!("vec is now: {:?}", vec); // [5, 6, 7] let mut x = 5; let y = &mut x; // Dereferencing is your only option here *y = 10; println!("x is now: {}", x); // 10 }
Understanding mut in Different Positions
fn main() { let mut x = 5; let mut y = &mut x; // Two different 'mut' keywords here! // First mut: y itself can be reassigned to point elsewhere // Second mut: y points to mutable data (can modify *y) *y = 10; // OK - modify the value y points to println!("y is now: {}", y); // Now that we're done with y we can look at: println!("x is now: {}", x); // y = 5; // ERROR! Can't assign i32 to &mut i32 // This would try to change y from a reference into a number // but we could make it a different &mut i32: let mut z = 6; y = &mut z; println!("y is now: {}", y); }
Key insight:
*y = valuechanges whatypoints toy = &mut otherchanges whereypoints- Methods like
.push()automatically dereference, so no*needed
Review - Some common borrowing patterns
Pattern 1: Read-Only Processing
fn find_max(numbers: &Vec<i32>) -> Option<&i32> { numbers.iter().max() } fn count_even(numbers: &Vec<i32>) -> usize { let mut count = 0; for &n in numbers.iter() { if n % 2 == 0 { count += 1; } } count } fn main() { let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; // Can call multiple read-only functions: let max_val = find_max(&data); let even_count = count_even(&data); let sum: i32 = data.iter().sum(); println!("Max: {:?}, Even count: {}, Sum: {}", max_val, even_count, sum); println!("Original data still available: {:?}", data); }
Pattern 2: In-Place Modification
fn double_all(numbers: &mut Vec<i32>) { for item in numbers.iter_mut() { *item *= 2; } } fn filter_positive(numbers: &mut Vec<i32>) { let mut i = 0; while i < numbers.len() { if numbers[i] <= 0 { numbers.remove(i); } else { i += 1; } } } fn main() { let mut data = vec![-2, 1, -1, 3, 0, 4]; println!("Original: {:?}", data); double_all(&mut data); println!("Doubled: {:?}", data); filter_positive(&mut data); println!("Positive only: {:?}", data); }
Part 2 - Slices
A slice is a reference to a contiguous portion of data without ownership:
Key points: &[T] type, borrowed references, syntax &collection[start..end]
#![allow(unused)] fn main() { let data = [1, 2, 3, 4, 5, 6]; let slice1 = &data[1..4]; // [2, 3, 4] let slice2 = &data[..3]; // [1, 2, 3] - from start let slice3 = &data[2..]; // [3, 4, 5, 6] - to end println!("Slice3: {:?}", slice3); }
Mutable Slices
Note the index is relative to the slice!
#![allow(unused)] fn main() { let mut numbers = [10, 20, 30, 40, 50]; { let slice = &mut numbers[1..4]; slice[0] = 999; // Modify through slice } // slice scope ends println!("{:?}", numbers); // [10, 999, 30, 40, 50] }
Slices of Different Types
Slices work with any contiguous data:
&[T]- slice of array/Vec elements&str- slice of string bytes (UTF-8)
fn main() { // Slice of an array let array = [1, 2, 3, 4, 5]; let array_slice: &[i32] = &array[1..4]; println!("Array slice: {:?}", array_slice); // Slice of a Vec let vec = vec![1.1, 2.2, 3.3, 4.4, 5.5]; let vec_slice: &[f32] = &vec[2..]; println!("Vec slice: {:?}", vec_slice); // Slice of a String (string slice = &str) let string = String::from("Hello World"); let str_slice: &str = &string[0..5]; println!("String slice: {}", str_slice); // Slice of a slice let vec_slice_slice: &[f32] = &vec_slice[0..2]; println!("Slice of a slice: {:?}", vec_slice_slice); }
Memory Representation of Slices
Slices are "fat pointers" - they contain pointer + length:
#![allow(unused)] fn main() { let v = vec![10, 20, 30, 40, 50]; let x = &v[1..4]; // Points to middle 3 elements }
STACK Heap
┌───────────────┐ ┌────────────────────────┐
│ x: &[i32] │ │ 10 │ 20 │ 30 │ 40 │ 50 │
│ ptr ───────┼──────────|───►│──────────────┤ │
│ len: 3 │ └────────────────────────┘
├───────────────┤ ▲
│ v: Vec<i32> │ │
│ ptr ───────┼─────────────────┘
│ len: 5 │
│ capacity: 5│
└───────────────┘
Ownership Interlude: Slice Borrowing
What do you think happens in this code?
#![allow(unused)] fn main() { let mut data = vec![1, 2, 3, 4]; let slice1 = &data[0..2]; let slice2 = &mut data[2..4]; println!("{:?} {:?}", slice1, slice2); }
A) Compiles fine - non-overlapping slices (indices 0 and 1 vs 2 and 3)
B) Compiler error - mixing mutable and immutable borrows
C) Runtime panic
Part 3 - Strings Deep-dive
The FOUR (or 3) kinds of Strings
#![allow(unused)] fn main() { let s = String::from("Hello DS210"); // Heap allocation - owned String let s_ref: &String = &s; // Reference to the String itself let literal: &str = "literal"; // String slice from program binary let slice: &str = &s[0..5]; // String slice from heap (borrows from s) }
STACK HEAP PROGRAM BINARY
┌──────────────────┐ ┌─────────────┐ ┌─────────────┐
│ s: String │◄─┐ │"Hello DS210"│ │ "literal" │
│ ptr ───────────┼──┼─────►│ │ └─────────────┘
│ len: 11 │ │ └─────────────┘ ▲
│ capacity: 20 │ │ ▲ │
├──────────────────┤ │ │ │
│ s_ref: &String │ │ │ │
│ ptr ───────────┼──┘ │ │
├──────────────────┤ │ │
│ literal: &str │ │ │
│ ptr ───────────┼──────────────┼──────────────────────┘
│ len: 7 │ │
├──────────────────┤ │
│ slice: &str │ │
│ ptr ───────────┼──────────────┘ (points to "Hello" in heap)
│ len: 5 │
└──────────────────┘
String encodings: Unicode and UTF-8
Unicode is a standard that assigns a unique number (called a "code point") to every character across all writing systems. For example:
- 'A' = U+0041
- 'é' = U+00E9
- '你' = U+4F60
- '🦀' = U+1F980
The char type in Rust stores this value directly - so there are always 4 bytes per char
UTF-8 is an encoding (a way to represent those Unicode code points as bytes in memory/files). It's one of several ways to encode Unicode:
- UTF-8: Variable-length (1-4 bytes per character), backward compatible with ASCII
- UTF-16: Variable-length (2 or 4 bytes per character)
- UTF-32: Fixed-length (always 4 bytes per character)
UTF-8 encoding uses variable-length bytes per character:
Character UTF-8 Bytes Binary Representation
'A' 1 byte 01000001
'é' 2 bytes 11000011 10101001
'你' 3 bytes 11100100 10111000 10101101
'🦀' 4 bytes 11110000 10011111 10100110 10000000
Strings in Rust use UTF-8 so use 1-4 bytes per character as needed.
Strings Are Collections of Characters
A String or &str is a sequence of Unicode characters encoded in UTF-8:
#![allow(unused)] fn main() { let emoji = "🦀🚀"; println!("Bytes: {}", emoji.len()); // 8 bytes (4 + 4 in UTF-8) println!("Characters: {}", emoji.chars().count()); // 2 characters let accents = "Aé"; println!("Bytes: {}", accents.len()); // 3 bytes (1 + 2 in UTF-8) println!("Characters: {}", accents.chars().count()); // 2 characters }
The key insight:
.len()returns bytes, not character count!- Use
.chars()to iterate over actual characters
Converting Between char and String
#![allow(unused)] fn main() { // char to String let c: char = '🦀'; let s: String = c.to_string(); // String to chars let text = "Hello"; for ch in text.chars() { // ch is type char println!("{}", ch); } // Collecting chars into a String let chars: Vec<char> = vec!['H', 'i', '!']; let s: String = chars.iter().collect(); // we'll see collect more soon }
So THAT'S why string indexing is forbidden
text[0] would return a byte, potentially splitting a multi-byte character and corrupting Unicode data.
fn main() { let text = "Hello, 世界!"; // let c = text[0]; // ERROR! let first = text.chars().next().unwrap(); // Safe let first_three: String = text.chars().take(3).collect(); // Also safe }
Slices won't throw compiler errors but are also potentially dangerous:
fn main() { // ASCII - works fine let text = "Hello, world!"; let hello = &text[0..5]; // OK - slices at character boundaries // Emoji at the boundary - PANIC! // let text = "🦀Hello"; // let slice = &text[0..2]; // PANIC! - slices through middle of 🦀 (4 bytes) // Emoji not at boundary - OK let text = "🦀Hello"; let slice = &text[4..9]; // OK - starts after 🦀, slices "Hello" }
Ownership Interlude: String Ownership Quiz
Question: What happens here?
#![allow(unused)] fn main() { let s1 = String::from("Hello"); let s2 = s1; let s3 = s2.clone(); println!("{} {}", s1, s2); // What happens? }
A) Prints "Hello Hello"
B) Compiler error - s1 cannot be assigned to s2 on line 2
C) Compiler error - s2 cannot be cloned on line 3
D) Compiler error - s1 cannot print on line 4
E) Runtime panic
String Concatenation
#![allow(unused)] fn main() { // Method 1: Mutation (keeps ownership) let mut s = String::from("Hello"); s.push_str(" World"); // Mutates s // Method 2: + operator (moves first string) let s1 = String::from("Hello"); let s2 = s1 + " World"; // s1 is moved! // Method 3: format! (no ownership taken) let name = "Data"; let num = 210; let result = format!("{} Science {}", name, num); // name & num still usable }
Ownership note: + moves first operand, format! borrows all inputs.
Function Parameters: &str vs &String
#![allow(unused)] fn main() { // Good: accepts &String, and &str fn analyze_text(text: &str) -> usize { ... // Less flexible: only accepts &String fn analyze_ref(text: &String) -> usize { ... // Moves ownership fn analyze_owned(text: String) -> usize { ... }
Best practice: Use &str parameters - more flexible, no ownership transfer.
Memory Layout: Passing &String to &str Parameter
When you pass &String to a function expecting &str, Rust converts it for you:
fn analyze_text(text: &str) -> usize { text.len() } fn main() { let s = String::from("Hello DS210"); let s_ref = &s; analyze_text(s_ref); // &String → &str conversion }
STACK HEAP
┌─ analyze_text ──┐ ┌─────────────────┐
│ text: &str │ │ "Hello DS210" │
│ ptr ──────────┼──────────┤─►┤───────────│ │
│ len: 11 │ └─────────────────┘
└─────────────────┘ ▲
│
┌──── main ───────┐ │
│ s_ref: &String │ │
│ ptr ──────────┼──┐ │
├─────────────────┤ │ │
│ s: String │◄─┘ │
│ ptr ──────────┼─────────────────────┘
│ len: 11 │
│ capacity: 20 │
└─────────────────┘
What happens:
sowns the heap datas_refis a reference tositself (points to stack)- When passed to
analyze_text, Rust converts&String→&str textis a string slice pointing directly to the heap data
Think-Pair-Share: String Slice Safety
Thought Experiment:
Consider this situation:
#![allow(unused)] fn main() { let mut s = String::from("Hello"); let slice = &s[0..5]; // Points directly to heap data }
The string slice slice points directly to the heap, not to s on the stack.
Question: What happens if we modify s after creating the slice?
#![allow(unused)] fn main() { let mut s = String::from("Hello"); let slice = &s[0..5]; s.push_str(" World!"); // String grows, might reallocate! println!("{}", slice); // Is slice still valid? }
Since slice points directly to heap memory, and the String might reallocate to a new location when it grows, won't the slice pointer become invalid (dangling)?
Part 4 - Iter and Collect
More on iter() and iter_mut()
We saw .iter() before - now we'll add .iter_mut():
.iter(): Gives you immutable references (&T) to each element.iter_mut(): Gives you mutable references (&mut T) to each element
fn main() { let mut numbers = vec![1, 2, 3, 4, 5]; // .iter() - read-only access for num in numbers.iter() { println!("{}", num); // num is &i32 // *num += 1; // ERROR! Can't modify through immutable reference } // .iter_mut() - mutable access for num in numbers.iter_mut() { *num += 10; // num is &mut i32 - can modify! } println!("Modified: {:?}", numbers); // [11, 12, 13, 14, 15] }
Dereferencing with iter_mut()
With .iter_mut() you always need to dereference with * to modify the value:
Why no pattern matching? With .iter() you can work off a copy because you're just reading. With .iter_mut() you need the mutable reference itself to assign through it, so you must use *.
fn main() { let mut numbers = vec![1, 2, 3, 4, 5]; // Must use * to modify through mutable reference for num in numbers.iter_mut() { *num *= 2; // num is &mut i32, *num is i32 } println!("{:?}", numbers); // [2, 4, 6, 8, 10] }
Enumerate with iter_mut()
You can combine .iter_mut() with .enumerate() to get both the index and a mutable reference:
fn main() { let mut scores = vec![78, 85, 92, 67, 88]; // enumerate gives (usize, &mut i32) for (i, score_ref) in scores.iter_mut().enumerate() { println!("Score {}: {}", i, score_ref); // Modify based on index if i == 0 { *score_ref += 10; // Bonus for first student } } println!("Updated scores: {:?}", scores); // [88, 85, 92, 67, 88] }
Intro to functions on iterators: sum() and collect()
.iter() also enables you to use:
- Math functions like
sum()andmax() - The
.collect()method, which can transform an iterator into various types (more on this later)
fn main() { let numbers = vec![1, 2, 3, 4, 5]; // sum() consumes the iterator and returns a single value let total: i32 = numbers.iter().sum(); println!("Total: {}", total); // 15 (empty iter -> 0) // max() returns an Option<&i32> let largest = numbers.iter().max(); println!("Largest: {:?}", largest); // Some(5) (empty iter -> None) // Collect strings into a single String let words = vec!["Hello", "world", "!"]; let sentence: String = words.iter().collect(); println!("Sentence: {}", sentence); // "Helloworld!" // .collect() can build different types based on the type annotation! }
More Examples of .collect()
.collect() is very flexible - it can build different collection types based on your type annotation:
fn main() { // Collect chars into a String let letters = vec!['H', 'e', 'l', 'l', 'o']; let word: String = letters.iter().collect(); println!("{}", word); // "Hello" // Collect into a Vec let numbers = [1, 2, 3, 4, 5]; // this is "closure" notation we'll learn later: let doubled: Vec<i32> = numbers.iter().map(|x| x * 2).collect(); println!("{:?}", doubled); // [2, 4, 6, 8, 10] // Collect string slices into a String let parts = vec!["Data", " ", "Science", " ", "210"]; let course: String = parts.iter().collect(); println!("{}", course); // "Data Science 210" // Collect range into a Vec let range_vec: Vec<i32> = (0..5).collect(); println!("{:?}", range_vec); // [0, 1, 2, 3, 4] // Collect chars from a string into a Vec let text = "Hello"; let char_vec: Vec<char> = text.chars().collect(); println!("{:?}", char_vec); // ['H', 'e', 'l', 'l', 'o'] // Take first 3 chars and collect back into String let first_three: String = "Hello World".chars().take(3).collect(); println!("{}", first_three); // "Hel" }