Lecture 22 - Generics and Type Systems

Logistics

HW4 due tonight (new policy doesn't apply)
New HW policy for HW5-7 (see Piazza after class)

Take-aways from "comfort-check" quiz

Most comfortable:

Stack and heap memory
What .clone() does
.iter() vs .iter_mut()
&self, &mut self, self

High variance:

Ownership and borrow-checker rules
Why you can't do text[0] on a String
Modifying a Vec when using .iter()

Least comfortable:

What .collect() does
.entry().or_insert() for hashmaps
.get() on hashmaps returning an Option
Tuple structs

Learning Objectives (TC 12:25)

By the end of today, you should be able to:

Write generic functions and structs using type parameters
Use trait bounds to constrain generic behavior
Recognize when you've been using generics all along

The Problem with Type-Specific Functions

Python is dynamically typed and quite flexible. We can pass many different types to a function:

def max(x, y):
    return x if x > y else y

>>> max(3, 2)
3
>>> max(3.1, 2.2)
3.1
>>> max('s', 't')
't'

Very flexible! Any downsides?

Requires inferring types each time function is called
Incurs runtime penalty
No compile-time guarantees about type safety

Type system approaches (review)

Dynamic Typing (Python, JavaScript, Ruby, R):

Types checked at runtime
Flexible coding
Prone to runtime errors

Static Typing (C/C++, Java, Rust, Go):

Types checked at compile-time
Fast execution
Early error detection

Rust without generics

Rust is strongly typed, so we would have to create a version of the function for each type:

#![allow(unused)]
fn main() {
fn max_i32(x: i32, y: i32) -> i32 {
    if x > y { x } else { y }
}

fn max_f64(x: f64, y: f64) -> f64 {
    if x > y { x } else { y }
}

fn max_char(x: char, y: char) -> char {
    if x > y { x } else { y }
}
// ... etc., etc.
}

Problem: Supporting N types = writing N functions!

fn main() {
    println!("{}", max_i32(3, 8));      // 8
    println!("{}", max_f64(3.3, 8.1));  // 8.1
    println!("{}", max_char('a', 'b')); // b
}

The dilemma: Python's flexibility with runtime costs vs. Rust's safety with code duplication?

Solution: Generics give us both flexibility AND compile-time guarantees!

Compiling generic functions (Monomorphization)

Insight: Generics = compile-time code generation for zero runtime cost!

     SOURCE CODE                    COMPILED OUTPUT
┌─ fn max<T>(x: T, y: T) ─┐       ┌ Specialized Functions ─┐
│   where T: PartialOrd   │  ───► │ fn max_i32(x: i32, ...)│
│ {                       │       │ fn max_f64(x: f64, ...)│
│   if x > y { x } else{y}│       │ fn max_char(x: char,..)│
│ }                       │       └────────────────────────┘
└─────────────────────────┘            Monomorphization

A Simple Generic Example (TC 12:30)

Let's try writing a super simple generic function. Use the <T> syntax to indicate that the function is generic:

#![allow(unused)]
fn main() {
fn passit<T>(x: T) -> T {
    x
}
}

The T is a placeholder for the type (could be any letter, but T for "Type" is conventional).

#![allow(unused)]
fn main() {
fn passit<T>(x: T) -> T {
    x
}

let x = passit(5);
println!("x is {}", x);      // x is 5

let x = passit(1.1);
println!("x is {}", x);      // x is 1.1

let x = passit('s');
println!("x is {}", x);      // x is s
}

This works! The function just passes through whatever type it receives.

Okay but that was pretty boring...

Let's try writing a generic max function.

#![allow(unused)]
fn main() {
fn max<T>(x: T, y: T) -> T {
    if x > y { x } else { y }  
}
}

... but wait, there's a compiler error!

Problem: Not all types support > comparison!

The Rust compiler is thorough enough to recognize that not all generic types may have the behavior we want.

Solution: Trait bounds specify required behavior.

Trait Bounds: Constraining Generic Types

So how can we make our max function? We need to add a trait bound to specify that T must support comparison:

use std::cmp::PartialOrd;

fn max<T: PartialOrd>(x: T, y: T) -> T {
    if x > y { x } else { y }  // Now it works!
}

fn main() {
    // Type inference determines T:
    println!("{}", max(5, 10));     // T = i32
    println!("{}", max(3.14, 2.7)); // T = f64
    println!("{}", max('a', 'b'));  // T = char
    let i = num::complex::Complex::new(10, 20);
    let j = num::complex::Complex::new(20, 5);
    // println!("{:?}", max(i, j));  // Won't compile if T doesn't implement PartialOrd
}

Key insight: T: PartialOrd = "T must support comparison operations"

We can place restrictions on the generic types we would support.

Quick note on `use std::cmp::PartialOrd;`

PartialOrd needed to be imported from std::cmp::PartialOrd

(We didn't have to do this for things like #[derive(PartialOrd)] because those were macros!)

Other imports we might need:

#![allow(unused)]
fn main() {
use std::fmt::{Debug, Display};
use std::cmp::{PartialOrd, Eq, Ord};
use std::ops::{Add, Sub, Mul, Div};
}

Some traits like Copy, Clone, PartialEq are in the prelude (automatically imported), but others need explicit imports.

Monomorphization in Action (TC 12:35)

// What you write:
fn max<T: PartialOrd>(x: T, y: T) -> T {
    if x > y { x } else { y }
}

fn main() {
    println!("{}", max(5, 10));    
    println!("{}", max(3.14, 2.7));
}

// What the compiler generates (conceptually):
fn max_i32(x: i32, y: i32) -> i32 {
    if x > y { x } else { y }
}

fn max_f64(x: f64, y: f64) -> f64 {
    if x > y { x } else { y }
}

fn main() {
    println!("{}", max_i32(5, 10));    
    println!("{}", max_f64(3.14, 2.7));
}

Generic Structs

#![allow(unused)]
fn main() {
#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}

// Type inference at work:
let int_point = Point { x: 5, y: 10 };     // Point<i32>
let float_point = Point { x: 3.14, y: 2.7 }; // Point<f64>
}

Generic Struct Memory Layout

     STACK
┌─ Point<i32> ───┐
│ x: 5  [4 bytes]│
│ y: 10 [4 bytes]│
└────────────────┘
   8 bytes total

┌─ Point<f64> ─────┐
│ x: 3.14 [8 bytes]│
│ y: 2.7  [8 bytes]│
└──────────────────┘
   16 bytes total

Memory insight: Generic structs adapt their size to the contained types!

Methods on Generic Structs

#![allow(unused)]
fn main() {
#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}

impl<T> Point<T> {
    fn new(x: T, y: T) -> Point<T> {
        Point { x, y }
    }
    fn get_x(&self) -> &T {
        &self.x
    }
}

// Works for any type:
let point1 = Point::new(1, 2);      // Point<i32>
let point2 = Point::new(1.5, 2.5);  // Point<f64>
println!("{}", point1.get_x());
println!("{}", point2.get_x());
}

Trait Bounds on Methods (TC 12:40)

Sometimes a method only works for certain types. Let's implement a swap method:

#![allow(unused)]
fn main() {
#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}
// This won't compile!
impl<T> Point<T> {
    fn new(x: T, y: T) -> Point<T> {
        Point { x, y }
    }
    fn swap(&mut self) {
        let temp = self.x;   // Might not be Copy!
        self.x = self.y;
        self.y = temp;
    }
}
}

Problem: We're trying to move self.x out, but T might not implement Copy! (Compiler error gives a different helpful suggestion to add Clone)

Solution: Add a trait bound to the impl block:

#![allow(unused)]
fn main() {
#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}

// Only implement swap for types that are Copy
impl<T> Point<T> {
    fn new(x: T, y: T) -> Point<T> {
        Point { x, y }
    }
}

impl<T: Copy> Point<T> {
    fn swap(&mut self) {
        let temp = self.x;  // OK - T implements Copy
        self.x = self.y;
        self.y = temp;
    }
}

let mut point = Point::new(2, 3);
println!("{:?}", point);  // Point { x: 2, y: 3 }
point.swap();
println!("{:?}", point);  // Point { x: 3, y: 2 }
}

Key insight: impl<T: Copy> means "this implementation only exists for types that implement Copy"

Common Traits and Bounds (TC 12:45)

#![allow(unused)]
fn main() {
use std::fmt::Debug;  // Need to import Debug!

// Debug: Check if it can be printed with {:?}
fn debug_value<T: Debug>(val: T) {
    println!("Value: {:?}", val);
}

// Clone: Check if it can be duplicated with .clone()
fn duplicate<T: Clone>(val: &T) -> T {
    val.clone()
}

// Copy: Check if it is automatically copied (no moves)
fn safe_copy<T: Copy>(val: T) -> (T, T) {
    (val, val)  // val still usable!
}
}

Built-in Generic Types (You Know These!)

Remember these from earlier in the semester?

#![allow(unused)]
fn main() {
// Option<T> - maybe has a value
let maybe_number: Option<i32> = Some(42);

// Result<T, E> - success or error
let outcome: Result<i32, String> = Ok(42);

// Vec<T> - growable array
let numbers: Vec<i32> = vec![1, 2, 3];

// Box<T> - heap-allocated value
let boxed_data: Box<i32> = Box::new(5);
}

Now you understand what the <T> means!

These are all generic types that work with any type T.

When you wrote Option<i32>, you were using a generic enum specialized for i32.

When you wrote Result<f64, String>, you were using a generic enum specialized for returning f64 on success and String on error

Ownership Interlude: Trait Bounds Quiz

Question: Explain this function signature / why it's a "safe" max

#![allow(unused)]
fn main() {
use std::cmp::PartialOrd;

fn safe_max<T: PartialOrd + Clone>(x: &T, y: &T) -> T {
    if x > y { x.clone() } else { y.clone() }
}
}

Answer: We take &T parameters to avoid moving the arguments, but need Clone to return an owned T. PartialOrd enables the comparison operation!

Generic vs. Type-Specific Implementations (TC 12:50)

Even though we have generic methods defined, we can still specify methods for specific types!

#[derive(Debug)]
struct Point<T> {
    x: T,
    y: T,
}

// Generic implementation - works for any type T
impl<T> Point<T> {
    fn new(x: T, y: T) -> Point<T> {
        Point { x, y }
    }
}

// Specialized implementation - ONLY for Point<i32>
impl Point<i32> {
    fn distance_from_origin(&self) -> f64 {
        ((self.x.pow(2) + self.y.pow(2)) as f64).sqrt()
    }
}

// Specialized implementation - ONLY for Point<f64>
impl Point<f64> {
    fn distance_from_origin(&self) -> f64 {
        (self.x.powi(2) + self.y.powi(2)).sqrt()
    }
}

fn main(){
    let int_point = Point::new(3, 4);
    println!("Distance: {}", int_point.distance_from_origin()); // 5.0

    let float_point = Point::new(3.0, 4.0);
    println!("Distance: {}", float_point.distance_from_origin()); // 5.0

    // let char_point = Point::new('a', 'b');
    // char_point.distance_from_origin(); // Error! No such method for Point<char>
}

Why Use Specialized Implementations?

Different algorithms work better for different types (ints, floats)
Some methods only make sense for certain types
Sometimes you want drastically different behavior (eg are_you_a_float)

Readable bounds using `where`

#![allow(unused)]
fn main() {
use std::cmp::PartialOrd;
use std::fmt::Debug;

fn analyze_data<T>(values: &[T]) -> Option<T>
where
    T: PartialOrd + Clone + Debug
{
    values.iter().max().cloned()
}
}

This is the same as:

#![allow(unused)]
fn main() {
fn analyze_data<T: PartialOrd + Clone + Debug>(values: &[T]) -> Option<T> {
    values.iter().max().cloned()
}
}

Use where when you have multiple bounds - it's more readable!

"Polymorphism" and "Monomorphization"

We say max is polymorphic and the compiled functions are monomorphic. The process of going from one to the other is monomorphization.

  GENERIC SOURCE                 COMPILER OUTPUT (roughly)
┌─────────────────┐            ┌─────────────────┐
│ fn max<T>(x, y) │  ────────► │ fn max_i32(...) │
│ where T: Ord    │            │ fn max_f64(...) │
│ { ... }         │            │ fn max_char(...)│
└─────────────────┘            └─────────────────┘
     One source                 Multiple functions

What we mean by "zero cost polymorphism"

The compiler generates specialized functions for each type you use.

#![allow(unused)]
fn main() {
max(5, 10);     // Compiles to direct i32 comparison (as fast as hand-written max_i32)
max(3.14, 2.7); // Compiles to direct f64 comparison (as fast as hand-written max_f64)
}

"Zero cost" means:

No runtime type checking ("is this an i32 or f64?")
No performance penalty compared to writing separate functions by hand

This is different from languages like Java (type erasure adds overhead) or Python (dynamic dispatch at runtime).

Activity time

See Gradescope and our B1 website (linked on Piazza) for Activity 22 instructions

Lauren's DS210 Materials