Lecture 27 - Packages, Crates, and Modules

Logistics - midterm recap

  • Discussing distributions / Q&A
  • Stack/Heap and Hand-coding redo opportunity in class on 11/14
  • Corrections - proctored in discussion on 11/18
  • If you would like to pursue either but have an immovable conflict please let me know ASAP

A note on code commenting

  • Something the TAs have noticed:

Moving forward

  • We're starting the final section of Rust: practical tools for real-world projects
  • Next few lectures are lighter - focused on "how to actually use Rust"
  • Very little new syntax for exams (mostly concepts)
  • Examples can be a reference for future projects

Learning Objectives

By the end of today, you should be able to:

  • Understand what packages, crates, and modules are and how they relate
  • Organize code into modules using mod and control visibility with pub
  • Use external crates by adding them to Cargo.toml
  • Navigate code using paths (crate::, super::, use)
  • Know where to find crates and how to evaluate them for your projects

The big picture: why organization matters

At first, we created Rust programs in just one file (main.rs).

The homeworks added a few more but it's stayed simple and we haven't explained how all the files relate.

Without organization:

  • Name conflicts
  • Hard to find things or see the big picture
  • Impossible to collaborate

Rust's solution: A three-level hierarchy

Rust's organization hierarchy

  • Package (your project, ~ "your repo")
  • Crate (compilation unit - binary or library, ~ "your program")
  • Module (namespace inside a crate, ~ "a file")

Modules: organizing code within a file

Problem: Everything in one namespace gets messy

#![allow(unused)]
fn main() {
fn process_data() { /* ... */ }
fn process_image() { /* ... */ }
fn process_text() { /* ... */ }
// Too many "process" things!
}

Modules as Namespaces

Solution: Use mod to create "namespaces"

mod data {
    pub fn process() {
        println!("Processing data");
    }
}

mod images {
    pub fn process() {
        println!("Processing images");
    }
}

fn main() {
    data::process();     // Clear which one!
    images::process();   // No confusion!
}

Modules are like folders for your code

Importantly, they're not scopes - just a tool for naming and organzing

The pub Keyword: Public vs. Private

By default, everything in a module is private - it can't be used outside the module

mod math {
    fn helper() {  // Private!
        println!("Internal helper");
    }

    pub fn add(a: i32, b: i32) -> i32 {  // Public!
        helper();  // Can use private from inside
        a + b
    }
}

fn main() {
    math::add(2, 3);     // Works!
    // math::helper();   // Error: private!
}

Why is this useful?

  • Hide implementation details for helper functions
  • Change internal code without affecting users

pub works for everything you've seen

You can control visibility for all the types you've learned:

mod data_structures {
    // Public struct with private fields
    pub struct Person {
        pub name: String,
        age: i32,  // Private!
    }

    // Public enum
    pub enum Status {
        Active,
        Inactive,
    }

    // Public function
    pub fn create_person(name: String, age: i32) -> Person {
        Person { name, age }
    }

    // Private helper function
    fn validate_age(age: i32) -> bool {
        age > 0 && age < 150
    }
}

fn main() {
    let p = data_structures::create_person("Alice".to_string(), 25);
    println!("{}", p.name);  // Works - name is public
    // println!("{}", p.age);  // Error - age is private!
}

Key insight: Just like modules, you choose what's part of your public interface!

Nested modules

You can nest modules to create hierarchy:

mod data_processing {
    pub mod cleaning {
        pub fn remove_nulls() {
            println!("Removing nulls");
        }
    }

    pub mod analysis {
        pub fn compute_mean() {
            println!("Computing mean");
        }
    }
}

fn main() {
    data_processing::cleaning::remove_nulls();
    data_processing::analysis::compute_mean();
}

Note: If something is pub all its "parent" layers must also be pub to work (the exception is the outer mod if it's in the same file - it acts like it's pub for anything in the file.)

Paths: Navigating Your Module Tree

Three ways to refer to things:

1. Absolute paths (from crate root)

#![allow(unused)]
fn main() {
crate::data_processing::cleaning::remove_nulls();
}

2. Relative paths (from current location)

#![allow(unused)]
fn main() {
super::other_module::function();  // Go up one level (like cd .. )
}

3. use to bring things into scope

#![allow(unused)]
fn main() {
use data_processing::cleaning;
cleaning::remove_nulls();  // Shorter!

// Or even shorter (but less clear):
use data_processing::cleaning::remove_nulls;
remove_nulls();
}

Convention: Import the module, not the function

  • Makes it clear where things come from
  • HashMap::new() is clearer than just new()

Organizing modules across files

When modules get big, move them to separate files:

File structure:

src/
  main.rs
  data.rs
  analysis.rs

In src/main.rs:

mod data;      // Tells Rust to look for src/data.rs
mod analysis;  // Tells Rust to look for src/analysis.rs

fn main() {
    data::process();
    analysis::compute_stats();
}

In src/data.rs:

#![allow(unused)]
fn main() {
pub fn process() {
    println!("Processing data");
}
}

Packages and crates: the bigger picture

Let's clarify some terms you'll hear:

Package = Your project folder (what cargo new creates)

  • Has a Cargo.toml file
  • Contains one or more crates

Crate = A single program or library that Rust compiles

  • Think: "one thing that gets compiled"

Two types of crates:

Binary crate (a program you run)

my_program/
  Cargo.toml
  src/
    main.rs    <- Has main(), compiles to executable

When you run cargo new my_program, you get a package with one binary crate.

Library crate (code for others to use)

my_library/
  Cargo.toml
  src/
    lib.rs       <- No main(), compiles to library

When you run cargo new --lib my_library, you get a package with one library crate.

Real-world example:

  • rand is a library crate (you add it to your project)
  • Your homework is a binary crate (you run it)

Most of the time: One package = one crate.

Using External Crates

This is where Rust gets powerful: reusing other people's code!

Where to find crates:

  • https://crates.io - Official Rust package registry
  • https://docs.rs - Documentation for all crates

Adding a crate to your project:

Method 1: Edit Cargo.toml

[dependencies]
rand = "0.8"

Method 2: Use cargo command

cargo add rand

Then use it in your code:

use rand::Rng;

fn main() {
    let random_num = rand::thread_rng().gen_range(1..=100);
    println!("Random number: {}", random_num);
}

Cargo automatically downloads and compiles it!

For data science:

  • ndarray - NumPy-like arrays for numerical computing
  • polars - Fast DataFrame library (like pandas but faster)
  • csv - Reading/writing CSV files
  • serde - Serializing/deserializing data (JSON, etc.)
  • plotters - Creating plots and visualizations
  • linfa - Machine learning algorithms
  • statrs - Statistical distributions and functions

General utilities:

  • rand - Random number generation
  • chrono - Date/time handling
  • regex - Regular expressions
  • rayon - Easy data parallelism
  • clap - Command-line argument parsing

You don't have to reinvent the wheel!

Semantic Versioning: Understanding Version Numbers

When you see rand = "0.8", what does it mean?

Version format: MAJOR.MINOR.PATCH

  • 0.8 means "0.8.anything" - compatible updates only
  • =0.8.5 means exactly version 0.8.5
  • ^0.8 is same as 0.8 (default)

Why this matters:

  • Your code won't randomly break when crates update
  • Cargo lock file (Cargo.lock) records exact versions
  • Teammates get same dependencies

Example: Building a small project with modules

Let's build a simple data analysis tool:

Project structure:

my_analyzer/
  Cargo.toml
  src/
    main.rs
    loading.rs
    stats.rs

Cargo.toml:

[package]
name = "my_analyzer"
version = "0.1.0"
edition = "2021"

[dependencies]

src/loading.rs:

#![allow(unused)]
fn main() {
pub fn load_numbers(data: &str) -> Vec<i32> {
    data.split_whitespace()
        .filter_map(|s| s.parse().ok())
        .collect()
}
}

src/stats.rs:

#![allow(unused)]
fn main() {
pub fn mean(numbers: &[i32]) -> f64 {
    let sum: i32 = numbers.iter().sum();
    sum as f64 / numbers.len() as f64
}
}

src/main.rs:

mod loading;
mod stats;

fn main() {
    let data = "10 20 30 40 50";
    let numbers = loading::load_numbers(data);
    let average = stats::mean(&numbers);
    println!("Average: {}", average);
}

Choosing External Crates: What to Look For

Not all crates are equal! Here's how to evaluate:

Green flags:

  • High download count (millions)
  • Recent updates (within last year)
  • Good documentation
  • Used by well-known projects

Red flags:

  • Last updated 5 years ago
  • No documentation
  • Lots of open issues, no responses
  • Only 100 downloads total

Example: rand has 200+ million downloads, maintained by Rust team -> safe choice

Remember: Every dependency is code you're trusting!

Summary: The Module System

LevelWhat It IsHow You Use It
ModuleNamespace within a filemod name { }
CrateCompilation unit (binary/library)main.rs or lib.rs
PackageProject with Cargo.tomlcargo new
External CrateSomeone else's packageAdd to Cargo.toml

Navigation:

  • crate::path - absolute from root
  • super::path - relative (go up)
  • use - bring into scope
  • pub - make it public

Activity: Organize Modules (on paper)