logo

Rust for Rustaceans - Part 1: Foundations

Published on

Foundations

1. Memory

1.1 Value, Variable and Pointer

  • A value is the combination of a type and an element of that type's domain of values.
  • A value is stored in a place, which is the Rust terminology for "a location that can hold a value". The most common place to store a value is a variable, which is a named value slot on the stack.
  • A ponter is a value that holds the address of a region of memory, so the pointer points to a place.
let x = 42;
let y = 43;
let a = &x;
let mut b = &x;
b = &y;
  • The variable of string actually is a pointer to the first character in the string value.

1.2 Variables in Depth

High-Level Model:11

In this model, we don't think of variables as places that hold bytes. Instead, we think of them just as names given to values as they are instantiated, moved and used throughout a program.

Low-Level Model:

This model matches the memory model used by C and C++, and many other low-level languages, and is useful for when you need to reason explicitly about memory.

1.3 Memory Regions

There are many different regions of memory, and perhaps surprisingly, not all of them are stored in the DRAM of the computer.

Stack

The stack is a segment of memory that your program uses as scratch space for function calls. Each time a function is called, a contiguous chunk of memory called a frame is allocated at the top of the stack. Near the bottom of the stack is the frame for the main function, and as functions call other functions, additional frames are pushed onto the stack. A function’s frame contains all the variables within that function, along with any arguments the function takes. When the function returns, its stack frame is reclaimed.

Any variable stored in a frame on the stack cannot be accessed after that frame goes away, so any reference to it must have a lifetime that is at most as long as the lifetime of the frame.

Heap

The heap is a pool of memory that isn’t tied to the current call stack of the program. Values in heap memory live until they are explicitly deallocated.

The primary mechanism for interacting with the heap in Rust is the Box type. When you write Box::new(value), the value is placed on the heap, and what you are given back (the Box<T>) is a pointer to that value on the heap. When the Box is eventually dropped, that memory is freed.

If you forget to deallocate heap memory, it will stick around forever, and your application will eventually eat up all the memory on your machine. This is called leaking memory.

However, there are some cases where you explicitly want to leak memory. For example, say you have a read-only configuration that the entire program should be able to access. You can allocate that on the heap and explicitly leak it with Box::leak to get a 'static reference to it.

Static Memory

Static memory generally refers to memory that is allocated at compile time and stays valid for the entire duration of the program. Static memory is often used for global variables, constants, and string literals.

The static keyword allows you to define global variables. These variables exist for the entire duration of the program and are stored in a fixed memory location.

The 'static lifetime is the longest possible lifetime, and it signifies that a reference can be valid for the entire duration of the program. This is common for string literals and other static data.

fn main() {  
  let s: &'static str = "Hello, world!";  
  println!("{}", s);  
}  

2. Ownership

Ownership refers to the responsibility for managing the lifecycle of data.

Key Rules and Principles of Ownership:

  • Each value has a single owner. (The ownership can be transferred but cannot be shared directly)
  • There can only be one owner at a time. (The moved ownership can not be accssed by the previous owner)
  • When the owner goes out of scope, the value is dropped. (Rust automatically deallocates the memory when the owner goes out of scope)

If a value's type implements the special Copy trait, the value is not considered to have moved even if it is reassigned to a new memory location. Instead, the value is copied, and both the old and new locations remain accessible.

let x1 = 42;
let y1 = Box::new(84);

{
  // starts a new scope
  let z = (x1, y1);
  // z goes out of scope, and is dropped;
  // it in turn drops the values from x1 and y1
}

// x1's value is Copy, so it was not moved into z
let x2 = x1;

// y1's value is not Copy, so it was moved into z
// let y2 = y1;

3. Borrowing and Lifetimes

Rust allows the owner of a value to lend out that value to others, without giving up ownership, through references.

3.1 References

  • Shared References

A shared reference, &T, is, as the name implies, a pointer that may be shared. Any number of other references may exist to the same value, and each shared reference is Copy, so you can trivially make more of them.

Values behind shared references are not mutable; you cannot modify or reassign the value a shared reference points to, nor can cast a shared reference to a mutable one.

  • Mutable References

A mutable references, &mut T. The compiler assumes that there are no other threads accessing the target value, whether through a shared reference or a mutable one.

If the value behind the mutable reference is moved, then another value must be left in its place. If not, the owner would still think it needed to drop the value, but there would be no value for it to drop!.

fn replace_with_84(s: &mut Box<i32>) {
  // this is not okay, as *s would be empty
  // let was = *s;

  // but instead
  let was = std::mem::take(s);
  *s = was;

  // we can exchange values behind &mut
  let mut r = Box::new(84);
  std::mem::swap(s, &mut r);

  assert_ne!(*r, 84);
}

let mut s = Box::new(42);

replace_with_84(&mut s);

3.2 Interior Mutability

Some types provide interior mutability, meaning they allow to mutate a value through a shared reference.

  • mutable reference through shared reference: Mutex and RefCell
  • replace a value given only a shared reference: atomic integer types in std::sync::atomic and std::cell::Cell

3.3 Lifetimes

A Lifetime is a name for a region of code that some reference must be valid for.

  • Generic Lifetime: is a parameter used in functions, structs, traits and implementations to indicate that one or more references share the same lifetime, or to express relationships between the lifetimes of multiple references.
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
  if x.len() > y.len() {
    x
  } else {
    y
  }
}
  • Lifetime Variance
//covariant
let x: &'static str; // more useful, lives longer
let x: &'a str; // less useful, lives shorter

// contravariant
fn take_func1(&'static str) // stricter, so less useful
fn take_func2(&'a str) // less strict, more useful