Paper

Rust Smart Pointers

· Sricor
  • Box<T> have single owners
  • Box<T> allow to store data on the heap.
  • Box<T> is a pointer, point to data on the heap.
  • When a box goes out of scope, it will be deallocated (both for the heap and stack).
  • Implementing the Deref trait allows to customize the behavior of the dereference operator *.
  • Implementing the Drop trait, Rust automatically called drop when instances went out of scope.
  • Not allowed to explicitly call drop.
  • Can drop a value early with std::mem::drop.
  • Variables are dropped in the reverse order of their creation.
  • Rc<T> allows a single value to have multiple owners.
  • Rc<T> allows only immutable borrows checked at compile time.
  • Rc<T> only gives immutable access to that data.
  • Rc<T> only for use in single-threaded scenarios.
  • Rc::clone does not make a deep copy, it does not take much time.
  • Drop trait decreases the reference count automatically when an Rc<T> value goes out of scope
  • RefCell<T> have single owners.
  • RefCell<T> allows immutable or mutable borrows checked at runtime.
  • RefCell<T> will panic at runtime when try to violate borrow rules.
  • RefCell<T> only for use in single-threaded scenarios.

1. Using Box to Point to Data on the Heap

Boxes allow to store data on the heap rather than the stack.

Will use them most often in these situations:

  • have a type whose size can’t be known at compile time
  • have a large amount of data and want to transfer ownership but ensure the data won’t be copied
  • want to own a value and care only that it’s a type that implements a particular trait rather than being of a specific type
fn main() {
    let b = Box::new(5);
    println!("b = {}", b);
}

Define the variable b to have the value of a Box that points to the value 5, which is allocated on the heap. When a box goes out of scope, as b does at the end of main, it will be deallocated. The deallocation happens both for the box (stored on the stack) and the data it points to (stored on the heap).

// This code does not compile
enum List {
    Cons(i32, List),
    Nil,
}

use crate::List::{Cons, Nil};

fn main() {
    let list = Cons(1, Cons(2, Cons(3, Nil)));
}

This code does not compile, Because Rust can’t figure out how much space to allocate for recursively defined types.

An infinite List consisting of infinite Cons variants

Put a Box<T> inside the Cons variant instead of another List value directly. The Box<T> will point to the next List value that will be on the heap rather than inside the Cons variant.

enum List {
    Cons(i32, Box<List>),
    Nil,
}

use crate::List::{Cons, Nil};

fn main() {
    let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}

Because a Box<T> is a pointer, Rust always knows how much space a Box<T> needs: a pointer’s size doesn’t change based on the amount of data it’s pointing to. We now know that any List value will take up the size of an i32 plus the size of a box’s pointer data. By using a box broken the infinite, recursive chain, so the compiler can figure out the size it needs to store a List value.

A List that is not infinitely sized because Cons holds a Box

Boxes provide only the indirection and heap allocation,  they don’t have any other special capabilities.

The Box<T> type is a smart pointer because it implements the Deref trait, which allows Box<T> values to be treated like references. When a Box<T> value goes out of scope, the heap data that the box is pointing to is cleaned up as well because of the Drop trait implementation.

2. Treating Smart Pointers Like Regular References with the Deref Trait

Implementing the Deref trait allows to customize the behavior of the dereference operator *

By implementing Deref in such a way that a smart pointer can be treated like a regular reference, you can write code that operates on references and use that code with smart pointers too.

// This code does not compile
// Using Box<T> Like a Reference

struct MyBox<T>(T);

impl<T> MyBox<T> {
    fn new(x: T) -> MyBox<T> {
        MyBox(x)
    }
}

fn main() {
    let x = 5;
    let y = MyBox::new(x);

    assert_eq!(5, x);
    assert_eq!(5, *y);  // error: type `MyBox<{integer}>` cannot be dereferenced
}

MyBox<T> type can’t be dereferenced because we haven’t implemented that ability on our type. To enable dereferencing with the * operator, we implement the Deref trait.

use std::ops::Deref;

impl<T> Deref for MyBox<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

Without the Deref trait, the compiler can only dereference & references. The deref method gives the compiler the ability to take a value of any type that implements Deref and call the deref method to get a & reference that it knows how to dereference.

When we entered *y, behind the scenes Rust actually ran this code *(y.deref()). The * operator is replaced with a call to the deref method and then a call to the * operator just once, each time we use a * in our code. Because the substitution of the * operator does not recurse infinitely.

Implicit Deref Coercions with Functions and Methods

Deref coercion converts a reference to a type that implements the Deref trait into a reference to another type. For example, deref coercion can convert &String to &str because String implements the Deref trait such that it returns &str.

fn hello(name: &str) {
    println!("Hello, {name}!");
}

fn main() {
    let m = MyBox::new(String::from("Rust"));
    hello(&m);

	// If Rust didn’t implement deref coercion
	// hello(&(*m)[..]);
}

When the Deref trait is defined for the types involved, Rust will analyze the types and use Deref::deref as many times as necessary to get a reference to match the parameter’s type. The number of times that Deref::deref needs to be inserted is resolved at compile time, so there is no runtime penalty for taking advantage of deref coercion.

How Deref Coercion Interacts with Mutability

  • From &T to &U when T: Deref<Target=U>
  • From &mut T to &mut U when T: DerefMut<Target=U>
  • From &mut T to &U when T: Deref<Target=U>

3. Running Code on Cleanup with the Drop Trait

Specify the code to run when a value goes out of scope by implementing the Drop trait. The Drop trait requires you to implement one method named drop that takes a mutable reference to self.

struct CustomSmartPointer {
    data: String,
}

impl Drop for CustomSmartPointer {
    fn drop(&mut self) {
        println!("Dropping CustomSmartPointer with data `{}`!", self.data);
    }
}

fn main() {
    let c = CustomSmartPointer {
        data: String::from("my stuff"),
    };
    let d = CustomSmartPointer {
        data: String::from("other stuff"),
    };
    println!("CustomSmartPointers created.");
}

Rust doesn’t call drop explicitly because Rust would still automatically call drop on the value at the end of main. This would cause a double free error because Rust would be trying to clean up the same value twice.

Dropping a Value Early with std::mem::drop

fn main() {
    let c = CustomSmartPointer {
        data: String::from("some data"),
    };
    println!("CustomSmartPointer created.");
    drop(c);
    println!("CustomSmartPointer dropped before the end of main.");
}

4. Rc, the Reference Counted Smart Pointer

Enable multiple ownership explicitly by using the Rust type Rc<T>, which is an abbreviation for reference counting. The Rc<T> type keeps track of the number of references to a value to determine whether or not the value is still in use. If there are zero references to a value, the value can be cleaned up without any references becoming invalid.

We use the Rc<T> type when we want to allocate some data on the heap for multiple parts of our program to read and we can’t determine at compile time which part will finish using the data last. If we knew which part would finish last, we could just make that part the data’s owner, and the normal ownership rules enforced at compile time would take effect.

 Rc<T> is only for use in single-threaded scenarios.

Using Rc to Share Data

Two lists, b and c, sharing ownership of a third list, a

Trying to implement this scenario using our definition of List with Box<T> won’t work

// This code does not compile
enum List {
    Cons(i32, Box<List>),
    Nil,
}

use crate::List::{Cons, Nil};

fn main() {
    let a = Cons(5, Box::new(Cons(10, Box::new(Nil))));
    let b = Cons(3, Box::new(a));   // error: use of moved value: `a`
    let c = Cons(4, Box::new(a));   // error: value used here after move
}

when try to use a again when creating c, not allowed to because a has been moved.

enum List {
    Cons(i32, Rc<List>),
    Nil,
}

use crate::List::{Cons, Nil};
use std::rc::Rc;

fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    let b = Cons(3, Rc::clone(&a));
    let c = Cons(4, Rc::clone(&a));
}

Change definition of List to use Rc<T> in place of Box<T>, Each Cons variant will now hold a value and an Rc<T> pointing to a List. When create b, instead of taking ownership of a, it will clone the Rc<List> that a is holding, thereby increasing the number of references from one to two and letting a and b share ownership of the data in that Rc<List>. it will also clone a when creating c, increasing the number of references from two to three. Every time call Rc::clone, the reference count to the data within the Rc<List> will increase, and the data won’t be cleaned up unless there are zero references to it.

Could have called a.clone() rather than Rc::clone(&a), but Rust’s convention is to use Rc::clone in this case. The implementation of Rc::clone doesn’t make a deep copy of all the data like most types’ implementations of clone do. The call to Rc::clone only increments the reference count, which doesn’t take much time. Deep copies of data can take a lot of time. By using Rc::clone for reference counting, we can visually distinguish between the deep-copy kinds of clones and the kinds of clones that increase the reference count.

Cloning an Rc<T> Increases the Reference Count

fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    println!("count after creating a = {}", Rc::strong_count(&a));
    let b = Cons(3, Rc::clone(&a));
    println!("count after creating b = {}", Rc::strong_count(&a));
    {
        let c = Cons(4, Rc::clone(&a));
        println!("count after creating c = {}", Rc::strong_count(&a));
    }
    println!("count after c goes out of scope = {}", Rc::strong_count(&a));
}

At each point in the program where the reference count changes, we print the reference count, which get by calling the Rc::strong_count function.

This code prints the following:

count after creating a = 1
count after creating b = 2
count after creating c = 3
count after c goes out of scope = 2

The Drop trait decreases the reference count automatically when an Rc<T> value goes out of scope. Using Rc<T> allows a single value to have multiple owners, and the count ensures that the value remains valid as long as any of the owners still exist.

5. RefCell and the Interior Mutability Pattern

Interior mutability is a design pattern in Rust that allows you to mutate data even when there are immutable references to that data, To mutate data, the pattern uses unsafe code inside a data structure to bend Rust’s usual rules that govern mutation and borrowing. Unsafe code indicates to the compiler that we’re checking the rules manually instead of relying on the compiler to check them for us

Enforcing Borrowing Rules at Runtime with RefCell

The RefCell<T> type represents single ownership over the data it holds

With references and Box<T>, the borrowing rules’ invariants are enforced at compile time. With RefCell<T>, these invariants are enforced at runtime. With references, if you break these rules, you’ll get a compiler error. With RefCell<T>, if you break these rules, your program will panic and exit.

Because RefCell<T> allows mutable borrows checked at runtime, you can mutate the value inside the RefCell<T> even when the RefCell<T> is immutable.

RefCell<T> is only for use in single-threaded scenarios and will give you a compile-time error if you try using it in a multithreaded context.

Mutating the value inside an immutable value is the interior mutability pattern.

Interior Mutability: A Mutable Borrow to an Immutable Value

// This code does not compile
fn main() {
    let x = 5;
    let y = &mut x;  // error: cannot borrow as mutable
}

Using RefCell<T> is one way to get the ability to have interior mutability, but RefCell<T> doesn’t get around the borrowing rules completely: the borrow checker in the compiler allows this interior mutability, and the borrowing rules are checked at runtime instead. If violate the rules, will get a panic! instead of a compiler error.

Exaple:

pub trait Messenger {
    fn send(&self, msg: &str);
}

pub struct LimitTracker<'a, T: Messenger> {
    messenger: &'a T,
    value: usize,
    max: usize,
}

impl<'a, T> LimitTracker<'a, T>
where
    T: Messenger,
{
    pub fn new(messenger: &'a T, max: usize) -> LimitTracker<'a, T> {
        LimitTracker {
            messenger,
            value: 0,
            max,
        }
    }

    pub fn set_value(&mut self, value: usize) {
        self.value = value;

        let percentage_of_max = self.value as f64 / self.max as f64;

        if percentage_of_max >= 1.0 {
            self.messenger.send("Error: You are over your quota!");
        } else if percentage_of_max >= 0.9 {
            self.messenger
                .send("Urgent warning: You've used up over 90% of your quota!");
        } else if percentage_of_max >= 0.75 {
            self.messenger
                .send("Warning: You've used up over 75% of your quota!");
        }
    }
}
// This code does not compile
#[cfg(test)]
mod tests {
    use super::*;

    struct MockMessenger {
        sent_messages: Vec<String>,
    }

    impl MockMessenger {
        fn new() -> MockMessenger {
            MockMessenger {
                sent_messages: vec![],
            }
        }
    }

    impl Messenger for MockMessenger {
        fn send(&self, message: &str) {
            self.sent_messages.push(String::from(message)); // error: `self` is a `&` reference, so the data it refers to cannot be borrowed as mutable
        }
    }

    #[test]
    fn it_sends_an_over_75_percent_warning_message() {
        let mock_messenger = MockMessenger::new();
        let mut limit_tracker = LimitTracker::new(&mock_messenger, 100);

        limit_tracker.set_value(80);

        assert_eq!(mock_messenger.sent_messages.len(), 1);
    }
}

Can’t modify the MockMessenger to keep track of the messages, because the send method takes an immutable reference to self. Also can’t use &mut self instead, because then the signature of send wouldn’t match the signature in the Messenger trait definition.

Using RefCell<T> to mutate an inner value while the outer value is considered immutable.

#[cfg(test)]
mod tests {
    use super::*;
    use std::cell::RefCell;

    struct MockMessenger {
        sent_messages: RefCell<Vec<String>>,
    }

    impl MockMessenger {
        fn new() -> MockMessenger {
            MockMessenger {
                sent_messages: RefCell::new(vec![]),
            }
        }
    }

    impl Messenger for MockMessenger {
        fn send(&self, message: &str) {
            self.sent_messages.borrow_mut().push(String::from(message));
        }
    }

    #[test]
    fn it_sends_an_over_75_percent_warning_message() {
        // --snip--

        assert_eq!(mock_messenger.sent_messages.borrow().len(), 1);
    }

Keeping Track of Borrows at Runtime with RefCell

When creating immutable and mutable references, use the & and &mut syntax, respectively. With RefCell<T>, use the borrow and borrow_mut methods, which are part of the safe API that belongs to RefCell<T>. The borrow method returns the smart pointer type Ref<T>, and borrow_mut returns the smart pointer type RefMut<T>. Both types implement Deref, so can treat them like regular references.

The RefCell<T> keeps track of how many Ref<T> and RefMut<T> smart pointers are currently active. Every time call borrow, the RefCell<T> increases its count of how many immutable borrows are active. When a Ref<T> value goes out of scope, the count of immutable borrows goes down by one. Just like the compile-time borrowing rules, RefCell<T> lets us have many immutable borrows or one mutable borrow at any point in time.

If we try to violate these rules, rather than getting a compiler error as we would with references, the implementation of RefCell<T> will panic at runtime.

// This code panics
impl Messenger for MockMessenger {
	fn send(&self, message: &str) {
		let mut one_borrow = self.sent_messages.borrow_mut();
		let mut two_borrow = self.sent_messages.borrow_mut();

		one_borrow.push(String::from(message));
		two_borrow.push(String::from(message)); // error: already borrowed: BorrowMutError
	}
}

Having Multiple Owners of Mutable Data by Combining Rc and RefCell

A common way to use RefCell is in combination with Rc. Rc lets you have multiple owners of some data, but it only gives immutable access to that data.

Using Rc<RefCell<i32>> to create a List that we can mutate.

#[derive(Debug)]
enum List {
    Cons(Rc<RefCell<i32>>, Rc<List>),
    Nil,
}

use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

fn main() {
    let value = Rc::new(RefCell::new(5));

    let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));

    let b = Cons(Rc::new(RefCell::new(3)), Rc::clone(&a));
    let c = Cons(Rc::new(RefCell::new(4)), Rc::clone(&a));

    *value.borrow_mut() += 10;

    println!("a after = {:?}", a);
    println!("b after = {:?}", b);
    println!("c after = {:?}", c);
}

This code prints the following:

a after = Cons(RefCell { value: 15 }, Nil)
b after = Cons(RefCell { value: 3 }, Cons(RefCell { value: 15 }, Nil))
c after = Cons(RefCell { value: 4 }, Cons(RefCell { value: 15 }, Nil))

Have an outwardly immutable List value. But can use the methods on RefCell<T> that provide access to its interior mutability so can modify data.The runtime checks of the borrowing rules protect us from data races, and it’s sometimes worth trading a bit of speed for this flexibility in our data structures.