Lifetimes

A lifetime inference algorithm for the Rust programming language written in Soufflé.

Generate Convert Improve

Install / Use

/learn @rljacobson/Lifetimes

About this skill

Quality Score

0/100

README

This is a lifetime inference algorithm for Rust written in Soufflé. It is a proof of concept and will not actually work with your source code automatically—but it could in principle if we hooked it up to Rust’s AST. The To Do section below illustrates how unnecessarily restrictive the Rust compiler is regarding lifetimes.

Usage

Since it’s a proof of concept, it’s very unsophisticated. In “real life,“ the input comes from the AST generated by the compiler. In this script I just manually typed the input into the script itself. The input is labeled as such.

The inference algorithm is just a Soufflé script, so the only dependency is Soufflé. Once Soufflé is installed, run the script like so:

$ souffle -D. lifetimes.dl

What’s nice about Soufflé is that it will tell you how it deduced something. Here’s how:

$ souffle -D. -t explain lifetimes.dl
>      explain <<thing that was deduced here>>

The -t explain flag drops you into a command line. Type explain followed by the thing you want explaining. I got tired of doing this, so I rewrote my “error” predicate to tell me where it comes from together with an error_msg that produces a human readable error message string.

.output error_msg

Lifetime Inference in Rust

Last Monday I decided to take a look at Rust for the first time. Rust has the notion of lifetimes which is a mechanism to prohibit errors like use after free and race conditions. However, the Rust compiler requires the programmer to figure out the lifetimes manually. Programmers new to Rust find the “borrow checker,“ the component of the compiler that checks lifetime correctness, to be a challenging learning curve. The internet is full of people “fighting the borrow checker.”

A lifetime is a type. The Rust compiler infers types, but it doesn’t infer lifetimes.1 Why not? I don’t know. It seems like such a thing would be a priority considering how big of a barrier it is to new users. I decided to write my own lifetime inference engine.

How does it work?

It works the same way type inference works—literally. A lifetime is a type. Consider this example from The Book:

fn longest(x: &str, y: &str) -> &str { // ━━━━━━┱───┐ -x and y borrows here.
    if x.len() > y.len() { //   x's borrow    S4┃ S5│ 
        x     //                overlaps with   ┃   │ -loan_if = x, x drops.
    } else {  //                y's borrow      ┃   │
        y     // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋───┤ -loan_if = y, y drops.
    } // either x or y is still borrowing here  ┃   │
} // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┹───┘ -result borrows here, while
                                                   //  x and y go out of scope.
fn main() {
    let string1 = String::from(  // ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ -string1 born
        "long string is long");  //   string1's scope              S1┃
    let result;                  // ┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉┉╂┉┉┐
    { //    result's scope, but not assigned to, so no lifetime yet  ┃S4┋
        let string2 = String::from("xyz"); //─────────────────────┐  ┃  ┋ -string2 born.
        result = longest(string1.as_str(), // string2's scope   S2│  ┃  ┋
                        string2.as_str());       // ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┼┄┄╂┄┄┼┄┄┄┐ -string1 or string2 borrowed here.
    }                                            // ──────────────┘  ┃  ┋ S3┆ -string2 dies.
    println!("The longest string is {}", result);// - S5             ┃  ┋   ┆ -result drops.
                                                 // ━━━━━━━━━━━━━━━━━┹┉┉┘┄┄┄┘ 
}                                                // - S6, global scope.

The boxes labeled with S# represent either lifetimes or regions in which a reference holds a “loan,” which is what I will call a borrow when I need a noun to refer to the abstraction of a reference holding a particular heap-allocated data element.

Program input

The sourcecode data needs to be input into the program. In particular, the input consists of:

The control flow graph. The control flow graph tells us whether program control can flow from point A to point B during execution. The points are program statements, which we identify with line numbers in the source code for simplicity. We write CFG.edge("L3","L4”){:.rust} to mean control flows from line L3 to line L4. The control flow graph itself can be constructed by computing reachability along edges, but we don’t compute it directly, so CFG is only an abstraction. What we actually construct is the data flow graph in our application, because it describes whether a piece of data's lifetime, via borrowing or scope, is "live" at any given point.
Where a loan is created together with some kind of identifier. We write object_born("string1", "L10"){:.rust} and, for simplicity, use the name of the original variable that created the loan as the name of the loan.
Where the loan dies, i.e. where it’s owner drops the loan. We write object_dies("string1", "L20"){:.rust}.
Where which reference borrows from which other reference. We write borrows("loan_if", "x", "L3”){:.rust}. Note that which loan is borrowed is unnecessary—that is inferred by the algorithm.
Where which reference releases their loan, e.g., because it went out of scope. We write drops("x", "string1", "L3”){:.rust}.

Computations

From the program input, we compute the set of all points a given referenceis borrowing a given loan. The reference starts holding the loan exactly when it borrows it:

borrowing(ref, ln, p) :- borrows(ref, ln, p).

It continues borrowing the loan along the control flow graph until it drops the loan, perhaps by going out of scope:

borrowing(ref, ln, p2) :- CFG.edge(p1, p2),
                        borrowing(ref, ln, p1),
                        !drops(ref, ln, p2).

This computes the region of source code that the reference ref holds the loan ln—sort of: crucially, despite the fact that I’ve called ln a loan, it is actually a reference to a loan. So borrowing tells us which reference is borrowing from which other reference at which points in the source code. But which loan is it referencing?

To bootstrap, we declare (perhaps awkwardly) that the loan’s owner references itself. (Recall that we are naming the loan after its owner, so it makes a bit more sense to read “references” as, “is held by” or something.)

references(var, var, pt) :- object_born(var, pt).

Next, whatever reference is borrowing the loan, we need the reference to propogate to the next point on the control graph (the next statement) if it’s still borrowing from the same thing:

references(ref2, ln, pt) :- CFG.edge(p1, p2),
                        borrowing(ref2, ref1, p1),
                        references(ref2, ln, p1),
                        borrowing(ref2, ref1, p2).

The crucial difference between references and borrowing is, references doesn’t update its referrent. That is, the thing being referred to never changes from when it came into being at an object_born:

references(ref2, ln, pt) :- CFG.edge(p1, p2),
                        borrowing(ref1, _, p1),
                        references(ref1, ln, p1),
                        borrowing(ref2, ref1, p2).

Detecting Errors

Conceptually, if ref2 borrows from ref1, then we need the set S2 := { p ∈ Program | borrowing(ref2, ref1, p) } to be a subset of the set S1 := { p | borrowing(ref1, _, p) }, that is, S1⊂ S2 . Said another way, the lifetime of ref2's borrow must be a subset of the lifetime of ref1's borrow.2

A dangling pointer occurs when the loan dies while something is still referencing it, perhaps because its owner goes out of scope. From the conceptual perspective of sets, an error occurs when we find a point p, a reference ref, and a loan ln such that references(ref, ln, p){:.prolog} but NOT borrowing(ln, ln, p){:.prolog} (the owner drops ln). In symbols, S1⊂ S2 is a constraint, but p ∈ S1∖ S2, contradicting the constraint. To be fancy, we also compute the point at which the reference first borrows the object.

error_use_freed(var, ln, errpt, borpt) :- references(var, ln, errpt),
                        object_dies(ln, errpt),
                        references(var, ln, borpt),
                        borrows(var, _, borpt).

It is also an error to have distinct (mutable) references to the same object. I don't actually make a distinction between mutable and immutable borrows, so this error isn't true to life. It is just to show how it would work. Again, we're fancy and compute the reference already pointing to the object and where that reference borrowed from the object.

error_already_borrowed(ref1, ln, errpt, ref2, borpt) :- references(ref1, ln, errpt),
                                borrows(ref1, ln, errpt),
                                references(ref2, ln, errpt), ref2!=ref1,
                                borrows(ref2, ref3, borpt), ref2!=ref3,
                                references(ref2, ln, borpt).

Et voilà!

Error: 'string2' is dropped at L17, but 'result' still holds the loan it borrowed at L15.

And that’s pretty much it, modulo a few detail

Related Skills

node-connect

348.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

348.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

348.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。