laut /laʊt/ - verifiable provenance data and SBOMs with Nix

The name is german for[^1]

loud, noisy, blatant 📢
(as) per, according to, in accordance with 🕵️‍♀️

🚧 This is a still incomplete implementation of https://dl.acm.org/doi/10.1145/3689944.3696169. 🚧

</div>

The fundamentals are in place, but some of the cool things about it still neeed some work (marked ❎):

configurable trust model[^2] ✅, implemented correctly ❎ ...
which can be re-configured over time, ✅ based on ...
verifiable provenance data for builders ❎
like realizations for CA derivations, ✅ but also works for IA derivations ❎
based on a new proposed signature format on top of JWS, ✅ with
arbitrary additional ✅ but detachable ❎ metadata
integrates with/extends https://github.com/nikstur/bombon to create verifiable SBOMs ❎

Right now this can resovle the dependencies for and verify a fully content-addressed hello binary, testing Nix against Lix using different keys, in our VM tests.

At the same time, it's not ready for users yet. For example, the datalog implemenation for verification is not entirely correct yet. As a project we are also not yet commited to supporting the current format of the signatures long term, we might for example still want to change things like the envelope format.

I want to get a scientific paper, and later my PhD thesis published based on this work, so if you do something that's inspired by this project, please give me a shoutout in your README.md, your docs or the relevant issue in your issue tracker. This really helps me demonstrate the relevance of my work.

How can I use it

This is a standalone command line tool called laut, which has two subcommands:

The first one is

laut sign-and-upload --to [S3 store url]

which will sign your derivations with the new signature format, and upload them to the newly introduced traces folder in the provided S3 store. This will then happen automatically after each build, in the same way that signatures are normally uploaded from nix-based builders.

The second one is

laut verify --cache [S3 store url] --trusted-key [path to public key file] [derivation path or flake output path]

Which is run manually by the user after building or obtaining an output from the cache. This command tries to verify that an output can be derived from a given derivation according to the stricter validation criteria of the tool. Later on the tool will check this against a produced result link on disk, there will be more options to configure a specific trust model to verify against, and you will be able to additionally pass an SBOM which then also has to match the other elements. The goal of the SBOM integration is to connect this with established standards that people outside of the Nix community understand as well.

How does it work

It's a python program, with some internals written in Rust, and a dependency on Snix for the hashing schemes. The signing itself is very straightforward python code.

The verification is more complicated, as it instantiates an actual dependency tree in memory, then walks through that tree to gather information. As part of this verification phase, the tool also gathers signatures from a set of caches, taking into account possible combinations of inputs by content hash, which could satisfy the dependency on those same inputs by input hash. Thes data then serves as the input to a datalog program written in Rust, to make the actual determination about the validity of the dependency tree.

How can I test it

As of now, there are two kinds of tests in this project

The python tests, which can be run with

pytest -s tests/

inside a nix develop shell, and the NixOS VM tests, which you can run by first building the test driver for one of the tests

nix build .#checks.x86_64-linux.small-verify.driverInteractive

Available VM tests include:

small-sign and small-verify - Quick tests
medium-sign and medium-verify - Medium-sized tests
large-sign and large-verify - Large tests (time out in CI)

Then run the resulting binary to get into this emacs shell.

In that shell you can then run the test using the "test_script()" function.

In the future different VM tests should exercise different trust models, but right now they all uniformly only trust builderA and builderB in combination.

FAQ

Q: Do you want to upstram this?
A: Yes. With this project, I want to lead a credible effort to propose a specific signature format, which does what I want from such a format, as outlined in my paper.

Q: Do you accept contributions?
A: Yes, I am enthusiastic about collaborating on this, and helping people with getting started on that. I also want to reply to proposals and criticism within a week. If I don't and you're waiting on an answer from me, please remind me.

Q: What do you want from a signature format in Nix?
A: To turn Nix into a leading edge supply chain security tool. Nix has interesting properties in that area, but it is not living up to its potential yet.

Q: Are you interested in working with different implementations of Nix?
A: Yes, definitely. I feel like a flexible enough signature format can be especially useful in an increasingly diverse ecosystem. Let's make it possible to let Nix evolve over time, try new ideas, AND interoperate as much as possible while doing it. It's important to me to have a good working relationship with others in the community across various implementations of Nix, who care about these issues as well. Please open issues, reach me on matrix or via email at m@groundry.org.

Q: Why are you not implementing this in Nix or any of its implementations directly?
A: Eventually that is definitely the way you would want to do this kind of thing, but for now it is meant to prove the concept (also across implementations) and introduce it to an expert audience, with a lot of breakage much shorter iteration times.

Q: Can I use this now?
A: No, it does not do anything useful yet, but you can help work on it.

Glossary

Here is a list of technical terms we use in this project with their definitions:

<dl> <dt>derivation / drv</dt> <dd>Nix uses this term for build steps, which are identified and defined by their characteristic input hash. In this project we will define a derivation strictly as an element <code>i</code> in the domain of a function <code>build(i: input) -> output</code> and not as the pair of both input output <code>(i, build(i))</code>.</dd> <dt>unresolved derivation / udrv</dt> <dd>A derivation, which depends on other derivations.</dd> <dt>resolved derivation / rdrv</dt> <dd>A derivation, which does not depend on other derivations (anymore). The content-addressed derivation RFC also calls this a basic derivation.</dd> <dt>derivation output / output path</dt> <dd>Each derivation can have more than one derivation output, which show up in the Nix store as/at separate output paths, but were created by building the same derivation. This step of indirection and distinction between individual outputs of a derivation is not an important concern when reasoning about trust, but it shows up in the technical details sometimes. Derivation outputs refers to the abstract names of these outputs, written as <code>/nix/store/{hash}-{name}.drv$out</code>, while output path refers to their "physical manifestation" in terms of a path / address and the contents of those outputs in the store, like <code>/nix/store/{hash}-{name}</code> and its content.</dd> <dt>content hash</dt> <dd>Describes the bitwise identity of a file or path by hashing it in a defined manner.</dd> <dt>dependency resolution / resolution</dt> <dd>The process of resolving a derivation, by replacing each dependency on another derivation in terms of a derivation output of another unresolved derivation with its bitwise identity in term of a content hash. <br> The following adds detail using a bunch of forward references: For derivations using the CA derivation experimental feature, this is done explicitly by replacing entries in the <code>inputDrvs</code> attribute of the drv with entries in the <code>inputSrc</code> attribute of the drv. For IA derivations or CA derivations with IA dependencies, this happens implicitly every time the contents of an IA path are accessed.</dd> <dt>input hash</dt> <dd>The identifying and defining hash of a derivation. If a derivation is the input to, and therefore an element in the domain of, a <code>build</code> function, the input hash is a lookup key, which identifies this element and can therfore be used to store and look up build outputs or their content hashes. All derivations in Nix have an input hash, even CA derivations.</dd> <dt>unresolved input hash</dt> <dd>A type of input hash which is constructed from the set of inputs recursively, so that reflects the bitwise identity of only the leaves in the dependency tree in question, and the <em>build recipe</em> identity of how they are put together. This is called a deep constructive trace up to terminal inputs in the build systems a la carte paper, and my first paper. In Nix it is the hash that is part of the store path of any regular (input-addressed derivation). It is why they are called input-addressed.</dd> <dt>resolved input hash</dt> <dd>A type of input hash which is constructed from the set of inputs and incorporates identity of all direct dependencies by a content hash. This is called a constructive trace in the build systems a la carte paper, and my first paper. In Nix it is the hash of a resolved content-addressed derivation. The derivation itself is still input-addressed, and it has an input hash, but the individual inputs that factor into that hash are direct dependencies that are included with their content hash.</dd> <dt>IA derivation</dt> <dd>A regular derivation in Ni

Laut

Install / Use

README

laut /laʊt/ - verifiable provenance data and SBOMs with Nix

How can I use it

How does it work

How can I test it

FAQ

Glossary