SkillAgentSearch skills...

Verneuil

Verneuil is a VFS extension for SQLite that asynchronously replicates databases to S3-compatible blob stores.

Install / Use

/learn @backtrace-labs/Verneuil
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Verneuil: streaming replication for SQLite

asciicast

Verneuil[^verneuil-process] [vɛʁnœj] is a VFS (OS abstraction layer) for SQLite that accesses local database files like the default unix VFS while asynchronously replicating snapshots to S3-compatible blob stores. We wrote it to improve the scalability and availability of pre-existing services for which SQLite is a good fit, at least for single-node deployments.

Backtrace relies on Verneuil to backup and replicate thousands of SQLite databases that range in size from 100KB to a few gigabytes, some of which see updates every second... for less than $40/day in S3 costs.

It has been tested on linux/amd64, linux/aarch64 (little endian), and darwin/aarch64. The sqlite file format and the Verneuil replication data are all platform agnostic.

[^verneuil-process]: The Verneuil process was the first commercial method of manufacturing synthetic gemstones... and DRH insists on pronouncing SQLite like a mineral, surely a precious one (:

The primary design goal of Verneuil is to add asynchronous read replication to working single-node systems without introducing new catastrophic failure modes. Avoiding new failure modes takes precedence over all other considerations, including replication lag: there is no attempt to bound or minimise the staleness of read replicas. Verneuil read replicas should only be used when stale data is acceptable.

In keeping with this conservative approach to replication, the local database file on disk remains the source of truth, and the VFS is fully compatible with SQLite's default unix VFS, even for concurrent (with file locking) accesses. Verneuil stores all state that must persist across SQLite transactions on disk, so multiple processes can still access and replicate the same database with Verneuil.

Verneuil also paces all API calls (with a [currently hardcoded] limit of 30 call/second/process) to avoid "surprising" cloud bills, and decouples the SQLite VFS from the replication worker threads that upload data to a remote blob store with a crash-safe buffer directory that bounds its worst-case disk footprint to roughly four times the size of the source database file. It's thus always safe to disable access to the blob store: buffered replication data may grow over time, but always within bounds.

Replacing the default unix VFS with Verneuil impacts local SQLite operations, of course: writes must be slower, in order to queue updates for replication. However, this slowdown is usually proportional to the time it took to perform the write itself, and often dominated by the two fsyncs incurred by SQLite transaction commits in rollback mode. In addition, the additional replication logic runs with the write lock downgraded to a read lock, so subsequent transactions only block on the new replication step once they're ready to commit.

This effort is incomparable with litestream: Verneuil is meant for asynchronous read replication, with streaming backups as a nice side effect. The replication approach is thus completely different. In particular, while litestream only works with SQLite databases in WAL mode, Verneuil only supports rollback journaling. See doc/DESIGN.md for details.

What's in this repo

  1. A "Linux" VFS (c/linuxvfs.c) that implements everything that SQLite needs for a non-WAL DB, without all the backward compatibility cruft in SQLite's Unix VFS. The new VFS's behaviour is fully compatible with upstream's Unix VFS! It's a simpler starting point for new (Linux-only) SQLite VFSes.

  2. A Rust crate with a C interface (see include/verneuil.h) to configure and register:

    • The verneuil VFS, which hooks into the Linux VFS to track changes, generate snapshots in spooling directories, and asynchronously upload spooled data to a remote blob store like S3. This VFS is only compatible with SQLite's rollback journal mode. It can be called directly as a Rust program, or via its C interface.

    • The verneuil_snapshot VFS that lets SQLite access snapshots stored in S3-compatible blob stores.

  3. A runtime-loadable SQLite extension, libverneuil_vfs, that lets SQLite open databases with the verneuil VFS (to replicate the database to remote storage), or with the verneuil_snapshot VFS (to access a replicated snapshot).

  4. The verneuilctl command-line tool to restore snapshots, forcibly upload spooled data, synchronise a database file to remote storage, and perform other ad hoc administrative tasks.

Quick start

There is more detailed setup information, including how to directly link against the verneuil crate instead of loading it as a SQLite extension, in doc/VFS.md and doc/SNAPSHOT_VFS.md. The rusqlite_integration example shows how that works for a Rust crate.

For quick hacks and test drives, the easiest way to use Verneuil is to build it as a runtime loadable extension for SQLite (libverneuil_vfs).

cargo build --release --examples --features='dynamic_vfs'

The verneuilctl tool will also be useful.

cargo build --release --examples --features='vendor_sqlite'

Verneuil needs additional configuration to know where to spool replication data, and where to upload or fetch data from remote storage. That configuration data must be encoded in JSON, and will be deserialised into a verneuil::Options struct (in src/lib.rs).

A minimal configuration string looks as follows. See doc/VFS.md and doc/SNAPSHOT_VFS.md for more details.

{
  // "make_default": true, to use the replicating VFS by default
  // "tempdir": "/my/tmp/", to override the location of temporary files
  "replication_spooling_dir": "/tmp/verneuil/",
  "replication_targets": [
    {
      "s3": {
        "region": "us-east-1",
        // "endpoint": "http://127.0.0.1:9000", //for non-standard regions
        "chunk_bucket": "verneuil_chunks",
        "manifest_bucket": "verneuil_manifests",
        "domain_addressing": true  // or false for the legacy bucket-as-path interface
        // "create_buckets_on_demand": true // to create private buckets as needed
      }
    }
  ]
}

That's a mouthful to pass as query string parameters to sqlite3_open_v2, so Verneuil currently looks for that configuration string in the VERNEUIL_CONFIG environment variable. If that variable's value starts with an at sign, like "@/path/to/config.json", Verneuil looks for the configuration JSON in that file.

The configuration file does not include any credential: Verneuil gets those from the environment, either by hitting the local EC2 credentials daemon, or by reading the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

Now that the environment is set up, we can load the extension in SQLite, and start replicating our writes to S3, or any other compatible blob server (we use minio for testing).

$ RUST_LOG=warn VERNEUIL_CONFIG=@verneuil.json sqlite3
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ./libverneuil_vfs  -- Load the Verneuil VFS extension.
sqlite> .open file:source.db?vfs=verneuil
-- The contents of source.db will now be spooled for replication before
-- letting each transaction close.
sqlite> .open file:verneuil://source.host.name/path/to/replicated.db?vfs=verneuil_snapshot
-- opens a read replica for the most current snapshot replicated to s3 by `source.host.name`
-- for the database at `/path/to/replicated.db`.

Outside the SQLite shell, extensions loading must be enabled in order to allow access to the load_extension SQL function.

URI filenames must also be enabled in order to specify the VFS in the connection string; it's also possible to pass a VFS argument to sqlite3_open_v2.

Replication data is buffered to the replication_spooling_dir synchronously, before the end of each SQLite transaction. Actually uploading the data to remote storage happens asynchronously: we wouldn't want to block transaction commit on network calls.

After exiting the shell or closing an application, we can make sure that all spooled data is flushed to remote storage with verneuilctl flush $REPLICATION_SPOOLING_DIR: this command will attempt to synchronously upload all pending spooled data in the spooling directory, and log noisily / error out on failure.

Find documentation for other verneuilctl subcommands with verneuilctl help:

$ ./verneuilctl --help
verneuilctl 0.1.0
utilities to interact with Verneuil snapshots

USAGE:
    verneuilctl [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help
            Prints help information

    -V, --version
            Prints version information


OPTIONS:
    -c, --config <config>
            The Verneuil JSON configuration used when originally copying the database to remote storage.

            A value of the form "@/path/to/json.file" refers to the contents of that file; otherwise, the argument
            itself is the configuration string.

            This parameter is optional, and defaults to the value of the `VERNEUIL_CONFIG` environment variable.
    -l, --log <log>
            Log level, in the same format as `RUST
View on GitHub
GitHub Stars517
CategoryData
Updated23h ago
Forks21

Languages

C

Security Score

100/100

Audited on Mar 23, 2026

No findings