S3git
s3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.
Install / Use
/learn @s3git/S3gitREADME
s3git: git for Cloud Storage<br/>(or Version Control for Data)
s3git applies the git philosophy to Cloud Storage. If you know git, you will know how to use s3git!
s3git is a simple CLI tool that allows you to create a distributed, decentralized and versioned repository. It scales limitlessly to 100s of millions of files and PBs of storage and stores your data safely in S3. Yet huge repos can be cloned on the SSD of your laptop for making local changes, committing and pushing back.
Exactly like git, s3git does not require any server-side components, just download and run the executable. It imports the golang package s3git-go that can be used from other applications as well. Or see the Python module or Ruby gem.
Use cases for s3git
- Build and Release Management (see example with all Kubernetes releases).
- DevOps Scenarios
- Data Consolidation
- Analytics
- Photo and Video storage
See use cases for a detailed description of these use cases.
Download binaries
DISCLAIMER: These are PRE-RELEASE binaries -- use at your own peril for now
OSX
Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-darwin-amd64
$ mkdir s3git && cd s3git
$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-darwin-amd64
$ chmod +x s3git
$ export PATH=$PATH:${PWD} # Add current dir where s3git has been downloaded to
$ s3git
Linux
Download s3git from https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-linux-amd64
$ mkdir s3git && cd s3git
$ wget -q -O s3git https://github.com/s3git/s3git/releases/download/v0.9.2/s3git-linux-amd64
$ chmod +x s3git
$ export PATH=$PATH:${PWD} # Add current dir where s3git has been downloaded to
$ s3git
Windows
Download s3git.exe from https://github.com/s3git/s3git/releases/download/v0.9.1/s3git.exe
C:\Users\Username\Downloads> s3git.exe
Building from source
Build instructions are as follows (see install golang for setting up a working golang environment):
$ go get -d github.com/s3git/s3git
$ cd $GOPATH/src/github.com/s3git/s3git
$ go install
$ s3git
BLAKE2 Tree Hashing and Storage Format
Read here how s3git uses the BLAKE2 Tree hashing mode for both deduplicated and hydrated storage (and here for info for BLAKE2 at scale).
Example workflow
Here is a simple workflow to create a new repository and populate it with some data:
$ mkdir s3git-repo && cd s3git-repo
$ s3git init
Initialized empty s3git repository in ...
$ # Just stream in some text
$ echo "hello s3git" | s3git add
Added: 18e622875a89cede0d7019b2c8afecf8928c21eac18ec51e38a8e6b829b82c3ef306dec34227929fa77b1c7c329b3d4e50ed9e72dc4dc885be0932d3f28d7053
$ # Add some more files
$ s3git add "*.mp4"
$ # Commit and log
$ s3git commit -m "My first commit"
$ s3git log --pretty
Push to cloud storage
$ # Add remote back end and push to it
$ s3git remote add "primary" -r s3://s3git-playground -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
$ s3git push
$ # Read back content
$ s3git cat 18e6
hello s3git
Note: Do not store any important info in the s3git-playground bucket. It will be auto-deleted within 24-hours.
Directory versioning
You can also use s3git for directory versioning. This allows you to 'capture' changes coherently all the way down from a directory and subsequently go back to previous versions of the full state of the directory (and not just any file). Think of it as a Time Machine for directories instead of individual files.
So instead of 'saving' a directory by making a full copy into 'MyFolder-v2' (and 'MyFolder-v3', etc.) you capture the state of a directory and give it a meaningful message ("Changed color to red") as version so it is always easy to go back to the version you are looking for.
In addition you can discard any uncommitted changes that you made and go back to the last version that you have captured, which basically means you can (after committing) mess around in a directory and then be rest assured that you can always go back to its original state.
If you push your repository into the cloud then you will have an automatic backup and additionally you can easily collaborate with other people.
Lastly, it works of course with huge binary data too, so not just for text files as in the following 'demo' example:
$ mkdir dir-versioning && cd dir-versioning
$ s3git init .
$ # Just create a single file
$ echo "First line" > text.txt && ls -l
-rw-rw-r-- 1 ec2-user ec2-user 11 May 25 09:06 text.txt
$ #
$ # Create initial snapshot
$ s3git snapshot create -m "Initial snapshot" .
$ # Add new line to initial file and create another file
$ echo "Second line" >> text.txt && echo "Another file" > text2.txt && ls -l
-rw-rw-r-- 1 ec2-user ec2-user 23 May 25 09:08 text.txt
-rw-rw-r-- 1 ec2-user ec2-user 13 May 25 09:08 text2.txt
$ s3git snapshot status .
New: /home/ec2-user/dir-versioning/text2.txt
Modified: /home/ec2-user/dir-versioning/text.txt
$ #
$ # Create second snapshot
$ s3git snapshot create -m "Second snapshot" .
$ s3git log --pretty
3a4c3466264904fed3d52a1744fb1865b21beae1a79e374660aa231e889de41191009afb4795b61fdba9c156 Second snapshot
77a8e169853a7480c9a738c293478c9923532f56fcd02e3276142a1a29ac7f0006b5dff65d5ca245255f09fa Initial snapshot
$ more text.txt
First line
Second line
$ more text2.txt
Another file
$ #
$ # Go back one version in time
$ s3git snapshot checkout . HEAD^
$ more text.txt
First line
$ more text2.txt
text2.txt: No such file or directory
$ #
$ # Switch back to latest revision
$ s3git snapshot checkout .
$ more text2.txt
Another file
Note that snapshotting works for all files in the directory including any subdirectories. Click the following link for a more elaborate repository that includes all releases of the Kubernetes project.
Clone the YFCC100M dataset
Clone a large repo with 100 million files totaling 11.5 TB in size (Multimedia Commons), yet requiring only 7 GB local disk space.
(Note that this takes about 7 minutes on an SSD-equipped MacBook Pro with 500 Mbit/s download connection so for less powerful hardware you may want to skip to the next section (or if you lack 7 GB local disk space, try a df -h . first). Then again it is quite a few files...)
$ s3git clone s3://s3git-100m -a "AKIAI26TSIF6JIMMDSPQ" -s "5NvshAhI0KMz5Gbqkp7WNqXYlnjBjkf9IaJD75x7"
Cloning into ...
Done. Totaling 97,974,749 objects.
$ cd s3git-100m
$ # List all files starting with '123456'
$ s3git ls 123456
12345649755b9f489df2470838a76c9df1d4ee85e864b15cf328441bd12fdfc23d5b95f8abffb9406f4cdf05306b082d3773f0f05090766272e2e8c8b8df5997
123456629a711c83c28dc63f0bc77ca597c695a19e498334a68e4236db18df84a2cdd964180ab2fcf04cbacd0f26eb345e09e6f9c6957a8fb069d558cadf287e
123456675eaecb4a2984f2849d3b8c53e55dd76102a2093cbca3e61668a3dd4e8f148a32c41235ab01e70003d4262ead484d9158803a1f8d74e6acad37a7a296
123456e6c21c054744742d482960353f586e16d33384f7c42373b908f7a7bd08b18768d429e01a0070fadc2c037ef83eef27453fc96d1625e704dd62931be2d1
$ s3git cat cafebad > olympic.jpg
$ # List and count total nr of files
$ s3git ls | wc -l
97974749
Fork that repo
Below is an example for alice and bob working together on a repository.
$ mkdir alice && cd alice
alice $ s3git clone s3://s3git-spoon-knife -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
Cloning into .../alice/s3git-spoon-knife
Done. Totaling 0 objects.
alice $ cd s3git-spoon-knife
alice $ # add a file filled with zeros
alice $ dd if=/dev/zero count=1 | s3git add
Added: 3ad6df690177a56092cb1ac7e9690dcabcac23cf10fee594030c7075ccd9c5e38adbaf58103cf573b156d114452b94aa79b980d9413331e22a8c95aa6fb60f4e
alice $ # add 9 more files (with random content)
alice $ for n in {1..9}; do dd if=/dev/urandom count=1 | s3git add; done
alice $ # commit
alice $ s3git commit -m "Commit from alice"
alice $ # and push
alice $ s3git push
Clone it again as bob on a different computer/different directory/different universe:
$ mkdir bob && cd bob
bob $ s3git clone s3://s3git-spoon-knife -a "AKIAJYNT4FCBFWDQPERQ" -s "OVcWH7ZREUGhZJJAqMq4GVaKDKGW6XyKl80qYvkW"
Cloning into .../bob/s3git-spoon-knife
Done. Totaling 10 objects.
bob $ cd s3git-spoon-knife
bob $ # Check if we can access our empty file
bob $ s3git cat 3ad6 | hexdump
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00000200
bob $ # add another 10 files
bob $ for n in {1..10}; do dd if=/dev/urandom count=1 | s3git add; done
bob $ # commit
bob $ s3git commit -m "Commit from bob"
bob $ # and push back
bob $ s3git push
Switch back t
Related Skills
apple-reminders
339.3kManage Apple Reminders via remindctl CLI (list, add, edit, complete, delete). Supports lists, date filters, and JSON/plain output.
gh-issues
339.3kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
healthcheck
339.3kHost security hardening and risk-tolerance configuration for OpenClaw deployments
node-connect
339.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
