Iagitup
Archive GitHub, GitLab, Bitbucket & any git repo to the Internet Archive as portable bundles with rich metadata.
Install / Use
/learn @gdamdam/IagitupREADME
Archive git repositories to the Internet Archive.
iagitup clones a git repository from GitHub, GitLab, Bitbucket, Codeberg, or any HTTPS git URL, creates a portable git bundle, and uploads it to the Internet Archive with rich metadata. GitHub repos get full API metadata; all other platforms extract metadata from the local git history. If the repository has a wiki (GitHub only), that is bundled and uploaded too. A companion command, archive-watchlist, continuously archives the most-starred repositories on GitHub -- either all-time, by recency (--days), or within a custom date range (--since/--until).
Features
- Multi-platform support -- GitHub, GitLab, Bitbucket, Codeberg, self-hosted Gitea/Forgejo, and any HTTPS git URL.
- Full-fidelity snapshots -- every branch, tag, and ref is preserved in a single git bundle.
- Git LFS support -- LFS-enabled repos are detected automatically; large file objects are fetched and archived alongside the bundle.
- Wiki archiving -- wiki repositories are detected and bundled automatically (GitHub only).
- Rich IA metadata -- description, README, topics, language, stars, and more are attached to each item. GitHub repos get full API metadata; other platforms extract metadata from git history.
- Duplicate prevention -- two layers (local state cache + IA item check) ensure the same snapshot is never uploaded twice.
- Bulk archiving --
archive-watchlistfetches and archives the top-N most-starred GitHub repos on a schedule, with--days,--since, and--untilfilters for trending repos. - Parallel workers -- configurable concurrency for bulk runs.
- Custom metadata -- pass extra
key:valuepairs to enrich any upload.
Installation
From PyPI
pip install iagitup
This installs two commands: iagitup and archive-watchlist.
From source
git clone https://github.com/gdamdam/iagitup.git
cd iagitup
pip install .
Prerequisites
- Python 3.10, 3.11, 3.12, or 3.13
giton$PATHgit-lfson$PATH(optional — needed to archive LFS objects; repos are still archived without it, but LFS pointers won't resolve)- An Internet Archive account
- HTTPS URLs are required (SSH
user@host:pathsyntax is not supported) - For non-GitHub platforms, standard git authentication applies (credential helpers,
~/.netrc, etc.)
Quick Start
Archive a single repository
# GitHub
iagitup https://github.com/torvalds/linux
# GitLab
iagitup https://gitlab.com/inkscape/inkscape
# Bitbucket
iagitup https://bitbucket.org/berkeleylab/upcxx
# Codeberg
iagitup https://codeberg.org/forgejo/forgejo
# Any HTTPS git URL
iagitup https://git.example.com/org/project
:: Downloading https://github.com/torvalds/linux ...
:: Cloning https://github.com/torvalds/linux.git ...
:: Uploading bundle: torvalds-linux_-_2026-02-28_10-00-00.bundle
:: Upload FINISHED.
Identifier: github.com-torvalds-linux_-_2026-02-28_10-00-00
Archived repository: https://archive.org/details/github.com-torvalds-linux_-_2026-02-28_10-00-00
Git bundle: https://archive.org/download/github.com-torvalds-linux_-_2026-02-28_10-00-00/torvalds-linux_-_2026-02-28_10-00-00.bundle
Bulk-archive top starred repos
# Preview the top 10 without uploading
archive-watchlist --dry-run --top-n 10
# Full run with 8 parallel workers
archive-watchlist --workers 8
# Archive the most-starred repos created in the last 7 days
archive-watchlist --days 7
# Archive repos created in a specific date range
archive-watchlist --since 2025-01-01 --until 2025-06-30
# Archive repos created since a specific date
archive-watchlist --since 2025-06-01
# Preview trending repos from the last month
archive-watchlist --days 30 --dry-run --top-n 20
Usage
iagitup
iagitup [options] <repo_url>
| Flag | Short | Default | Description |
|---|---|---|---|
| repo_url | -- | (required) | Git repository URL to archive (GitHub, GitLab, Bitbucket, or any HTTPS git URL) |
| --metadata | -m | -- | Custom metadata fields (see Custom Metadata) |
| --version | -v | -- | Print version and exit |
archive-watchlist
archive-watchlist [options]
| Flag | Default | Description |
|---|---|---|
| --top-n N | 100 | Number of top repositories to fetch and check (max 100) |
| --days N | (all-time) | Only consider repos created within the last N days |
| --since DATE | -- | Only consider repos created on or after DATE (YYYY-MM-DD) |
| --until DATE | -- | Only consider repos created on or before DATE (YYYY-MM-DD) |
| --workers N | 4 | Number of parallel archive workers |
| --dry-run | off | Preview what would be archived -- no uploads, no state changes |
| --state-file PATH | ./watchlist_state.json | Path to the persistent state cache |
| --version / -v | -- | Print version and exit |
Examples:
# Use a custom state file
archive-watchlist --state-file /var/lib/iagitup/state.json
# Archive trending repos from the past week
archive-watchlist --days 7
# Archive repos from a date range with a custom state file
archive-watchlist --since 2025-01-01 --until 2025-12-31 --state-file /var/lib/iagitup/state.json
# Combine with top-n for a quick daily trending sweep
archive-watchlist --days 1 --top-n 20 --dry-run
Configuration
GitHub Authentication (GitHub repos only)
Unauthenticated GitHub API calls are rate-limited to 60 requests/hour. Set GITHUB_TOKEN to raise this to 5,000/hour:
export GITHUB_TOKEN=ghp_your_token_here
iagitup https://github.com/user/repo
Generate a token at https://github.com/settings/tokens -- no specific scopes are required for public repositories.
Internet Archive Credentials
On first run, if no credentials are found, iagitup will prompt you to run ia configure interactively. Credentials are stored in ~/.ia or ~/.config/ia.ini and reused on subsequent runs.
You can also configure them manually:
ia configure
Or create ~/.ia directly:
[s3]
access = YOUR_ACCESS_KEY
secret = YOUR_SECRET_KEY
Find your keys at https://archive.org/account/s3.php.
Custom Metadata
Pass additional Internet Archive metadata fields as comma-separated key:value pairs:
iagitup --metadata="subject:python;cli,creator:myorg" https://github.com/user/repo
Custom fields are merged into the default metadata. Any key that matches a default field will override it.
Default metadata fields
| Field | Value |
|---|---|
| mediatype | software |
| collection | open_source_software |
| creator | GitHub owner login |
| title | IA item identifier |
| date | Last push date (YYYY-MM-DD) |
| year | Last push year |
| subject | {Platform};code;software;git (e.g. GitHub, GitLab, Bitbucket) |
| originalurl | Repository URL |
| pushed_date | Full push timestamp (YYYY-MM-DD HH:MM:SS) |
| uploaded_with | iagitup-vX.X.X |
| description | HTML: repo description + README + restore instructions |
Extra fields added by archive-watchlist
| Field | Value |
|---|---|
| stars_count | Stargazer count at time of archive |
| forks_count | Fork count |
| watchers_count | Watcher count |
| language | Primary programming language |
| topics | Semicolon-joined topic list |
| github_rank | Position in the top-N list |
| subject | Extended: base tags + language + topics |
How It Works
Single repository (iagitup)
- Fetches metadata -- for GitHub repos, from the GitHub API (
pushed_at, description, owner, etc.); for all other platforms, from the local git history after cloning. - Checks for duplicates -- the IA item identifier is derived from the platform hostname, repo name, and
pushed_attimestamp ({platform}-{owner}-{repo}_-_{YYYY-MM-DD_HH-MM-SS}). If an item with that identifier already exists, iagitup exits early. - Clones the repository in full (all branches and tags).
- Downloads the owner's avatar as a cover image (
cover.jpg), concurrently with wiki cloning (GitHub only; skipped for other platforms). - Creates git bundles (
git bundle create --all) for the repository and, if present, the wiki. - Builds an HTML description from the repo description, README (
.mdor.txt), and restore instructions. - Uploads the bundle(s) and cover image to the Internet Archive.
Each archived repository becomes a single IA item containing:
| File | Description |
|---|---|
| <bundle_name>.bundle | Full git bundle (all branches + tags) |
| cover.jpg | Repository owner's avatar |
| <bundle_name>_wiki.bundle | Wiki git bundle (if wiki exists) |
| <bundle_name>_lfs.tar.gz | Git LFS objects (if repo uses LFS) |
Bulk archiving (archive-watchlist)
- Fetches the top-N repos from the GitHub Search API (sorted by stars, optionally filtered by
--days,--since, or--until). - Compares each repo's
pushed_atagainst a local
