Dupin
Search Git repositories for secrets
Install / Use
/learn @guardian/DupinREADME
Dupin
But it is in matters beyond the limits of mere rule that the skill of the analyst is evinced. He makes in silence a host of observations and inferences....
— Edgar Allan Poe, The Murders in the Rue Morgue
Dupin is a tool to help discover secrets in Git repositories.
It is designed to be used as a tool for regularly scanning an organisation's public Git repositories, notifying a nominated email address when it finds anything that looks suspicious.
Quickstart
Install Dupin from source with pip install <path-to-dupin>.
(virtualenv is recommended)
For these examples we'll use ~/.dupin as our root directory,
you can use anything that makes sense for you.
ROOT=~/.dupin
# sets up a directory for Dupin to store its repositories and results
dupin setup --root $ROOT
# stores a list of your organisation's public repos
dupin update-repos --root $ROOT organisation-name
# if you get rate limit errors you'll need to provide a Github
# token with the --token argument
# scan all repositories in the list for secrets, logs and shows results
dupin auto-scan-all --root $ROOT
# this logs what it finds in the $ROOT/results directory and the
# details to the console
# it's also possible to email reports, more details below and in the
# config section
Installation
Dupin is an installable package Python package, but is not hosted in
public Python repositories. You can clone the source code and then
use pip to install Dupin. This will also install its dependencies.
As ever, it's better to install Dupin into a virtual environment. This prevents Dupin's dependencies from creating problems with other Python software on your machine.
git clone git@github.com:guardian/dupin.git
# via a virtualenv, or globally (may require sudo)
pip install dupin
You should then be able to run dupin.
AWS
This repository includes a CloudFormation template which creates an EC2 instance that runs Dupin on a schedule. If you have an AWS account this is the easiest way to run Dupin.
Usage
Dupin offers several commands. Check the program's main file for full info, the main commands are described below.
Note: many of these commands interact with Dupin's directory structure. More information about the layout Dupin uses to store data is available below, in the Directory structure section.
Global arguments
These arguments apply to many/all of Dupin's commands.
--root
Sets the root directory for Dupin's directory structure.
--config
By default, this is read from ROOT/config if a root is provided.
You may instead provide a custom location. This should point to a
yaml file that contains Dupin's config.
setup
The setup command initialises Dupin's directory structure. If you're using any of the features Dupin offers that depend on the data it has stored (likely) you'll need to run this command first.
Examples:
duping setup --root ~/.dupin
update-repos
This command looks up an organisation's public repositories on Github and writes them to a file.
Examples:
# provide args via a config file at ~/.dupin/config
dupin update-repos --root ~/.dupin
# provide args explicitly
dupin update-repos myorg --token abcdef
# save the list of repositories in a provided location
dupin update-repos --file /tmp/organisation-repos.txt
# exclude some very large repositories from the scan
dupin update-repos myorg --repo-exclusions organisation/large-repo.git
--file
By default it writes to ROOT/repository-urls (you'll need to provide
a --root argument to take advantage of this). You can specify an
alternative file.
--include-forks
By default, Dupin will not include repositories that are forks. These tend to contain only minor changes and the source repository is often very large. Dupin's aim is to try and find secrets in an organisation's repositories, if you'd like to include forks you should pass this flag.
--repo-exclusions
This setting specifies Git repositories that should be excluded from the resulting list.
Very large Git repositories cannot easily be scanned by TruffleHog, and therefor by Dupin. The resulting log file will likely be too large because of false positives and the scan itself will likely consume too much memory. You should use this option (or the corresponding config property) to exclude large repositories and use a different approach to check for secrets in those repositories.
auto-scan-all
This command scans all the repositories it finds in ROOT/repository-urls
for secrets, and saves its findings. It will also generate a diff of
these findings compared to the previous version and display this diff
for the user. This makes it easy to spot when secrets have been
introduced (or removed).
If you provide the --notify flag, Dupin will read the provided
configuration and email the changes in its findings.
NOTE: Emailing secrets is a silly idea so Dupin supports PGP encryption of its notification emails. To enable this feature simply provide a PGP Public Key in the configuration (read the config section for more info)
Examples:
# scans and prints changes to the console
dupin --root ~/.dupin auto-scan-all
# instruct Dupin to send notification emails (requires config)
dupin --root ~/.dupin auto-scan-all --notify
--notify
This flag tells Dupin to send notification emails. Doing so will require additional configuration. Since this configuration is non-trivial, you should provide it in a config file, rather than as arguments to Dupin.
More information on configuring Dupin for sending email is available below, under Configuration, specifically SMTP
Directory structure
Dupin creates a directory structure for storing its results as follows.
root
├── config
├── repository-urls
├── repositories
│ ├── example.git
│ │ ├── ...etc contents of example repo
│ │ └── .git
│ └── example-2.git
│ ├── ...etc contents of example-2 repo
│ └── .git
└── results
├── .git
├── example-2
└── example
config
You may provide a config file that saves passing lots of arguments to
all of Dupin's commands. By default, Dupin looks in ROOT/config for
this file.
repository-urls
This file contains a list of repository URLs, one per line. This is what Dupin uses to determine what to scan.
You can edit the list yourself, or generate it using Dupin's
update-repos command.
repositories
This is where Dupin stores a local copy of the repositories it scans. If Dupin finds a new repository while scanning it will clone a copy to this location. If the repo already exists it will update it before scanning.
results
The results directory is a Git repository that contains the history of Dupin's scans. This is also used to determine changes since when notifying Dupin emails details of changes.
Configuration
You can provide a config file to set some parameters for Dupin without needing to pass them every time. This also lets you keep secrets away from the git repository.
If you provide a --root argument to Dupin it will attempt to read the
config from a file in that root called config. Alternatively, you can
specify the config file location with the --config argument.
root
├── config <- default location for config
├── repository-urls
├── repositories
│ └── ...etc
└── results
└── ...etc
Here's an example configuration file. The file should be written using YAML. Look at config.py for more info about how this works.
github_token: xxxxxxxx-github-token-xxxxxxxx
organisation_name: your-organisation
notification_email: recipient@example.com
include_forks: true
repo_exclusions:
- organisation/large-repo.git
- org2/another-excluded-repo.git
smtp:
host: smtp-server.example.com
from: sender@example.com
username: username
password: password
pgp_key: |
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1
abdefghihjklmnopqrstuvwxyz...etc
...etc
...etc
Most of these setting can be provided as arguments to Dupin instead of
as configuration, but it's generally simpler and safer to put them in
a config file. In particular, the auto-scan-all reads its arguments
from the configuration for simplicity and the SMTP settings can only be
provided from config.
Github token
This is used when Dupin fetches the list of organisation repositories. Dupin searches public repositories so in theory this token isn't required. In practice, if your organisation has a large number of repositories you'll hit Github's rate limit while Dupin runs through the pagination. If this happens you'll need to provide authentication so you are given a higher rate limit.
Organisation name
This tells Dupin which organisation to use when it creates its list of repositories that should be scanned.
Notification email
Dupin uses this as a "to" address when it emails updates to your organisation's secrets.
Include forks
As described abve, Dupin will not include repositories that are forks.
If you'd like to include forks in the generated list of repositories,
you can specify this from Dupin's config by setting include_forks to
true.
Repository exclusions
This property allows you to exclude specified Git repositories from the list of repos that will be scanned. This is particularly useful if you have some very large repositories that Dupin is unable to scan.
The exclusions should be provided as a YAML list. Any repository that matches any of the provided strings will be excluded, so be specific (you should probably include .git at the end where possible).
SMTP
If no SMTP host is provided, Dupin will attempt to send an email usin
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
