Pdscan
Scan your data stores for unencrypted personal data (PII)
Install / Use
/learn @ankane/PdscanREADME
pdscan
Scan your data stores for unencrypted personal data (PII)
- Last names (US)
- Email addresses
- IP addresses (IPv4)
- Street addresses (US)
- Phone numbers
- Credit card numbers
- Social Security numbers (US)
- Dates of birth
- Location data
- OAuth tokens
- MAC addresses
Uses data sampling and naming, and works with compressed files
:boom: Zero runtime dependencies and minimal database load
Installation
Download the latest version:
You can also install it with Homebrew or Docker.
Data Stores
Elasticsearch
pdscan elasticsearch+http://user:pass@host:9200
For HTTPS, use elasticsearch+https://.
You can also specify indices.
pdscan elasticsearch+http://user:pass@host:9200/index1,index2
Wildcards are also supported.
pdscan "elasticsearch+http://user:pass@host:9200/index*"
Files
pdscan file://path/to/file.txt
You can also specify a directory.
pdscan file://path/to/directory
For absolute paths, use file:///.
pdscan file:///absolute/path/to/file.txt
For paths relative to your home directory on Mac and Linux, use:
pdscan file://$HOME/file.txt
MariaDB
pdscan mariadb://user:pass@host:3306/dbname
MongoDB
pdscan mongodb://user:pass@host:27017/dbname
MySQL
pdscan mysql://user:pass@host:3306/dbname
OpenSearch
pdscan opensearch+http://user:pass@host:9200
For HTTPS, use opensearch+https://.
You can also specify indices.
pdscan opensearch+http://user:pass@host:9200/index1,index2
Wildcards are also supported.
pdscan "opensearch+http://user:pass@host:9200/index*"
Postgres
pdscan postgres://user:pass@host:5432/dbname
Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.
If your connection doesn’t use SSL, append to the URI:
?sslmode=disable
For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).
CREATE EXTENSION tsm_system_rows;
Redis
pdscan redis://user:pass@host:6379/db
S3
pdscan s3://bucket/path/to/file.txt
Requires
s3:GetObjectpermission
You can also specify a prefix by ending with a /.
pdscan s3://bucket/path/to/directory/
Requires
s3:ListBucketands3:GetObjectpermissions
SQLite
pdscan sqlite://path/to/dbname.sqlite3
Not available with prebuilt binaries
SQL Server
pdscan "sqlserver://user:pass@host:1433?database=dbname"
Options
Show the data found
pdscan --show-data
Show low confidence matches
pdscan --show-all
Change the sample size
pdscan --sample-size 50000
Specify the number of processes to use (defaults to 1)
pdscan --processes 4
Scan for only certain types of data
pdscan --only email,phone,location
Scan for all except certain types of data
pdscan --except ip,mac
Specify the minimum number of rows/documents/lines for a match (experimental)
pdscan --min-count 10
Specify a custom pattern (experimental)
pdscan --pattern "\d{16}"
Output newline delimited JSON (experimental)
pdscan --format ndjson
Additional Installation Methods
Homebrew
With Homebrew, you can use:
brew install ankane/brew/pdscan
Docker
Get the Docker image with:
docker pull ankane/pdscan
And run it with:
docker run -ti ankane/pdscan <connection-uri>
For data stores on the host machine, use host.docker.internal as the hostname
docker run -ti ankane/pdscan "postgres://user@host.docker.internal:5432/dbname?sslmode=disable"
On Linux, this requires
--add-host=host.docker.internal:host-gateway
For files on the host machine, use:
docker run -ti -v /path/to/files:/data ankane/pdscan file:///data
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/pdscan.git
cd pdscan
make test
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
