SkillAgentSearch skills...

MetaDetective

Unleash Metadata Intelligence with MetaDetective. Your Assistant Beyond Metagoofil.

Install / Use

/learn @franckferman/MetaDetective

README

<div id="top" align="center"> <!-- Sponsorship --> <p align="center"> <a href="https://iproyal.com/?r=848799" target="_blank"> <img src="https://raw.githubusercontent.com/franckferman/MetaDetective/stable/docs/github/graphical_resources/IPRoyal-Logo_Transparent_500x500.png" alt="Sponsored by IPRoyal" width="180"> </a> </p> <p align="center"><b>Supported by <a href="https://iproyal.com/?r=848799">IPRoyal</a></b> &mdash; Proxy services for OSINT and security research.</p> <br>

![Contributors][contributors-shield] ![Forks][forks-shield] ![Stargazers][stars-shield] ![Issues][issues-shield] ![PyPI][pypi-shield] ![Docker][docker-shield] ![License][license-shield] Python Stdlib ExifTool

<a href="https://github.com/franckferman/MetaDetective"> <img src="https://raw.githubusercontent.com/franckferman/MetaDetective/stable/docs/github/graphical_resources/Logo-Without_background-MetaDetective.png" alt="MetaDetective" width="340"> </a> <h3 align="center">MetaDetective</h3> <p align="center">Metadata extraction and web scraping for OSINT and pentesting.</p> </div>

Table of Contents


About

MetaDetective is a single-file Python 3 tool for metadata extraction and web scraping, built for OSINT and pentesting workflows.

It has no Python dependencies beyond exiftool. One curl and you're operational.

What it extracts: authors, software versions, GPS coordinates, creation/modification dates, internal hostnames, serial numbers, hyperlinks, camera models - across documents, images, and email files.

What it does beyond extraction:

  • Direct web scraping of target sites (no search engine dependency, no IP blocks)
  • GPS reverse geocoding with OpenStreetMap, map link generation
  • Export to HTML, TXT, or JSON
  • Selective field extraction with --parse-only
  • Deduplication across multiple files

It was built as a replacement for Metagoofil, which dropped native metadata analysis and relied on Google search (rate limiting, CAPTCHAs, proxy overhead).

<p align="center"> <img src="https://raw.githubusercontent.com/franckferman/MetaDetective/stable/docs/github/graphical_resources/Screenshot-MetaDetective_Demo.png" alt="MetaDetective demo" width="700"> </p> <p align="center"> <img src="https://raw.githubusercontent.com/franckferman/MetaDetective/stable/docs/github/graphical_resources/Screenshot-MetaDetective_Scraping_Demo.png" alt="MetaDetective scraping demo" width="700"> </p>

Installation

Requirements: Python 3, exiftool.

# Debian / Ubuntu / Kali
sudo apt install libimage-exiftool-perl

# macOS
brew install exiftool

# Windows
winget install OliverBetz.ExifTool

Direct download (recommended for field use)

curl -O https://raw.githubusercontent.com/franckferman/MetaDetective/stable/src/MetaDetective/MetaDetective.py
python3 MetaDetective.py -h

pip

pip install MetaDetective
metadetective -h

git clone

git clone https://github.com/franckferman/MetaDetective.git
cd MetaDetective
python3 src/MetaDetective/MetaDetective.py -h

Docker

docker pull franckferman/metadetective
docker run --rm franckferman/metadetective -h

# Mount a local directory
docker run --rm -v $(pwd)/loot:/data franckferman/metadetective -d /data

Usage

File analysis

# Analyze a directory (deduplicated singular view by default)
python3 MetaDetective.py -d ./loot/

# Specific file types, filter noise
python3 MetaDetective.py -d ./loot/ -t pdf docx -i admin anonymous

# Per-file display
python3 MetaDetective.py -d ./loot/ --display all

# Formatted output (singular/default display)
python3 MetaDetective.py -d ./loot/ --format formatted

# Single file
python3 MetaDetective.py -f report.pdf

# Multiple files
python3 MetaDetective.py -f report.pdf photo.heic

Summary and timeline

# Quick stats: identities, emails, GPS exposure, tools, date range
python3 MetaDetective.py -d ./loot/ --summary

# Chronological view of document creation/modification
python3 MetaDetective.py -d ./loot/ --timeline

# Both together
python3 MetaDetective.py -d ./loot/ --summary --timeline

# Scripting: no banner, summary only
python3 MetaDetective.py -d ./loot/ --summary --no-banner

Selective parsing

--parse-only limits extraction to specific fields. Useful to cut noise or target a specific data point.

# Extract only Author and Creator fields
python3 MetaDetective.py -d ./loot/ --parse-only Author Creator

# Extract GPS data only from iPhone photos
python3 MetaDetective.py -d ./photos/ -t heic heif --parse-only 'GPS Position' 'Map Link'

Export

# HTML report (default)
python3 MetaDetective.py -d ./loot/ -e

# TXT
python3 MetaDetective.py -d ./loot/ -e txt

# JSON - singular (deduplicated values per field)
python3 MetaDetective.py -d ./loot/ -e json

# JSON - per file
python3 MetaDetective.py -d ./loot/ --display all -e json

# Custom filename suffix and output directory
python3 MetaDetective.py -d ./loot/ -e json -c pentest-corp -o ~/results/

JSON singular output structure:

{
  "tool": "MetaDetective",
  "generated": "2026-03-21T...",
  "unique": {
    "Author": ["Alice Martin", "Bob Dupont"],
    "Creator Tool": ["Microsoft Word 16.0"]
  }
}

Pivot with jq:

jq '.unique.Author' MetaDetective_Export-*.json

Web scraping

MetaDetective can crawl a target website, discover downloadable files (PDF, DOCX, XLSX, images, etc.), and download them for local metadata analysis.

Two scraping modes:

  • --download-dir - Download files to a local directory for analysis. This is the primary mode.
  • --scan - Preview only: list discovered files and stats without downloading. Useful for scoping before a full download.

--scan and --download-dir are mutually exclusive.

The --depth flag is critical. By default, depth is 0: MetaDetective only looks at the URL you provide. Most interesting files (reports, presentations, internal documents) are linked from subpages, not the homepage. Always set --depth 1 or higher for real engagements.

| Depth | Behavior | |-------|----------| | 0 (default) | Only the target URL. Finds files directly linked on that single page. | | 1 | Target URL + all pages linked from it. Covers most site structures. | | 2+ | Follows links N levels deep. Broader coverage, more requests, slower. |

Download (primary workflow):

# Standard download with depth 1 (recommended starting point)
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --depth 1

# Target specific file types
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --depth 2 --extensions pdf docx xlsx pptx

# Parallel download (8 threads, 10 req/s)
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --depth 2 --threads 8 --rate 10

# Follow external links (CDN, subdomain, partner sites)
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --depth 1 --follow-extern

# Stealth: realistic User-Agent + low rate
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --depth 2 --user-agent stealth --rate 2

Scan (preview):

# Quick preview: how many files are reachable?
python3 MetaDetective.py --scraping --scan --url https://target.com/ --depth 1

# Filter preview by extension
python3 MetaDetective.py --scraping --scan --url https://target.com/ \
  --depth 2 --extensions pdf docx

Full pipeline (scrape + analyze + export):

# Step 1: download files
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --depth 2 --extensions pdf docx xlsx

# Step 2: analyze and export
python3 MetaDetective.py -d ~/loot/ -e html -o ~/results/

| Flag | Default | Description | |------|---------|-------------| | --url | required | Target URL | | --download-dir | - | Download destination (created if needed) | | --scan | - | Preview mode (no download) | | --depth | 0 | Link depth to follow. Set to 1+ for real use. | | --extensions | all supported | Filter by file type | | --threads | 4 | Concurrent download threads (1-100) | | --rate | 5 | Max requests per second (1-1000) | | --follow-extern | off | Follow links to external domains | | --user-agent | MetaDetective/<ver> | Custom or preset UA string |

Display modes

MetaDetective offers two display modes that control how results are structured:

--display singular (default) - Aggregates all unique values per field across every file. Best for OSINT: "who touched these documents?" at a glance.

# Default: deduplicated singular view
python3 MetaDetective.py -d ./loot/

# With formatted style (vertical list with markers)
python3 MetaDetective.py -d ./loot/ --format formatted

# With concise style (comma-separated on one line)
python3 MetaDetective.py -d ./loot/ --format concise

--display all - One block per file with its individual metadata. Best for forensic analysis: examine each document's properties independently.

python3 MetaDetective.py -d ./loot/ --display all

--format only works with --display singular. Usin

Related Skills

View on GitHub
GitHub Stars450
CategoryDevelopment
Updated49m ago
Forks48

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings