Dpp
High-performance offline processing of DNS traffic from PCAP files.
Install / Use
/learn @dnstelecom/DppREADME
DNS Packet Processor (DPP) — Community Edition
<p align="left"> <img alt="License: GPLv3" src="https://img.shields.io/badge/License-GPLv3-blue.svg"> <a href="https://github.com/dnstelecom/dpp/releases/latest"> <img alt="Latest release" src="https://img.shields.io/github/v/release/dnstelecom/dpp?display_name=tag"> </a> </p> <p align="left"> High-performance offline DNS extraction from PCAP to CSV or Parquet. </p>DPP is a Rust application for parsing, matching, and exporting DNS query/response traffic from offline PCAP files. It is designed for large captures, bounded parallel processing, and downstream analytics workflows.
Table of Contents
- Overview
- Features
- Architecture
- Documentation
- Prerequisites
- Build
- Usage
- Example run
- Performance Optimization
- Limitations
- Commercial Edition
- License
Overview
DPP reads offline PCAP files, extracts DNS traffic, matches queries with responses, and writes structured output to CSV or Parquet. The pipeline emphasizes throughput, bounded backpressure, deterministic aggregation, and optional IP pseudonymization.
Features
- Offline capture parsing through a pure-Rust classic-PCAP reader with
libpcapfallback for non-classic formats. - Multi-threaded processing with cheap packet routing, canonical flow-based shard ownership, and deterministic aggregation under parallel load.
- Adaptive runtime behavior: low-core hosts fall back to a simpler phase-parallel path.
- DNS query/response matching using source and destination IPs, ID, QNAME, QTYPE, and closely aligned timestamps.
- A single forward-only matching mode with retry deduplication inside the configurable match-timeout window (
1200msby default). - Optional monotonic-capture mode for globally ordered captures, enabling batched timeout eviction with fail-fast validation on timestamp regressions.
- CSV and Parquet output, with optional Zstandard (
zstd) compression for Parquet. - Asynchronous output pipeline to reduce write-side overhead.
- Optional deterministic IP pseudonymization.
- Peak RSS memory tracking for performance analysis.
- Graceful
SIGINT/SIGTERMhandling that stops intake and drains in-flight work; any still-buffered output tail is discarded before final writer teardown to avoid a skewed partial ending. - Graceful handling of malformed packets and I/O errors.
Architecture
The table below is a high-level map of the system. For the canonical architecture reference, ownership boundaries, and matcher invariants, see docs/architecture.md.
| Layer | Components | Responsibility |
|-----------------------------|---------------------------------------------------------------|----------------------------------------------------------------------------------------|
| Configuration and contracts | src/config.rs, src/cli.rs, src/record.rs | Runtime constants, CLI/environment contract, and exported DnsRecord schema |
| Input parsing | PacketParser | Offline capture input with pure-Rust classic-PCAP parsing and libpcap fallback |
| DNS processing | DnsProcessor, pipeline.rs | Packet routing, DNS decoding, matching, shard ownership, and deterministic aggregation |
| Runtime and orchestration | src/app.rs, src/runtime.rs, src/allocator.rs | App lifecycle, bootstrap, reporting, thread-pool setup, and allocator selection |
| Output pipeline | src/output.rs, src/csv_writer.rs, src/parquet_writer.rs | Async writer lifecycle and CSV/Parquet serialization |
| Memory monitoring | src/monitor_memory.rs | Peak RSS tracking with explicit stop/join lifecycle |
| References and benchmarks | docs/rfc/, benches/ | Architecture decisions, benchmark workflow, and harnesses |
Documentation
- Canonical architecture reference: docs/architecture.md
- Architecture decision records: docs/rfc/README.md
- Benchmark workflow: benches/README.md
- Benchmark harness: benches/benchmark.sh
- Contribution guide: CONTRIBUTING.md
- Allocator guide: docs/allocator-guide.md
- Allocator benchmarking protocol: benches/allocator-benchmarking.md
- Encapsulation handling playbook: docs/encapsulation-playbook.md
- Synthetic DNS PCAP generator: docs/synthetic-pcap-generator.md
Prerequisites
- Rust
- Cargo
- PCAP files for offline processing
libpcapdevelopment headers for fallback support on non-classic formats
Ubuntu/Debian:
sudo apt-get install libpcap-dev
Build
Standard release build:
cargo build --release
For best throughput on the target host:
RUSTFLAGS='-C target-cpu=native' cargo build --release
Performance benchmarking build:
cargo build --profile perf
The perf profile inherits from release but disables release overflow checks. Use it only for
trusted performance measurements on representative input. The default release profile remains the
safe production build.
Usage
Basic examples
# Export to CSV
dpp input.pcap output.csv
# Export to Parquet
dpp -f parquet input.pcap output.pq
# Export to Parquet with Zstd compression
dpp -f parquet --zstd input.pcap output.pq
# Enable deterministic IP pseudonymization
dpp --anonymize /tmp/anon.key input.pcap output.csv
# Quiet mode
dpp -s -f parquet input.pcap dns_output.pq
# Stream CSV records to stdout
dpp input.pcap - > output.csv
# Emit a machine-readable JSON summary object at the end of the run
dpp --report-format json input.pcap output.csv > dpp-summary.json
If output_filename is omitted, DPP chooses the default file name from the resolved output format:
dns_output.csv for csv and dns_output.parquet for parquet or pq. Use - only when you
want CSV records on stdout.
Create an anonymization key
--anonymize expects a text file. DPP reads the file contents as a passphrase and derives the
internal pseudonymization key from that value.
One practical way to create a key file on Linux or macOS is:
umask 077
openssl rand -hex 32 > /tmp/anon.key
Then use it like this:
dpp --anonymize /tmp/anon.key input.pcap output.csv
Notes:
- Keep the key file private. Anyone with the same file can reproduce the same pseudonymized output.
- The key file must contain valid UTF-8 text because DPP reads it as a text passphrase.
- Rotating the key changes the resulting pseudonymized IP addresses for the same input capture.
- DPP intentionally uses a fixed PBKDF2 salt for deterministic pseudonymization. The passphrase is still the operator-controlled secret; changing it rotates the derived pseudonyms.
- If
--anonymizeorDPP_ANONYMIZEis configured and the key file is missing, unreadable, or invalid, DPP exits with an error. It does not silently fall back to pass-through IP addresses.
Arguments
| Argument | Description |
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| filename | Path to the input PCAP file |
| output_filename | Optional output file path. If omitted, DPP writes to dns_output.csv for csv and to dns_output.parquet for parquet or pq. Use - for CSV stdout output. |
Options
| Option | Description |
|-----------------------------------|---------------------------------------------------------------------------------------------------------------------|
| -s, --silent | Suppress info-level log output |
| -f, --format <csv\|parquet\|pq> | Select output format; stdout output is supported only for csv |
| --report-format <text\|json> | Select the final process report format; defaults to text; json cannot be combined with output_filename = - |
| --match-timeout-ms <MS> | Set the DNS query-response match timeout in milliseconds; allowed range is 1..=5000, default is 1200 |
| --monotonic-capture | Assume globally monotonic packet timestamps, enable batched timeout eviction, and abort if a regression is detected |
| -b, --bonded <N> | Set I/O channel capacity; 0 uses the safe default bounded capacity |
| -z, --zstd | Enable Zstd compression for Parquet output
