SkillAgentSearch skills...

Dpp

High-performance offline processing of DNS traffic from PCAP files.

Install / Use

/learn @dnstelecom/Dpp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="left"> <a href="https://github.com/dnstelecom/dpp"> <img src="assets/dpp_mascote.svg" alt="DNS Packet Processor mascot" width="180"> </a> </p>

DNS Packet Processor (DPP) — Community Edition

<p align="left"> <img alt="License: GPLv3" src="https://img.shields.io/badge/License-GPLv3-blue.svg"> <a href="https://github.com/dnstelecom/dpp/releases/latest"> <img alt="Latest release" src="https://img.shields.io/github/v/release/dnstelecom/dpp?display_name=tag"> </a> </p> <p align="left"> High-performance offline DNS extraction from PCAP to CSV or Parquet. </p>

DPP is a Rust application for parsing, matching, and exporting DNS query/response traffic from offline PCAP files. It is designed for large captures, bounded parallel processing, and downstream analytics workflows.

Table of Contents

Overview

DPP reads offline PCAP files, extracts DNS traffic, matches queries with responses, and writes structured output to CSV or Parquet. The pipeline emphasizes throughput, bounded backpressure, deterministic aggregation, and optional IP pseudonymization.

Features

  • Offline capture parsing through a pure-Rust classic-PCAP reader with libpcap fallback for non-classic formats.
  • Multi-threaded processing with cheap packet routing, canonical flow-based shard ownership, and deterministic aggregation under parallel load.
  • Adaptive runtime behavior: low-core hosts fall back to a simpler phase-parallel path.
  • DNS query/response matching using source and destination IPs, ID, QNAME, QTYPE, and closely aligned timestamps.
  • A single forward-only matching mode with retry deduplication inside the configurable match-timeout window (1200ms by default).
  • Optional monotonic-capture mode for globally ordered captures, enabling batched timeout eviction with fail-fast validation on timestamp regressions.
  • CSV and Parquet output, with optional Zstandard (zstd) compression for Parquet.
  • Asynchronous output pipeline to reduce write-side overhead.
  • Optional deterministic IP pseudonymization.
  • Peak RSS memory tracking for performance analysis.
  • Graceful SIGINT/SIGTERM handling that stops intake and drains in-flight work; any still-buffered output tail is discarded before final writer teardown to avoid a skewed partial ending.
  • Graceful handling of malformed packets and I/O errors.

Architecture

The table below is a high-level map of the system. For the canonical architecture reference, ownership boundaries, and matcher invariants, see docs/architecture.md.

| Layer | Components | Responsibility | |-----------------------------|---------------------------------------------------------------|----------------------------------------------------------------------------------------| | Configuration and contracts | src/config.rs, src/cli.rs, src/record.rs | Runtime constants, CLI/environment contract, and exported DnsRecord schema | | Input parsing | PacketParser | Offline capture input with pure-Rust classic-PCAP parsing and libpcap fallback | | DNS processing | DnsProcessor, pipeline.rs | Packet routing, DNS decoding, matching, shard ownership, and deterministic aggregation | | Runtime and orchestration | src/app.rs, src/runtime.rs, src/allocator.rs | App lifecycle, bootstrap, reporting, thread-pool setup, and allocator selection | | Output pipeline | src/output.rs, src/csv_writer.rs, src/parquet_writer.rs | Async writer lifecycle and CSV/Parquet serialization | | Memory monitoring | src/monitor_memory.rs | Peak RSS tracking with explicit stop/join lifecycle | | References and benchmarks | docs/rfc/, benches/ | Architecture decisions, benchmark workflow, and harnesses |

Documentation

Prerequisites

  • Rust
  • Cargo
  • PCAP files for offline processing
  • libpcap development headers for fallback support on non-classic formats

Ubuntu/Debian:

sudo apt-get install libpcap-dev

Build

Standard release build:

cargo build --release

For best throughput on the target host:

RUSTFLAGS='-C target-cpu=native' cargo build --release

Performance benchmarking build:

cargo build --profile perf

The perf profile inherits from release but disables release overflow checks. Use it only for trusted performance measurements on representative input. The default release profile remains the safe production build.

Usage

Basic examples

# Export to CSV
dpp input.pcap output.csv

# Export to Parquet
dpp -f parquet input.pcap output.pq

# Export to Parquet with Zstd compression
dpp -f parquet --zstd input.pcap output.pq

# Enable deterministic IP pseudonymization
dpp --anonymize /tmp/anon.key input.pcap output.csv

# Quiet mode
dpp -s -f parquet input.pcap dns_output.pq

# Stream CSV records to stdout
dpp input.pcap - > output.csv

# Emit a machine-readable JSON summary object at the end of the run
dpp --report-format json input.pcap output.csv > dpp-summary.json

If output_filename is omitted, DPP chooses the default file name from the resolved output format: dns_output.csv for csv and dns_output.parquet for parquet or pq. Use - only when you want CSV records on stdout.

Create an anonymization key

--anonymize expects a text file. DPP reads the file contents as a passphrase and derives the internal pseudonymization key from that value.

One practical way to create a key file on Linux or macOS is:

umask 077
openssl rand -hex 32 > /tmp/anon.key

Then use it like this:

dpp --anonymize /tmp/anon.key input.pcap output.csv

Notes:

  • Keep the key file private. Anyone with the same file can reproduce the same pseudonymized output.
  • The key file must contain valid UTF-8 text because DPP reads it as a text passphrase.
  • Rotating the key changes the resulting pseudonymized IP addresses for the same input capture.
  • DPP intentionally uses a fixed PBKDF2 salt for deterministic pseudonymization. The passphrase is still the operator-controlled secret; changing it rotates the derived pseudonyms.
  • If --anonymize or DPP_ANONYMIZE is configured and the key file is missing, unreadable, or invalid, DPP exits with an error. It does not silently fall back to pass-through IP addresses.

Arguments

| Argument | Description | |-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------| | filename | Path to the input PCAP file | | output_filename | Optional output file path. If omitted, DPP writes to dns_output.csv for csv and to dns_output.parquet for parquet or pq. Use - for CSV stdout output. |

Options

| Option | Description | |-----------------------------------|---------------------------------------------------------------------------------------------------------------------| | -s, --silent | Suppress info-level log output | | -f, --format <csv\|parquet\|pq> | Select output format; stdout output is supported only for csv | | --report-format <text\|json> | Select the final process report format; defaults to text; json cannot be combined with output_filename = - | | --match-timeout-ms <MS> | Set the DNS query-response match timeout in milliseconds; allowed range is 1..=5000, default is 1200 | | --monotonic-capture | Assume globally monotonic packet timestamps, enable batched timeout eviction, and abort if a regression is detected | | -b, --bonded <N> | Set I/O channel capacity; 0 uses the safe default bounded capacity | | -z, --zstd | Enable Zstd compression for Parquet output

View on GitHub
GitHub Stars13
CategoryDevelopment
Updated1d ago
Forks0

Languages

Rust

Security Score

90/100

Audited on Apr 1, 2026

No findings