TraceTree

TraceTree - Runtime behavioral analysis tool that maps the process cascade of suspicious packages into a directed tree, catching supply chain attacks that install-time scanners miss.

Generate Convert Improve

Install / Use

/learn @tejasprasad2008-afk/TraceTree

About this skill

Quality Score

0/100

README

TraceTree

Runtime behavioral analysis for Python packages, npm modules, DMG and EXE files — catching supply chain attacks that install-time scanners miss.

Header Banner

How It Works

TraceTree executes suspicious packages inside an isolated Docker sandbox. Right after the initial download starts, it drops the container's network interface. This safely triggers and logs malicious outbound connection attempts without actually letting traffic escape.

A regex engine parses the strace output, tracks system calls (like clone, execve, socket, and openat), and builds a directed graph using NetworkX. Finally, a RandomForestClassifier trained on known malware evaluates the graph's topology to detect anomalous behavior.

Installation

You need Python 3.9+ and Docker running on your machine.

git clone https://github.com/tejasprasad2008-afk/TraceTree.git
cd TraceTree

# Install the CLI tool in editable mode
pip install -e .

Usage

The pipeline is controlled via a Typer CLI.

# Analyze a PyPI package
cascade-analyze requests

# Evaluate standard dependency files
cascade-analyze requirements.txt
cascade-analyze package.json

# Analyze compiled installers
cascade-analyze malicious_app.dmg
cascade-analyze payload.exe

Advanced Training & Dataset Ingestion

TraceTree features an Online Training Pipeline that can fetch live malware samples from MalwareBazaar.

Local Training

If you want to train the model locally using the datasets in data/:

# Start the interactive training pipeline
cascade-train

During cascade-train, you will be prompted for a MalwareBazaar Auth Key. If provided, the tool will:

Ingest: Fetch the latest malicious Python samples from MalwareBazaar.
Sandbox: Run them through the Docker pipeline to extract fresh behavioral footprints.
Train: Re-calculate the Random Forest weights to include the new data.
Sync: Automatically cache the new model locally.

Model Synchronization

To fetch the latest pre-trained model directly from the global cloud storage:

# Force download the latest global model
cascade-update

Who Is This For

Security Researchers: Hunting undocumented supply chain behavior.
DevOps / DevSecOps: Validating the runtime safety of injected dependencies.
Software Engineers: Profiling the exact syscall requirements of applications.

Architecture

The pipeline is split into 5 core modules:

/sandbox: Manages the Docker container lifecycle and actively restricts networking during testing.
/monitor: Parses the strace log to track execution paths and network attempts.
/graph: Uses networkx to translate parent/child process relationships into an edge graph.
/ml: Feeds the extracted graph features into a RandomForestClassifier for anomaly detection.
/cli: The Typer entrypoint that orchestrates the pipeline and renders the terminal UI.

Threat Model

In late 2024, the highly obfuscated XZ Utils backdoor bypassed standard static scanning. Advanced supply chain malware often hides malicious operations deep within legitimate-looking test code or delayed payload fetches. By analyzing the runtime execution graph, TraceTree bypasses code obfuscation entirely to see exactly what external files, commands, and sockets a package actually tries to open.

Contributing

Pull requests are welcome. Please ensure new features remain decoupled across the existing architecture.

License

MIT

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。