☠ PHANTOM

Autonomous Adversary Simulation Platform

AI-native penetration testing — autonomous reconnaissance, exploitation, and verified results.

Quick Start · Architecture · Usage · Configuration · Contributing

</div>

Overview

Phantom is an autonomous AI penetration testing agent built on the ReAct (Reason–Act) loop. It connects a large language model to over 30 professional security tools, runs all offensive operations inside an isolated Docker sandbox, and produces verified vulnerability reports — entirely without human intervention.

Unlike CVE-signature scanners, Phantom reasons about your target: it reads HTTP responses, forms hypotheses, selects the right tool, chains multi-step exploits, then writes and executes a proof-of-concept script to confirm every finding before it appears in a report.

Core Capabilities

<table> <tr> <td align="center">🧠</td> <td>Autonomous ReAct Loop — Plans, executes tools, reads results, re-plans. Handles dead ends and unexpected responses without human guidance.</td> </tr> <tr> <td align="center">🔧</td> <td>53 Security Tools — nmap · nuclei · sqlmap · ffuf · httpx · katana · subfinder · nikto · gobuster · arjun · semgrep · playwright — all orchestrated automatically.</td> </tr> <tr> <td align="center">🐳</td> <td>Ephemeral Docker Sandbox — All offensive tooling runs in a network-restricted Kali Linux container. Zero host filesystem access. Container is destroyed after every scan.</td> </tr> <tr> <td align="center">⚡</td> <td>Multi-Agent Parallelism — Spawns specialized sub-agents (SQLi, XSS, recon) that work concurrently and report findings to the coordinator.</td> </tr> <tr> <td align="center">🛡️</td> <td>7-Layer Defense Model — Scope guard → Tool firewall → Docker sandbox → Cost limiter → Time budget → HMAC audit trail → Output sanitizer.</td> </tr> <tr> <td align="center">✅</td> <td>Verified Findings Only — No hallucinations. Every reported vulnerability includes raw HTTP evidence, reproduction steps, and a working exploit script.</td> </tr> <tr> <td align="center">🗺️</td> <td>MITRE ATT&CK Enrichment — Automatic CWE, CAPEC, technique-level tagging, and CVSS 3.1 scoring per finding.</td> </tr> <tr> <td align="center">📋</td> <td>Compliance Coverage — OWASP Top 10 (2021) · PCI DSS v4.0 · NIST 800-53 — mapped automatically per finding.</td> </tr> <tr> <td align="center">💾</td> <td>Knowledge Persistence — Cross-scan memory stores hosts, past findings, and false-positive signatures. Each scan learns from the last.</td> </tr> <tr> <td align="center">💰</td> <td>Full Cost Control — Per-request and per-scan budget caps. Every token and every dollar tracked in real time.</td> </tr> </table>

Architecture

<details open> <summary>① System Architecture — Component Overview</summary>

%%{init: {"theme": "dark"}}%%
flowchart TD
    USER(["👤 User / CI-CD"])

    subgraph IFACE["Interface Layer"]
        CLI["CLI · TUI"]
        PARSER["Output Parser"]
    end

    subgraph ORCH["Orchestration"]
        PROFILE["Scan Profile"]
        SCOPE["Scope Guard"]
        COST["Cost Controller"]
        AUDIT["HMAC Audit Log"]
    end

    subgraph AGENT["Agent Core — ReAct"]
        LLM["LLM via LiteLLM"]
        STATE["State Machine"]
        MEM["Memory Engine"]
        SKILLS["Skills Engine"]
    end

    subgraph SEC["Security Layer"]
        FW["Tool Firewall"]
        VERIFY["Verifier"]
        SANIT["Sanitizer"]
    end

    subgraph SANDBOX["Docker Sandbox — Kali Linux"]
        TSRV["Tool Server :48081"]
        TOOLS["30+ Security Tools"]
        BROWSER["Playwright · Chromium"]
        PROXY["Caido Proxy :48080"]
    end

    subgraph OUTPUT["Output Pipeline"]
        REPORTS["JSON · MD · HTML"]
        GRAPH["Attack Graph"]
        MITRE["MITRE ATT&CK Map"]
    end

    USER --> IFACE
    IFACE --> ORCH
    ORCH --> AGENT
    AGENT <--> SEC
    SEC --> SANDBOX
    AGENT --> OUTPUT

    style IFACE fill:#6c5ce7,stroke:#a29bfe,color:#ffffff
    style ORCH fill:#00b894,stroke:#55efc4,color:#ffffff
    style AGENT fill:#e17055,stroke:#fab1a0,color:#ffffff
    style SEC fill:#d63031,stroke:#ff7675,color:#ffffff
    style SANDBOX fill:#0984e3,stroke:#74b9ff,color:#ffffff
    style OUTPUT fill:#f9ca24,stroke:#f0932b,color:#2d3436

</details> <details> <summary>② Scan Execution Flow — Phase by Phase</summary>

%%{init: {"theme": "dark"}}%%
sequenceDiagram
    actor User
    participant CLI as Phantom CLI
    participant Orch as Orchestrator
    participant Agent as Agent ReAct
    participant FW as Tool Firewall
    participant Box as Docker Sandbox
    participant LLM as LLM Provider
    participant T as Target App

    User->>CLI: phantom scan -t https://app.com
    CLI->>Orch: Validate scope · init cost controller
    Orch->>Box: Spin up ephemeral Kali container
    Orch->>Agent: Begin scan · profile + scope injected

    rect rgb(48, 25, 80)
        Note over Agent,LLM: Phase 1 — Reconnaissance
        Agent->>LLM: Analyze target · plan recon
        LLM-->>Agent: Run katana · httpx · nmap
        Agent->>FW: Validate tool call
        FW-->>Agent: Approved
        Agent->>Box: Execute recon tools
        Box->>T: HTTP probes · port scans · crawl
        T-->>Box: Responses
        Box-->>Agent: Endpoints · tech stack · open ports
    end

    rect rgb(80, 20, 20)
        Note over Agent,LLM: Phase 2 — Exploitation
        Agent->>LLM: Hypothesize attack vectors
        LLM-->>Agent: SQLi on /api/login · XSS on /search
        Agent->>Box: sqlmap · custom payload injection
        Box->>T: Exploit attempts
        T-->>Box: Vulnerability confirmed
        Box-->>Agent: Raw HTTP evidence
    end

    rect rgb(15, 60, 30)
        Note over Agent,LLM: Phase 3 — Verification
        Agent->>Box: Re-exploit with clean PoC script
        Box->>T: Reproduce exact attack
        T-->>Box: Confirmed
        Agent->>Agent: CVSS 3.1 · CWE tag · MITRE map
    end

    Agent->>CLI: Findings compiled
    CLI->>User: Vulnerabilities + PoCs + Compliance
    CLI->>Box: Destroy container

</details> <details> <summary>③ Agent ReAct Loop — Decision Cycle</summary>

%%{init: {"theme": "dark"}}%%
flowchart LR
    INIT(["Scan Start"])

    OBS["Observe\nCollect results"]
    THINK["Reason\nAnalyze context"]
    PLAN["Plan\nChoose tool"]
    ACT["Act\nBuild arguments"]
    FW{"Firewall?"}
    EXEC["Execute\nDocker sandbox"]
    DONE{"Stop\nCondition?"}
    VERIFY["Verify\nRe-test findings"]
    ENRICH["Enrich\nMITRE · CVSS"]
    REPORT["Report\nJSON · HTML · MD"]
    FINISH(["Scan Complete ☠"])

    INIT --> OBS
    OBS --> THINK
    THINK --> PLAN
    PLAN --> ACT
    ACT --> FW
    FW -- "✓ Pass" --> EXEC
    FW -- "✗ Block" --> THINK
    EXEC --> OBS
    OBS --> DONE
    DONE -- "Continue" --> THINK
    DONE -- "Done" --> VERIFY
    VERIFY --> ENRICH
    ENRICH --> REPORT
    REPORT --> FINISH

    style INIT fill:#6c5ce7,stroke:#a29bfe,color:#fff
    style FINISH fill:#6c5ce7,stroke:#a29bfe,color:#fff
    style FW fill:#d63031,stroke:#ff7675,color:#fff
    style DONE fill:#e17055,stroke:#fab1a0,color:#fff
    style EXEC fill:#0984e3,stroke:#74b9ff,color:#fff
    style REPORT fill:#00b894,stroke:#55efc4,color:#fff

</details> <details> <summary>④ Docker Sandbox — Isolation Architecture</summary>

%%{init: {"theme": "dark"}}%%
flowchart LR
    HOST(["Phantom Agent\nHost Machine"])

    subgraph CONTAINER["Kali Linux Container — Network Isolated"]
        TSRV["Tool Server :48081"]
        PROXY["Caido Proxy :48080"]

        subgraph TOOLKIT["Security Toolkit"]
            SCA["nmap · masscan"]
            INJ["sqlmap · nuclei"]
            FUZ["ffuf · gobuster · arjun"]
            WEB["httpx · katana"]
            ANA["nikto · semgrep"]
        end

        subgraph RUNTIME["Runtime Environment"]
            PY["Python 3.12"]
            BR["Playwright + Chromium"]
            SH["Bash Shell"]
        end
    end

    TARGET(["Target\nApplication"])

    HOST -- "Authenticated API" --> TSRV
    TSRV --> TOOLKIT
    TSR

Phantom

Install / Use

README

☠ PHANTOM

Autonomous Adversary Simulation Platform

Overview

Core Capabilities

Architecture