Pwnkit
Attack-driven evals and autonomous pentesting for AI systems, web apps, code, and packages.
Install / Use
/learn @PwnKit-Labs/PwnkitQuality Score
Category
Development & EngineeringSupported Platforms
README
Fully autonomous agentic pentesting for web apps, AI/LLM apps, npm packages, and source code.
A PwnKit Labs product.
This README is the fast path. The detailed command reference, configuration, architecture notes, recipes, and benchmark breakdowns live in the docs site.
Quick Start
Docker
docker run --rm -e OPENROUTER_API_KEY=$KEY \
ghcr.io/peaktwilight/pwnkit:latest scan --target https://example.com
If you use Azure OpenAI instead, also pass AZURE_OPENAI_BASE_URL and AZURE_OPENAI_MODEL. For the Responses API, the Azure base URL should include /openai/v1.
The image ships with Node 20, Playwright/Chromium, and the standard pentest toolbox (sqlmap, nmap, nikto, gobuster, ffuf, hydra, john, …) preinstalled.
npx / bunx
# Scan an AI / LLM endpoint
npx pwnkit-cli scan --target https://example.com/api/chat
# Pentest a web app
npx pwnkit-cli scan --target https://example.com --mode web
# White-box scan with source code access
npx pwnkit-cli scan --target https://example.com --repo ./source
# Audit an npm package
npx pwnkit-cli audit lodash
# Review source code
npx pwnkit-cli review ./my-app
# Auto-detect — just give it a target
npx pwnkit-cli https://example.com
Prefer Bun? Swap npx for bunx — same commands, same flags, zero-install, noticeably faster cold start. pwnkit-cli is pure-JS with a WASM SQLite core, so there are no native bindings to rebuild on either runtime.
Global install:
npm i -g pwnkit-cli
# or
bun add -g pwnkit-cli
What It Does
scantargets AI / LLM apps, web apps, REST / OpenAPI APIs, and MCP servers.auditinstalls and inspects npm packages withnpm audit, semgrep, and AI review.reviewperforms deep source-code security review on a local repo or Git URL.triage-dataturns benchmark runs and verified findings into labeled JSONL for triage-model training.cloud-sinkcan stream findings and final reports to an orchestrator withPWNKIT_CLOUD_SINK+PWNKIT_CLOUD_SCAN_ID.dashboard,history,findings, andtriageprovide local persistence and review workflows.
Why It’s Different
- Shell-first web pentesting. The agent uses
bash, writes scripts, and chains tools like a human pentester instead of being trapped in a small HTTP-tool DSL. - Blind verification. Findings are independently re-exploited before they are reported.
- Docs-backed benchmark transparency. The current benchmark details live in the docs and raw artifacts under
packages/benchmark/results.
Docs
Snapshot
- XBOW (black-box): 91/104 = 87.5%
- XBOW (white-box best-of-N aggregate): 96/104 = 92.3%
- Cybench: 8/10 = 80%
- AI / LLM regression set: 10/10
Both XBOW numbers are reported separately — no methodology blending. The 5 white-box-only flags (XBEN-023, 056, 063, 075, 061) come from the best-of-N aggregate across features=none / features=experimental / features=all runs with --repo source access. Same model, same tools, only the source-access flag differs. For the full benchmark methodology, caveats, and historical runs, use the benchmark docs page instead of the README.
GitHub Action
- uses: PwnKit-Labs/pwnkit@main
with:
mode: review
path: .
format: sarif
env:
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
Development
git clone https://github.com/PwnKit-Labs/pwnkit.git
cd pwnkit
pnpm install
pnpm lint
pnpm test
See CONTRIBUTING.md.
License
Apache 2.0 — built by PwnKit Labs and Doruk Tan Ozturk.
Related Skills
healthcheck
352.2kHost security hardening and risk-tolerance configuration for OpenClaw deployments
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
prose
352.2kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
