P2pspider

BitTorrent DHT crawler with real-time web UI. Indexes magnet links from DHT swarm into MongoDB/SQLite. Features responsive search, WebSocket updates, and torrent details.

Generate Convert Improve

Install / Use

/learn @thejordanprice/P2pspider

About this skill

Quality Score

0/100

README

p2pspider - DHT Spider

A daemon that crawls the BitTorrent DHT network and an Express web application that provides a searchable database of magnet links with real-time updates through WebSockets.

Intro

DHT Spider can index over 1 million magnets per 24 hours on modest hardware (2GB of RAM and around 2MB/s connection). It's resource-intensive and will use available CPU and RAM, which can be controlled via the 'ecosystem.json' file. On 2GB RAM, it's recommended to use 8 instances of the daemon and 2 of the webserver, all limited at 175MB.

Screenshots

DHT Spider Home Page

DHT Spider Search Results

Getting Started

# Install dependencies
npm install

# Set up configuration
cp .env.sample .env
# Edit .env file as needed

# Run the application
npm start  # Start the unified application (both crawler and web interface)

# Alternatively, use PM2 for process management
npm install -g pm2
npm run start:pm2  # Uses the ecosystem.json file
pm2 monit

Configuration

You will need to have port 6881 (or your configured port) open to the internet for the DHT crawler to function properly.

The application can be configured through the .env file:

# Database and server configuration
REDIS_URI=redis://127.0.0.1:6379
MONGO_URI=mongodb://127.0.0.1/magnetdb
SITE_HOSTNAME=http://127.0.0.1:8080
SITE_NAME=DHT Spider
SITE_PORT=8080

# Database options: "mongodb" or "sqlite"
DB_TYPE=sqlite

# Redis options: "true" or "false"
USE_REDIS=false

# SQLite database file location (only used if DB_TYPE=sqlite)
SQLITE_PATH=./data/magnet.db

# Elasticsearch options: "true" or "false"
USE_ELASTICSEARCH=false

# Elasticsearch connection
ELASTICSEARCH_NODE=http://localhost:9200
ELASTICSEARCH_INDEX=magnets

# Component control options: "true" or "false"
RUN_DAEMON=true
RUN_WEBSERVER=true

You can also fine-tune the crawler performance in the daemon.js file:

const p2p = P2PSpider({
    nodesMaxSize: 250,
    maxConnections: 500,
    timeout: 1000
});

It's not recommended to change the nodesMaxSize or maxConnections, but adjusting the timeout may increase indexing speed. Higher timeout values may require more RAM; the maximum recommended value is 5000ms.

Component Control

DHT Spider now allows you to run the daemon and webserver components independently:

RUN_DAEMON: Set to "true" to run the P2P Spider daemon, or "false" to disable it
RUN_WEBSERVER: Set to "true" to run the web server, or "false" to disable it

This flexibility allows you to:

Run only the daemon for dedicated crawling
Run only the webserver for serving existing data
Run both components together (default behavior)

Example usage:

# Run both components (default)
node app.js

# Run only the daemon
RUN_WEBSERVER=false node app.js

# Run only the webserver
RUN_DAEMON=false node app.js

Database and Redis Configuration

DHT Spider supports both MongoDB and SQLite as database options, and Redis usage can be toggled on/off:

DB_TYPE: Choose between "mongodb" or "sqlite" as your database
USE_REDIS: Set to "true" to use Redis for caching recent infohashes, or "false" to disable Redis
SQLITE_PATH: Path where the SQLite database file will be created (only used when DB_TYPE=sqlite)

SQLite is ideal for smaller deployments with reduced dependencies, while MongoDB is better for large-scale operations. Redis provides caching to prevent duplicate processing of recently seen infohashes.

Elasticsearch Configuration

DHT Spider now includes Elasticsearch integration for powerful full-text search capabilities:

USE_ELASTICSEARCH: Set to "true" to enable Elasticsearch integration
ELASTICSEARCH_NODE: URL of your Elasticsearch server (default: http://localhost:9200)
ELASTICSEARCH_INDEX: Name of the Elasticsearch index to use (default: magnets)

To bulk index existing data into Elasticsearch, run:

node utils/bulkIndexToElasticsearch.js

Elasticsearch provides significantly improved search performance and relevance, especially for large datasets. When enabled, search queries will use Elasticsearch instead of database queries.

Features

Real-time DHT network crawling and magnet link indexing
WebSocket-based live updates on the web interface
Searchable database of discovered magnet links
Statistics page with database information
Support for both MongoDB and SQLite databases
Elasticsearch integration for powerful full-text search
Redis caching for improved performance
Responsive web interface with modern design

Protocols

bep_0005, bep_0003, bep_0010, bep_0009

Notes

Cluster mode does not work on Windows. On Linux and other UNIX-like operating systems, multiple instances can listen on the same UDP port, which is not possible on Windows due to operating system limitations.

Notice

Please don't share the data DHT Spider crawls to the internet. Because sometimes it discovers sensitive/copyrighted/adult material.

Performance Optimization

To maximize performance, DHT Spider now includes several optimizations:

1. Redis Caching

Enable Redis by setting USE_REDIS=true in your .env file to significantly reduce database load:

# Redis options: "true" or "false"
USE_REDIS=true

2. Production Mode

Run the application in production mode for better performance:

npm run start:prod   # For the web server
npm run daemon:prod  # For the DHT crawler

# Or with PM2 (recommended for production)
pm2 start ecosystem.json

3. Optimized PM2 Configuration

The included ecosystem.json is configured for optimal performance:

Web server runs in cluster mode with multiple instances
DHT crawler runs in a single instance to avoid duplicate crawling
Memory limits prevent excessive resource usage

4. WebSocket Optimizations

The WebSocket server includes:

Message batching to reduce overhead
Client connection health monitoring
Throttled broadcasts to prevent excessive updates

5. Elasticsearch Search Optimization

When dealing with large datasets, enable Elasticsearch for improved search performance:

# Elasticsearch options: "true" or "false"
USE_ELASTICSEARCH=true

Monitoring Performance

Monitor system resources during operation:

pm2 monit

If the application is still slow:

Increase server resources (RAM/CPU)
Use a CDN for static assets
Consider using a dedicated Redis server
Consider using a dedicated Elasticsearch cluster
Scale horizontally with a load balancer

License

MIT

Related Skills

feishu-drive

336.2k

things-mac

336.2k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

336.2k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

yu-ai-agent

1.9k

编程导航 2025 年 AI 开发实战新项目，基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus，覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发（Manas Java 实现）、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽，帮你成为 AI 时代企业的香饽饽，给你的简历和求职大幅增加竞争力。

thejordanprice

View profile

View on GitHub

GitHub Stars32

CategoryData

Updated4d ago

Forks15

thejordanprice/p2pspider

Languages

JavaScript

Security Score

95/100

Audited on Mar 21, 2026

No findings