P2pspider
BitTorrent DHT crawler with real-time web UI. Indexes magnet links from DHT swarm into MongoDB/SQLite. Features responsive search, WebSocket updates, and torrent details.
Install / Use
/learn @thejordanprice/P2pspiderREADME
p2pspider - DHT Spider
A daemon that crawls the BitTorrent DHT network and an Express web application that provides a searchable database of magnet links with real-time updates through WebSockets.
Intro
DHT Spider can index over 1 million magnets per 24 hours on modest hardware (2GB of RAM and around 2MB/s connection). It's resource-intensive and will use available CPU and RAM, which can be controlled via the 'ecosystem.json' file. On 2GB RAM, it's recommended to use 8 instances of the daemon and 2 of the webserver, all limited at 175MB.
Screenshots


Getting Started
# Install dependencies
npm install
# Set up configuration
cp .env.sample .env
# Edit .env file as needed
# Run the application
npm start # Start the unified application (both crawler and web interface)
# Alternatively, use PM2 for process management
npm install -g pm2
npm run start:pm2 # Uses the ecosystem.json file
pm2 monit
Configuration
You will need to have port 6881 (or your configured port) open to the internet for the DHT crawler to function properly.
The application can be configured through the .env file:
# Database and server configuration
REDIS_URI=redis://127.0.0.1:6379
MONGO_URI=mongodb://127.0.0.1/magnetdb
SITE_HOSTNAME=http://127.0.0.1:8080
SITE_NAME=DHT Spider
SITE_PORT=8080
# Database options: "mongodb" or "sqlite"
DB_TYPE=sqlite
# Redis options: "true" or "false"
USE_REDIS=false
# SQLite database file location (only used if DB_TYPE=sqlite)
SQLITE_PATH=./data/magnet.db
# Elasticsearch options: "true" or "false"
USE_ELASTICSEARCH=false
# Elasticsearch connection
ELASTICSEARCH_NODE=http://localhost:9200
ELASTICSEARCH_INDEX=magnets
# Component control options: "true" or "false"
RUN_DAEMON=true
RUN_WEBSERVER=true
You can also fine-tune the crawler performance in the daemon.js file:
const p2p = P2PSpider({
nodesMaxSize: 250,
maxConnections: 500,
timeout: 1000
});
It's not recommended to change the nodesMaxSize or maxConnections, but adjusting the timeout may increase indexing speed. Higher timeout values may require more RAM; the maximum recommended value is 5000ms.
Component Control
DHT Spider now allows you to run the daemon and webserver components independently:
- RUN_DAEMON: Set to "true" to run the P2P Spider daemon, or "false" to disable it
- RUN_WEBSERVER: Set to "true" to run the web server, or "false" to disable it
This flexibility allows you to:
- Run only the daemon for dedicated crawling
- Run only the webserver for serving existing data
- Run both components together (default behavior)
Example usage:
# Run both components (default)
node app.js
# Run only the daemon
RUN_WEBSERVER=false node app.js
# Run only the webserver
RUN_DAEMON=false node app.js
Database and Redis Configuration
DHT Spider supports both MongoDB and SQLite as database options, and Redis usage can be toggled on/off:
- DB_TYPE: Choose between "mongodb" or "sqlite" as your database
- USE_REDIS: Set to "true" to use Redis for caching recent infohashes, or "false" to disable Redis
- SQLITE_PATH: Path where the SQLite database file will be created (only used when DB_TYPE=sqlite)
SQLite is ideal for smaller deployments with reduced dependencies, while MongoDB is better for large-scale operations. Redis provides caching to prevent duplicate processing of recently seen infohashes.
Elasticsearch Configuration
DHT Spider now includes Elasticsearch integration for powerful full-text search capabilities:
- USE_ELASTICSEARCH: Set to "true" to enable Elasticsearch integration
- ELASTICSEARCH_NODE: URL of your Elasticsearch server (default: http://localhost:9200)
- ELASTICSEARCH_INDEX: Name of the Elasticsearch index to use (default: magnets)
To bulk index existing data into Elasticsearch, run:
node utils/bulkIndexToElasticsearch.js
Elasticsearch provides significantly improved search performance and relevance, especially for large datasets. When enabled, search queries will use Elasticsearch instead of database queries.
Features
- Real-time DHT network crawling and magnet link indexing
- WebSocket-based live updates on the web interface
- Searchable database of discovered magnet links
- Statistics page with database information
- Support for both MongoDB and SQLite databases
- Elasticsearch integration for powerful full-text search
- Redis caching for improved performance
- Responsive web interface with modern design
Protocols
bep_0005, bep_0003, bep_0010, bep_0009
Notes
Cluster mode does not work on Windows. On Linux and other UNIX-like operating systems, multiple instances can listen on the same UDP port, which is not possible on Windows due to operating system limitations.
Notice
Please don't share the data DHT Spider crawls to the internet. Because sometimes it discovers sensitive/copyrighted/adult material.
Performance Optimization
To maximize performance, DHT Spider now includes several optimizations:
1. Redis Caching
Enable Redis by setting USE_REDIS=true in your .env file to significantly reduce database load:
# Redis options: "true" or "false"
USE_REDIS=true
2. Production Mode
Run the application in production mode for better performance:
npm run start:prod # For the web server
npm run daemon:prod # For the DHT crawler
# Or with PM2 (recommended for production)
pm2 start ecosystem.json
3. Optimized PM2 Configuration
The included ecosystem.json is configured for optimal performance:
- Web server runs in cluster mode with multiple instances
- DHT crawler runs in a single instance to avoid duplicate crawling
- Memory limits prevent excessive resource usage
4. WebSocket Optimizations
The WebSocket server includes:
- Message batching to reduce overhead
- Client connection health monitoring
- Throttled broadcasts to prevent excessive updates
5. Elasticsearch Search Optimization
When dealing with large datasets, enable Elasticsearch for improved search performance:
# Elasticsearch options: "true" or "false"
USE_ELASTICSEARCH=true
Monitoring Performance
Monitor system resources during operation:
pm2 monit
If the application is still slow:
- Increase server resources (RAM/CPU)
- Use a CDN for static assets
- Consider using a dedicated Redis server
- Consider using a dedicated Elasticsearch cluster
- Scale horizontally with a load balancer
License
MIT
Related Skills
feishu-drive
336.2k|
things-mac
336.2kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
336.2kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
yu-ai-agent
1.9k编程导航 2025 年 AI 开发实战新项目,基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus,覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发(Manas Java 实现)、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽,帮你成为 AI 时代企业的香饽饽,给你的简历和求职大幅增加竞争力。
