Paper2code
No description available
Install / Use
/learn @h9-tec/Paper2codeREADME
Paper2Code - AI-Powered Research Paper Implementation Tool
<div align="center"> <h3>🧬 Transform Research Papers into Working Code</h3>Automatically convert academic research papers into production-ready code implementations
Created by Hesham Haroon - AI Lead
</div>🚀 Overview
Paper2Code is a cutting-edge AI-powered tool that transforms research papers into working code implementations using advanced multi-agent architecture. Built by Hesham Haroon, this standalone system analyzes academic papers and generates complete, testable code implementations with state-of-the-art accuracy.
✨ Key Features
📄 Intelligent Document Processing
- Multi-format Support: PDF, DOCX, HTML, TXT, Markdown
- Smart Segmentation: Handles large papers with token-efficient processing
- Academic Focus: Optimized for research paper structure and content
🤖 Multi-Agent Architecture
- Document Analysis Agent: Extracts algorithms, formulas, and technical details
- Code Planning Agent: Creates comprehensive implementation roadmaps
- Implementation Agent: Generates working code with proper structure
- Repository Agent: Discovers and integrates relevant GitHub repositories
⚡ Advanced Processing
- Configurable Modes: Fast mode vs. comprehensive analysis
- Token Optimization: Intelligent segmentation for large documents
- Error Handling: Robust processing with fallback mechanisms
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Research Paper │ -> │ Multi-Agent AI │ -> │ Working Code │
│ │ │ │ │ │
│ • PDF/DOCX/etc │ │ • Document Agent │ │ • Complete Impl │
│ • Algorithms │ │ • Planning Agent │ │ • Tests │
│ • Formulas │ │ • Code Agent │ │ • Documentation │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Processing Pipeline:
- 📖 Document Analysis: Extract structure, algorithms, and technical details
- 🎯 Implementation Planning: Create comprehensive code implementation plan
- 🔍 Repository Discovery: Find relevant GitHub repositories for reference
- 💻 Code Generation: Produce complete, runnable implementations
- 🧪 Testing & Validation: Generate tests and verify functionality
📦 Installation
Prerequisites
- Python 3.9 or higher
- Node.js 16+ (for MCP servers)
- Git
Install from Source
# Clone the repository
git clone https://github.com/h9-tec/paper2code.git
cd paper2code
# Install Python dependencies
pip install -r requirements.txt
# Install Node.js MCP servers
npm install -g @modelcontextprotocol/server-brave-search
npm install -g @modelcontextprotocol/server-filesystem
Configuration
- Copy configuration templates:
cp paper2code/config/config.yaml.template paper2code/config/config.yaml
cp paper2code/config/secrets.yaml.template paper2code/config/secrets.yaml
- Edit
paper2code/config/secrets.yamlwith your API keys:
openai:
api_key: "your_openai_api_key_here"
base_url: "https://api.openai.com/v1"
anthropic:
api_key: "your_anthropic_api_key_here"
brave_search:
api_key: "your_brave_api_key_here" # Optional
🎯 Usage
Command Line Interface
# Process a research paper PDF
python -m paper2code --file paper.pdf
# Process from URL
python -m paper2code --url https://arxiv.org/pdf/2301.12345.pdf
# Fast mode (skip repository indexing)
python -m paper2code --file paper.pdf --fast
# Custom output directory
python -m paper2code --file paper.pdf --output /path/to/output
# Enable debug mode
python -m paper2code --file paper.pdf --debug
Web Interface
# 1) Run API server (serves frontend build at /)
paper2code-api
# 2) Run frontend dev (HMR) in another terminal
cd frontend && npm run dev
# 3) Build frontend for production and serve from API static mount
cd frontend && npm run build
paper2code-api
Interactive Mode
# Start interactive session
python -m paper2code
# Follow the prompts to:
# 1. Select input method (file/URL)
# 2. Choose processing mode
# 3. Configure output options
Python API
from paper2code import Paper2CodeProcessor
# Initialize processor
processor = Paper2CodeProcessor()
# Process a paper
result = await processor.process_paper("path/to/paper.pdf")
# Check results
if result.success:
print(f"✅ Code generated at: {result.output_path}")
print(f"📊 Files created: {len(result.files)}")
else:
print(f"❌ Error: {result.error}")
⚙️ Configuration
Processing Modes
# config.yaml
processing:
mode: "comprehensive" # or "fast"
enable_repository_search: true
enable_segmentation: true
segmentation_threshold: 50000 # characters
Document Segmentation
document_segmentation:
enabled: true
size_threshold_chars: 50000
overlap_chars: 1000
max_segments: 10
Output Configuration
output:
base_directory: "./output"
create_readme: true
include_analysis: true
code_format: "python" # primary language
📁 Output Structure
output/
├── paper_name/
│ ├── analysis/
│ │ ├── document_analysis.md
│ │ ├── algorithm_extraction.yaml
│ │ └── implementation_plan.yaml
│ ├── code/
│ │ ├── __init__.py
│ │ ├── main.py
│ │ ├── models/
│ │ │ ├── __init__.py
│ │ │ └── core_model.py
│ │ ├── algorithms/
│ │ │ ├── __init__.py
│ │ │ └── main_algorithm.py
│ │ ├── utils/
│ │ │ ├── __init__.py
│ │ │ └── helpers.py
│ │ └── tests/
│ │ ├── __init__.py
│ │ ├── test_main.py
│ │ └── test_models.py
│ ├── docs/
│ │ ├── README.md
│ │ ├── implementation_notes.md
│ │ └── api_documentation.md
│ └── references/
│ ├── github_repositories.json
│ └── reference_papers.txt
🔧 Advanced Features
Custom Prompts
Modify prompts in paper2code/prompts/ for specialized domains:
paper_analysis.py: Document analysis promptscode_generation.py: Code implementation promptsplanning.py: Implementation planning prompts
MCP Server Extensions
Add custom tools in paper2code/tools/:
# custom_tool_server.py
from paper2code.tools.base import BaseMCPServer
class CustomToolServer(BaseMCPServer):
def setup_tools(self):
# Define custom tools
pass
Environment Variables
export PAPER2CODE_CONFIG_PATH="/custom/config/path"
export PAPER2CODE_OUTPUT_DIR="/custom/output/path"
export PAPER2CODE_LOG_LEVEL="DEBUG"
export PAPER2CODE_CACHE_DIR="/custom/cache/path"
🧪 Testing
# Run all tests
python -m pytest tests/
# Run specific test categories
python -m pytest tests/test_agents.py
python -m pytest tests/test_workflows.py
# Test with sample paper
python -m paper2code --file samples/sample_paper.pdf --test-mode
📊 Supported Paper Types
- Machine Learning: Neural networks, algorithms, architectures
- Computer Vision: Image processing, detection, recognition
- Natural Language Processing: Text analysis, language models
- Reinforcement Learning: Agents, environments, policies
- Systems: Distributed systems, databases, protocols
- Theory: Mathematical proofs, algorithms, complexity
🔍 Troubleshooting
Common Issues
-
API Rate Limits
# Solution: Configure rate limiting in config.yaml api: rate_limit_delay: 1.0 # seconds between requests -
Large Papers (Token Limits)
# Solution: Enable document segmentation python -m paper2code --file large_paper.pdf --segment-threshold 30000 -
Memory Issues
# Solution: Use fast mode python -m paper2code --file paper.pdf --fast --no-repository-search
Debug Mode
# Enable detailed logging
python -m paper2code --file paper.pdf --debug --verbose
# Check logs
tail -f logs/paper2code.log
🎛️ Performance Tuning
Fast Mode Options
# Optimize for speed
performance:
fast_mode: true
skip_repository_search: true
parallel_processing: true
cache_enabled: true
max_concurrent_agents: 3
Memory Optimization
# Optimize for memory usage
memory:
max_document_size: 100000 # chars
clear_cache_frequency: 100 # operations
use_streaming: true
🤝 Contributing
-
Fork the Repository
git fork https://github.com/h9-tec/paper2code -
Create Feature Branch
git checkout -b feature/amazing-feature -
Make Changes
- Add tests for new features
- Update documentation
- Follow code style guidelines
-
Submit Pull Request
git commit -m 'Add amazing feature' git push origin feature/amazing-feature
Development Setup
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run tests before committing
python -m pytest tests/
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🆘 Support
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Email: support@paper2code.dev
🙏 Acknowledgments
- Hesham Haroon: AI Lead and Creator of Paper2Code
- **DeepCo
