RAPTOR
RNA-seq Analysis Pipeline Testing and Optimization Resource - Intelligent pipeline selection and comprehensive benchmarking.
Install / Use
/learn @AyehBlk/RAPTORREADME
🦖 What is RAPTOR?
RAPTOR is a comprehensive framework for RNA-seq analysis that makes sophisticated differential expression workflows accessible to everyone. Stop wondering which pipeline to use or what thresholds to set—RAPTOR provides ML-powered recommendations and ensemble methods for robust, reproducible results.
Why RAPTOR?
| Challenge | RAPTOR Solution | |-----------|-----------------| | Which pipeline should I use? | ✅ ML recommendations based on 32 dataset features | | Which DE method (DESeq2/edgeR/limma)? | ✅ Ensemble analysis combines all methods | | What thresholds should I use? | ✅ 4 optimization methods for data-driven cutoffs | | Is my data quality good enough? | ✅ 6 outlier detection methods with consensus | | How do I know results are reliable? | ✅ Ensemble consensus with direction checking | | What if methods disagree? | ✅ Brown's method accounts for correlation |
✨ Features
<table> <tr> <td width="50%">🎯 Ensemble Analysis (NEW!)
- 5 statistical combination methods
- Fisher's, Brown's, RRA, Voting, Weighted
- Direction consistency checking
- Meta-analysis fold changes
- 33% fewer false positives
⚙️ Parameter Optimization (NEW!)
- 4 validated optimization methods
- Ground truth, FDR control, Stability, Reproducibility
- Automated threshold selection
- Performance metrics tracking
- Publication-ready results
📊 32-Feature Data Profiling
- BCV (Biological Coefficient of Variation)
- Sample characteristics & balance
- Dispersion patterns
- Sparsity analysis
- ML-ready feature vectors
🤖 ML-Powered Recommendations
- Random Forest classifier
- 32-feature profiling
- Confidence scoring
- Alternative suggestions
- Feature importance analysis
🔬 6 Production Pipelines
- Salmon ⭐ (recommended)
- Kallisto (fastest)
- STAR + featureCounts
- STAR + RSEM
- STAR + Salmon (unique!)
- HISAT2 + featureCounts
📈 Quality Assessment
- 6 outlier detection methods
- Consensus-based reporting
- Batch effect detection
- Actionable recommendations
🎨 Interactive Dashboard (Verified in v2.2.1)
All 9 pages functionally tested with real data:
- Quality Assessment — 7 visualization tabs: PCA (2D/3D), sample correlation, expression distribution, RLE plot, dendrogram, mean-variance/BCV
- Data Profiler — 32-feature extraction with professional CSS styling
- Pipeline Recommender — rule-based recommendations with confidence scoring
- Import DE — drag-and-drop DESeq2/edgeR/limma result files
- Parameter Optimization — interactive sliders + 4 scientific methods (FDR Control, Ground Truth, Stability, Reproducibility)
- Ensemble Analysis — Fisher's, Brown's, RRA with column name mapping
- Reports — generate QC, DE, and ensemble reports
- Settings — configure analysis parameters
- Visualization — 12 plot types including volcano, MA, heatmap, and 7 gene expression styles (box, violin, beeswarm, raincloud, lollipop, heatmap bar, forest plot)
🆕 What's New in v2.2.1
Dashboard verified and enhanced — all 9 Streamlit pages tested with real data:
- 16 bugs fixed across Import DE, Ensemble, Optimization, Reports, and Quality pages
- Quality Assessment expanded to 7 visualization tabs (PCA 2D/3D, correlation, RLE, dendrogram, BCV)
- Parameter Optimization now integrates all 4 scientific methods from Module 8
- Visualization completely rewritten with 12 plot types and 7 gene expression styles
- Professional styling across all pages (emoji cleanup, CSS cards, clean typography)
- 413 tests passing, 0 failures, 55 CLI commands verified
- Column name mapping fixed between Module 7 (standardized) and Modules 8/9 (expected)
🚀 Quick Start
Option 1: Interactive Dashboard (Recommended)
Install with dashboard support:
pip install raptor-rnaseq[dashboard]
Launch the dashboard — choose the method that matches your setup:
| Scenario | Command |
|----------|---------|
| pip install (recommended) | python -m raptor.launch_dashboard |
| Cloned from GitHub | python launch_dashboard.py |
| Direct streamlit | python -m streamlit run raptor/dashboard/app.py |
| Inside a virtual environment | Activate venv first, then any command above |
The dashboard opens at http://localhost:8501. Upload data → Profile → Get recommendation → Run ensemble → Done!
Detailed instructions by setup:
<details> <summary><strong>A. Installed via pip (most users)</strong></summary># Install
pip install raptor-rnaseq[dashboard]
# Launch
python -m raptor.launch_dashboard
</details>
<details>
<summary><strong>B. Cloned from GitHub (developers)</strong></summary>
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR
pip install -e .[dashboard]
# Launch from repo root
python launch_dashboard.py
# Or directly
python -m streamlit run raptor/dashboard/app.py
</details>
<details>
<summary><strong>C. Using a virtual environment</strong></summary>
# Create and activate venv
python -m venv .venv
# Windows
.venv\Scripts\Activate
# Linux/macOS
source .venv/bin/activate
# Install and launch
pip install raptor-rnaseq[dashboard]
python -m raptor.launch_dashboard
</details>
<details>
<summary><strong>D. Conda environment</strong></summary>
conda env create -f environment.yml
conda activate raptor
pip install streamlit altair
python -m streamlit run raptor/dashboard/app.py
</details>
Option 2: Command Line
# 1. Quality check
raptor qc --counts counts.csv --metadata metadata.csv
# 2. Profile your data
raptor profile --counts counts.csv --metadata metadata.csv --group-column condition
# 3. Get ML recommendation
raptor recommend --profile profile.json --method ml
# 4. Import DE results from different methods
raptor import-de --input deseq2.csv --method deseq2
raptor import-de --input edger.csv --method edger
raptor import-de --input limma.csv --method limma
# 5. Optimize thresholds (NEW!)
raptor optimize --de-result de_results.csv --method fdr-control --fdr-target 0.05
# 6. Ensemble analysis - combine all methods (NEW!)
raptor ensemble-compare --deseq2 deseq2.csv --edger edger.csv --limma limma.csv
Option 3: Python API
from raptor import (
quick_quality_check,
profile_data_quick,
recommend_pipeline,
optimize_with_fdr_control,
ensemble_brown
)
# 1. Quality check
qc_report = quick_quality_check('counts.csv', 'metadata.csv')
print(f"Outliers: {qc_report.outliers}")
# 2. Profile data (32 features extracted)
profile = profile_data_quick('counts.csv', 'metadata.csv', group_column='condition')
print(f"BCV: {profile.bcv:.3f} ({profile.bcv_category})")
# 3. Get ML recommendation
recommendation = recommend_pipeline(profile_file='profile.json', method='ml')
print(f"Recommended: {recommendation.pipeline_name} (confidence: {recommendation.confidence:.2f})")
# 4. After running DE analysis, optimize thresholds (NEW!)
result = optimize_with_fdr_control(de_result, fdr_target=0.05)
print(f"Optimal thresholds: {result.optimal_threshold}")
# 5. Ensemble analysis - combine DESeq2, edgeR, limma (NEW!)
consensus = ensemble_brown({
'deseq2': deseq2_result,
'edger': edger_result,
'limma': limma_result
})
print(f"Consensus DE genes: {len(consensus.consensus_genes)}")
📦 Installation
Requirements
- Python: 3.8 - 3.12
- R: 4.0+ (optional, for Module 6 DE analysis)
- RAM: 4GB minimum (16GB recommended for pipelines)
- Disk: 500MB (Python package) / 5-8GB (with bioinformatics tools)
Install from PyPI (Recommended)
# Basic installation
pip install raptor-rnaseq
# With dashboard support
pip install raptor-rnaseq[dashboard]
# With all features
pip install raptor-rnaseq[all]
# Development installation
pip install raptor-rnaseq[dev]
Conda Installation
Core environment (Python only, ~500MB, 5-10 min):
conda env create -f environment.yml
conda activate raptor
Full environment (with STAR, Salmon, Kallisto, R, ~5-8GB, 30-60 min):
conda env create -f environment-full.yml
conda activate raptor-full
See docs/CONDA_ENVIRONMENTS.md for detailed comparison.
Install from Source
# Clone repository
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR
# Install in editable mode
pip install -e .
# Or with development tools
pip install -e .[dev]
# Verify installation
raptor
