<p align="center"> <img src="https://img.shields.io/badge/🦖_RAPTOR-v2.2.1-brightgreen?style=for-the-badge" alt="RAPTOR v2.2.1"/> </p> <h1 align="center">RAPTOR</h1> <h3 align="center">RNA-seq Analysis Pipeline Testing and Optimization Resource</h3> <p align="center"> <strong>Making free science for everybody around the world 🌍</strong> </p> <p align="center"> <a href="https://pypi.org/project/raptor-rnaseq/"><img src="https://img.shields.io/pypi/v/raptor-rnaseq.svg?style=flat&logo=pypi&logoColor=white" alt="PyPI version"/></a> <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-3.8--3.12-blue.svg" alt="Python 3.8-3.12"/></a> <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="MIT License"/></a> <a href="https://doi.org/10.5281/zenodo.17607161"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.17607161.svg" alt="DOI"/></a> <a href="https://github.com/AyehBlk/RAPTOR/releases/tag/v2.2.1"><img src="https://img.shields.io/badge/Release-v2.2.1-orange.svg" alt="Release v2.2.1"/></a> </p> <p align="center"> <a href="#-quick-start">Quick Start</a> • <a href="#-features">Features</a> • <a href="#-installation">Installation</a> • <a href="#-architecture">Architecture</a> • <a href="#-documentation">Documentation</a> • <a href="#-pipelines">Pipelines</a> • <a href="#-citation">Citation</a> </p>

🦖 What is RAPTOR?

RAPTOR is a comprehensive framework for RNA-seq analysis that makes sophisticated differential expression workflows accessible to everyone. Stop wondering which pipeline to use or what thresholds to set—RAPTOR provides ML-powered recommendations and ensemble methods for robust, reproducible results.

Why RAPTOR?

| Challenge | RAPTOR Solution | |-----------|-----------------| | Which pipeline should I use? | ✅ ML recommendations based on 32 dataset features | | Which DE method (DESeq2/edgeR/limma)? | ✅ Ensemble analysis combines all methods | | What thresholds should I use? | ✅ 4 optimization methods for data-driven cutoffs | | Is my data quality good enough? | ✅ 6 outlier detection methods with consensus | | How do I know results are reliable? | ✅ Ensemble consensus with direction checking | | What if methods disagree? | ✅ Brown's method accounts for correlation |

✨ Features

🎯 Ensemble Analysis (NEW!)

5 statistical combination methods
Fisher's, Brown's, RRA, Voting, Weighted
Direction consistency checking
Meta-analysis fold changes
33% fewer false positives

⚙️ Parameter Optimization (NEW!)

4 validated optimization methods
Ground truth, FDR control, Stability, Reproducibility
Automated threshold selection
Performance metrics tracking
Publication-ready results

📊 32-Feature Data Profiling

BCV (Biological Coefficient of Variation)
Sample characteristics & balance
Dispersion patterns
Sparsity analysis
ML-ready feature vectors

</td> <td width="50%">

🤖 ML-Powered Recommendations

Random Forest classifier
32-feature profiling
Confidence scoring
Alternative suggestions
Feature importance analysis

🔬 6 Production Pipelines

Salmon ⭐ (recommended)
Kallisto (fastest)
STAR + featureCounts
STAR + RSEM
STAR + Salmon (unique!)
HISAT2 + featureCounts

📈 Quality Assessment

6 outlier detection methods
Consensus-based reporting
Batch effect detection
Actionable recommendations

</td> </tr> </table>

🎨 Interactive Dashboard (Verified in v2.2.1)

All 9 pages functionally tested with real data:

Quality Assessment — 7 visualization tabs: PCA (2D/3D), sample correlation, expression distribution, RLE plot, dendrogram, mean-variance/BCV
Data Profiler — 32-feature extraction with professional CSS styling
Pipeline Recommender — rule-based recommendations with confidence scoring
Import DE — drag-and-drop DESeq2/edgeR/limma result files
Parameter Optimization — interactive sliders + 4 scientific methods (FDR Control, Ground Truth, Stability, Reproducibility)
Ensemble Analysis — Fisher's, Brown's, RRA with column name mapping
Reports — generate QC, DE, and ensemble reports
Settings — configure analysis parameters
Visualization — 12 plot types including volcano, MA, heatmap, and 7 gene expression styles (box, violin, beeswarm, raincloud, lollipop, heatmap bar, forest plot)

🆕 What's New in v2.2.1

Dashboard verified and enhanced — all 9 Streamlit pages tested with real data:

16 bugs fixed across Import DE, Ensemble, Optimization, Reports, and Quality pages
Quality Assessment expanded to 7 visualization tabs (PCA 2D/3D, correlation, RLE, dendrogram, BCV)
Parameter Optimization now integrates all 4 scientific methods from Module 8
Visualization completely rewritten with 12 plot types and 7 gene expression styles
Professional styling across all pages (emoji cleanup, CSS cards, clean typography)
413 tests passing, 0 failures, 55 CLI commands verified
Column name mapping fixed between Module 7 (standardized) and Modules 8/9 (expected)

🚀 Quick Start

Option 1: Interactive Dashboard (Recommended)

Install with dashboard support:

pip install raptor-rnaseq[dashboard]

Launch the dashboard — choose the method that matches your setup:

| Scenario | Command | |----------|---------| | pip install (recommended) | python -m raptor.launch_dashboard | | Cloned from GitHub | python launch_dashboard.py | | Direct streamlit | python -m streamlit run raptor/dashboard/app.py | | Inside a virtual environment | Activate venv first, then any command above |

The dashboard opens at http://localhost:8501. Upload data → Profile → Get recommendation → Run ensemble → Done!

Detailed instructions by setup:

<details> <summary><strong>A. Installed via pip (most users)</strong></summary>

# Install
pip install raptor-rnaseq[dashboard]

# Launch
python -m raptor.launch_dashboard

</details> <details> <summary><strong>B. Cloned from GitHub (developers)</strong></summary>

git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR
pip install -e .[dashboard]

# Launch from repo root
python launch_dashboard.py

# Or directly
python -m streamlit run raptor/dashboard/app.py

</details> <details> <summary><strong>C. Using a virtual environment</strong></summary>

# Create and activate venv
python -m venv .venv

# Windows
.venv\Scripts\Activate

# Linux/macOS
source .venv/bin/activate

# Install and launch
pip install raptor-rnaseq[dashboard]
python -m raptor.launch_dashboard

</details> <details> <summary><strong>D. Conda environment</strong></summary>

conda env create -f environment.yml
conda activate raptor
pip install streamlit altair
python -m streamlit run raptor/dashboard/app.py

</details>

Option 2: Command Line

# 1. Quality check
raptor qc --counts counts.csv --metadata metadata.csv

# 2. Profile your data
raptor profile --counts counts.csv --metadata metadata.csv --group-column condition

# 3. Get ML recommendation
raptor recommend --profile profile.json --method ml

# 4. Import DE results from different methods
raptor import-de --input deseq2.csv --method deseq2
raptor import-de --input edger.csv --method edger
raptor import-de --input limma.csv --method limma

# 5. Optimize thresholds (NEW!)
raptor optimize --de-result de_results.csv --method fdr-control --fdr-target 0.05

# 6. Ensemble analysis - combine all methods (NEW!)
raptor ensemble-compare --deseq2 deseq2.csv --edger edger.csv --limma limma.csv

Option 3: Python API

from raptor import (
    quick_quality_check,
    profile_data_quick,
    recommend_pipeline,
    optimize_with_fdr_control,
    ensemble_brown
)

# 1. Quality check
qc_report = quick_quality_check('counts.csv', 'metadata.csv')
print(f"Outliers: {qc_report.outliers}")

# 2. Profile data (32 features extracted)
profile = profile_data_quick('counts.csv', 'metadata.csv', group_column='condition')
print(f"BCV: {profile.bcv:.3f} ({profile.bcv_category})")

# 3. Get ML recommendation
recommendation = recommend_pipeline(profile_file='profile.json', method='ml')
print(f"Recommended: {recommendation.pipeline_name} (confidence: {recommendation.confidence:.2f})")

# 4. After running DE analysis, optimize thresholds (NEW!)
result = optimize_with_fdr_control(de_result, fdr_target=0.05)
print(f"Optimal thresholds: {result.optimal_threshold}")

# 5. Ensemble analysis - combine DESeq2, edgeR, limma (NEW!)
consensus = ensemble_brown({
    'deseq2': deseq2_result,
    'edger': edger_result,
    'limma': limma_result
})
print(f"Consensus DE genes: {len(consensus.consensus_genes)}")

📦 Installation

Requirements

Python: 3.8 - 3.12
R: 4.0+ (optional, for Module 6 DE analysis)
RAM: 4GB minimum (16GB recommended for pipelines)
Disk: 500MB (Python package) / 5-8GB (with bioinformatics tools)

Install from PyPI (Recommended)

# Basic installation
pip install raptor-rnaseq

# With dashboard support
pip install raptor-rnaseq[dashboard]

# With all features
pip install raptor-rnaseq[all]

# Development installation
pip install raptor-rnaseq[dev]

Conda Installation

Core environment (Python only, ~500MB, 5-10 min):

conda env create -f environment.yml
conda activate raptor

Full environment (with STAR, Salmon, Kallisto, R, ~5-8GB, 30-60 min):

conda env create -f environment-full.yml
conda activate raptor-full

See docs/CONDA_ENVIRONMENTS.md for detailed comparison.

Install from Source

# Clone repository
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR

# Install in editable mode
pip install -e .

# Or with development tools
pip install -e .[dev]

# Verify installation
raptor

RAPTOR

Install / Use

README