<img src="assets/images/medea.png" alt="Medea System Overview" style="max-width: 100%; height: auto;"> <a href="https://docs.astral.sh/uv/"><img src="https://img.shields.io/endpoint?url=https%3A%2F%2Fgist.githubusercontent.com%2Fpsui3905%2Fdd996b3802242fbb31ba604182759b79%2Fraw%2Fef4ff3a61419fe721edcbb74e854177d1a86973c%2Fgistfile1.txt&style=for-the-badge&logo=uv&label=uv&labelColor=%2352276A" alt="uv"></a> <a href="https://github.com/SalesforceAIResearch/AgentLite"><img src="https://img.shields.io/badge/0.1.2-agentlite?style=for-the-badge&logo=salesforce&logoColor=white&label=AgentLite&labelColor=%231F8ACB&color=%23A8B2C0" alt="agentlite"></a> <a href="https://openrouter.ai/"><img src="https://img.shields.io/badge/API-API?style=for-the-badge&logo=openrouter&logoColor=white&label=OpenRouter&labelColor=grey&color=%23A8B2C0" alt="openrouter"></a> <a href="https://www.biorxiv.org/content/10.64898/2026.01.16.696667v1"><img src="https://img.shields.io/badge/Paper_on-arXiv-4A90E2?style=for-the-badge&logo=arxiv&logoColor=white&labelColor=2C3E50" alt="arXiv"></a> <a href="https://medea.openscientist.ai/"><img src="https://img.shields.io/badge/Project-Website-4A90E2?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=2C3E50" alt="Website"></a> <a href="https://huggingface.co/datasets/mims-harvard/MedeaDB"><img src="https://img.shields.io/badge/Datasets-Hugging_Face-4A90E2?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=2C3E50" alt="HuggingFace"></a>

Medea, an AI agent to accelerate therapeutic discovery through multi-omics analysis. Built on the AgentLite framework, Medea addresses a fundamental challenge in biomedical research: how to effectively integrate diverse data modalities, computational resources, and scientific knowledge to identify therapeutic targets and predict drug responses.

Medea consists of three specialized agentic modules that collaborate with each other:

Research Planning module - Formulates experimental plans, verifies biological context (diseases, cell types, genes), and ensures analytical feasibility
Analysis module - Generates and executes Python code for single-cell data analysis, including quality checks and debugging
Literature Reasoning module - Searches, filters, and synthesizes relevant scientific papers using LLM-based relevance assessment

<img src="assets/images/figure1-medeaOverview.png" alt="Medea System Overview" style="max-width: 80%; height: auto;"> Overview of Medea

Installation

Quick Install

# Clone the repository
git clone https://github.com/mims-harvard/Medea.git
cd Medea

# Create virtual environment with uv (recommended)
pip install uv
uv venv medea --python 3.10
source medea/bin/activate  # On Windows: medea\Scripts\activate

# Install Medea
uv pip install -e .
uv pip install openai==1.82.1  # Ensure correct OpenAI version

Download MedeaDB

Download required datasets from Hugging Face:

uv pip install -U huggingface_hub
huggingface-cli login  # Enter your token
brew install git-lfs  # macOS, or: sudo apt-get install git-lfs (Linux)
git lfs install
git clone https://huggingface.co/datasets/mims-harvard/MedeaDB

# OPTIONAL: For machine learning tools (COMPASS, etc.)
# Clone and configure the tool in the Medea directory
git clone https://github.com/mims-harvard/COMPASS.git MedeaDB/compass/COMPASS

📚 Detailed guide: See docs/QUICKSTART.md

Configuration

Create a .env file in the project root:

cp env_template.txt .env

Required Settings

# Database path
MEDEADB_PATH=/path/to/MedeaDB

# Model configuration
BACKBONE_LLM=gpt-4o
SEED=42

# API Key (recommended: OpenRouter for access to 100+ models)
OPENROUTER_API_KEY=your-key-here
USE_OPENROUTER=true

Alternative API Configurations

Azure OpenAI:

AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_API_VERSION=2024-10-21
USE_OPENROUTER=false

Google Gemini:

GEMINI_API_KEY=your-key
GEMINI_MODEL=gemini-2.0-flash-exp

Anthropic Claude:

ANTHROPIC_API_KEY=your-key
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

NVIDIA DeepSeek:

NVIDIA_DEEPSEEK_ENDPOINT=https://your-endpoint.com/v1
NVIDIA_DEEPSEEK_API_KEY=your-key

📋 Full configuration reference: See env_template.txt

Using Medea as a Library

Once installed, you can use Medea in your own Python scripts. Here are three simple ways to get started:

🚀 Option 1: Full Medea Agent (Recommended)

Run the complete Medea agent with research planning, analysis, and literature reasoning modules:

import os
from medea import medea, AgentLLM, LLMConfig
from medea import ResearchPlanning, Analysis, LiteratureReasoning
from medea import (
    ResearchPlanDraft, ContextVerification, IntegrityVerification,
    CodeGenerator, AnalysisExecution, CodeDebug, AnalysisQualityChecker,
    LiteratureSearch, PaperJudge, OpenScholarReasoning
)

# Step 1: Initialize LLMs
backbone_llm = "gpt-4o"
llm_config = LLMConfig({"temperature": 0.4})
research_llm = AgentLLM(llm_config, llm_name=backbone_llm)
analysis_llm = AgentLLM(llm_config, llm_name=backbone_llm)
literature_llm = AgentLLM(llm_config, llm_name=backbone_llm)

# Step 2: Configure module specific actions
research_actions = [
    ResearchPlanDraft(tmp=0.4, llm_provider=backbone_llm),
    ContextVerification(tmp=0.4, llm_provider=backbone_llm),
    IntegrityVerification(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]

analysis_actions = [
    CodeGenerator(tmp=0.4, llm_provider=backbone_llm),
    AnalysisExecution(),
    CodeDebug(tmp=0.4, llm_provider=backbone_llm),
    AnalysisQualityChecker(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]

literature_actions = [
    LiteratureSearch(model_name=backbone_llm, verbose=True),
    PaperJudge(model_name=backbone_llm, verbose=True),
    OpenScholarReasoning(tmp=0.4, llm_provider=backbone_llm, verbose=True)
]

# Step 3: Create module
research_planning_module = ResearchPlanning(llm=research_llm, actions=research_actions)
analysis_module = Analysis(llm=analysis_llm, actions=analysis_actions)
literature_module = LiteratureReasoning(llm=literature_llm, actions=literature_actions)

# Step 4: Run Medea
result = medea(
    user_instruction="Which gene is the best therapeutic target for RA in CD4+ T cells?",
    experiment_instruction=None,  # Optional: additional experiment context
    research_planning_module=research_planning_module,
    analysis_module=analysis_module,
    literature_module=literature_module,
    debate_rounds=2,  # Number of panel discussion rounds
    timeout=800  # Timeout in seconds per process
)

# Step 5: Get your answer
print(result['final'])  # Medea (full agent) output
print(result['P'])     # Research plan from ResearchPlanning module
print(result['PA'])    # ResearchPlanning + Analysis module output
print(result['R'])     # LiteratureReasoning output

🔬 Option 2: Research Planning + In-Silico Experiment Only

Run computational experiments without literature search:

from medea import experiment_analysis, AgentLLM, LLMConfig
from medea import ResearchPlanning, Analysis
from medea import (
    ResearchPlanDraft, ContextVerification, IntegrityVerification,
    CodeGenerator, AnalysisExecution, CodeDebug, AnalysisQualityChecker
)

# Step 1: Initialize LLMs
backbone_llm = "gpt-4o"
llm_config = LLMConfig({"temperature": 0.4})
research_llm = AgentLLM(llm_config, llm_name=backbone_llm)
analysis_llm = AgentLLM(llm_config, llm_name=backbone_llm)

# Step 2: Configure actions
research_actions = [
    ResearchPlanDraft(tmp=0.4, llm_provider=backbone_llm),
    ContextVerification(tmp=0.4, llm_provider=backbone_llm),
    IntegrityVerification(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]

analysis_actions = [
    CodeGenerator(tmp=0.4, llm_provider=backbone_llm),
    AnalysisExecution(),
    CodeDebug(tmp=0.4, llm_provider=backbone_llm),
    AnalysisQualityChecker(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]

# Step 3: Create modules
research_planning_module = ResearchPlanning(llm=research_llm, actions=research_actions)
analysis_module = Analysis(llm=analysis_llm, actions=analysis_actions)

# Step 4: Run experiment
plan, result = experiment_analysis(
    query="Identify therapeutic targets for rheumatoid arthritis in CD4+ T cells",
    research_planning_module=research_planning_module,
    analysis_module=analysis_module
)

print(f"Research Plan:\n{plan}\n")
print(f"Experiment Result:\n{result}")

📚 Option 3: Literature Reasoning Only

Search papers and synthesize insights without computational experiments:

from medea import literature_reasoning, AgentLLM, LLMConfig
from medea import LiteratureReasoning
from medea import LiteratureSearch, PaperJudge, OpenScholarReasoning

# Step 1: Initialize LLM
backbone_llm = "gpt-4o"
llm_config = LLMConfig({"temperature": 0.4})
literature_llm = AgentLLM(llm_config, llm_name=backbone_llm)

# Step 2: Configure actions
literature_actions = [
    LiteratureSearch(model_name=backbone_llm, verbose=True),
    PaperJudge(model_name=backbone_llm, verbose=True),
    OpenScholarReasoning(tmp=0.4, llm_provider=backbone_llm, verbose=True)
]

# Step 3: Create modules
literature_module = LiteratureReasoning(llm=literature_llm, actions=literature_actions)

# Step 4: Search and reason
result = literature_reasoning(
    query="What are validated therapeutic targets for rheum

Medea

Install / Use

README