Medea
Medea: An omics AI agent for therapeutic discovery
Install / Use
/learn @mims-harvard/MedeaREADME
Medea, an AI agent to accelerate therapeutic discovery through multi-omics analysis. Built on the AgentLite framework, Medea addresses a fundamental challenge in biomedical research: how to effectively integrate diverse data modalities, computational resources, and scientific knowledge to identify therapeutic targets and predict drug responses.
Medea consists of three specialized agentic modules that collaborate with each other:
- Research Planning module - Formulates experimental plans, verifies biological context (diseases, cell types, genes), and ensures analytical feasibility
- Analysis module - Generates and executes Python code for single-cell data analysis, including quality checks and debugging
- Literature Reasoning module - Searches, filters, and synthesizes relevant scientific papers using LLM-based relevance assessment
📋 Table of Contents
- Installation
- Configuration
- Using Medea as a Library
- Command-Line Interface (CLI) Usage
- Documentation
Installation
Quick Install
# Clone the repository
git clone https://github.com/mims-harvard/Medea.git
cd Medea
# Create virtual environment with uv (recommended)
pip install uv
uv venv medea --python 3.10
source medea/bin/activate # On Windows: medea\Scripts\activate
# Install Medea
uv pip install -e .
uv pip install openai==1.82.1 # Ensure correct OpenAI version
Download MedeaDB
Download required datasets from Hugging Face:
uv pip install -U huggingface_hub
huggingface-cli login # Enter your token
brew install git-lfs # macOS, or: sudo apt-get install git-lfs (Linux)
git lfs install
git clone https://huggingface.co/datasets/mims-harvard/MedeaDB
# OPTIONAL: For machine learning tools (COMPASS, etc.)
# Clone and configure the tool in the Medea directory
git clone https://github.com/mims-harvard/COMPASS.git MedeaDB/compass/COMPASS
📚 Detailed guide: See docs/QUICKSTART.md
Configuration
Create a .env file in the project root:
cp env_template.txt .env
Required Settings
# Database path
MEDEADB_PATH=/path/to/MedeaDB
# Model configuration
BACKBONE_LLM=gpt-4o
SEED=42
# API Key (recommended: OpenRouter for access to 100+ models)
OPENROUTER_API_KEY=your-key-here
USE_OPENROUTER=true
Alternative API Configurations
Azure OpenAI:
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_API_VERSION=2024-10-21
USE_OPENROUTER=false
Google Gemini:
GEMINI_API_KEY=your-key
GEMINI_MODEL=gemini-2.0-flash-exp
Anthropic Claude:
ANTHROPIC_API_KEY=your-key
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
NVIDIA DeepSeek:
NVIDIA_DEEPSEEK_ENDPOINT=https://your-endpoint.com/v1
NVIDIA_DEEPSEEK_API_KEY=your-key
📋 Full configuration reference: See env_template.txt
Using Medea as a Library
Once installed, you can use Medea in your own Python scripts. Here are three simple ways to get started:
🚀 Option 1: Full Medea Agent (Recommended)
Run the complete Medea agent with research planning, analysis, and literature reasoning modules:
import os
from medea import medea, AgentLLM, LLMConfig
from medea import ResearchPlanning, Analysis, LiteratureReasoning
from medea import (
ResearchPlanDraft, ContextVerification, IntegrityVerification,
CodeGenerator, AnalysisExecution, CodeDebug, AnalysisQualityChecker,
LiteratureSearch, PaperJudge, OpenScholarReasoning
)
# Step 1: Initialize LLMs
backbone_llm = "gpt-4o"
llm_config = LLMConfig({"temperature": 0.4})
research_llm = AgentLLM(llm_config, llm_name=backbone_llm)
analysis_llm = AgentLLM(llm_config, llm_name=backbone_llm)
literature_llm = AgentLLM(llm_config, llm_name=backbone_llm)
# Step 2: Configure module specific actions
research_actions = [
ResearchPlanDraft(tmp=0.4, llm_provider=backbone_llm),
ContextVerification(tmp=0.4, llm_provider=backbone_llm),
IntegrityVerification(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]
analysis_actions = [
CodeGenerator(tmp=0.4, llm_provider=backbone_llm),
AnalysisExecution(),
CodeDebug(tmp=0.4, llm_provider=backbone_llm),
AnalysisQualityChecker(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]
literature_actions = [
LiteratureSearch(model_name=backbone_llm, verbose=True),
PaperJudge(model_name=backbone_llm, verbose=True),
OpenScholarReasoning(tmp=0.4, llm_provider=backbone_llm, verbose=True)
]
# Step 3: Create module
research_planning_module = ResearchPlanning(llm=research_llm, actions=research_actions)
analysis_module = Analysis(llm=analysis_llm, actions=analysis_actions)
literature_module = LiteratureReasoning(llm=literature_llm, actions=literature_actions)
# Step 4: Run Medea
result = medea(
user_instruction="Which gene is the best therapeutic target for RA in CD4+ T cells?",
experiment_instruction=None, # Optional: additional experiment context
research_planning_module=research_planning_module,
analysis_module=analysis_module,
literature_module=literature_module,
debate_rounds=2, # Number of panel discussion rounds
timeout=800 # Timeout in seconds per process
)
# Step 5: Get your answer
print(result['final']) # Medea (full agent) output
print(result['P']) # Research plan from ResearchPlanning module
print(result['PA']) # ResearchPlanning + Analysis module output
print(result['R']) # LiteratureReasoning output
🔬 Option 2: Research Planning + In-Silico Experiment Only
Run computational experiments without literature search:
from medea import experiment_analysis, AgentLLM, LLMConfig
from medea import ResearchPlanning, Analysis
from medea import (
ResearchPlanDraft, ContextVerification, IntegrityVerification,
CodeGenerator, AnalysisExecution, CodeDebug, AnalysisQualityChecker
)
# Step 1: Initialize LLMs
backbone_llm = "gpt-4o"
llm_config = LLMConfig({"temperature": 0.4})
research_llm = AgentLLM(llm_config, llm_name=backbone_llm)
analysis_llm = AgentLLM(llm_config, llm_name=backbone_llm)
# Step 2: Configure actions
research_actions = [
ResearchPlanDraft(tmp=0.4, llm_provider=backbone_llm),
ContextVerification(tmp=0.4, llm_provider=backbone_llm),
IntegrityVerification(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]
analysis_actions = [
CodeGenerator(tmp=0.4, llm_provider=backbone_llm),
AnalysisExecution(),
CodeDebug(tmp=0.4, llm_provider=backbone_llm),
AnalysisQualityChecker(tmp=0.4, llm_provider=backbone_llm, max_iter=2)
]
# Step 3: Create modules
research_planning_module = ResearchPlanning(llm=research_llm, actions=research_actions)
analysis_module = Analysis(llm=analysis_llm, actions=analysis_actions)
# Step 4: Run experiment
plan, result = experiment_analysis(
query="Identify therapeutic targets for rheumatoid arthritis in CD4+ T cells",
research_planning_module=research_planning_module,
analysis_module=analysis_module
)
print(f"Research Plan:\n{plan}\n")
print(f"Experiment Result:\n{result}")
📚 Option 3: Literature Reasoning Only
Search papers and synthesize insights without computational experiments:
from medea import literature_reasoning, AgentLLM, LLMConfig
from medea import LiteratureReasoning
from medea import LiteratureSearch, PaperJudge, OpenScholarReasoning
# Step 1: Initialize LLM
backbone_llm = "gpt-4o"
llm_config = LLMConfig({"temperature": 0.4})
literature_llm = AgentLLM(llm_config, llm_name=backbone_llm)
# Step 2: Configure actions
literature_actions = [
LiteratureSearch(model_name=backbone_llm, verbose=True),
PaperJudge(model_name=backbone_llm, verbose=True),
OpenScholarReasoning(tmp=0.4, llm_provider=backbone_llm, verbose=True)
]
# Step 3: Create modules
literature_module = LiteratureReasoning(llm=literature_llm, actions=literature_actions)
# Step 4: Search and reason
result = literature_reasoning(
query="What are validated therapeutic targets for rheum
