AetherCell
AetherCell is a hierarchical generative framework designed to predict context-specific transcriptomic responses to drugs and genetic perturbations. By bridging high-resolution RNA-seq contexts with massive L1000 screens, it unifies biological identity and functional state shifts into a shared 256-dimensional manifold.
Install / Use
/learn @Wenyuan-AI4science/AetherCellREADME
AetherCell: A Generative Engine for Virtual Cell Perturbation and Drug Discovery
Repository accompanying a manuscript under peer review. AetherCell is a generative modelling framework for virtual cell perturbation, drug response prediction, and drug repurposing from transcriptomic data.
Manuscript title: AetherCell: A generative engine for virtual cell perturbation and in vivo drug discovery
Resources
- Preprint: bioRxiv
- Model weights: Hugging Face — liwenyuan99/AetherCell
- Web demo: http://101.32.8.25/
- Processed datasets: Zenodo

Overview
AetherCell is designed to support three related tasks from transcriptomic and molecular inputs:
- Virtual perturbation modelling for drug and gene perturbations
- Drug response prediction for cancer cell lines
- Drug repurposing for disease-oriented candidate ranking
The framework aligns context-rich bulk RNA-seq data with perturbation-dense L1000 data in a shared latent space, enabling perturbation modelling across data regimes and downstream screening workflows. Beyond a static methodology, AetherCell is packaged as an "Agent-Ready" engine. We provide not only the model weights and Python APIs, but also fully executable Skills (e.g., automated perturbation-to-enrichment pipelines) that can be seamlessly integrated into AI scientist workflows or natural-language interfaces. This repository is intended as a research-facing entry point rather than a full methodological exposition. For architecture, training objectives, benchmark design, and quantitative results, please refer to the associated manuscript.
Why this repository is structured this way
For a project positioned as a serious biomedical research contribution, the repository should help readers do three things quickly:
- understand the scientific scope
- reproduce the main access pathways
- evaluate the practical usability of the system AetherCell demonstrates a development pattern we believe is especially useful in biomedicine: pairing a domain model with a task-oriented interaction layer. In this repository, that means the model is accessible through both a programmable API and a natural-language workflow, making it easier for non-specialist users to test core capabilities without rewriting infrastructure.
Access modes
1. Web demo
The fastest way to test the core drug-screening workflow is the online demo:
The demo exposes AetherCell’s screening pipeline and uses an LLM to generate an end-to-end report with interpretable supporting rationale.
Note: daily LLM API quota on the website is limited.
2. Python API
For local use, batch experiments, and integration into custom research pipelines, AetherCell can be called through Python.
Important: using the Python API requires downloading the model files from Hugging Face first.
3. Agent-Ready Skills (via Claude Code & CLI)
To bridge the gap between static model weights and executable science, AetherCell comes with predefined Skills—automated, end-to-end workflows that can be triggered via natural language (using Claude Code) or automated AI agents. Featured Skills:
- 🧬 Perturbation-to-Enrichment Pipeline: Input a drug or gene target, and the skill automatically predicts the transcriptomic changes, calculates fold changes, extracts top Differentially Expressed Genes (DEGs), and runs downstream enrichment analysis.
- 💊 Disease-Centric Drug Screening: Input a disease name, and the skill queries the MoE predictor to generate a ranked list of repurposed drug candidates with actionable insights.
Important: Claude Code mode also requires downloading the model files from Hugging Face first.
Quick Demo via Claude Code:
npm install -g @anthropic-ai/claude-code
cd aethercell-drug-discovery-v1.0.0
claude
Core research tasks
| Task | Input | Output | |------|------|--------| | Virtual perturbation | Drug SMILES + cell line, or gene perturbation target | Predicted transcriptomic response, fold changes, top DEGs | | Drug response prediction | Drug SMILES + cancer cell line | Sensitivity / resistance estimate | | Drug repurposing | Disease name | Ranked candidate drugs |
Supported perturbation types
- Drug treatment (
drug) - Gene knockdown via shRNA (
sh) - Gene overexpression (
oe) - Gene knockout via CRISPR (
xpr)
Quick start
Environment setup
conda env create -f environment.yml
conda activate aethercell
Download model weights
Before using either the Python API or Claude Code workflow, download the required weights from:
https://huggingface.co/liwenyuan99/AetherCell
Minimal usage examples
Transcriptome prediction
from models.transcriptome_prediction.transcriptome_inference import TranscriptomePredictor
predictor = TranscriptomePredictor(
model_type='l1000',
perturbation='drug',
device='cpu'
)
result = predictor.predict(
drug_smiles='CC(=O)Oc1ccccc1C(=O)O',
cell_line='MCF7'
)
Drug response prediction
from models.ic50_prediction.ic50_inference import IC50Predictor
predictor = IC50Predictor(device='cpu')
result = predictor.predict(drug_smiles='...', cell_line='A549')
Drug repurposing
from models.moe_repurposing.moe_inference import MoEPredictor
predictor = MoEPredictor()
results = predictor.predict_for_disease('Alzheimer disease', top_n=10)
Reproducibility resources
Data
| Dataset | Source | Scale | |---------|--------|-------| | Bulk RNA-seq pre-training | TCGA, CCLE, GEO | 519,609 samples | | L1000 perturbation data | CMap LINCS project | ~1.3M standardized pairs |
Processed datasets: Zenodo
Model assets
Pre-trained weights are available at Hugging Face.
| Model asset | Description | |-------------|-------------| | Perturbation predictors | Drug / sh / oe / xpr perturbation modules | | AC-RP | Drug response prediction | | PK-MoE | Drug repurposing system |
Limitations
- Performance may vary across perturbation classes, cell lines, and biological contexts.
- Gene perturbation tasks rely on the availability and quality of perturbation-specific representations.
- Drug repurposing outputs are hypothesis-generating and require downstream experimental validation.
- The web demo depends on limited daily LLM API quota and is not guaranteed to provide uninterrupted service.
Responsible use
FOR RESEARCH USE ONLY This repository and its associated model assets are intended for research use. They are not validated for clinical use, diagnosis, patient stratification, or treatment decision-making. Any biological or therapeutic hypothesis generated by the system should be independently evaluated and experimentally validated.
Citation
If you use AetherCell in your research, please cite:
@article{li2026aethercell,
title = {AetherCell: A Generative Engine for Virtual Cell Perturbation and In Vivo Drug Discovery},
author = {Li, Wenyuan and Chen, Yang and Peng, Zhaoyi and Xiang, Lei and Wang, Dong and Xie, Zhi},
journal = {bioRxiv},
year = {2026},
doi = {10.64898/2026.03.13.710968},
url = {https://www.biorxiv.org/content/10.64898/2026.03.13.710968v1}
}
Use of this repository, model weights, outputs, or derivative models in any publication, preprint, report, benchmark, presentation, or public release requires citation of the above preprint in accordance with the license terms.
License
This project is distributed under the AetherCell Research License v1.0. Permitted use Non-commercial academic research Non-commercial scientific evaluation Internal reproduction for research purposes Fine-tuning, adaptation, or improvement for non-commercial research only
Conditions Citation of the AetherCell preprint is mandatory for any use of the repository, model, model weights, outputs, or any derived / fine-tuned / adapted / improved model in a publication, preprint, report, benchmark, presentation, or other public disclosure
Any redistributed derivative model must retain this attribution and citation notice See LICENSE for full terms.
