AetherCell: A Generative Engine for Virtual Cell Perturbation and Drug Discovery

Repository accompanying a manuscript under peer review. AetherCell is a generative modelling framework for virtual cell perturbation, drug response prediction, and drug repurposing from transcriptomic data.

Manuscript title: AetherCell: A generative engine for virtual cell perturbation and in vivo drug discovery

Resources

Preprint: bioRxiv
Model weights: Hugging Face — liwenyuan99/AetherCell
Web demo: http://101.32.8.25/
Processed datasets: Zenodo

AetherCell Framework

Overview

AetherCell is designed to support three related tasks from transcriptomic and molecular inputs:

Virtual perturbation modelling for drug and gene perturbations
Drug response prediction for cancer cell lines
Drug repurposing for disease-oriented candidate ranking

The framework aligns context-rich bulk RNA-seq data with perturbation-dense L1000 data in a shared latent space, enabling perturbation modelling across data regimes and downstream screening workflows. Beyond a static methodology, AetherCell is packaged as an "Agent-Ready" engine. We provide not only the model weights and Python APIs, but also fully executable Skills (e.g., automated perturbation-to-enrichment pipelines) that can be seamlessly integrated into AI scientist workflows or natural-language interfaces. This repository is intended as a research-facing entry point rather than a full methodological exposition. For architecture, training objectives, benchmark design, and quantitative results, please refer to the associated manuscript.

Why this repository is structured this way

For a project positioned as a serious biomedical research contribution, the repository should help readers do three things quickly:

understand the scientific scope
reproduce the main access pathways
evaluate the practical usability of the system AetherCell demonstrates a development pattern we believe is especially useful in biomedicine: pairing a domain model with a task-oriented interaction layer. In this repository, that means the model is accessible through both a programmable API and a natural-language workflow, making it easier for non-specialist users to test core capabilities without rewriting infrastructure.

Access modes

1. Web demo

The fastest way to test the core drug-screening workflow is the online demo:

http://101.32.8.25/

The demo exposes AetherCell’s screening pipeline and uses an LLM to generate an end-to-end report with interpretable supporting rationale.

Note: daily LLM API quota on the website is limited.

2. Python API

For local use, batch experiments, and integration into custom research pipelines, AetherCell can be called through Python.

Important: using the Python API requires downloading the model files from Hugging Face first.

3. Agent-Ready Skills (via Claude Code & CLI)

To bridge the gap between static model weights and executable science, AetherCell comes with predefined Skills—automated, end-to-end workflows that can be triggered via natural language (using Claude Code) or automated AI agents. Featured Skills:

🧬 Perturbation-to-Enrichment Pipeline: Input a drug or gene target, and the skill automatically predicts the transcriptomic changes, calculates fold changes, extracts top Differentially Expressed Genes (DEGs), and runs downstream enrichment analysis.
💊 Disease-Centric Drug Screening: Input a disease name, and the skill queries the MoE predictor to generate a ranked list of repurposed drug candidates with actionable insights.

Important: Claude Code mode also requires downloading the model files from Hugging Face first.

Quick Demo via Claude Code:

npm install -g @anthropic-ai/claude-code
cd aethercell-drug-discovery-v1.0.0
claude

Core research tasks

| Task | Input | Output | |------|------|--------| | Virtual perturbation | Drug SMILES + cell line, or gene perturbation target | Predicted transcriptomic response, fold changes, top DEGs | | Drug response prediction | Drug SMILES + cancer cell line | Sensitivity / resistance estimate | | Drug repurposing | Disease name | Ranked candidate drugs |

Supported perturbation types

Drug treatment (drug)
Gene knockdown via shRNA (sh)
Gene overexpression (oe)
Gene knockout via CRISPR (xpr)

Quick start

Environment setup

conda env create -f environment.yml
conda activate aethercell

Download model weights

Before using either the Python API or Claude Code workflow, download the required weights from:

https://huggingface.co/liwenyuan99/AetherCell

Minimal usage examples

Transcriptome prediction

from models.transcriptome_prediction.transcriptome_inference import TranscriptomePredictor

predictor = TranscriptomePredictor(
    model_type='l1000',
    perturbation='drug',
    device='cpu'
)

result = predictor.predict(
    drug_smiles='CC(=O)Oc1ccccc1C(=O)O',
    cell_line='MCF7'
)

Drug response prediction

from models.ic50_prediction.ic50_inference import IC50Predictor

predictor = IC50Predictor(device='cpu')
result = predictor.predict(drug_smiles='...', cell_line='A549')

Drug repurposing

from models.moe_repurposing.moe_inference import MoEPredictor

predictor = MoEPredictor()
results = predictor.predict_for_disease('Alzheimer disease', top_n=10)

Reproducibility resources

Data

| Dataset | Source | Scale | |---------|--------|-------| | Bulk RNA-seq pre-training | TCGA, CCLE, GEO | 519,609 samples | | L1000 perturbation data | CMap LINCS project | ~1.3M standardized pairs |

Processed datasets: Zenodo

Model assets

Pre-trained weights are available at Hugging Face.

| Model asset | Description | |-------------|-------------| | Perturbation predictors | Drug / sh / oe / xpr perturbation modules | | AC-RP | Drug response prediction | | PK-MoE | Drug repurposing system |

Limitations

Performance may vary across perturbation classes, cell lines, and biological contexts.
Gene perturbation tasks rely on the availability and quality of perturbation-specific representations.
Drug repurposing outputs are hypothesis-generating and require downstream experimental validation.
The web demo depends on limited daily LLM API quota and is not guaranteed to provide uninterrupted service.

Responsible use

FOR RESEARCH USE ONLY This repository and its associated model assets are intended for research use. They are not validated for clinical use, diagnosis, patient stratification, or treatment decision-making. Any biological or therapeutic hypothesis generated by the system should be independently evaluated and experimentally validated.

Citation

If you use AetherCell in your research, please cite:

@article{li2026aethercell,
  title   = {AetherCell: A Generative Engine for Virtual Cell Perturbation and In Vivo Drug Discovery},
  author  = {Li, Wenyuan and Chen, Yang and Peng, Zhaoyi and Xiang, Lei and Wang, Dong and Xie, Zhi},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.64898/2026.03.13.710968},
  url     = {https://www.biorxiv.org/content/10.64898/2026.03.13.710968v1}
}

Use of this repository, model weights, outputs, or derivative models in any publication, preprint, report, benchmark, presentation, or public release requires citation of the above preprint in accordance with the license terms.

License

This project is distributed under the AetherCell Research License v1.0. Permitted use Non-commercial academic research Non-commercial scientific evaluation Internal reproduction for research purposes Fine-tuning, adaptation, or improvement for non-commercial research only

Conditions Citation of the AetherCell preprint is mandatory for any use of the repository, model, model weights, outputs, or any derived / fine-tuned / adapted / improved model in a publication, preprint, report, benchmark, presentation, or other public disclosure

Any redistributed derivative model must retain this attribution and citation notice See LICENSE for full terms.

AetherCell

Install / Use

README