MiT4SL
MiT4SL is the first machine learning model for cross cell line prediction of synthetic lethal (SL) gene pairs. It uses a novel method of triplet representation learning to encode cell line information by integrating multi-omics data of gene expression, PPI network and protein sequences, etc.
Install / Use
/learn @JieZheng-ShanghaiTech/MiT4SLREADME
MiT4SL: Context-aware deep learning enables adaptive synthetic lethality prediction across cancer cell lines
</div>Overview

Overview of MiT4SL. MiT4SL is designed to predict SL interactions across diverse contexts, ranging from well-characterized to unexplored cell lines. To address data sparsity and context-specificity, MiT4SL incorporates cell-line-specific information with effective gene-pair representations. This flexible framework achieves superior performance in both established and unseen cell lines. Beyond its predictive accuracy, the versatility of the triplet representation allows MiT4SL to serve diverse roles. For example, it can identify novel SL partners for a target gene or prioritize optimal cellular contexts for specific gene-pair interactions.
Table of Contents
1. Installation
Create a new environment
First, create a new virtual environment for MiT4SL. We recommend using Python >=3.10, and the local verified environment uses Python 3.10.6.
# Create a new environment with Python 3.10
conda create -n mit4sl python=3.10
# Activate the environment
conda activate mit4sl
Install dependencies
We provide two options for installing dependencies.
Option 1: install from pyproject.toml
Please upgrade packaging tools first:
python -m pip install --upgrade pip setuptools
Then install the package in editable mode:
pip install -e .
Option 2: install from requirements.txt
pip install -r requirements.txt
📌 Note: Install PyTorch Geometric-related wheels manually
If the default installation does not resolve the PyTorch Geometric stack correctly on your machine, install the PyTorch Geometric-related extensions against your local PyTorch/CUDA build. For example, for PyTorch 1.12.1 + CUDA 11.3:
pip install torch-scatter==2.1.0 torch-sparse==0.6.16 -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip install torch-geometric==1.6.0
Browse data.pyg.org/whl for other CUDA/PyTorch combinations. Adjust the
--find-linksURL to match your installed PyTorch/CUDA version.
2. Download data
Due to data size and availability restrictions, the target dataset must be manually downloaded. You can access it from the dataset URL: Dataset. Make sure to download the dataset and place it in the appropriate directory data before running the program.
Step 1: Download the Dataset.
For the local archive package data.zip, the current release metadata is:
| Field | Value |
| --- | --- |
| Archive name | data.zip |
| Version | v2026.03.31 |
| Release date | 2026-03-31 |
| File size | 1,030,474,683 bytes |
| Checksum | SHA-256: d06124f2de3d6766f222c21800e91dcee10fb959c06a4cbf4189f1a3a9c791a8 |
| License / usage notes | MIT |
Step 2: Place everything under data/ at the project root. The expected layout is:
data/
├── MultiOmics_feature/
├── SLbench/
├── SL_partner_recommendation/
├── Cell_line_recommendation/
└── Case_study_TE1/
The main dataset resources are organized as follows:
| Folder | Contents |
| ------------------------------ | ---------------------------------------------------------------------------------------------- |
| MultiOmics_feature/ | PrimeKG-derived graph assets, protein sequence embeddings, and cell-line-specific PPI features |
| SLbench/ | Benchmark splits for specific-cell-line and cross-cell-line SL prediction |
| SL_partner_recommendation/ | Partner recommendation tasks such as A549_KRAS and A549_TP53 |
| Cell_line_recommendation/ | Recommendation benchmarks for the Dede and Ito collections |
| Case_study_TE1/ | TE-1 case-study data |
💡 Additional notes for selected datasets are available in:
data/MultiOmics_feature/README.mddata/Cell_line_recommendation/README.mddata/Case_study_TE1/README.md
3. Run MiT4SL
Quick start
Launch the default experiment (cross-cell-line, target A549):
bash scripts/run_mit4sl.sh
If you need to override the configured runtime device, add --device <id> (for example, --device 0).
Optionally, inspect the default configuration files selected by the launcher and preview the full training command without starting the run:
bash scripts/run_mit4sl.sh --dry-run
By default, the launcher runs the cross-cell-line example with target cell line A549:
python src/train_MiT4SL.py \
--cfg configs/cross_cell_line/protocol.yaml \
--cfg configs/cross_cell_line/Multi_5_to_A549.yaml
List available targets for a config set
bash scripts/run_mit4sl.sh --config-dir cross_cell_line --list-targets
Run other scenarios
- Cell-line-specific random splitting:
bash scripts/run_mit4sl.sh --config-dir cell_line_specific/random --target A549
- Cross-cell-line transfer to another target cell line:
bash scripts/run_mit4sl.sh --config-dir cross_cell_line --target 22Rv1
- SL partner recommendation:
bash scripts/run_mit4sl.sh --config-dir recom_sl_partner --target A549_KRAS
- Cell line recommendation:
bash scripts/run_mit4sl.sh --config-dir recom_sl_cell_line/dede --target A549
Run the training script directly
If you prefer to skip the shell wrapper, you can run the training script directly by providing both the protocol config and the target config:
python src/train_MiT4SL.py \
--cfg <protocol.yaml> \
--cfg <target.yaml>
You can also override the output directory or runtime device:
python src/train_MiT4SL.py \
--cfg configs/cross_cell_line/protocol.yaml \
--cfg configs/cross_cell_line/Multi_5_to_A549.yaml \
--device 0 \
--Save_model_path result/custom_run
Outputs
By default, run outputs are written under the configured RESULT.SAVE_PATH (typically result/).
result/
└── <setting>/
└── <cell_or_target>/
└── <run_tag>/
├── checkpoint
├── train.log
├── <cell_or_target>_results.txt
├── final_result_eval.csv
├── resolved_config.yaml
└── run_metadata.json
The main output files are:
checkpoint: saved model checkpoint, including the model state and optimizer state.train.log: full training log, including setup information and periodic training, validation, and test metrics.<cell_or_target>_results.txt: human-readable summary of per-run results, together with the final mean and standard deviation.final_result_eval.csv: compact table of the evaluation metrics for each run, plus aggregatedaverageandstdrows.resolved_config.yaml: the fully merged runtime config after combiningprotocol.yamlwith the target-specific YAML.run_metadata.json: structured metadata describing the resolved run, such as config files, repeat mode, selected learning rate, and effective epoch budget.
4. Configuration system
MiT4SL uses a two-stage configuration pattern:
protocol.yamlstores the shared settings for one experiment family.- A target-specific YAML stores the cell-line- or task-specific override.
The launcher resolves and merges this pair automatically. For example:
bash scripts/run_mit4sl.sh --config-dir cross_cell_line --target A549
For the full configuration catalog and directory layout, see configs/README.md.
5. Project Structure
MiT4SL/
├── configs/ # Experiment YAMLs (protocol + target overrides)
├── data/ # Released datasets and supporting assets
├── result/ # Output directory for runs
├── scripts/ # Shell launcher and script-level docs
├── src/ # Core model, training, utilities, config loading
├── tests/ # Regression and integrity tests
├── tutorials/ # Notebooks for rebuilding contextualized PPI assets and SL benchmark splits
├── fig_overview_mit4sl.png # Overview figure used in the README
├── pyproject.toml # Project metadata and Python requirement
└── requirements.txt # Pinned dependency list
For readers who want to understand or rebuild the processed assets, see tutorials/README.md and the notebooks under tutorials/,
including:
- tutorials/contextualized_PPI_construction.ipynb
- tutorials/cell_line_specific_scenario_construction.ipynb
- tutorials/cross_cell_line_scenario_constrcution.ipynb
6. How to cite
If you find MiT4SL useful in your research, please consider citing:
@article{tao2025mit4sl,
title={MiT
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
