MiT4SL

MiT4SL is the first machine learning model for cross cell line prediction of synthetic lethal (SL) gene pairs. It uses a novel method of triplet representation learning to encode cell line information by integrating multi-omics data of gene expression, PPI network and protein sequences, etc.

Generate Convert Improve

Install / Use

/learn @JieZheng-ShanghaiTech/MiT4SL

About this skill

Quality Score

0/100

README

MiT4SL: Context-aware deep learning enables adaptive synthetic lethality prediction across cancer cell lines

Paper | Data | GitHub

</div>

Overview

MiT4SL overview

Overview of MiT4SL. MiT4SL is designed to predict SL interactions across diverse contexts, ranging from well-characterized to unexplored cell lines. To address data sparsity and context-specificity, MiT4SL incorporates cell-line-specific information with effective gene-pair representations. This flexible framework achieves superior performance in both established and unseen cell lines. Beyond its predictive accuracy, the versatility of the triplet representation allows MiT4SL to serve diverse roles. For example, it can identify novel SL partners for a target gene or prioritize optimal cellular contexts for specific gene-pair interactions.

Installation
Download Data
Run MiT4SL
Configuration System
Project Structure
Cite

1. Installation

Create a new environment

First, create a new virtual environment for MiT4SL. We recommend using Python >=3.10, and the local verified environment uses Python 3.10.6.

# Create a new environment with Python 3.10
conda create -n mit4sl python=3.10 
# Activate the environment
conda activate mit4sl

Install dependencies

We provide two options for installing dependencies.

Option 1: install from pyproject.toml

Please upgrade packaging tools first:

python -m pip install --upgrade pip setuptools

Then install the package in editable mode:

pip install -e .

Option 2: install from requirements.txt

pip install -r requirements.txt

📌 Note: Install PyTorch Geometric-related wheels manually

If the default installation does not resolve the PyTorch Geometric stack correctly on your machine, install the PyTorch Geometric-related extensions against your local PyTorch/CUDA build. For example, for PyTorch 1.12.1 + CUDA 11.3:

pip install torch-scatter==2.1.0 torch-sparse==0.6.16 -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip install torch-geometric==1.6.0

Browse data.pyg.org/whl for other CUDA/PyTorch combinations. Adjust the --find-links URL to match your installed PyTorch/CUDA version.

2. Download data

Due to data size and availability restrictions, the target dataset must be manually downloaded. You can access it from the dataset URL: Dataset. Make sure to download the dataset and place it in the appropriate directory data before running the program.

Step 1: Download the Dataset.

For the local archive package data.zip, the current release metadata is:

| Field | Value | | --- | --- | | Archive name | data.zip | | Version | v2026.03.31 | | Release date | 2026-03-31 | | File size | 1,030,474,683 bytes | | Checksum | SHA-256: d06124f2de3d6766f222c21800e91dcee10fb959c06a4cbf4189f1a3a9c791a8 | | License / usage notes | MIT |

Step 2: Place everything under data/ at the project root. The expected layout is:

data/
├── MultiOmics_feature/
├── SLbench/
├── SL_partner_recommendation/
├── Cell_line_recommendation/
└── Case_study_TE1/

The main dataset resources are organized as follows:

| Folder | Contents | | ------------------------------ | ---------------------------------------------------------------------------------------------- | | MultiOmics_feature/ | PrimeKG-derived graph assets, protein sequence embeddings, and cell-line-specific PPI features | | SLbench/ | Benchmark splits for specific-cell-line and cross-cell-line SL prediction | | SL_partner_recommendation/ | Partner recommendation tasks such as A549_KRAS and A549_TP53 | | Cell_line_recommendation/ | Recommendation benchmarks for the Dede and Ito collections | | Case_study_TE1/ | TE-1 case-study data |

💡 Additional notes for selected datasets are available in:

data/MultiOmics_feature/README.md
data/Cell_line_recommendation/README.md
data/Case_study_TE1/README.md

3. Run MiT4SL

Quick start

Launch the default experiment (cross-cell-line, target A549):

bash scripts/run_mit4sl.sh

If you need to override the configured runtime device, add --device <id> (for example, --device 0).

Optionally, inspect the default configuration files selected by the launcher and preview the full training command without starting the run:

bash scripts/run_mit4sl.sh --dry-run

By default, the launcher runs the cross-cell-line example with target cell line A549:

python src/train_MiT4SL.py \
  --cfg configs/cross_cell_line/protocol.yaml \
  --cfg configs/cross_cell_line/Multi_5_to_A549.yaml

List available targets for a config set

bash scripts/run_mit4sl.sh --config-dir cross_cell_line --list-targets

Run other scenarios

Cell-line-specific random splitting:

bash scripts/run_mit4sl.sh --config-dir cell_line_specific/random --target A549

Cross-cell-line transfer to another target cell line:

bash scripts/run_mit4sl.sh --config-dir cross_cell_line --target 22Rv1

SL partner recommendation:

bash scripts/run_mit4sl.sh --config-dir recom_sl_partner --target A549_KRAS

Cell line recommendation:

bash scripts/run_mit4sl.sh --config-dir recom_sl_cell_line/dede --target A549

Run the training script directly

If you prefer to skip the shell wrapper, you can run the training script directly by providing both the protocol config and the target config:

python src/train_MiT4SL.py \
  --cfg <protocol.yaml> \
  --cfg <target.yaml>

You can also override the output directory or runtime device:

python src/train_MiT4SL.py \
  --cfg configs/cross_cell_line/protocol.yaml \
  --cfg configs/cross_cell_line/Multi_5_to_A549.yaml \
  --device 0 \
  --Save_model_path result/custom_run

Outputs

By default, run outputs are written under the configured RESULT.SAVE_PATH (typically result/).

result/
└── <setting>/
    └── <cell_or_target>/
        └── <run_tag>/
            ├── checkpoint
            ├── train.log
            ├── <cell_or_target>_results.txt
            ├── final_result_eval.csv
            ├── resolved_config.yaml
            └── run_metadata.json

The main output files are:

checkpoint: saved model checkpoint, including the model state and optimizer state.
train.log: full training log, including setup information and periodic training, validation, and test metrics.
<cell_or_target>_results.txt: human-readable summary of per-run results, together with the final mean and standard deviation.
final_result_eval.csv: compact table of the evaluation metrics for each run, plus aggregated average and std rows.
resolved_config.yaml: the fully merged runtime config after combining protocol.yaml with the target-specific YAML.
run_metadata.json: structured metadata describing the resolved run, such as config files, repeat mode, selected learning rate, and effective epoch budget.

4. Configuration system

MiT4SL uses a two-stage configuration pattern:

protocol.yaml stores the shared settings for one experiment family.
A target-specific YAML stores the cell-line- or task-specific override.

The launcher resolves and merges this pair automatically. For example:

bash scripts/run_mit4sl.sh --config-dir cross_cell_line --target A549

For the full configuration catalog and directory layout, see configs/README.md.

5. Project Structure

MiT4SL/
├── configs/                 # Experiment YAMLs (protocol + target overrides)
├── data/                    # Released datasets and supporting assets
├── result/                  # Output directory for runs
├── scripts/                 # Shell launcher and script-level docs
├── src/                     # Core model, training, utilities, config loading
├── tests/                   # Regression and integrity tests
├── tutorials/               # Notebooks for rebuilding contextualized PPI assets and SL benchmark splits
├── fig_overview_mit4sl.png  # Overview figure used in the README
├── pyproject.toml           # Project metadata and Python requirement
└── requirements.txt         # Pinned dependency list

For readers who want to understand or rebuild the processed assets, see tutorials/README.md and the notebooks under tutorials/, including:

tutorials/contextualized_PPI_construction.ipynb
tutorials/cell_line_specific_scenario_construction.ipynb
tutorials/cross_cell_line_scenario_constrcution.ipynb

6. How to cite

If you find MiT4SL useful in your research, please consider citing:

@article{tao2025mit4sl,
  title={MiT

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

JieZheng-ShanghaiTech

View profile

View on GitHub

GitHub Stars6

CategoryEducation

Updated9d ago

Forks1

JieZheng-ShanghaiTech/MiT4SL

Languages

Python

Security Score

70/100

Audited on Mar 31, 2026

No findings