ScDEAL
Deep Transfer Learning of Drug Sensitivity by Integrating Bulk and Single-cell RNA-seq data
Install / Use
/learn @OSU-BMBL/ScDEALREADME
scDEAL documentation
Deep Transfer Learning of Drug Sensitivity by Integrating Bulk and Single-cell RNA-seq data
News 2023/03/19
- Added the function of loading checkpoint weights to the model.
- Added the conda environment applied to produce the result.
- Reorganized and uploaded the resources need for the code.
News 2022/12/04
- Add trained adata.h5ad objects for 6 data examples.
Previous News
- Added the function to use clustering labels to help transfer learning. Details are listed in the usage section.
- Migrate the source of testing data from the FTP to OneDrive.
Installation
Retrieve code from GitHub
The software is a stand-alone python script package. The home directory of scDEAL can be cloned from the GitHub repository:
# Clone from Github
git clone https://github.com/OSU-BMBL/scDEAL.git
# Go into the directory
cd scDEAL
It’s recommended to install the scDEAL under Linux and install the provided conda environment through the conda pack Click here to download scdeal.tar.gz. It’s recommended to install in your root conda environment - the conda pack command will then be available in all sub-environments as well.
Click here to download scdeal.tar.gz from onedrive
Click here to download scdeal.tar.gz from google drive
Install with conda:
conda-pack is available from Anaconda as well as from conda-forge:
conda install conda-pack
conda install -c conda-forge conda-pack
Install from PyPI:
While conda-pack requires an existing conda install, it can also be installed from PyPI:
pip install conda-pack
Load the scDEALenv environment
conda-pack is primarily a command line tool. Full CLI docs can be found here. One common use case is packing an environment on one machine to distribute to other machines that may not have conda/python installed. Place the downloaded scdeal.tar.gz into your scDEAL folder. Import and activate the environment of your target machine:
# Unpack environment into directory `scDEALenv`
mkdir -p scDEALenv
tar -xzf scDEAL.tar.gz -C scDEALenv
# Activate the environment. This adds `scDEALenv/bin` to your path
source scDEALenv/bin/activate
Data Preparation
Data download
After setting up the home directory, you need to download other resources required for the run. Please create and download the zip format dataset from the scDEAL.zip link inside:
Click here to download scDEAL.zip from onedrive
Click here to download scDEAL.zip from google drive
The file "scDEAL.zip" includes all the datasets we have tested. Please extract the zip file and place the sub-directory "data" in the root directory of the "scDEAL" folder. | | Author | Drug | GEO access | Cells | Species | Cancer type | |---------------|------------------------|------------------|-------------------|--------------|-----------------------|----------------------------------------| | Data 1&2 | Sharma, et al. | Cisplatin | GSE117872 | 548 | Homo sapiens | Oral squamous cell carcinomas | | Data 3 | Kong, et al. | Gefitinib | GSE112274 | 507 | Homo sapiens | Lung cancer | | Data 4 | Schnepp, et al. | Docetaxel | GSE140440 | 324 | Homo sapiens | Prostate Cancer | | Data 5 | Aissa, et al. | Erlotinib | GSE149383 | 1496 | Homo sapiens | Lung cancer | | Data 6 | Bell, et al. | I-BET-762 | GSE110894 | 1419 | Mus musculus | Acute myeloid leukemia |
"scDEAL.zip" also includes model checkpoints in the "save" directory. Try to extract the scDEAL.zip.
# Unpack scDEAL.zip into directory `scDEAL`
unzip scDEAL.zip
# View folder
ls -a
#ls results:
#bulkmodel.py DaNN LICENSE README.md save scDEALenv trainers.py utils.py
#casestudy data models.py sampling.py scanpypip scmodel.py trajectory.py
All resources in the home directory of scDEAL should look as follows:
scDEAL
└───scDEALenv
| ...
│ README.md
│ bulkmodel.py
│ scmodel.py
| ...
└───data
│ │ ALL_expression.csv
│ │ ALL_label_binary_wf.csv
│ └───GSE110894
│ └───GSE112274
│ └───GSE117872
│ └───GSE140440
│ └───GSE149383
│ | ...
└───save
| └───logs
| └───figures
| └───models
│ │ └───bulk_encoder
│ │ └───bulk_pre
│ │ └───sc_encoder
│ │ └───sc_pre
│ └───adata
│ │ ...
└───DaNN
└───scanpypip
│ │ ...
Directory contents
Folders in our package will store the corresponding contents:
- root: python scripts to run the program and README.md
- data: datasets required for the learning
- save/logs: log and error files that record running status.
- save/figures & figures: figures generated through the run.
- save/models: models trained through the run.
- save/adata: results from AnnData outputs.
- DaNN: python scripts describe the model.
- scanpypip: python scripts of utilities.
Demo
Pretrained checkpoints
For the scRNA-Seq prediction task, we provide pre-trained checkpoints for the models stored in save/models The naming rules of the checkpoint are as follows:
bulk dataset+"data"+scRNA-Seq dataset+"drug"+drug+"bottle"+bottle+"edim"+encoder dimensions+"pdim"+predictor dimensions+"model"+encoder model+"dropout"+dropout+"gene"+show critical genes+"lr"+learning rate+"mod"+model version+"sam"+sampling method(+"_DANN.pkl"only in the final single cell model)
An example can be:
integrate_data_GSE110894_drug_I.BET.762_bottle_512_edim_256,128_pdim_128,64_model_DAE_dropout_0.1_gene_F_lr_0.5_mod_new_sam_upsampling_DaNN.pkl
Usage: For resuming training, you can use the --checkpoint option of scmodel.py and bulkmodel.py. For example, run scmodel.py with checkpoints to get the single-cell level prediction results:
source scDEALenv/bin/activate
python scmodel.py --sc_data "GSE110894" --dimreduce "DAE" --drug "I.BET.762" --bulk_h_dims "256,128" --bottleneck 512 --predictor_h_dims "128,64" --dropout 0.1 --printgene "F" -mod "new" --lr 0.5 --sampling "upsampling" --printgene "F" -mod "new" --checkpoint "save/sc_pre/integrate_data_GSE110894_drug_I.BET.762_bottle_512_edim_256,128_pdim_128,64_model_DAE_dropout_0.1_gene_F_lr_0.5_mod_new_sam_upsampling_DaNN.pkl"
This step is a built-in testing case of acute myeloid leukemia cells Bell et al.](https://doi.org/10.1038/s41467-019-10652-9) accessed from Gene Expression Omnibus (GEO) accession GSE110894. This step calls the scDEAL model and predicts the sensitivity of I.BET.762 of the input scRNA-Seq data from GSE110984. The file name of the single cell model is "save/sc_pre/integrate_data_GSE110894_drug_I.BET.762_bottle_512_edim_256,128_pdim_128,64_model_DAE_dropout_0.1_gene_F_lr_0.5_mod_new_sam_upsampling_DaNN.pkl". Then we also provide the checkpoint from the bulk level and run bulkmodel.py with checkpoints and then the scmodel.py to get the single-cell level prediction results
source scDEALenv/bin/activate
python bulkmodel.py --drug "I.BET.762" --dimreduce "DAE" --encoder_h_dims "256,128" --predictor_h_dims "128,64" --bottleneck 512 --data_name "GSE110894" --sampling "upsampling" --dropout 0.1 --lr 0.5 --printgene "F" -mod "new" --checkpoint "save/bulk_pre/integrate_data_GSE110894_drug_I.BET.762_bottle_512_edim_256,128_pdim_128,64_model_DAE_dropout_0.1_gene_F_lr_0.5_mod_new_sam_upsampling"
python scmodel.py --sc_data "GSE110894" --dimreduce "DAE" --drug "I.BET.762" --bulk_h_dims "256,128" --bottleneck 512 --predictor_h_dims "128,64" --dropout 0.1 --printgene "F" -mod "new" --lr 0.5 --sampling "upsampling" --printgene "F" -mod "new" --checkpoint "save/sc_pre/integrate_data_GSE110894_drug_I.BET.762_bottle_512_edim_256,128_pdim_128,64_model_DAE_dropout_0.1_gene_F_lr_0.5_mod_new_sam_upsampling_DaNN.pkl"
Remember that the dimension of the encoder and predictor should be identical (--encoder_h_dims(bulk_h_dims) "256,128", --predictor_h_dims "128,64", --bottleneck 256) in two steps. This step takes the expression profile of bulk RNA-Seq and the drug response annotations as input. It loads a drug sensitivity predictor for the drug "I.BET.762." The output model is stored in the directory "save/models." In this case. The file name of the bulk model is "save/bulk_pre/integrate_data_GSE110894_drug_I.BET.762_bottle_512_edim_256,128_pdim_128,64_model_DAE_dropout_0.1_gene_F_lr_0.5_mod_new_sam_upsampling".
Train from scratch
Suggested parameters for our selected datasets are as follows
| data | drug | bottleneck | encoder dimensions | predictor dimensions | encoder model | dropout | learning rate | sampling | |---------------:|----------:|-----------:|-------------------:|---------------------:|--------------:|--------:|--------------:|-----------:| | GSE110894 | I.BET.762 | 512 | 256,128 | 128,64 | DAE | 0.1 |
