ScButterfly
A versatile single-cell cross-modality translation method via dual-aligned variational autoencoders
Install / Use
/learn @BioX-NKU/ScButterflyREADME
scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders
Installation
It's prefered to create a new environment for scButterfly
conda create -n scButterfly python==3.9
conda activate scButterfly
scButterfly is available on PyPI, and could be installed using
pip install scButterfly
Installation via Github is also provided
git clone https://github.com/Biox-NKU/scButterfly
cd scButterfly
pip install scButterfly-0.0.9-py3-none-any.whl
This process will take approximately 5 to 10 minutes, depending on the user's computer device and internet connectivition.
Quick Start
Illustrating with the translation between scRNA-seq and scATAC-seq data as an example, scButterfly could be easily used following 3 steps: data preprocessing, model training, predicting and evaluating. More details could be find in scButterfly documents.
Generate a scButterfly model first with following process:
from scButterfly.butterfly import Butterfly
butterfly = Butterfly()
1. Data preprocessing
-
Before data preprocessing, you should load the raw count matrix of scRNA-seq and scATAC-seq data via
butterfly.load_data:butterfly.load_data(RNA_data, ATAC_data, train_id, test_id, validation_id)| Parameters | Description | | ------------- | ------------------------------------------------------------------------------------------ | | RNA_data | AnnData object of shape
n_obs×n_vars. Rows correspond to cells and columns to genes. | | ATAC_data | AnnData object of shapen_obs×n_vars. Rows correspond to cells and columns to peaks. | | train_id | A list of cell IDs for training. | | test_id | A list of cell IDs for testing. | | validation_id | An optional list of cell IDs for validation, if setted None, butterfly will use a default setting of 20% cells in train_id. |Anndata object is a Python object/container designed to store single-cell data in Python packege anndata which is seamlessly integrated with scanpy, a widely-used Python library for single-cell data analysis.
-
For data preprocessing, you could use
butterfly.data_preprocessing:butterfly.data_preprocessing()You could save processed data or output process logging to a file using following parameters.
| Parameters | Description | | ------------ | -------------------------------------------------------------------------------------------- | | save_data | optional, choose save the processed data or not, default False. | | file_path | optional, the path for saving processed data, only used if
save_datais True, default None. | | logging_path | optional, the path for output process logging, if not save, set it None, default None. |scButterfly also support to refine this process using other parameters (more details on scButterfly documents), however, we strongly recommend the default settings to keep the best result for model.
2. Model training
-
Before model training, you could choose to use data augmentation strategy or not. If using data augmentation, scButterfly will generate synthetic samgles with the use of cell-type labels(if
cell_typeinadata.obs) or cluster labels get with Leiden algorithm and MultiVI, a single-cell multi-omics data joint analysis method in Python packages scvi-tools.scButterfly provide data augmentation API:
butterfly.augmentation(aug_type)You could choose parameter
aug_typefromcell_type_augmentationorMultiVI_augmentation, this will cause more training time used, but promise better result for predicting.- If you choose
cell_type_augmentation, scButterfly-T (Type) will try to findcell_typeinadata.obs. If failed, it will automaticly transfer toMultiVI_augmentation. - If you choose
MultiVI_augmentation, scButterfly-C (Cluster) will train a MultiVI model first. - If you just want to using original data for scButterfly-B (Basic) training, set
aug_type = None.
- If you choose
-
You could construct a scButterfly model as following:
butterfly.construct_model(chrom_list)scButterfly need a list of peaks count for each chromosome, remember to sort peaks with chromosomes.
| Parameters | Description | | ------------ | ---------------------------------------------------------------------------------------------- | | chrom_list | a list of peaks count for each chromosome, remember to sort peaks with chromosomes. | | logging_path | optional, the path for output model structure logging, if not save, set it None, default None. |
-
scButterfly model could be easily trained as following:
butterfly.train_model()| Parameters | Description | | ------------ | --------------------------------------------------------------------------------------- | | output_path | optional, path for model check point, if None, using './model' as path, default None. | | load_model | optional, the path for load pretrained model, if not load, set it None, default None. | | logging_path | optional, the path for output training logging, if not save, set it None, default None. |
scButterfly also support to refine the model structure and training process using other parameters for
butterfly.construct_model()andbutterfly.train_model()(more details on scButterfly documents).
3. Predicting and evaluating
-
scButterfly provide a predicting API, you could get predicted profiles as follow:
A2R_predict, R2A_predict = butterly.test_model()A series of evaluating method also be integrated in this function, you could get these evaluation using parameters:
| Parameters | Description | | ------------- | ------------------------------------------------------------------------------------------- | | output_path | optional, path for model evaluating output, if None, using './model' as path, default None. | | load_model | optional, the path for load pretrained model, if not load, set it None, default False. | | model_path | optional, the path for pretrained model, only used if
load_modelis True, default None. | | test_cluster | optional, test the correlation evaluation or not, including AMI, ARI, HOM, NMI, default False.| | test_figure | optional, draw the tSNE visualization for prediction or not, default False. | | output_data | optional, output the prediction to file or not, if True, output the prediction tooutput_path/A2R_predict.h5adandoutput_path/R2A_predict.h5ad, default False. |
