INSTINCT
Multi-sample integration of spatial chromatin accessibility sequencing data via stochastic domain translation
Install / Use
/learn @yyLIU12138/INSTINCTREADME
INSTINCT
Multi-sample integration of spatial chromatin accessibility sequencing data via stochastic domain translation
System requirements
The package development version is tested on Windows operating systems. The developmental version of the package has been tested on the following systems:
Linux: Ubuntu 20.04
Windows
Installation
Clone the repository.
git clone https://github.com/yyLIU12138/INSTINCT.git
cd INSTINCT
Create an environment.
conda create -n epi_INSTINCT python=3.10
conda activate epi_INSTINCT
Install the required packages.
pip install -r requirement.txt
Install INSTINCT.
python setup.py build
python setup.py install
Installation takes a few minutes.
Tutorial
Detailed version of tutorials for INSTINCT can be found on the Read the Docs website.
Import the package.
import torch
import anndata as ad
from sklearn.decomposition import PCA
import INSTINCT
import warnings
warnings.filterwarnings("ignore")
Load the anndata type data samples into a list.
data_dir = '../demo_data/EpiTran_MouseBrain_Jiang2023/'
slice_name_list = ['E11_0-S1', 'E13_5-S1', 'E15_5-S1', 'E18_5-S1']
cas_list = [ad.read_h5ad(data_dir + sample + '_atac.h5ad') for sample in slice_name_list]
for j in range(len(cas_list)):
cas_list[j].obs_names = [x + '_' + slice_name_list[j] for x in cas_list[j].obs_names]
Merge the peaks.
cas_list = INSTINCT.peak_sets_alignment(cas_list)
Preprocessing (If the data samples already incorparate fragment count matrices, then set use_fragment_count=False).
adata_concat = ad.concat(cas_list, label="slice_name", keys=slice_name_list)
INSTINCT.preprocess_CAS(cas_list, adata_concat, use_fragment_count=True, min_cells_rate=0.03)
Use PCA to reduce the dimensionality of the concatenated data to 100. The matrix of shape N*100 should be stored in adata_concat.obsm['X_pca'].
pca = PCA(n_components=100, random_state=1234)
input_matrix = pca.fit_transform(adata_concat.X.toarray())
adata_concat.obsm['X_pca'] = input_matrix
Construct the neighbor graph
INSTINCT.create_neighbor_graph(cas_list, adata_concat)
Train the model. The low-dimensional representations for spots are stored in .obsm['INSTINCT_latent'] of each slice in cas_list.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
INSTINCT_model = INSTINCT.INSTINCT_Model(cas_list,
adata_concat,
input_mat_key='X_pca', # the key of the input matrix in adata_concat.obsm
input_dim=100, # the input dimension
hidden_dims_G=[50], # hidden dimensions of the encoder and the decoder
latent_dim=30, # the dimension of latent space
hidden_dims_D=[50], # hidden dimensions of the discriminator
lambda_adv=1, # hyperparameter for the adversarial loss
lambda_cls=10, # hyperparameter for the classification loss
lambda_la=20, # hyperparameter for the latent loss
lambda_rec=10, # hyperparameter for the reconstruction loss
seed=1236, # random seed
learn_rates=[1e-3, 5e-4], # learning rate
training_steps=[500, 500], # training_steps
use_cos=True, # use cosine similarity to find the nearest neighbors
margin=10, # the margin of latent loss
alpha=1, # the hyperparameter for triplet loss
k=50, # the amount of neighbors to find
device=device)
INSTINCT_model.train(report_loss=True, report_interval=100)
INSTINCT_model.eval(cas_list)
Training the model takes about one minute using GPU (RTX 4090D 24GB).
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
