ScGPCL

The official source code for "Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning", accepted at Bioinformatics (Volume 39, June 2023) and 2023 ICML workshop on Computational Biology.

Generate Convert Improve

Install / Use

/learn @Junseok0207/ScGPCL

About this skill

Quality Score

0/100

README

Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning

The official source code for Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning, accepted at Bioinformatics (Volume 39, June 2023) and 2023 ICML workshop on Computational Biology.

Overview

Single-cell RNA sequencing (scRNA-seq) enables researchers to study cellular heterogeneity by measuring transcriptome-wide gene expression at single cell level. To this end, identifying subgroups of cells with clustering techniques becomes an important task for downstream analysis. However, challenges on the scRNA-seq data such as pervasive dropout phenomena and high dimensionality hinder obtaining robust clustering outputs. Although many existing works are proposed to alleviate these problems, we argue that they fall short of fully leveraging the relational information inherent in the data, and most of them only adopt reconstruction-based losses that highly depend on the quality of features. In this paper, we propose a graph-based prototypical contrastive learning method, named scGPCL. Specifically, given a cell-gene bipartite graph that captures the natural relationship inherent in the scRNA-seq data, scGPCL encodes the cell representations based on Graph Neural Networks (GNNs), and utilizes prototypical contrastive learning scheme to learn cell representations by pushing apart semantically disimillar pairs and pulling together similar ones. Through extensive experiments on both simulated and real scRNA-seq data, we demonstrate that scGPCL not only obtains robust cell clustering outputs, but also handles the large-scale scRNA-seq data.

Requirements

Python version : 3.9.7
Pytorch version : 1.10.1
torch-geometric version : 2.0.3
scanpy : 1.8.2

Download and pre-processing data (Real single-cell RNA-seq data)

Option 1 : Download preprocessed data

You can download proprocessed data here

Option 2 : Download rawdata and follow preprocessing steps

Create the directory to save the raw and preprocessed data.

mkdir raw_data

Download and save the data to raw_data directory from following references.

Follow the preprocessing.ipynb to prepare the input data.

How to simulate

To demonstrate the effectiveness of our model, we conduct experiments on the challenging simulated datasets.
All of the simulated datasets are generated by using Splatter Package and you can follow our simulation settings from simulate.ipynb

Or, I also upload my simulated data on the 'data' folder, so you can simply use this.

How to Run

git clone https://github.com/Junseok0207/scGPCL.git
cd scGPCL

Case 1: Evaluation under Dropout Phenomena

sh scripts/Dropout.sh

Case 2: Evaluation under Low Signal

sh scripts/Sigma.sh

Case 3: Evaluation under Imbalanced Subgroups of Cells

sh scripts/Imb.sh

Real single-cell RNA-seq datasets

sh scripts/Real.sh

Or you can reproduce our experiment result with reproduce.ipynb file.

Hyperparameters

--name: Name of the dataset.
usage example :--dataset Zeisel

--recon: Type of reconstruction loss.
usage example :--recon zinb

--n_clusers: Number of Clusters.
usage example :--n_clusers 4

--HVG: threshold for variance filtering.
usage example :--HVG 0.2

--lr: Learning rate to train scGPCL.
usage example :--lr 0.001

--tau: Temperature for contrastive loss.
usage example :--tau 0.25

--r: Theshold to terminate pre-training phase.
usage example :--thres 0.8

--tol: tolerance for delta clustering labels to terminate fine-tuning phase.
usage example :--thres 0.8

--lam1: Weight for Node-wise Consistency Regularization loss
usage example :--lam 0.5

--lam2: Weight for Label-guided Consistency Regularization loss
usage example :--lam2 0.5

--lam3: Weight for Label-guided Consistency Regularization loss
usage example :--lam3 0.5

Using above hyper-parmeters, you can run our model with following codes

python main.py --recon zinb --name Zeisel --n_clusers 9 --lr 0.0001 --tau 0.25 --r 0.99 --tol 0.0001 --lam1 1.0 --lam2 0.05 --lam3 1.0

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

API

A learning and reflection platform designed to cultivate clarity, resilience, and antifragile thinking in an uncertain world.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

Junseok0207

View profile

View on GitHub

GitHub Stars23

CategoryEducation

Updated1d ago

Forks1

Junseok0207/scGPCL

Languages

Jupyter Notebook

Security Score

75/100

Audited on Mar 23, 2026

No findings