Dualsimplex
NMF and complete deconvolution algorithm
Install / Use
/learn @artyomovlab/DualsimplexREADME
DualSimplex algorithm's R package
About the project
This is the implementation of the Dual Simplex method presented in this paper
Non-negative matrix factorization and deconvolution as dual simplex problem
Denis Kleverov, Ekaterina Aladyeva, Alexey Serdyukov, Maxim Artyomov
bioRxiv 2024.04.09.588652; doi: https://doi.org/10.1101/2024.04.09.588652
This in essence is an NMF algorithm which can factorize nonegative matrix V into two nonnegative matrices W and H.
The key feature is that it operates in a lower dimensional space of a Sinkhorn-transformed original matrix, which aligns both row and column data points of the original matrix via two interrelated geometrical simplex structures.
Therefore, in this space we can search only for K (K-1)-dimensional solution points (K is the number of components i.e. the number of columns/rows of W/H).

This method can be applied to:
- The general NMF problem, where it outperforms commonly used methods
- Bulk RNAseq deconvolution
- Single cell clustering
Getting Started
Prerequisites
This is an R language package so you need to have R We tested our code using Rstudio or Rstudio server as IDE environments. We are actively using Bioconductor and devtools packages so you need it to install.
# in your R environment
install.packages("BiocManager")
install.packages("devtools")
Installation
Install from github
devtools::install_github("artyomovlab/DualSimplex")
Or alternatively install from your local directory with this repository
devtools::load_all("path_to_code_directory")
(This is not working yet) After the publication, it will be:
install.packages("DualSimplex")
Usage
Check our additional paper repository for more examples of NMF, bulk-RNAseq deconvolution and single cell clustering
Read/Generate the data
library("DualSimplex")
library(dplyr)
N <- 100 # number of samples (e.g. mixtures)
M <- 10000 # number of features (e.g. genes)
K <- 3 # Number of pure components
sim <- create_simulation(n_genes = M,
n_samples = N,
n_cell_types = K,
with_marker_genes = FALSE)
sim <- sim %>% add_noise(noise_deviation = 3.5)
data_raw <- sim$data
true_W <- sim$basis
true_H <- sim$proportions
Create a Solver object
This performs Sinkhorn scaling, SVD projection, and data annotation
dso <- DualSimplexSolver$new()
dso$set_data(data_raw) # run Sinkhorn procedure
dso$project(K) # project to SVD space
dso$plot_projected("zero_distance", "zero_distance", with_solution = TRUE, use_dims = list(2:3)) # visualize the projection
dso$set_display_dims(list(2:3)) # remember the use_dims choice, to call just dso$plot_projected()
(Optional) Filter the data/remove outliers
This is only if you are willing to remove points from your dataset
plane_distance_threshold <- 0.05 # Change here several times to see result, start with big and lower it
zero_distance_threshold <- 1
dso$distance_filter(plane_d_lt = plane_distance_threshold, zero_d_lt = zero_distance_threshold, genes = T)
dso$project(K)
dso$plot_projection_diagnostics() # See the distribution of points distances
dso$plot_svd_history() # observe changes in SVD variance explained
Identify simplex corners in the projected space
Initialize solution
dso$init_solution("random")
dso$plot_projected("zero_distance", "zero_distance")
Run optimization
dso$optim_solution(
5000,
optim_config(
coef_hinge_H = 1,
coef_hinge_W = 1,
coef_der_X = 0.001,
coef_der_Omega = 0.001
)
)
dso$plot_projected("zero_distance", "zero_distance")
dso$plot_error_history()
Get solution
solution <- dso$finalize_solution()
result_W <- solution$W
result_H <- solution$H
ptp <- coerce_pred_true_props(solution$H, true_H)
plot_ptp_lines(ptp)
Save/Load the results
# Save
dso$save_state("directory_to_save")
# Load
dso <- DualSimplexSolver$from_state("directory_to_save")
Contacts
- Denis Kleverov (@denis_kleverov) (linkedIn )
- Ekaterina Aladyeva (@AladyevaE)
- Alexey Serdyukov (email)
- prof. Maxim Artyomov (@maxim_artyomov) (email)
For developers
Code structure & Guidelines
The following files in the R/ directory represent different stages
of DualSimplex pipeline:
0. simulation.R
1. annotation.R
2. filtering.R
3. sinkhorn.R
4. projection.R
5. initialization.R
6. optimization.R
7. post_analysis.R
8. benchmarking.R
Ideally, main logic functions in a stage shouldn't use functions from another stage, and a downstream stage should only use the objects generated on the previous stage as its input.
Then, either the user or DualSimplexSolver use the main
functions from those packages to implement the whole control flow.
This rule of thumb leads to linear code logic and low code coupling, which makes it simple to debug and introduce changes.
Checking your new functions
Please document your code with roxygene2 comments (as it is done for rest of the package)
- Regenerate NAMESPACE and additional files
devtools::document()
- ensure standard devtools check is returning 0 errors
devtools::check()
- ensure package is installable from your repository
devtools::install_github("your_github_nickname"/DualSimplex@your_branch_name")
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
