SkillAgentSearch skills...

GcimputeR

Missing value imputation using Gaussian copula

Install / Use

/learn @udellgroup/GcimputeR
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

gcimputeR

This package provides a R implemention to impute missing values by fitting a Gaussian copula model, on incomplete dataset thay may contain continuous, ordinal and binary variables. The user could either fit a full rank Gaussian copula model [1] or a low rank Gaussian copula model [2]. The fitted model also provides latent correlation among all variables regardless of types. Future release will support mini-batching training and online data imputation [3], parallelization, and categorical variables. A python implementation is also available.

Install and load the package

The easiest way is to install using install_github from the package devtools:

library(devtools)
install_github("udellgroup/gcimputeR")
library(gcimputeR)

Examples

library(gcimputeR)
set.seed(410)
# generate and mask 15-dim mixed data 
# 5 continuous, 5 ordinal (1-5) and 5 boolean
var_types = list('cont'=1:5, 'ord'=6:10, 'bin'=11:15) 
X = generate_GC(n = 2000, var_types = var_types)
X_mask = mask_MCAR(X, mask_fraction = 0.4)

# model fitting
fit = impute_GC(X_mask, verbose = TRUE)

# Evaluation: compute the scaled-MAE (SMAE) for each data type
# (scaled by MAE of median imputation) 
err_imp = cal_smae(xhat = fit$Ximp, xobs = X_mask, xtrue = X, reduce = FALSE) 
for (t in names(var_types)){
  err = round(mean(err_imp[var_types[[t]]]), 4)
  print(paste0('SMAE of ', var_types[t], ' : ', err))
}

More detailed examples are available under directory doc.

News

06-13-2022 (Version 0.1.1)

Our software was renamed from mixedgcImp to gcimputeR and went through substantial structural changes. We improved the code quality and the user interface. Moreover, this release achieves significant speed improvement when there are many ordinal variables. We strongly recommend an update if you are an early user!

Reference

[1] Zhao, Y. and Udell, M. Missing value imputation for mixed data via Gaussian copula, KDD 2020.

[2] Zhao, Y. and Udell, M. Matrix completion with quantified uncertainty through low rank Gaussian copula, NeurIPS 2020.

[3] Zhao, Y., Landgrebe, E., Shekhtman E., and Udell, M. Online missing value imputation and change point detection using the Gaussian Copula, AAAI 2022.

View on GitHub
GitHub Stars7
CategoryDevelopment
Updated1y ago
Forks0

Languages

R

Security Score

55/100

Audited on Feb 13, 2025

No findings