SkillAgentSearch skills...

Grur

grur: an R package tailored for RADseq data imputations

Install / Use

/learn @thierrygosselin/Grur

README

grur <a href='https://thierrygosselin.github.io/grur/'><img src='man/figures/logo.png' align="right" height="139" /></a>

<!-- badges: start -->

lifecycle CRAN_Status_Badge Project Status: Active – The project has reached a stable, usable
state and is being actively
developed. minimal R
version packageversion Last-changedate R-CMD-checks DOI

<!-- badges: end -->

https://thierrygosselin.github.io/grur/

The name grur |ɡro͞oˈr| was chosen because the missing genotypes dilemma with RADseq data reminds me of the cheese paradox.

Here, I don’t want to sustain a war or the controversy of cheese with holes, so choose as you like, the French Gruyère or the Swiss Emmental. The paradox is that the more cheese you have the more holes you’ll get. But, the more holes you have means the less cheese you have… So, someone could conclude, the more cheese = the less cheese ? I’ll leave that up to you, back to genomics…

Numerous genomic analysis are vulnerable to missing values, don’t get trapped by missing genotypes in your RADseq dataset.

Use grur to visualize patterns of missingness and perform map-independent imputations of missing genotypes (see features below).

Installation

if (!require("remotes")) install.packages("remotes")
remotes::install_github("thierrygosselin/grur")
library(grur)

Note: not all the packages used for imputations inside grur are installed automatically, why?

  • Not all methods will be of interest.
  • Some modules used for imputations are more complicated to install, and depending on OS, it will definitely test your R skills and patience.
  • By default, you’ll be able to run grur::missing_visualization to check for pattern of missingness (the first step…)

Installation details for additonal imputation options

Please follow additional instructions in the vignette to install the required packages for the imputation options you want to conduct:

| imputation options | package | installation difficulty | install instructions | | :--------------------------------- | :---------------: | :---------------------: | :------------------------------------------------------------------------------------------ | | imputation.method = “lightgbm” | lightgbm | difficult | vignette | | imputation.method = “xgboost” | xgboost | moderate | vignette | | imputation.method = “rf” | randomForestSRC | moderate | vignette | | imputation.method = “rf_pred” | ranger | easy | install.packages("ranger") | | if using pmm > 0 | missRanger | easy | install.packages("missRanger") |

web site and additional info: https://thierrygosselin.github.io/grur/

Life cycle

grur is still experimental, in order to make the package better, changes are inevitable. Experimental functions will change, argument names will change. Your codes and workflows might break from time to time until grur is stable. Consequently, depending on your tolerance to change, grur might not be for you.

Assumptions before imputing your dataset

  1. Filtered data: Please don’t try grur with raw data consisting of > 100K SNPs, you will generate all sorts of bias and you’ll be disapointed. Filter your data first! radiator was designed for this.

  2. Correlations: Machine learning algorithms will work better and faster if correlations are reduced to a minimum. If you used filter_rad to filter your dataset, you should be ok. If not, check your dataset for short and long LD.

  3. Pattern of individual heterozygosity: If you have individual heterozygosity patterns and/or correlation of individual heterozygosity with missingness, you might want to skip imputation and go back to filter your data. Please check out radiator::detect_mixed_genomes and radiator::detect_het_outliers.

  4. Patterns of missingness: Look for patterns of missingness (vignette) to better understand the reasons for their presence and tailor the arguments inside grur’s imputation module.

  5. Default arguments: Defaults are there for testing, please, please, please, don’t use grur’s defaults for publications!

Features

| Caracteristics | Description | | :---------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Simulate RADseq data | simulate_rad: simulate populations of RADseq data following island or stepping stone models. Inside the function, allele frequency can be created with fastsimcoal2 and then used inside rmetasim simulation engine. Vignette coming soon. | | Patterns of missingness | missing_visualization: visualize patterns of missing data associated with different variables of your study (lanes, chips, sequencers, populations, sample sites, reads/samples, homozygosity, etc). Similar to PLINK’s identify-by-missingness analysis (IBM), grur is more powerful because it generat

View on GitHub
GitHub Stars7
CategoryEducation
Updated7mo ago
Forks0

Languages

R

Security Score

67/100

Audited on Aug 25, 2025

No findings