SkillAgentSearch skills...

Cvms

R Package: Cross-validate one or multiple gaussian or binomial regression models at once. Perform repeated cross-validation. Returns results in a tibble for easy comparison, reporting and further analysis.

Install / Use

/learn @LudvigOlsen/Cvms
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- README.md is generated from README.Rmd. Please edit that file -->

cvms <a href='https://github.com/LudvigOlsen/cvms'><img src='man/figures/cvms_logo_242x280_250dpi.png' align="right" height="140" /></a>

Cross-Validation for Model Selection
Authors: Ludvig R. Olsen ( r-pkgs@ludvigolsen.dk ), Hugh Benjamin Zachariae <br/> License: MIT <br/> Started: October 2016

CRAN_Status_Badge metacran
downloads minimal R
version Codecov test
coverage GitHub Actions CI
status AppVeyor build
status DOI

Overview

R package for model evaluation and comparison.

  • Cross-validate one or multiple regression or classification models with relevant evaluation metrics in a tidy format.
  • Validate the best model on a test set and compare it to a baseline evaluation.
  • Perform hyperparameter tuning with grid search.
  • Evaluate predictions from an external model.
  • Extract the observations that were the most challenging to predict.

Currently supports regression ('gaussian'), binary classification ('binomial'), and (some functions only) multiclass classification ('multinomial'). Many of the functions allow parallelization, e.g. through the doParallel package.

NEW: Our new application for plotting confusion matrices with plot_confusion_matrix() without any code is now available on Huggingface Spaces.

Main functions

| Function | Description | |:---|:---| | cross_validate() | Cross-validate linear models with lm()/lmer()/glm()/glmer() | | cross_validate_fn() | Cross-validate a custom model function | | validate() | Validate linear models with (lm/lmer/glm/glmer) | | validate_fn() | Validate a custom model function | | evaluate() | Evaluate predictions with a large set of metrics | | baseline()</br>baseline_gaussian()</br>baseline_binomial()</br>baseline_multinomial() | Perform baseline evaluations of a dataset |

Evaluation utilities

| Function | Description | |:---|:---| | confusion_matrix() | Create a confusion matrix from predictions and targets | | evaluate_residuals() | Evaluate residuals from a regression task | | most_challenging() | Find the observations that were the most challenging to predict | | summarize_metrics() | Summarize numeric columns with a set of descriptors |

Formula utilities

| Function | Description | |:---|:---| | combine_predictors() | Generate model formulas from a list of predictors | | reconstruct_formulas() | Extract formulas from output tibble | | simplify_formula() | Remove inline functions with more from a formula object |

Plotting utilities

| Function | Description | |:---|:---| | plot_confusion_matrix() | Plot a confusion matrix (see also our no-code application) | | plot_metric_density() | Create a density plot for a metric column | | font() | Set font settings for plotting functions (currently only plot_confusion_matrix()) | | sum_tile_settings() | Set settings for sum tiles in plot_confusion_matrix() |

Custom functions

| Function | Description | |:---|:---| | model_functions() | Example model functions for cross_validate_fn() | | predict_functions() | Example predict functions for cross_validate_fn() | | preprocess_functions() | Example preprocess functions for cross_validate_fn() | | update_hyperparameters() | Manage hyperparameters in custom model functions |

Other utilities

| Function | Description | |:---|:---| | select_metrics() | Select the metric columns from the output | | select_definitions() | Select the model-defining columns from the output | | gaussian_metrics()<br />binomial_metrics()<br />multinomial_metrics() | Create list of metrics for the common metrics argument | | multiclass_probability_tibble() | Generate a multiclass probability tibble |

Datasets

| Name | Description | |:---|:---| | participant.scores | Made-up experiment data with 10 participants and two diagnoses | | wines | A list of wine varieties in an approximately Zipfian distribution | | musicians | Made-up data on 60 musicians in 4 groups for multiclass classification | | predicted.musicians | Predictions by 3 classifiers of the 4 classes in the musicians dataset | | precomputed.formulas | Fixed effect combinations for model formulas with/without two- and three-way interactions | | compatible.formula.terms | 162,660 pairs of compatible terms for building model formulas with up to 15 fixed effects |

Table of Contents

Important News

Check NEWS.md for the full list of changes.

  • Version 1.2.0 contained multiple breaking changes. Please see NEWS.md. (18th of October 2020)

Installation

CRAN:

install.packages("cvms")

Development version:

install.packages("devtools")

devtools::install_github("LudvigOlsen/groupdata2")

devtools::install_github("LudvigOlsen/cvms")

Vignettes

cvms contains a number of vignettes with relevant use cases and descriptions:

vignette(package = "cvms") # for an overview

Examples

Attach packages

library(cvms)
library(groupdata2)   # fold() partition()
library(knitr)        # kable()
library(dplyr)        # %>% arrange()

Load data

The dataset participant.scores comes with cvms:

data <- participant.scores

Fold data

Create a grouping factor for subsetting of folds using groupdata2::fold(). Order the dataset by the folds:

# Set seed for reproducibility
set.seed(7)

# Fold data 
data <- fold(
  data = data, k = 4,
  cat_col = 'diagnosis',
  id_col = 'participant') %>% 
  arrange(.folds)

# Show first 15 rows of data
data %>% head(15) %>% kable()

| participant | age | diagnosis | score | session | .folds | |:------------|----:|----------:|------:|--------:|:-------| | 9 | 34 | 0 | 33 | 1 | 1 | | 9 | 34 | 0 | 53 | 2 | 1 | | 9 | 34 | 0 | 66 | 3 | 1 | | 8 | 21 | 1 | 16 | 1 | 1 | | 8 | 21 | 1 | 32 | 2 | 1 | | 8 | 21 | 1 | 44 | 3 | 1 | | 2 | 23 | 0 | 24 | 1 | 2 | | 2 | 23 | 0 | 40 | 2 | 2 | | 2 | 23 | 0 | 67 | 3 | 2 | | 1 | 20 | 1 | 10 | 1 | 2 | | 1 | 20 | 1 | 24 | 2 | 2 | | 1 | 20 | 1 | 45 | 3 | 2 | | 6 | 31 | 1 | 14 | 1 | 2 | | 6 | 31 | 1 | 25 | 2 | 2 | | 6 | 31 | 1 | 30 | 3 | 2 |

Cross-validate a single model

Gaussian

CV1 <- cross_validate(
  data = data,
  formulas = "score ~ diagnosis",
  fold_cols = '.folds',
  family = 'gaussian',
  REML = FALSE
)

# Show results
CV1
#> # A tibble: 1 × 21
#>   Fixed  RMSE   MAE `NRMSE(IQR)`  RRSE   RAE RMSLE   AIC  AICc   BIC Predictions
#>   <chr> <dbl> <dbl>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>     
#> 1 diag…  16.4  13.8        

Related Skills

View on GitHub
GitHub Stars39
CategoryDevelopment
Updated1mo ago
Forks7

Languages

R

Security Score

75/100

Audited on Feb 26, 2026

No findings