Groupdata2

R-package: Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.

Generate Convert Improve

Install / Use

/learn @LudvigOlsen/Groupdata2

About this skill

Quality Score

0/100

README

groupdata2 <a href='https://github.com/LudvigOlsen/groupdata2'><img src='man/figures/groupdata2_logo_242x280_250dpi.png' align="right" height="140" /></a>

Author: Ludvig R. Olsen ( r-pkgs@ludvigolsen.dk ) <br/> License: MIT <br/> Started: October 2016

Overview

R package for dividing data into groups.

Create balanced partitions and cross-validation folds.
Perform time series windowing and general grouping and splitting of data.
Balance existing groups with up- and downsampling.
Collapse existing groups to fewer, balanced groups.
Finds values, or indices of values, that differ from the previous value by some threshold(s).
Check if two grouping factors have the same groups, memberwise.

Main functions

| Function | Description | |:--------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | group_factor() | Divides data into groups by a wide range of methods. | | group() | Creates grouping factor and adds to the given data frame. | | splt() | Creates grouping factor and splits the data by these groups. | | partition() | Splits data into partitions. Balances a given categorical variable and/or numerical variable between partitions and keeps all data points with a shared ID in the same partition. | | fold() | Creates folds for (repeated) cross-validation. Balances a given categorical variable and/or numerical variable between folds and keeps all data points with a shared ID in the same fold. | | collapse_groups() | Collapses existing groups into a smaller set of groups with categorical, numerical, ID, and size balancing. | | balance() | Uses up- and/or downsampling to equalize group sizes. Can balance on ID level. See wrappers: downsample(), upsample(). |

Other tools

| Function | Description | |:--------------------------|:--------------------------------------------------------------------------------------------------------------------| | all_groups_identical() | Checks whether two grouping factors contain the same groups, memberwise. | | differs_from_previous() | Finds values, or indices of values, that differ from the previous value by some threshold(s). | | find_starts() | Finds values or indices of values that are not the same as the previous value. | | find_missing_starts() | Finds missing starts for the l_starts method. | | summarize_group_cols() | Calculates summary statistics about group columns (i.e. factors). | | summarize_balances() | Summarizes the balances of numeric, categorical, and ID columns in and between groups in one or more group columns. | | ranked_balances() | Extracts the standard deviations from the Summary data frame from the output of summarize_balances() | | %primes% | Finds remainder for the primes method. | | %staircase% | Finds remainder for the staircase method. |

groupdata2

Installation

CRAN version:

install.packages("groupdata2")

Development version:

install.packages("devtools")
devtools::install_github("LudvigOlsen/groupdata2")

Vignettes

groupdata2 contains a number of vignettes with relevant use cases and descriptions:

vignette(package = "groupdata2") # for an overview
vignette("introduction_to_groupdata2") # begin here

Data for examples

# Attach packages
library(groupdata2)
library(dplyr)       # %>% filter() arrange() summarize()
library(knitr)       # kable()

# Create small data frame
df_small <- data.frame(
  "x" = c(1:12),
  "species" = rep(c('cat', 'pig', 'human'), 4),
  "age" = sample(c(1:100), 12),
  stringsAsFactors = FALSE
)

# Create medium data frame
df_medium <- data.frame(
  "participant" = factor(rep(c('1', '2', '3', '4', '5', '6'), 3)),
  "age" = rep(c(20, 33, 27, 21, 32, 25), 3),
  "diagnosis" = factor(rep(c('a', 'b', 'a', 'b', 'b', 'a'), 3)),
  "diagnosis2" = factor(sample(c('x','z','y'), 18, replace = TRUE)),
  "score" = c(10, 24, 15, 35, 24, 14, 24, 40, 30, 
              50, 54, 25, 45, 67, 40, 78, 62, 30))
df_medium <- df_medium %>% arrange(participant)
df_medium$session <- rep(c('1','2', '3'), 6)

Functions

group_factor()

Returns a factor with group numbers, e.g. factor(c(1,1,1,2,2,2,3,3,3)).

This can be used to subset, aggregate, group_by, etc.

Create equally sized groups by setting force_equal = TRUE

Randomize grouping factor by setting randomize = TRUE

# Create grouping factor
group_factor(
  data = df_small, 
  n = 5, 
  method = "n_dist"
)
#>  [1] 1 1 2 2 3 3 3 4 4 5 5 5
#> Levels: 1 2 3 4 5

group()

Creates a grouping factor and adds it to the given data frame. The data frame is grouped by the grouping factor for easy use in magrittr (%>%) pipelines.

# Use group()
group(data = df_small, n = 5, method = 'n_dist') %>%
  kable()

| x | species | age | .groups | |----:|:--------|----:|:--------| | 1 | cat | 68 | 1 | | 2 | pig | 39 | 1 | | 3 | human | 1 | 2 | | 4 | cat | 34 | 2 | | 5 | pig | 87 | 3 | | 6 | human | 43 | 3 | | 7 | cat | 14 | 3 | | 8 | pig | 82 | 4 | | 9 | human | 59 | 4 | | 10 | cat | 51 | 5 | | 11 | pig | 85 | 5 | | 12 | human | 21 | 5 |

# Use group() in a pipeline 
# Get average age per group
df_small %>%
  group(n = 5, method = 'n_dist') %>% 
  dplyr::summarise(mean_age = mean(age)) %>%
  kable()

| .groups | mean_age | |:--------|---------:| | 1 | 53.5 | | 2 | 17.5 | | 3 | 48.0 | | 4 | 70.5 | | 5 | 52.3 |

# Using group() with 'l_starts' method
# Starts group at the first 'cat', 
# then skips to the second appearance of "pig" after "cat",
# then starts at the following "cat".
df_small %>%
  group(n = list("cat", c("pig", 2), "cat"),
        method = 'l_starts',
        starts_col = "species") %>%
  kable()

| x | species | age | .groups | |----:|:--------|----:|:--------| | 1 | cat | 68 | 1 | | 2 | pig | 39 | 1 | | 3 | human | 1 | 1 | | 4 | cat | 34 | 1 | | 5 | pig | 87 | 2 | | 6 | human | 43 | 2 | | 7 | cat | 14 | 3 | | 8 | pig | 82 | 3 | | 9 | human | 59 | 3 | | 10 | cat | 51 | 3 | | 11 | pig | 85 | 3 | | 12 | human | 21 | 3 |

splt()

Creates the specified groups

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

LudvigOlsen

View profile

View on GitHub

GitHub Stars26

CategoryDevelopment

Updated4mo ago

Forks3

LudvigOlsen/groupdata2

Languages

Security Score

77/100

Audited on Dec 6, 2025

No findings

Groupdata2

Install / Use

README

groupdata2 <a href='https://github.com/LudvigOlsen/groupdata2'><img src='man/figures/groupdata2_logo_242x280_250dpi.png' align="right" height="140" /></a>

Overview

Main functions

Other tools

Table of Contents

Installation

Vignettes

Data for examples

Functions

group_factor()

group()

splt()

Related Skills