Tidytable
Tidy interface to 'data.table'
Install / Use
/learn @markfairbanks/TidytableREADME
tidytable <img id="logo" src="man/figures/logo.png" align="right" width="17%" height="17%" />
<!-- badges: start --> <!-- badges: end -->tidytable is a data frame manipulation library for users who need
data.table
speed
but prefer tidyverse-like syntax.
Installation
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("pak")
pak::pak("markfairbanks/tidytable")
General syntax
tidytable replicates tidyverse syntax but uses data.table in the
background. In general you can simply use library(tidytable) to
replace your existing dplyr and tidyr code with data.table backed
equivalents.
A full list of implemented functions can be found here.
library(tidytable)
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
Applying functions by group
You can use the normal tidyverse group_by()/ungroup() workflow, or
you can use .by syntax to reduce typing. Using .by in a function is
shorthand for df %>% group_by() %>% some_function() %>% ungroup().
- A single column can be passed with
.by = z - Multiple columns can be passed with
.by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = c(x, y))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
All functions that can operate by group have a .by argument built in.
(mutate(), filter(), summarize(), etc.)
The above syntax is equivalent to:
df %>%
group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Both options are available for users, so you can use the syntax that you prefer.
tidyselect support
tidytable allows you to select/drop columns just like you would in the
tidyverse by utilizing the tidyselect
package in the background.
Normal selection can be mixed with all tidyselect helpers:
everything(), starts_with(), ends_with(), any_of(), where(),
etc.
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
df %>%
select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
A full overview of selection options can be found here.
Using tidyselect in .by
tidyselect helpers also work when using .by:
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = where(is.character))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Tidy evaluation compatibility
Tidy evaluation can be used to write custom functions with tidytable
functions. The embracing shortcut {{ }} works, or you can use
enquo() with !! if you prefer:
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))
add_one <- function(data, add_col) {
data %>%
mutate(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 1 5 a 2
#> 3 1 6 b 2
The .data and .env pronouns also work within tidytable functions:
var <- 10
df %>%
mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 11
#> 2 1 5 a 11
#> 3 1 6 b 11
A full overview of tidy evaluation can be found here.
dt() helper
The dt() function makes regular data.table syntax pipeable, so you
can easily mix tidytable syntax with data.table syntax:
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#> z avg_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 3
Speed Comparisons
For those interested in performance, speed comparisons can be found here.
Acknowledgements
tidytable is only possible because of the great contributions to R by
the data.table and tidyverse teams. data.table is used as the main
data frame engine in the background, while tidyverse packages like
rlang, vctrs, and tidyselect are heavily relied upon to give users
an experience similar to dplyr and tidyr.
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
