Datawizard
Magic potions to clean and transform your data 🧙
Install / Use
/learn @easystats/DatawizardREADME
datawizard: Easy Data Wrangling and Statistical Transformations <img src='man/figures/logo.png' align="right" height="139" />
<!-- ***:sparkles: Hockety pockety wockety wack, prepare this data forth and back*** -->
<!-- ***Hockety pockety wockety wock, messy data is in shock*** -->
<!-- ***Hockety pockety wockety woss, you can cite i-it from JOSS*** <sup>(soon)</sup> -->
<!-- ***Hockety pockety wockety wass, datawizard saves your ass! :sparkles:*** -->
{datawizard} is a lightweight package to easily manipulate, clean,
transform, and prepare your data for analysis. It is part of the
easystats ecosystem, a suite
of R packages to deal with your entire statistical analysis, from
cleaning the data to reporting the results.
It covers two aspects of data preparation:
-
Data manipulation:
{datawizard}offers a very similar set of functions to that of the tidyverse packages, such as a{dplyr}and{tidyr}, to select, filter and reshape data, with a few key differences. 1) All data manipulation functions start with the prefixdata_*(which makes them easy to identify). 2) Although most functions can be used exactly as their tidyverse equivalents, they are also string-friendly (which makes them easy to program with and use inside functions). Finally,{datawizard}is super lightweight (no dependencies, similar to poorman), which makes it awesome for developers to use in their packages. -
Statistical transformations:
{datawizard}also has powerful functions to easily apply common data transformations, including standardization, normalization, rescaling, rank-transformation, scale reversing, recoding, binning, etc.
Installation
| Type | Source | Command |
|----|----|----|
| Release | CRAN | install.packages("datawizard") |
| Development | r-universe | install.packages("datawizard", repos = "https://easystats.r-universe.dev") |
| Development | GitHub | remotes::install_github("easystats/datawizard") |
Tip
Instead of
library(datawizard), uselibrary(easystats). This will make all features of the easystats-ecosystem available.To stay updated, use
easystats::install_latest().
Citation
To cite the package, run the following command:
citation("datawizard")
To cite package 'datawizard' in publications use:
Patil et al., (2022). datawizard: An R Package for Easy Data
Preparation and Statistical Transformations. Journal of Open Source
Software, 7(78), 4684, https://doi.org/10.21105/joss.04684
A BibTeX entry for LaTeX users is
@Article{,
title = {{datawizard}: An {R} Package for Easy Data Preparation and Statistical Transformations},
author = {Indrajeet Patil and Dominique Makowski and Mattan S. Ben-Shachar and Brenton M. Wiernik and Etienne Bacher and Daniel Lüdecke},
journal = {Journal of Open Source Software},
year = {2022},
volume = {7},
number = {78},
pages = {4684},
doi = {10.21105/joss.04684},
}
Features
Most courses and tutorials about statistical modeling assume that you
are working with a clean and tidy dataset. In practice, however, a major
part of doing statistical modeling is preparing your data–cleaning up
values, creating new columns, reshaping the dataset, or transforming
some variables. {datawizard} provides easy to use tools to perform
these common, critical, and sometimes tedious data preparation tasks.
Data wrangling
Select, filter and remove variables
The package provides helpers to filter rows meeting certain conditions…
data_match(mtcars, data.frame(vs = 0, am = 1))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
… or logical expressions:
data_filter(mtcars, vs == 0 & am == 1)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Finding columns in a data frame, or retrieving the data of selected
columns, can be achieved using extract_column_names() or
data_select():
# find column names matching a pattern
extract_column_names(iris, starts_with("Sepal"))
#> [1] "Sepal.Length" "Sepal.Width"
# return data columns matching a pattern
data_select(iris, starts_with("Sepal")) |> head()
#> Sepal.Length Sepal.Width
#> 1 5.1 3.5
#> 2 4.9 3.0
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5.0 3.6
#> 6 5.4 3.9
It is also possible to extract one or more variables:
# single variable
data_extract(mtcars, "gear")
#> [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
# more variables
head(data_extract(iris, ends_with("Width")))
#> Sepal.Width Petal.Width
#> 1 3.5 0.2
#> 2 3.0 0.2
#> 3 3.2 0.2
#> 4 3.1 0.2
#> 5 3.6 0.2
#> 6 3.9 0.4
Due to the consistent API, removing variables is just as simple:
head(data_remove(iris, starts_with("Sepal")))
#> Petal.Length Petal.Width Species
#> 1 1.4 0.2 setosa
#> 2 1.4 0.2 setosa
#> 3 1.3 0.2 setosa
#> 4 1.5 0.2 setosa
#> 5 1.4 0.2 setosa
#> 6 1.7 0.4 setosa
Reorder or rename
head(data_relocate(iris, select = "Species", before = "Sepal.Length"))
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 setosa 5.1 3.5 1.4 0.2
#> 2 setosa 4.9 3.0 1.4 0.2
#> 3 setosa 4.7 3.2 1.3 0.2
#> 4 setosa 4.6 3.1 1.5 0.2
#> 5 setosa 5.0 3.6 1.4 0.2
#> 6 setosa 5.4 3.9 1.7 0.4
head(data_rename(iris, c("Sepal.Length", "Sepal.Width"), c("length", "width")))
#> length width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
Merge
x <- data.frame(a = 1:3, b = c("a", "b", "c"), c = 5:7, id = 1:3)
y <- data.frame(c = 6:8, d = c("f", "g", "h"), e = 100:102, id = 2:4)
x
#> a b c id
#> 1 1 a 5 1
#> 2 2 b 6 2
#> 3 3 c 7 3
y
#> c d e id
#> 1 6 f 100 2
#> 2 7 g 101 3
#> 3 8 h 102 4
data_merge(x, y, join = "full")
#> a b c id d e
#> 3 1 a 5 1 <NA> NA
#> 1 2 b 6 2 f 100
#> 2 3 c 7 3 g 101
#> 4 NA <NA> 8 4 h 102
data_merge(x, y, join = "left")
#> a b c id d e
#> 3 1 a 5 1 <NA> NA
#> 1 2 b 6 2 f 100
#> 2 3 c 7 3 g 101
data_merge(x, y, join = "right")
#> a b c id d e
#> 1 2 b 6 2 f 100
#> 2 3 c 7 3 g 101
#> 3 NA <NA> 8 4 h 102
data_merge(x, y, join = "semi", by = "c")
#> a b c id
#> 2 2 b 6 2
#> 3 3 c 7 3
data_merge(x, y, join = "anti", by = "c")
#> a b c id
#> 1 1 a 5 1
data_merge(x, y, join = "inner")
#> a b c id d e
#> 1 2 b 6 2 f 100
#> 2 3 c 7 3 g 101
data_merge(x, y, join = "bind")
#> a b c id d e
#> 1 1 a 5 1 <NA> NA
#> 2 2 b 6 2 <NA> NA
#> 3 3 c 7 3 <NA> NA
#> 4 NA <NA> 6 2 f 100
#> 5 NA <NA> 7 3 g 101
#> 6 NA <NA> 8 4 h 102
Reshape
A common data wrangling task is to reshape data.
Either to go from wide/Cartesian to long/tidy format
wide_data <- data.frame(replicate(5, rnorm(10)))
head(data_to_long(wide_data))
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
