Skimr
A frictionless, pipeable approach to dealing with summary statistics
Install / Use
/learn @ropensci/SkimrREADME
skimr <a href='https://docs.ropensci.org/skimr/'>
<img src='https://docs.ropensci.org/skimr/reference/figures/logo.png' align="right" height="139" /></a>
<!-- badges: start --> <!-- badges: end -->skimr provides a frictionless approach to summary statistics which
conforms to the principle of least
surprise,
displaying summary statistics the user can skim quickly to understand
their data. It handles different data types and returns a skim_df
object which can be included in a pipeline or displayed nicely for the
human reader.
Note: skimr version 2 has major changes when skimr is used
programmatically. Upgraders should review this document, the release
notes and vignettes carefully.
Installation
The current released version of skimr can be installed from CRAN. If
you wish to install the current build of the next release you can do so
using the following:
# install.packages("devtools")
devtools::install_github("ropensci/skimr")
The APIs for this branch should be considered reasonably stable but still subject to change if an issue is discovered.
To install the version with the most recent changes that have not yet been incorporated in the main branch (and may not be):
devtools::install_github("ropensci/skimr", ref = "develop")
Do not rely on APIs from the develop branch, as they are likely to change.
Skim statistics in the console
skimr:
- Provides a larger set of statistics than
summary(), including missing, complete, n, and sd. - reports each data types separately
- handles dates, logicals, and a variety of other types
- supports spark-bar and spark-line based on the pillar package.
Separates variables by class:
skim(chickwts)
## ── Data Summary ────────────────────────
## Values
## Name chickwts
## Number of rows 71
## Number of columns 2
## _______________________
## Column type frequency:
## factor 1
## numeric 1
## ________________________
## Group variables None
##
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate ordered n_unique top_counts
## 1 feed 0 1 FALSE 6 soy: 14, cas: 12, lin: 12, sun: 12
##
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
## 1 weight 0 1 261. 78.1 108 204. 258 324. 423 ▆▆▇▇▃
Presentation is in a compact horizontal format:
skim(iris)
## ── Data Summary ────────────────────────
## Values
## Name iris
## Number of rows 150
## Number of columns 5
## _______________________
## Column type frequency:
## factor 1
## numeric 4
## ________________________
## Group variables None
##
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate ordered n_unique top_counts
## 1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
##
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
## 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
## 3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
## 4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Built in support for strings, lists and other column classes
skim(dplyr::starwars)
## ── Data Summary ────────────────────────
## Values
## Name dplyr::starwars
## Number of rows 87
## Number of columns 14
## _______________________
## Column type frequency:
## character 8
## list 3
## numeric 3
## ________________________
## Group variables None
##
## ── Variable type: character ────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate min max empty n_unique whitespace
## 1 name 0 1 3 21 0 87 0
## 2 hair_color 5 0.943 4 13 0 11 0
## 3 skin_color 0 1 3 19 0 31 0
## 4 eye_color 0 1 3 13 0 15 0
## 5 sex 4 0.954 4 14 0 4 0
## 6 gender 4 0.954 8 9 0 2 0
## 7 homeworld 10 0.885 4 14 0 48 0
## 8 species 4 0.954 3 14 0 37 0
##
## ── Variable type: list ─────────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate n_unique min_length max_length
## 1 films 0 1 24 1 7
## 2 vehicles 0 1 11 0 2
## 3 starships 0 1 16 0 5
##
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
## 1 height 6 0.931 175. 34.8 66 167 180 191 264 ▂▁▇▅▁
## 2 mass 28 0.678 97.3 169. 15 55.6 79 84.5 1358 ▇▁▁▁▁
## 3 birth_year 44 0.494 87.6 155. 8 35 52 72 896 ▇▁▁▁▁
Has a useful summary function
skim(iris) |>
summary()
## ── Data Summary ────────────────────────
## Values
## Name iris
## Number of rows 150
## Number of columns 5
## _______________________
## Column type frequency:
## factor 1
## numeric 4
## ________________________
## Group variables None
Individual columns can be selected using tidyverse-style selectors
skim(iris, Sepal.Length, Petal.Length)
## ── Data Summary ────────────────────────
## Values
## Name iris
## Number of rows 150
## Number of columns 5
## _______________________
## Column type frequency:
## numeric 2
## ________________________
## Group variables None
##
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
## 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
## 2 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Handles grouped data
skim() can handle data that has been grouped using
dplyr::group_by().
iris |>
dplyr::group_by(Species) |>
skim()
## ── Data Summary ────────────────────────
## Values
## Name dplyr::group_by(iris, Spe...
## Number of rows 150
## Number of columns 5
## _______________________
## Column type frequency:
## numeric 4
## ________________________
## Group variables Species
##
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
## skim_variable Speci
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
