Skimr

A frictionless, pipeable approach to dealing with summary statistics

Generate Convert Improve

Install / Use

/learn @ropensci/Skimr

About this skill

Quality Score

0/100

README

skimr <a href='https://docs.ropensci.org/skimr/'>

skimr provides a frictionless approach to summary statistics which conforms to the principle of least surprise, displaying summary statistics the user can skim quickly to understand their data. It handles different data types and returns a skim_df object which can be included in a pipeline or displayed nicely for the human reader.

Note: skimr version 2 has major changes when skimr is used programmatically. Upgraders should review this document, the release notes and vignettes carefully.

Installation

The current released version of skimr can be installed from CRAN. If you wish to install the current build of the next release you can do so using the following:

# install.packages("devtools")
devtools::install_github("ropensci/skimr")

The APIs for this branch should be considered reasonably stable but still subject to change if an issue is discovered.

To install the version with the most recent changes that have not yet been incorporated in the main branch (and may not be):

devtools::install_github("ropensci/skimr", ref = "develop")

Do not rely on APIs from the develop branch, as they are likely to change.

Skim statistics in the console

skimr:

Provides a larger set of statistics than summary(), including missing, complete, n, and sd.
reports each data types separately
handles dates, logicals, and a variety of other types
supports spark-bar and spark-line based on the pillar package.

Separates variables by class:

skim(chickwts)

## ── Data Summary ────────────────────────
##                            Values  
## Name                       chickwts
## Number of rows             71      
## Number of columns          2       
## _______________________            
## Column type frequency:             
##   factor                   1       
##   numeric                  1       
## ________________________           
## Group variables            None    
## 
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate ordered n_unique top_counts                        
## 1 feed                  0             1 FALSE          6 soy: 14, cas: 12, lin: 12, sun: 12
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean   sd  p0  p25 p50  p75 p100 hist 
## 1 weight                0             1 261. 78.1 108 204. 258 324.  423 ▆▆▇▇▃

Presentation is in a compact horizontal format:

skim(iris)

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   factor                   1     
##   numeric                  4     
## ________________________         
## Group variables            None  
## 
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate ordered n_unique top_counts               
## 1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean    sd  p0 p25  p50 p75 p100 hist 
## 1 Sepal.Length          0             1 5.84 0.828 4.3 5.1 5.8  6.4  7.9 ▆▇▇▅▂
## 2 Sepal.Width           0             1 3.06 0.436 2   2.8 3    3.3  4.4 ▁▆▇▂▁
## 3 Petal.Length          0             1 3.76 1.77  1   1.6 4.35 5.1  6.9 ▇▁▆▇▂
## 4 Petal.Width           0             1 1.20 0.762 0.1 0.3 1.3  1.8  2.5 ▇▁▇▅▃

Built in support for strings, lists and other column classes

skim(dplyr::starwars)

## ── Data Summary ────────────────────────
##                            Values         
## Name                       dplyr::starwars
## Number of rows             87             
## Number of columns          14             
## _______________________                   
## Column type frequency:                    
##   character                8              
##   list                     3              
##   numeric                  3              
## ________________________                  
## Group variables            None           
## 
## ── Variable type: character ────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate min max empty n_unique whitespace
## 1 name                  0         1       3  21     0       87          0
## 2 hair_color            5         0.943   4  13     0       11          0
## 3 skin_color            0         1       3  19     0       31          0
## 4 eye_color             0         1       3  13     0       15          0
## 5 sex                   4         0.954   4  14     0        4          0
## 6 gender                4         0.954   8   9     0        2          0
## 7 homeworld            10         0.885   4  14     0       48          0
## 8 species               4         0.954   3  14     0       37          0
## 
## ── Variable type: list ─────────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate n_unique min_length max_length
## 1 films                 0             1       24          1          7
## 2 vehicles              0             1       11          0          2
## 3 starships             0             1       16          0          5
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate  mean    sd p0   p25 p50   p75 p100 hist 
## 1 height                6         0.931 175.   34.8 66 167   180 191    264 ▂▁▇▅▁
## 2 mass                 28         0.678  97.3 169.  15  55.6  79  84.5 1358 ▇▁▁▁▁
## 3 birth_year           44         0.494  87.6 155.   8  35    52  72    896 ▇▁▁▁▁

Has a useful summary function

skim(iris) |>
  summary()

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   factor                   1     
##   numeric                  4     
## ________________________         
## Group variables            None

Individual columns can be selected using tidyverse-style selectors

skim(iris, Sepal.Length, Petal.Length)

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   numeric                  2     
## ________________________         
## Group variables            None  
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean    sd  p0 p25  p50 p75 p100 hist 
## 1 Sepal.Length          0             1 5.84 0.828 4.3 5.1 5.8  6.4  7.9 ▆▇▇▅▂
## 2 Petal.Length          0             1 3.76 1.77  1   1.6 4.35 5.1  6.9 ▇▁▆▇▂

Handles grouped data

skim() can handle data that has been grouped using dplyr::group_by().

iris |>
  dplyr::group_by(Species) |>
  skim()

## ── Data Summary ────────────────────────
##                            Values                      
## Name                       dplyr::group_by(iris, Spe...
## Number of rows             150                         
## Number of columns          5                           
## _______________________                                
## Column type frequency:                                 
##   numeric                  4                           
## ________________________                               
## Group variables            Species                     
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##    skim_variable Speci

Related Skills

node-connect

344.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

96.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。