SkillAgentSearch skills...

Skimr

A frictionless, pipeable approach to dealing with summary statistics

Install / Use

/learn @ropensci/Skimr
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- README.md is generated from README.Rmd. Please edit that file -->

skimr <a href='https://docs.ropensci.org/skimr/'>

<img src='https://docs.ropensci.org/skimr/reference/figures/logo.png' align="right" height="139" /></a>

<!-- badges: start -->

Project Status: Active – The project has reached a stable, usable
state and is being actively
developed. R-CMD-check Codecov test
coverage This is an ROpenSci Peer reviewed
package CRAN_Status_Badge cran
checks

<!-- badges: end -->

skimr provides a frictionless approach to summary statistics which conforms to the principle of least surprise, displaying summary statistics the user can skim quickly to understand their data. It handles different data types and returns a skim_df object which can be included in a pipeline or displayed nicely for the human reader.

Note: skimr version 2 has major changes when skimr is used programmatically. Upgraders should review this document, the release notes and vignettes carefully.

Installation

The current released version of skimr can be installed from CRAN. If you wish to install the current build of the next release you can do so using the following:

# install.packages("devtools")
devtools::install_github("ropensci/skimr")

The APIs for this branch should be considered reasonably stable but still subject to change if an issue is discovered.

To install the version with the most recent changes that have not yet been incorporated in the main branch (and may not be):

devtools::install_github("ropensci/skimr", ref = "develop")

Do not rely on APIs from the develop branch, as they are likely to change.

Skim statistics in the console

skimr:

  • Provides a larger set of statistics than summary(), including missing, complete, n, and sd.
  • reports each data types separately
  • handles dates, logicals, and a variety of other types
  • supports spark-bar and spark-line based on the pillar package.

Separates variables by class:

skim(chickwts)

## ── Data Summary ────────────────────────
##                            Values  
## Name                       chickwts
## Number of rows             71      
## Number of columns          2       
## _______________________            
## Column type frequency:             
##   factor                   1       
##   numeric                  1       
## ________________________           
## Group variables            None    
## 
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate ordered n_unique top_counts                        
## 1 feed                  0             1 FALSE          6 soy: 14, cas: 12, lin: 12, sun: 12
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean   sd  p0  p25 p50  p75 p100 hist 
## 1 weight                0             1 261. 78.1 108 204. 258 324.  423 ▆▆▇▇▃

Presentation is in a compact horizontal format:

skim(iris)

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   factor                   1     
##   numeric                  4     
## ________________________         
## Group variables            None  
## 
## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate ordered n_unique top_counts               
## 1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean    sd  p0 p25  p50 p75 p100 hist 
## 1 Sepal.Length          0             1 5.84 0.828 4.3 5.1 5.8  6.4  7.9 ▆▇▇▅▂
## 2 Sepal.Width           0             1 3.06 0.436 2   2.8 3    3.3  4.4 ▁▆▇▂▁
## 3 Petal.Length          0             1 3.76 1.77  1   1.6 4.35 5.1  6.9 ▇▁▆▇▂
## 4 Petal.Width           0             1 1.20 0.762 0.1 0.3 1.3  1.8  2.5 ▇▁▇▅▃

Built in support for strings, lists and other column classes

skim(dplyr::starwars)

## ── Data Summary ────────────────────────
##                            Values         
## Name                       dplyr::starwars
## Number of rows             87             
## Number of columns          14             
## _______________________                   
## Column type frequency:                    
##   character                8              
##   list                     3              
##   numeric                  3              
## ________________________                  
## Group variables            None           
## 
## ── Variable type: character ────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate min max empty n_unique whitespace
## 1 name                  0         1       3  21     0       87          0
## 2 hair_color            5         0.943   4  13     0       11          0
## 3 skin_color            0         1       3  19     0       31          0
## 4 eye_color             0         1       3  13     0       15          0
## 5 sex                   4         0.954   4  14     0        4          0
## 6 gender                4         0.954   8   9     0        2          0
## 7 homeworld            10         0.885   4  14     0       48          0
## 8 species               4         0.954   3  14     0       37          0
## 
## ── Variable type: list ─────────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate n_unique min_length max_length
## 1 films                 0             1       24          1          7
## 2 vehicles              0             1       11          0          2
## 3 starships             0             1       16          0          5
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate  mean    sd p0   p25 p50   p75 p100 hist 
## 1 height                6         0.931 175.   34.8 66 167   180 191    264 ▂▁▇▅▁
## 2 mass                 28         0.678  97.3 169.  15  55.6  79  84.5 1358 ▇▁▁▁▁
## 3 birth_year           44         0.494  87.6 155.   8  35    52  72    896 ▇▁▁▁▁

Has a useful summary function

skim(iris) |>
  summary()

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   factor                   1     
##   numeric                  4     
## ________________________         
## Group variables            None

Individual columns can be selected using tidyverse-style selectors

skim(iris, Sepal.Length, Petal.Length)

## ── Data Summary ────────────────────────
##                            Values
## Name                       iris  
## Number of rows             150   
## Number of columns          5     
## _______________________          
## Column type frequency:           
##   numeric                  2     
## ________________________         
## Group variables            None  
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##   skim_variable n_missing complete_rate mean    sd  p0 p25  p50 p75 p100 hist 
## 1 Sepal.Length          0             1 5.84 0.828 4.3 5.1 5.8  6.4  7.9 ▆▇▇▅▂
## 2 Petal.Length          0             1 3.76 1.77  1   1.6 4.35 5.1  6.9 ▇▁▆▇▂

Handles grouped data

skim() can handle data that has been grouped using dplyr::group_by().

iris |>
  dplyr::group_by(Species) |>
  skim()

## ── Data Summary ────────────────────────
##                            Values                      
## Name                       dplyr::group_by(iris, Spe...
## Number of rows             150                         
## Number of columns          5                           
## _______________________                                
## Column type frequency:                                 
##   numeric                  4                           
## ________________________                               
## Group variables            Species                     
## 
## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────
##    skim_variable Speci

Related Skills

View on GitHub
GitHub Stars1.1k
CategoryDevelopment
Updated12d ago
Forks81

Languages

HTML

Security Score

85/100

Audited on Mar 20, 2026

No findings