SkillAgentSearch skills...

Bigvis

Exploratory data analysis for large datasets (10-100 million observations)

Install / Use

/learn @hadley/Bigvis
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

bigvis

Travis-CI Build Status Coverage Status

The bigvis package provides tools for exploratory data analysis of large datasets (10-100 million obs). The aim is to have most operations take less than 5 seconds on commodity hardware, even for 100,000,000 data points.

Since bigvis is not currently available on CRAN, the easiest way to try it out is to:

# install.packages("devtools")
devtools::install_github("hadley/bigvis")

Workflow

The bigvis package is structured around the following workflow:

  • bin() and condense() to get a compact summary of the data

  • if the estimates are rough, you might want to smooth(). See best_h() and rmse_cvs() to figure out a good starting bandwidth

  • if you're working with counts, you might want to standardise()

  • visualise the results with autoplot() (you'll need to load ggplot2 to use this)

Weighted statistics

Bigvis also provides a number of standard statistics efficiently implemented on weighted/binned data: weighted.median, weighted.IQR, weighted.var, weighted.sd, weighted.ecdf and weighted.quantile.

Acknowledgements

This package wouldn't be possible without:

  • the fantastic Rcpp package, which makes it amazingly easy to integrate R and C++

  • JJ Allaire and Carlos Scheidegger who have indefatigably answered my many C++ questions

  • the generous support of Revolution Analytics who supported the early development.

  • Yue Hu, who implemented a proof of concepts that showed that it might be possible to work with this much data in R.

View on GitHub
GitHub Stars289
CategoryData
Updated4mo ago
Forks40

Languages

C++

Security Score

77/100

Audited on Nov 19, 2025

No findings