SkillAgentSearch skills...

Sqldf

Perform SQL Selects on R Data Frames

Install / Use

/learn @ggrothendieck/Sqldf
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

To write it, it took three months; to conceive it – three minutes; to collect the data in it – all my life. F. Scott Fitzgerald

Introduction

sqldf is an R package for runing SQL statements on R data frames, optimized for convenience. The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is performed, the result is read back into R and the database is deleted all automatically behind the scenes making the database's existence transparent to the user who only specifies the SQL statement. Surprisingly this can at times be even faster than the corresponding pure R calculation (although the purpose of the project is convenience and not speed). This link suggests that for aggregations over highly granular columns that sqldf is faster than another alternative tried. sqldf is free software published under the GNU General Public License that can be downloaded from CRAN.

sqldf supports (1) the SQLite backend database (by default), (2) the H2 java database, (3) the database and (4) sqldf 0.4-0 onwards also supports MySQL. SQLite, H2, MySQL and PostgreSQL are free software. SQLite and H2 are embedded serverless zero administration databases that are included right in the R driver packages, RSQLite and RH2, so that there is no separate installation for either one. A number of high profile projects use SQLite. H2 is a java database which contains a large collection of SQL functions and supports Date and other data types. PostgreSQL is a client/server database and unlike SQLite and H2 must be separately installed but it has a particularly powerful version of SQL, e.g. its window functions so the extra installation work can be worth it. sqldf supports the RPostgreSQL driver in R. Like PostgreSQL, MySQL is a client server database that must be installed independently so its not as easy to install as SQLite or H2 but its very popular and is widely used as the back end for web sites.

Overview

Citing sqldf

For Those New to R

News

Troubleshooting

FAQ

Examples

Links

Overview

sqldf is an R package for running SQL statements on R data frames, optimized for convenience. sqldf works with the SQLite, H2, PostgreSQL MySQL databases. SQLite has the least prerequisites to install. H2 is just as easy if you have Java installed and also supports Date class and a few additional functions. PostgreSQL notably supports Windowing functions providing the SQL analogue of the R ave function. MySQL is a particularly popular database that drives many web sites.

More information can be found from within R by installing and loading the sqldf package and then entering ?sqldf and ?read.csv.sql. A number of examples are on this page and more examples are accessible from within R in the examples section of the ?sqldf help page.

As seen from this example which uses the built in BOD data frame:

library(sqldf)
sqldf("select * from BOD where Time > 4")

with sqldf the user is freed from having to do the following, all of which are automatically done:

  • database setup
  • writing the create table statement which defines each table
  • importing and exporting to and from the database
  • coercing of the returned columns to the appropriate class in common cases

It can be used for:

  • learning SQL if you know R
  • learning R if you know SQL
  • as an alternate syntax for data frame manipulation, particularly for purposes of speeding these up, since sqldf with SQLite as the underlying database is often faster than performing the same manipulations in straight R
  • reading portions of large files into R without reading the entire file (example 6b and example 13 below show two different ways and examples 6e, 6f below show how to read random portions of a file)

In the case of SQLite it consists of a thin layer over the RSQLite DBI interface to SQLite itself.

In the case of H2 it works on top of the RH2 DBI driver which in turn uses RJDBC and JDBC to interface to H2 itself.

In the case of PostgreSQL it works on top of the RPostgreSQL DBI driver.

There is also some untested code in sqldf for use with the MySQL database using the RMySQL

View on GitHub
GitHub Stars253
CategoryData
Updated1mo ago
Forks79

Languages

R

Security Score

80/100

Audited on Feb 10, 2026

No findings