Nodbi
Document 'NoSQL' Database DBI Connector package for R
Install / Use
/learn @ropensci/NodbiREADME
nodbi
<!-- badges: start -->
<!-- badges: end -->
nodbi is an R package that provides a single interface for several
NoSQL databases and databases with JSON functionality, with the same
function parameters and return values across all database backends. Last
updated 2026-02-21.
| Currently, nodbi supports<br/>as database backends | for an R object of any<br/>of these data types | for these operations |
|:---|:---|:---|
| MongoDB | data.frame | List, Exists |
| SQLite | list | Create |
| PostgreSQL | JSON string | Get |
| DuckDB | file name of NDJSON records | Query |
| Elasticsearch | URL of NDJSON records | Update |
| CouchDB | | Delete |
For speed comparisons of database backends, see benchmark and testing below.
Plans for developing package nodbi in 2026 include to add MariaDB and
MySQL as database backends.
API overview
Parameters for docdb_*() functions are the same across all database
backends. See walk-through below and the canonical
testing in core-nodbi.R. “Container” is
used as term to indicate where conceptually the backend holds the data,
see Database connections below. The key
parameter holds the name of a container.
| Purpose | Function call |
|:---|:---|
| Create database connection (see below) | src <- nodbi::src_{duckdb, postgres, mongo, sqlite, couchdb, elastic}(<see below for parameters>) |
| Load my_data (a data frame, list, JSON string, or file name or URL pointing to NDJSON records) into database, container my_container | nodbi::docdb_create(src = src, key = "my_container", value = my_data) |
| Get all documents back into a data frame | nodbi::docdb_get(src = src, key = "my_container") |
| Get documents selected with query (as MongoDB-compatible JSON) into a data frame | nodbi::docdb_query(src = src, key = "my_container", query = '{"age": 20}') |
| Get selected fields (in MongoDB compatible JSON) from documents selected by query into a data frame | nodbi::docdb_query(src = src, key = "my_container", query = '{"age": {"$gt": 20}}', fields = '{"friends.name": 1, "_id": 0, "age": 1}', limit = 2L) |
| Update (patch) documents selected by query with new data my_data (in a data frame, list, JSON string, or file name or URL pointing to NDJSON records) | nodbi::docdb_update(src = src, key = "my_container", value = my_data, query = '{"age": 20}') |
| Check if container exists | nodbi::docdb_exists(src = src, key = "my_container") |
| List all containers in database | nodbi::docdb_list(src = src) |
| Delete document(s) in container | nodbi::docdb_delete(src = src, key = "my_container", query = '{"age": 20}') |
| Delete container | nodbi::docdb_delete(src = src, key = "my_container") |
| Close and remove database connection manually (when restarting R, connections are automatically closed and removed by nodbi) | rm(src) |
Install
CRAN version
install.packages("nodbi")
Development version
remotes::install_github("ropensci/nodbi")
Load package from library
library("nodbi")
Database connections
The following subsections show those parameters and aspects that are
specific to the database backend. These are only needed once, for
src_*() to create a connection object. Any such connection object is
subsequently used similarly across the docdb_* functions in package
‘nodbi’.
“Container” refers to how conceptually the database backend holds the
data. Users specify the relevant container with parameter
key = <container_name> in docdb_* functions, see
Walk-through below).
Data types are mapped from JSON to R objects by
jsonlite. Any root-level
_id is extracted from the document(s) and used for an index column
_id, otherwise an UUID is created as _id.
DuckDB
See also https://CRAN.R-project.org/package=duckdb. “Container” refers
to a DuckDB table, with columns _id and json created and used by
package nodbi, applying SQL functions and functions as per
https://duckdb.org/docs/extensions/json to the json column. Each row
in the table represents a JSON document.
src <- nodbi::src_duckdb(dbdir = ":memory:", ...)
MongoDB
“Container” refers to a MongoDB collection, in which nodbi creates
JSON documents. See also https://jeroen.github.io/mongolite/. MongoDB
but none of the other databases require to specify the container name
already in the src_*() function; use the collection name for
parameter key in docdb_* functions.
src <- nodbi::src_mongo(
collection = "my_container", db = "my_database",
url = "mongodb://localhost", ...
)
SQLite
“Container” refers to an SQLite table, with columns _id and json
created and used by package nodbi, applying SQL functions and
functions as per https://www.sqlite.org/json1.html to the json
column. Each row in the table represents a JSON document. The table is
indexed on _id. See also https://CRAN.R-project.org/package=RSQLite.
src <- nodbi::src_sqlite(dbname = ":memory:", ...)
PostgreSQL
“Container” refers to a PostgreSQL table, with columns _id and json
created and used by package nodbi, applying SQL functions and
functions as per
https://www.postgresql.org/docs/current/functions-json.html to the
json column. With PostgreSQL, a custom plpgsql function
jsonb_merge_patch()
is used for docdb_update(). The order of variables in data frames
returned by docdb_get() and docdb_query() can differ from their
order the input to docdb_create().
src <- nodbi::src_postgres(
dbname = "my_database", host = "127.0.0.1", port = 5432L, ...
)
CouchDB
“Container” refers to a CouchDB database, in which nodbi creates JSON
documents. See also https://CRAN.R-project.org/package=sofa. With
CouchDB, function docdb_update() uses
jqr to implement patching
JSON, in analogy to functions available for the other databases.
src <- nodbi::src_couchdb(
host = "127.0.0.1", port = 5984L, path = NULL,
transport = "http", user = NULL, pwd = NULL, headers = NULL
)
Elasticsearch
“Container” refers to an Elasticsearch index, in which nodbi creates
JSON documents. Opensearch can equally be used. Only lowercase is
accepted for container names (in parameter key of docdb_*
functions).
src <- nodbi::src_elastic(
host = "127.0.0.1", port = 9200L, path = NULL,
transport_schema = "http", user = NULL, pwd = NULL, ...
)
Walk-through
This example is to show how functional nodbi is at this time: With any
of the six database backends, the functions work in the same way and
return the same values.
# load nodbi
library(nodbi)
# name of container
key <- "my_container"
# connect any of these database backends
src <- src_duckdb()
src <- src_mongo(collection = key)
src <- src_sqlite()
src <- src_postgres()
src <- src_elastic()
src <- src_couchdb(
user = Sys.getenv("COUCHDB_TEST_USER"),
pwd = Sys.getenv("COUCHDB_TEST_PWD")
)
# check if container already exists
docdb_exists(src, key)
# [1] FALSE
# load data (here data frame, alternatively a list, JSON or file with NSJSON)
# into the container "my_container" specified in "key" parameter
docdb_create(src, key, value = mtcars)
# [1] 32
# load additionally 98 NDJSON records
docdb_create(src, key, "https://httpbin.org/stream/98")
# Note: container 'my_container' already exists
# [1] 98
# load additionally contacts JSON data, from package nodbi
docdb_create(src, key, contacts)
# Note: container 'my_container' already exists
# [1] 5
# get all documents, irrespective of schema
dplyr::tibble(docdb_get(src, key))
# # A tibble: 135 × 27
# `_id` isActive balance age eyeColor name email about registered tags friends
# <chr> <lgl> <chr> <int> <chr> <chr> <chr> <chr> <chr> <list> <list>
# 1 5cd6… TRUE $2,412… 20 blue Kris… kris… "Sin… 2017-07-1… <chr> <df>
# 2 5cd6… FALSE $3,400… 20 brown Rae … raec… "Nis… 2018-12-1… <chr> <df>
# 3 5cd6… TRUE $1,161… 22 brown Pace… pace… "Eiu… 2018-08-1… <chr> <df>
# 4 5cd6… FALSE $2,579… 30 brown Will… will… "Nul… 2018-02-1… <chr> <df>
# 5 5cd6… FALSE $3,808… 23 green Lacy… lacy… "Sun… 2014-08-0… <chr> <df>
# 6 69bc… NA NA NA NA NA NA NA NA <NULL> <NULL>
# 7 69bc… NA NA NA NA NA NA NA NA <NULL> <NULL>
# 8 69bc… NA NA NA NA NA NA NA NA <NULL> <NULL>
# 9 69bc… NA NA NA NA NA NA NA NA <NULL> <NULL>
# 10 69bc… NA NA NA NA NA NA NA NA <NULL> <NULL>
# # ℹ 125 more rows
# # ℹ 16 more variables: url <chr>, args <df[,0]>, headers <df[,4]>, origin <chr>,
# # id <int>, mpg <dbl>, cyl <int>, disp <dbl>, hp <int>, drat <dbl>, wt <dbl>,
# # qsec <dbl>, vs <int>, am <int>, gear <int>, carb <int>
# # ℹ Use `print(n = ...)` to see more rows
# query some documents
docdb_query(src, key, query = '{"mpg": {"$gte": 30}}')
# _id mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Fiat 128 32 4 79 66 4.1 2.2 19 1 1 4 1
# 2 Honda Civic 30 4 76 52 4.9 1.6 19 1 1 4 2
# 3 To
