Rds2cpp
Read and write RDS/RDA files in C++
Install / Use
/learn @LTLA/Rds2cppREADME
Read RDS/RDA files in C++
Overview
This repository contains a header-only C++ library for reading and writing RDS or RDA files without the need to link to R's libraries. In this manner, we can use RDS as a flexible data exchange format across different frameworks that have C++ bindings, e.g., Python, Javascript (via Wasm). We currently support most user-visible data structures such as atomic vectors, lists, environments and S4 classes.
Quick start
Each RDS file contains a single R object and is typically created by calling saveRDS() within an R session.
Given a path to an RDS file, the parse_rds() function will return a pointer to an RObject interface:
#include "rds2cpp/rds2cpp.hpp"
// Returns an object containing the file information,
// e.g., R version used to read/write the file.
auto file_info = rds2cpp::parse_rds(fpath, rds2cpp::ParseRdsOptions());
// Get the pointer to the actual R object.
const auto& ptr = file_info->object;
The type of the underlying object can then be queried for further examination. For example, if we wanted to process integer vectors:
if (ptr->type() == rds2cpp::SEXPType::INT) {
auto iptr = static_cast<const rds2cpp::IntegerVector*>(ptr.get());
const std::vector<std::int32_t>& values = iptr->data;
}
See the reference documentation for a list of known representations.
More reading examples
We can extract ordinary lists from an RDS file, examining the attributes to determine if the list is named.
if (ptr->type() == rds2cpp::SEXPType::VEC) {
auto lptr = static_cast<const rds2cpp::GenericVector*>(ptr.get());
const auto& elements = lptr->data; // vector of pointers to list elements.
for (const auto& attr : lptr->attributes) {
// Symbols are referenced by their position in the 'symbols' vector.
const auto& attr_name = file_info.symbols[attr.name.index].name;
if (attr_name == "names") {
if (attr.value->type() != rds2cpp::SEXPType::STR) {
// Just adding some protection for weird objects.
throw std::runtime_error("oops, names should be strings!");
}
auto nptr = static_cast<const rds2cpp::StringVector*>(attr.value.get());
for (const auto& str : nptr->value) {
if (!str.value.has_value()) {
throw std::runtime_error("oops, names should not be missing!");
}
const std::string& str_value = *(str.value); // value of the string.
const auto& str_enc = str.encoding; // encoding of the string.
// Do something with the list names...
}
}
}
}
Slots of S4 instances are similarly encoded in the attributes - except for the class name, which is extracted into its own member.
if (ptr->type() == rds2cpp::SEXPType::S4) {
auto sptr = static_cast<const rds2cpp::S4Object*>(ptr.get());
sptr->class_name;
sptr->package_name;
for (const auto& slot : sptr->attributes) {
const auto& slot_name = file_info.symbols[slot.name.index].name;
const auto& slot_val = *(slot.value); // Do something with the slot value...
}
}
Advanced users can also pull out serialized environments. These should be treated as file-specific globals that may be referenced one or more times inside the R object.
if (ptr->type() == rds2cpp::SEXPType::ENV) {
auto eptr = static_cast<const rds2cpp::EnvironmentIndex*>(ptr.get());
const auto& env = file_info.environments[eptr->index];
for (const auto& var = env.variables) {
const auto& var_name = file_info.symbols[var.name.index].name;
const auto& var_value = *(var.value); // Do something with the variable...
}
}
NULLs are supported but not particularly interesting:
if (ptr->type() == rds2cpp::SEXPType::NIL) {
// Do something.
}
Writing RDS files
The write_rds() function will write RDS files from an rds2cpp::RObject representation:
rds2cpp::RdsFile file_info;
// Setting up an integer vector.
auto vec = new rds2cpp::IntegerVector;
file_info.object.reset(vec);
// Storing data in the integer vector.
vec->data = std::vector<std::int32_t>{ 0, 1, 2, 3, 4, 5 };
rds2cpp::write_rds(file_info, "some_file_path.rds", rds2cpp::WriteRdsOptions());
Here's a more complicated example that saves a sparse matrix (as a dgCMatrix from the Matrix package) to file.
rds2cpp::RdsFile file_info;
auto ptr = std::make_unique<rds2cpp::S4Object>();
auto& obj = *ptr;
obj.class_name = "dgCMatrix";
obj.package_name = "Matrix";
auto ivec = std::make_unique<rds2cpp::IntegerVector>();
ivec->data = std::vector<std::int32_t>{ 6, 8, 0, 3, 5, 6, 0, 1, 3, 7 };
obj.attributes.emplace_back(
rds2cpp::register_symbol("i", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(ivec)
);
auto pvec = std::make_unique<rds2cpp::IntegerVector>();
pvec->data = std::vector<std::int32_t>{ 0, 0, 2, 3, 4, 5, 6, 8, 8, 8, 10 };
obj.attributes.emplace_back(
rds2cpp::register_symbol("p", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(pvec)
);
auto xvec = std::make_unique<rds2cpp::DoubleVector>();
xvec->data = std::vector<double>{ .96, -.34, .82, -2., -.72, .39, .16, .36, -1.5, -.47 };
obj.attributes.emplace_back(
rds2cpp::register_symbol("x", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(xvec)
);
auto dims = std::make_unique<rds2cpp::IntegerVector>();
dims->data = std::vector<int32_t>{ 10, 10 };
obj.attributes.emplace_back(
rds2cpp::register_symbol("Dim", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(dims)
);
auto dimnames = std::make_unique<rds2cpp::GenericVector>();
dimnames->data.emplace_back(new Null);
dimnames->data.emplace_back(new Null);
obj.attributes.emplace_back(
rds2cpp::register_symbol("Dimnames", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(dimnames)
);
obj.attributes.add(
rds2cpp::register_symbol("factors", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::make_unique<rds2cpp::GenericVector>()
);
file_info.object = std::move(ptr);
rds2cpp::write_rds(file_info, "my_matrix.rds", {});
We can also create environments by registering the environment before creating indices to it.
rds2cpp::RdsFile file_info;
// Creating an environment with a 'foo' variable containing c('bar', NA, 'whee')
file_info.environments.emplace_back();
auto& current_env = file_info.environments.back();
auto sptr = std::make_unique<rds2cpp::StringVector>();
sptr->data.emplace_back("bar", rds2cpp::StringEncoding::UTF8);
sptr->data.emplace_back(); // NA string.
sptr->data.emplace_back("whee", rds2cpp::StringEncoding::ASCII);
// The object is just a reference to the first environment:
file_info.object.reset(new rds2cpp::EnvironmentIndex(0));
rds2cpp::write_rds(file_info, "my_env.rds", {});
Reading/writing RDA files
Each RDA file (a.k.a., Rdata) contains multiple R objects, each of which is associated with a unique name.
It is typically created by calling save() within an R session.
We can read all objects into memory with the parse_rda() function:
auto file_info = rds2cpp::parse_rda(fpath, rds2cpp::ParseRdaOptions());
for (const auto& obj : file_info.objects) {
const auto& obj_name = file_info.symbols[obj.name.index].name;
switch (obj.value->type()) {
case rds2cpp::SEXPType::INT:
// This is an integer vector...
break;
case rds2cpp::SEXPType::STR:
// This is a character vector...
break;
default:
// and so on...
}
}
Similarly, we can write the name/object pairs into an RDA file.
#include <numeric>
auto ivec = std::make_unique<rds2cpp::IntegerVector>(5);
std::iota(ivec->data.begin(), ivec->data.end(), 1);
auto list = std::make_unique<rds2cpp::GenericVector>(2);
list->data[0].reset(new Null);
list->data[1].reset(new rds2cpp::LogicalVector(10));
auto svec = std::make_unique<rds2cpp::StringVector>(1);
svec->data[0].value = "FOOBAR";
rds2cpp::RdaFile file_info;
file_info.objects.emplace_back(
rds2cpp::register_symbol("alpha", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(ivec)
);
file_info.objects.emplace_back(
rds2cpp::register_symbol("bravo", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(list)
);
file_info.objects.emplace_back(
rds2cpp::register_symbol("charlie", rds2cpp::StringEncoding::UTF8, file_info.symbols),
std::move(svec)
);
rds2cpp::write_rda(file_info, "my_env.Rda", rds2cpp::WriteRdaOptions());
Building projects
CMake with FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
rds2cpp
GIT_REPOSITORY https://github.com/LTLA/rds2cpp
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(rds2cpp)
Then you can link to rds2cpp to make the headers available during compilation:
# For executables:
target_link_libraries(myexe rds2cpp)
# For libaries
target_link_libraries(mylib INTERFACE rds2cpp)
CMake using find_package()
You can install the library by cloning a suitable version of this repository and running the following commands:
mkdir build && cd build
cmake ..
cmake --build . --target install
Then you can use find_package() as usual:
find_package(ltla_rds2cpp CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::rds2cpp)
Manual
If you're not using CMake, the simple approach is to just copy the files
Related Skills
node-connect
346.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
