Rconfig
Manage R Configuration at the Command Line
Install / Use
/learn @analythium/RconfigREADME
rconfig
Manage R Configuration at the Command Line
Manage R configuration using files (YAML, JSON, INI, TXT) JSON strings and command line arguments. Command line arguments can be used to override configuration. Period-separated command line flags are parsed as hierarchical lists. Environment variables, R global variables, and configuration values can be substituted.
Try rconfig in your browser: click the Gitpod button, then
cd inst/examples in the VS Code terminal to run the Rscript example
from this README!
Install
# CRAN version
install.packages("rconfig")
# Development version from R-universe
install.packages("rconfig", repos = "https://analythium.r-universe.dev")
Another config package?
There are other R packages to manage configs:
- config has nice inheritance rules, and it even scans parent directories for YAML config files
- configr has nice substitution/interpolation features and supports YAML, JSON, TOML, and INI file formats
These package are fantastic if you are managing deployments at different stages of the life cycle, i.e. testing/staging/production.
However, when you use Rscript from the command line, you often do not
want to manage too many configuration files, but want a quick way to
override some of the default settings.
The rconfig package provides various ways to override defaults, and instead of changing the active configuration (as in the config package), you can merge lists in order to arrive at a final configuration. These are very similar concepts, but not quite the same.
The rconfig package has the following features:
- uses default configuration file
- file based override with the
-for--fileflags (accepts JSON, YAML, INI, and plain text files) - JSON string based override with the
-jor--jsonflags - other command line arguments are merged too, e.g.
--cores 4 - heuristic rules are used to coerce command line values to the right type
- R expressions starting with
!exprare evaluated by default, this behavior can be turned off (same feature can be found in the yaml and config packages, but here it works with plain text and JSON too) - period-separated command line arguments are parsed as hierarchical
lists, e.g.
--user.name Joewill be added asuser$nameto the config list - nested configurations can also be flattened
- command line flags without a value will evaluate to
TRUE, e.g.--verbose - environment variables (
${VALUE}), R global variables (@{VALUE}), and configuration values (#{VALUE}) can be substituted - differentiates verb/noun syntax, where verbs are sub-commands
following the R script file name and preceding the command line flags
(starting with
-or--)
This looks very similar to what littler, getopt, and optparse are supposed to do. You are right. These packages offer amazing command line experience once you have a solid interface. In an iterative and evolving research and development situation, however, rconfig gives you agility.
Moreover, the rconfig package offers various ways for substituting
environment variables, R global variables, and even substituting
configuration values. The
GetoptLong
package has similar functionality but its focus is on command line
interfaces and not configuration. Other tools, such as sprintf,
glue,
rprintf, and
whiskers are aimed at
substituting values from R expressions.
If you are not yet convinced, here is a quick teaser. This is the
content of the default configuration file, rconfig.yml:
trials: 5
dataset: "demo-data.csv"
cores: !expr getOption("mc.cores", 1L)
user:
name: "demo"
description: |
This is a multi line
description.
Let’s use a simple R script to print out the configs:
#!/usr/bin/env Rscript
options("rconfig.debug"=TRUE)
str(rconfig::rconfig())
Now you can override the default configuration using another file, a JSON string, and some other flags. Notice the variable substitution for user name!
export USER=Jane
Rscript --vanilla test.R deploy \
-f rconfig-prod.yml \
-j '{"trials":30,"dataset":"full-data.csv"}' \
--user.name $USER \
--verbose
# List of 6
# $ trials : int 30
# $ dataset : chr "full-data.csv"
# $ cores : int 1
# $ user :List of 1
# ..$ name: chr "Jane"
# $ description: chr "This is a multi line\ndescription."
# $ verbose : logi TRUE
# - attr(*, "trace")=List of 2
# ..$ kind : chr "merged"
# ..$ value:List of 4
# .. ..$ :List of 2
# .. .. ..$ kind : chr "file"
# .. .. ..$ value: chr "/Users/Peter/git/github.com/analythium/rconfig/inst/examples/rconfig.yml"
# .. ..$ :List of 2
# .. .. ..$ kind : chr "file"
# .. .. ..$ value: chr "/Users/Peter/git/github.com/analythium/rconfig/inst/examples/rconfig-prod.yml"
# .. ..$ :List of 2
# .. .. ..$ kind : chr "json"
# .. .. ..$ value: chr "{\"trials\":30,\"dataset\":\"full-data.csv\"}"
# .. ..$ :List of 2
# .. .. ..$ kind : chr "args"
# .. .. ..$ value: chr "deploy --user.name Jane --verbose"
# - attr(*, "command")= chr "deploy"
# - attr(*, "class")= chr "rconfig"
The package was inspired by the config package, docker-compose/kubectl/caddy and other CLI tools, and was motivated by some real world need when managing background processing on cloud instances.
Usage
R command line usage
Open the project in RStudio or set the work directory to the folder root after cloning/downloading the repository.
str(rconfig::rconfig())
# List of 5
# $ trials : int 5
# $ dataset : chr "demo-data.csv"
# $ cores : int 1
# $ user :List of 1
# ..$ name: chr "demo"
# $ description: chr "This is a multi line\ndescription."
# - attr(*, "command")= chr(0)
# - attr(*, "class")= chr "rconfig"
str(rconfig::rconfig(
file = "rconfig-prod.yml"))
# List of 5
# $ trials : int 30
# $ dataset : chr "full-data.csv"
# $ cores : int 1
# $ user :List of 1
# ..$ name: chr "real_We4$#z*="
# $ description: chr "This is a multi line\ndescription."
# - attr(*, "command")= chr(0)
# - attr(*, "class")= chr "rconfig"
str(rconfig::rconfig(
file = c("rconfig.json",
"rconfig-prod.txt"),
list = list(user = list(name = "Jack"))))
# List of 5
# $ trials : int 30
# $ dataset : chr "full-data.csv"
# $ cores : int 1
# $ user :List of 1
# ..$ name: chr "Jack"
# $ description: chr "This is a multi line\ndescription."
# - attr(*, "command")= chr(0)
# - attr(*, "class")= chr "rconfig"
str(rconfig::rconfig(
file = c("rconfig.json",
"rconfig-prod.txt"),
list = list(user = list(name = "Jack")),
flatten = TRUE))
# List of 5
# $ trials : int 30
# $ dataset : chr "full-data.csv"
# $ cores : int 1
# $ user.name : chr "Jack"
# $ description: chr "This is a multi line\ndescription."
# - attr(*, "command")= chr(0)
# - attr(*, "class")= chr "rconfig"
Set defaults in case some values are undefined (best to use [[
notation instead of $ to avoid surprises):
CONFIG <- rconfig::rconfig(
file = "rconfig-prod.yml")
rconfig::value(CONFIG[["cores"]], 2L) # set to 1L
# [1] 1
rconfig::value(CONFIG[["test"]]) # unset
# NULL
rconfig::value(CONFIG[["test"]], FALSE) # use default
# [1] FALSE
The default values are used to ensure type safety:
str(rconfig::value(CONFIG[["trials"]], 0L)) # integer
# int 30
str(rconfig::value(CONFIG[["trials"]], 0)) # numeric
# num 30
str(rconfig::value(CONFIG[["trials"]], "0")) # character
# chr "30"
str(rconfig::value(CONFIG[["trials"]], FALSE)) # logical
# logi TRUE
Using alongside of the config package:
conf <- config::get(
config = "production",
file = "config.yml",
use_parent = FALSE)
str(rconfig::rconfig(
file = "rconfig.yml",
list = conf))
# List of 5
# $ trials : int 30
# $ dataset : chr "data.csv"
# $ cores : int 1
# $ user :List of 1
# ..$ name: chr "demo"
# $ description: chr "This is a multi line\ndescription."
# - attr(*, "command")= chr(0)
# - attr(*, "class")= chr "rconfig"
Variable substitution
The rconfig package interprets 3 kinds of substitution patterns:
- environment variables (
${VALUE}): these variables are already present when the configurations is read from the calling environment or from.Renvironfile in the project specific or home folder, set variables can be null or not-null - R global variables (
@{VALUE}): the rconfig package looks for variables in the global environment at the time of configuration evaluation, however, expressions are not evaluated (unlike the!exproption for values) - configuration values (
#{VALUE}): the config
