Future
:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
Install / Use
/learn @futureverse/FutureREADME
future: Unified Parallel and Distributed Processing in R for Everyone <img border="0" src="man/figures/logo.png" alt="The 'future' hexlogo" align="right"/>
TL;DR
The Futureverse makes it easy to parallelize existing R code - often with only a minor change of code. It lowers the barriers so that anyone can safely speed up their existing R code in a worry-free manner. It is a cross-platform solution that requires no additional setups or technical skills. Anyone can be up and running within a few minutes.
At the core of Futureverse is this package, the future package. Most users interact with the future ecosystem using higher-level packages such as [futurize] for its convenience of running map-reduce calls concurrently. Here are some examples of both:
library(future)
plan(multisession)
## Sequential evaluation of an R expression
y <- slow_fcn(X[1])
## Parallel evaluation of an R expression
f <- future(slow_fcn(X[1]))
y <- value(f)
library(futurize)
## Sequential and parallel base R apply
y <- lapply(X, slow_fcn)
y <- lapply(X, slow_fcn) |> futurize()
## Sequential and parallel purrr map
library(purrr)
y <- X |> map(slow_fcn)
y <- X |> map(slow_fcn) |> futurize()
## Sequential and parallel foreach calls
library(foreach)
y <- foreach(x = X) %do% slow_fcn(x)
y <- foreach(x = X) %do% slow_fcn(x) |> futurize()
Introduction
The purpose of the [future] package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user.
In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available instantaneously. If the value is queried while the future is still unresolved, the current process is blocked until the future is resolved. It is possible to check whether a future is resolved or not without blocking. Exactly where and when futures are resolved depends on what future backend is set to evaluate them. For instance, a future can be resolved using the sequential backend, which means it is resolved in the current R session. Other backends may be used for resolving futures asynchronously, for instance, in parallel on the current machine or on a compute cluster.
Here is an example illustrating how the basics of futures work. First, consider the following code snippet that uses plain R code:
> v <- {
+ cat("Hello world!\n")
+ 3.14
+ }
Hello world!
> v
[1] 3.14
It works by assigning the value of an expression to variable v and we then print the value of v. Moreover, when the expression for v is evaluated we also print a message.
Here is the same code snippet modified to use futures instead:
> library(future)
> v %<-% {
+ cat("Hello world!\n")
+ 3.14
+ }
> v
Hello world!
[1] 3.14
The difference is in how v is constructed; with plain R we use <- whereas with futures we use %<-%. The other difference is that output is relayed after the future is resolved (not during) and when the value is queried (see Vignette 'Outputting Text').
So why are futures useful? Because we can choose to evaluate the future expression in a separate R process asynchronously by simply switching settings as:
> library(future)
> plan(multisession)
> v %<-% {
+ cat("Hello world!\n")
+ 3.14
+ }
> v
Hello world!
[1] 3.14
With asynchronous futures, the current/main R process does not block, which means it is available for further processing while the futures are being resolved in separate processes that run in the background. In other words, futures provide a simple but yet powerful construct for parallel and distributed processing in R.
Now, if you cannot be bothered to read all the nitty-gritty details about futures, but just want to try them out, then skip to the end to play with the Mandelbrot demo using both parallel and non-parallel evaluation.
Implicit or Explicit Futures
Futures can be created either implicitly or explicitly. In the introductory example above we used implicit futures created via the v %<-% { expr } construct. An alternative is explicit futures using the f <- future({ expr }) and v <- value(f) constructs. With these, our example could alternatively be written as:
> library(future)
> f <- future({
+ cat("Hello world!\n")
+ 3.14
+ })
> v <- value(f)
Hello world!
> v
[1] 3.14
Either style of future construct works equally(*) well. The implicit style is most similar to how regular R code is written. In principle, all you have to do is to replace <- with a %<-% to turn the assignment into a future assignment. On the other hand, this simplicity can also be deceiving, particularly when asynchronous futures are being used. In contrast, the explicit style makes it much clearer that futures are being used, which lowers the risk for mistakes and better communicates the design to others reading your code.
(*) There are cases where %<-% cannot be used without some (small) modifications. We will return to this in Section 'Constraints when using Implicit Futures' near the end of this document.
To summarize, for explicit futures, we use:
f <- future({ expr })- creates a futurev <- value(f)- gets the value of the future (blocks if not yet resolved)
For implicit futures, we use:
v %<-% { expr }- creates a future and a promise to its value
To keep it simple, we will use the implicit style in the rest of this document, but everything discussed will also apply to explicit futures.
Controlling How Futures are Resolved
The future package comes with built-in future backends that leverage the parallel package part of R itself. In addition to these backends, others exist in package extensions, e.g. [future.callr], [future.mirai], and [future.batchtools]. Below is an overview of the most common backends that you as an end-user can choose from.
| Package / Backend | Features | How futures are evaluated
|:----------------|:------------|:-----------------------------------------------------
| future<br> sequential | 📶<br>♻️<br> | sequentially and in the current R process; default
| future<br> multisession | 📶<br>♻️<br> | parallelly via background R sessions on current machine
| future<br> cluster | 📶<br>♻️*<br> | parallelly in external R sessions on current, local, and/or remote machines
| future<br> multicore | 📶<br>♻️<br> | (not recommended) parallelly via forked R processes on current machine; not with GUIs like RStudio; not on Windows
| [future.callr]<br> callr | 📶<br>♻️<br> | parallelly via transient [callr] background R sessions on current machine; all memory is returned as each future is resolved
| [future.mirai]<br> mirai_multisession | 📶<br>♻️<br> | parallelly via [mirai] background R sessions on current machine; low latency
| [future.mirai]<br> mirai_cluster |♻️<br> | parallelly via [mirai] daemons running locally or remotely
| [future.batchtools]<br> batchtools_lsf<br>batchtools_openlava<br>batchtools_sge<br>batchtools_slurm<br>batchtools_torque | 📶(soon)<br> ♻️<br> | parallelly on HPC job schedulers (Load Sharing Facility [LSF], OpenLava, TORQUE/PBS, Son/Sun/Oracle/Univa Grid Engine [SGE], Slurm) via [batchtools]; for long-running tasks; high latency |
📶: futures relay progress updates in real-time, e.g. [progressr]<br> ♻️: futures are interruptible and restartable; * disabled by default<br> (next): next release; (soon): in a near-future release
By default, future expressions are evaluated synchronously in the current R session via the "sequential" backend. In this section, we will go through the other backend and discuss what they have in common and how they differ.
Consistent Behavior Across Futures
Before going through each of the different future backends, it is probably helpful to clarify the objectives of the Future API (as defined by the future package). When programming with futures, it should not matter what future backend is used for executing code. This is because we cannot really know what computational resources the user has access to, so the choice of parallel backend should be in the hands of the user and not the developer. In other words, the code should not make any assumptions on where and when futures are resolved.
One of the designs of the Future API was to encapsulate any differences such that all types of futures will appear to work the same. This is despite the fact that expressions may be evaluated locally in the current R session or across the world in remote R sessions. An
