[!CAUTION]
This package is no longer being actively maintained. We may attempt an update to the current spatial-r standards in the future, but there is no timeline or funding for planning ahead.

modleR: a workflow for ecological niche models

<img src="./vignettes/modleR.png" align="right" alt="" width="120" />

modleR is a workflow based on package dismo (Hijmans et al. 2017), designed to automatize some of the common steps when performing ecological niche models. Given the occurrence records and a set of environmental predictors, it prepares the data by cleaning for duplicates, removing occurrences with no environmental information and applying some geographic and environmental filters. It executes crossvalidation or bootstrap procedures, then it performs ecological niche models using several algorithms, some of which are already implemented in the dismo package, and others come from other packages in the R environment, such as glm, Support Vector Machines and Random Forests.

Citation

Andrea Sánchez-Tapia, Sara Ribeiro Mortara, Diogo Souza Bezerra Rocha, Felipe Sodré Mendes Barros, Guilherme Gall, Marinez Ferreira de Siqueira. modleR: a modular workflow to perform ecological niche modeling in R. https://www.biorxiv.org/content/10.1101/2020.04.01.021105v1

Installing

Currently modleR can be installed from GitHub:

# Without vignette
remotes::install_github("Model-R/modleR", build = TRUE)
# With vignette
remotes::install_github("Model-R/modleR",
                        build = TRUE,
                        dependencies = TRUE,
                        build_opts = c("--no-resave-data", "--no-manual"),
                        build_vignettes = TRUE)

Note regarding vignette building: the default parameters in build_opts include --no-build-vignettes. In theory, removing this will include the vignette on the installation but we have found that build_vignettes = TRUE is also necessary. During installation, R may ask to install or update some packages. If any of these return an error you can install them apart by running install.packages() and retry. When building the vignette, package rJava and a JDK will be needed. Also, make sure that the maxent.jar file is available and in the java folder of package dismo. Please download it here. Vignette building may take a while during installation.

Packages kuenm and maxnet should be installed from GitHub:

remotes::install_github("marlonecobos/kuenm")
remotes::install_github("mrmaxent/maxnet")

The workflow

The workflow consists of mainly four functions that should be used sequentially.

Setup: setup_sdmdata() prepares and cleans the data, samples the pseudoabsences, and organizes the experimental design (bootstrap, crossvalidation or repeated crossvalidation). It creates a metadata file with details for the current round and a sdmdata file with the data used for modeling
Model fitting and projecting: do_any() makes the ENM for one algorithm and partition; optionally, do_many() calls do_any() to fit multiple algorithms
Partition joining: final_model() joins the partition models into a model per species per algorithm
Ensemble: ensemble_model() joins the different models per algorithm into an ensemble model (algorithmic consensus) using several methods.

Folder structure created by this package

modleR writes the outputs in the hard disk, according to the following folder structure:

models_dir
├── projection1
│   ├── data_setup
│   ├── partitions
│   ├── final_models
│   └── ensemble_models
└── projection2
    ├── data_setup
    ├── partitions
    ├── final_models
    └── ensemble_models

We define a partition as the individual modeling round (one training and test data set and one algorithm)
We define the final models as joining together the partitions and obtaining one model per species per algorithm
Ensemble models join together the results obtained by different algorithms (Araújo and New 2007)
When projecting models into the present, the projection folder is called present, other projections will be named after their environmental variables
You can set models_dir wherever you want in the hard disk, but if you do not modify the default value, it will create the output under the working directory (its default value is ./models, where the period points to the working directory)
The names of the final and ensemble folders can be modified, but the nested subfolder structure will remain the same. If you change final_models default value ("final_model") you will need to include the new value when calling ensemble_model() (final_dir = "[new name]"), to indicate the function where to look for models. This partial flexibility allows for experimenting with final model and ensemble construction (by runnning final or ensemble twice in different output folders, for example).

The example dataset

modleR comes with example data, a list called example_occs with occurrence data for four species, and predictor variables called example_vars.

library(modleR)

str(example_occs)
#> List of 4
#>  $ Abarema_langsdorffii:'data.frame':    104 obs. of  3 variables:
#>   ..$ sp : chr [1:104] "Abarema_langsdorffii" "Abarema_langsdorffii" "Abarema_langsdorffii" "Abarema_langsdorffii" ...
#>   ..$ lon: num [1:104] -40.6 -40.7 -41.2 -41.7 -42.5 ...
#>   ..$ lat: num [1:104] -19.9 -20 -20.3 -20.5 -20.7 ...
#>  $ Eugenia_florida     :'data.frame':    341 obs. of  3 variables:
#>   ..$ sp : chr [1:341] "Eugenia_florida" "Eugenia_florida" "Eugenia_florida" "Eugenia_florida" ...
#>   ..$ lon: num [1:341] -35 -34.9 -34.9 -36.4 -42.1 ...
#>   ..$ lat: num [1:341] -6.38 -7.78 -8.1 -10.42 -2.72 ...
#>  $ Leandra_carassana   :'data.frame':    82 obs. of  3 variables:
#>   ..$ sp : chr [1:82] "Leandra_carassana" "Leandra_carassana" "Leandra_carassana" "Leandra_carassana" ...
#>   ..$ lon: num [1:82] -39.3 -39.6 -40.7 -41.2 -41.5 ...
#>   ..$ lat: num [1:82] -15.2 -15.4 -20 -20.3 -20.4 ...
#>  $ Ouratea_semiserrata :'data.frame':    90 obs. of  3 variables:
#>   ..$ sp : chr [1:90] "Ouratea_semiserrata" "Ouratea_semiserrata" "Ouratea_semiserrata" "Ouratea_semiserrata" ...
#>   ..$ lon: num [1:90] -40 -42.5 -42.4 -42.9 -42.6 ...
#>   ..$ lat: num [1:90] -16.4 -20.7 -19.5 -19.6 -19.7 ...
species <- names(example_occs)
species
#> [1] "Abarema_langsdorffii" "Eugenia_florida"      "Leandra_carassana"   
#> [4] "Ouratea_semiserrata"

library(sp)
par(mfrow = c(2, 2), mar = c(2, 2, 3, 1))
for (i in 1:length(example_occs)) {
  plot(!is.na(example_vars[[1]]),
       legend = FALSE,
       main = species[i],
       col = c("white", "#00A08A"))
  points(lat ~ lon, data = example_occs[[i]], pch = 19)
}
par(mfrow = c(1, 1))

<figure> <img src="man/figures/README-dataset-1.png" alt="Figure 1. The example dataset: predictor variables and occurrence for four species." /> <figcaption aria-hidden="true">Figure 1. The example dataset: predictor variables and occurrence for four species.</figcaption> </figure>

We will filter the example_occs file to select only the data for the first species:

occs <- example_occs[[1]]

Cleaning and setting up the data: `setup_sdmdata()`

The first step of the workflow is to setup the data, that is, to partition it according to each project needs, to sample background pseudoabsences and to apply some data cleaning procedures, as well as some filters. This is done by function setup_sdmdata()

setup_sdmdata() has a large number of parameters:

args(setup_sdmdata)
#> function (species_name, occurrences, predictors, lon = "lon", 
#>     lat = "lat", models_dir = "./models", real_absences = NULL, 
#>     buffer_type = NULL, dist_buf = NULL, env_filter = FALSE, 
#>     env_distance = "centroid", buffer_shape = NULL, min_env_dist = NULL, 
#>     min_geog_dist = NULL, write_buffer = FALSE, seed = NULL, 
#>     clean_dupl = FALSE, clean_nas = FALSE, clean_uni = FALSE, 
#>     geo_filt = FALSE, geo_filt_dist = NULL, select_variables = FALSE, 
#>     cutoff = 0.8, sample_proportion = 0.8, png_sdmdata = TRUE, 
#>     n_back = 1000, partition_type = c("bootstrap"), boot_n = 1, 
#>     boot_proportion = 0.7, cv_n = NULL, cv_partitions = NULL) 
#> NULL

species_name is the name of the species to model
occurrences is the data frame with occurrences, lat and lon are the names of the columns for latitude and longitude, respectively. If they are already named lat and lon they need not be specified.
predictors: is the rasterStack of the environmental variables

There are a couple options for data cleaning:

clean_dupl will delete exact duplicates in the occurrence data
clean_nas will delete any occurrence with no environmental data in the predictor set
clean_uni will leave only one occurrence per pixel

The function also sets up different experimental designs:

partition_type can be either bootstrap or k-fold crossvalidation
boot_n and cv_n perform repeated bootstraps and repeated k-fold crossvalidation, respectively
boot_proportion sets the proportion of data to be sampled as training set (defaults to 0.8)
cv_partitions sets the number of partitions in the k-fold crossvalidations (defaults to 3) but overwrites part when n < 10, setting part to the number of occurrence records (a jacknife partition)

ModleR

Install / Use

README

modleR: a workflow for ecological niche models

Citation

Installing

The workflow

Folder structure created by this package

The example dataset

Cleaning and setting up the data: `setup_sdmdata()`

ModleR

Install / Use

README

modleR: a workflow for ecological niche models

Citation

Installing

The workflow

Folder structure created by this package

The example dataset

Cleaning and setting up the data: setup_sdmdata()

Cleaning and setting up the data: `setup_sdmdata()`