kuenm: An R package for detailed development of Maxent Ecological Niche Models

Marlon E. Cobos, A. Townsend Peterson, Luis Osorio-Olvera, and Narayani Barve

Introduction
Getting started
Doing analyses for a single species project
Other functionalities of kuenm
References

Introduction

kuenm is an R package designed to make the process of model calibration and final model creation easier and more reproducible, and at the same time more robust. The aim of this package is to design suites of candidate models to create diverse calibrations of Maxent models and enable selection of optimal parameterizations for each study. Other objectives of this program are to make the task of creating final models and their transfers easier, as well to permit assessing extrapolation risks when model transfers are needed.

This document is a brief tutorial for using the functions of the kuenm R package. The example of a disease vector (a tick) is used in this tutorial to make it more clear and understandable. Functions help can be checked while performing the processes.

Getting started

Directory structure and necessary data

Since this package was designed to perform complex analyses while avoiding excessive demands on the computer (especially related to RAM memory used for R), it needs certain data that are organized carefully in the working directory. Following this structure (Figure 1) will allow working with one or more species in a project, and avoid potential problems during the analyses.

Before starting the analyses, the user must make sure that the working directory (the folder with information for an individual species) has the following components:

A folder containing the distinct sets of environmental variables (i.e., M_variables in Figure 1) to be used (more than one is highly recommended, but not mandatory). These variables must represent environmental variation across the area over which models are calibrated.
A csv file containing training and testing occurrence data together (preferably after cleaning and thinning original data to avoid problems like wrong records and spatial auto-correlation). This data set consists of three fields: species name, longitude, and latitude. See Sp_joint.csv, in Figure 1.
A csv file containing occurrence data for training models. This file and the next file generally represent exclusive subsets of the full set of records. Occurrences can be subsetted in multiple ways (Muscarella et al. 2014), but some degree of independence of training and testing data is desired. See Sp_train.csv in Figure 1.
A csv file containing species occurrence data for testing models as part of the calibration process (i.e., Sp_test.csv in Figure 1).
If available, a csv file containing a completely independent subset of occurrence data—external to training and testing data—for a final, formal model evaluation. This dataset (i.e., for final model evaluation) is given as Sp_ind.csv in Figure 1.

<img src="Structure.png" alt="Figure 1. Directory structure and data for starting processing, as well as directory structure when the processes finish using the kuenm R package. Background colors represent data necessary before starting the analyses (blue) and data generated after the following steps: running the start function (yellow), creating candidate models (lighter green), evaluating candidate models (purple), preparing projection layers (light orange), generating final models and its projections (light gray), evaluating final models with independent data (brown), and analyzing extrapolation risks in projection areas or scenarios (darker green)." width="1371" /> Figure 1. Directory structure and data for starting processing, as well as directory structure when the processes finish using the kuenm R package. Background colors represent data necessary before starting the analyses (blue) and data generated after the following steps: running the start function (yellow), creating candidate models (lighter green), evaluating candidate models (purple), preparing projection layers (light orange), generating final models and its projections (light gray), evaluating final models with independent data (brown), and analyzing extrapolation risks in projection areas or scenarios (darker green).

Installing the package

The kuenm R package is in a GitHub repository and can be installed and/or loaded using the following code (make sure to have Internet connection). To warranty the package functionality, a crucial requirement is to have the maxent.jar application in any user-defined directory (we encourage you to maintain it in a fixed directory). This software is available in the <a href="https://biodiversityinformatics.amnh.org/open_source/maxent/" target="_blank">Maxent repository</a>. Another important requirement for using Maxent and therefore the kuenm package is to have the Java Development Kit installed. The Java Development Kit is available in <a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html" target="_blank">this repository</a>. Finally, for Windows users, Rtools needs to be installed in the computer; it is important that this software is added to the PATH. For instructions on how to download and install it see <a href="http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/#1" target="_blank">this guide</a>. Users of other operative systems may need to install other compiling tools.

# Installing and loading packages
if(!require(devtools)){
    install.packages("devtools")
}

if(!require(kuenm)){
    devtools::install_github("marlonecobos/kuenm")
}

library(kuenm)

Downloading the example data

Data used as an example for testing this package correspond to the turkey tick Amblyomma americanum, a vector of various diseases, including human monocytotropic ehrlichiosis, canine and human granulocytic ehrlichiosis, tularemia, and southern tick-associated rash illness. This species is distributed broadly in North America and a complete analysis of the risk of its invasion in other regions is being developed by Raghavan et al. (in review).

These data are already structured as needed for doing analysis with this package, and can be downloaded (from <a href="http://doi.org/10.17161/1808.26376" target="_blank">kuenm example data</a>) and extracted using the code below.

# Change "YOUR/DIRECTORY" by your actual directory.
download.file(url = "https://kuscholarworks.ku.edu/bitstream/handle/1808/26376/ku.enm_example_data.zip?sequence=3&isAllowed=y", 
              destfile = "YOUR/DIRECTORY/ku.enm_example_data.zip", mode = "wb",
              quiet = FALSE) # donwload the zipped example folder in documents

unzip(zipfile = "YOUR/DIRECTORY/ku.enm_example_data.zip",
      exdir = "YOUR/DIRECTORY") # unzip the example folder in documents

unlink("YOUR/DIRECTORY/ku.enm_example_data.zip") # erase zip file

setwd("YOUR/DIRECTORY/ku.enm_example_data/A_americanum") # set the working directory

dir() # check what is in your working directory

# If you have your own data and they are organized as in the first part of Figure 1, change 
# your directory and follow the instructions below.

Your working directory will be structured similarly to that presented for Species_1 in the left part of Figure 1.

Doing analyses for a single species project

Workflow recording

Once the working directory and data are ready, the function kuenm_start will allow generating an R Markdown (.Rmd) file as a guide to performing all the analyses that this package includes (Figure 1, yellow area). By recording all the code chunks used in the process, this file also helps to make analyses more reproducible. This file will be written in the working directory. The usage of this function is optional, but it is recommended if recording individual workflows per each species is desired.

help(kuenm_start)

# Preparing variables to be used in arguments
file_name <- "aame_enm_process"

kuenm_start(file.name = file_name)

Calibration of models

Note that, from this point, the following procedures will be performed in the R Markdown file previously created, but only if the kuenm_start function was used.

Creation of candidate models

The function kuenm_cal creates and executes a batch file for generating Maxent candidate models that will be written in subdirectories, named as the parameterizations selected, inside the output directory (Figure 1, light green area). Calibration models will be created with multiple combinations of regularization multipliers, feature classes, and sets of environmental predictors. For each combination, this function creates one Maxent model with the complete set of occurrences and another with training occurrences only. On some computers, the user will be asked if ruining the batch file is allowed before the modeling process starts in Maxent.

Maxent will run in command-line interface (do not close

Kuenm

Install / Use

README