<h5 align="right"> Latest version: Sept. 9, 2016 </h5> A Brief Introduction to SpadeR (R package): Species-Richness Prediction and Diversity Estimation ===== <h4>Anne Chao, K. H., Ma, T. C., Hsieh and Chun-Huo Chiu. Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan 30043</h4>

Overview

SpadeR (Species-Richness Prediction and Diversity Estimation with R) is an updated R package from the original version of SPADE. SpadeR provides simple R functions to compute various biodiversity indices and related (dis)similarity measures based on individual-based (abundance) data or sampling-unit-based (incidence) data taken from one or multiple communities/assemblages. The SpadeR package is available in CRAN. We have been updating SpadeR and you can download the latest version from Github (see below) or from Anne Chao's website.

<li> You need to acquire basic knowledge about R to use functions supplied by SpadeR. <li> For readers without R background, please try our SpadeR Online, an R-based online version via the link [Anne Chao's website](http://chao.stat.nthu.edu.tw/wordpress/software_download/) or https://chao.shinyapps.io/SpadeR/. Users do not need to learn/understand R to run SpadeR Online.

Both SpadeR (R package) and SpadeR Online include nearly all of the important features from the original program SPADE while also having the advantages of expanded output displays and simplified data input formats. See SpadeR Manual for all details of the functions supplied in the package. For numerical examples with proper interpretations, see the detailed Online SpadeR User's Guide.

This package contains six main functions:

ChaoSpecies (estimating species richness for one community).
Diversity (estimating a continuous diversity profile and various diversity indices in one community including species richness, Shannon diversity and Simpson diversity). This function also features plots of empirical and estimated continuous diversity profiles.
ChaoShared (estimating the number of shared species between two communities).
SimilartyPair (estimating various similarity indices between two assemblages). Both richness and abundance-based two-community similarity indices are included.
SimilarityMult (estimating various similarity indices among N communities). Both richness and abundance-based N-community similarity indices are included.
Genetics (estimating allelic dissimilarity/differentiation among sub-populations based on multiple subpopulation genetics data).

Except for the Genetics function, there are at least three types of data are supported for each function.

Data Types

It is very important to prepare your data in correct format. Data are generally classified as abundance data and incidence data and there are five types of data input formats options (datatype="abundance", "abundance_freq_count", "incidence_freq", "incidence_freq_count", "incidence_raw").

<li> Individual-based abundance data when a sample of individuals is taken from each community. Type (1) abundance data (datatype = "abundance"): Input data consist of species (in rows) by community (in columns) matrix. The entries of each row are the observed abundances of a species in N communities. Type (1A) abundance-frequency counts data only for a single community (datatype = "abundance_freq_count"): input data are arranged as (1 f1 2 f2 ... r fr)(each number needs to be separated by at least one blank space or separated by rows), where r denotes the maximum frequency and fk denotes the number of species represented by exactly k individuals/times in the sample. Here the data (f1, f2,..., fr) are referred to as "abundance-frequency counts". <li> Sampling-unit-based incidence data when a number of sampling units are randomly taken from each community. Only the incidence (detection/non-detection) of species is recorded in each sampling unit. There are three data formats options.

Type (2) incidence-frequency data (datatype="incidence_freq"): The first row of the input data must be the number of sampling units in each community. Beginning with the second row, input data consist of species (in rows) by community (in columns) matrix. The entries of each row are the observed incidence frequencies (the number of detections or the number of sampling units in which a species are detected) of a species in N communities. Type (2A) incidence-frequency counts data only for a single community (datatype="incidence_ freq_count"): input data are arranged as (T 1 Q1 2 Q2 ... r Qr) (each number needs to be separated by at least one blank space or separated by rows), where Qk denotes the number of species that were detected in exactly k sampling units, while r denotes the number of sampling units in which the most frequent species were found. The first entry must be the total number of sampling units, T. The data (Q1,Q2,...,Qr) are referred to as "incidence frequency counts". Type (2B) incidence-raw data (datatype="incidence_raw"): Data consist of a species-by-sampling-unit incidence (detection/non-detection) matrix; typically "1" means a detection and "0" means a non-detection. Each row refers to the detection/non-detection record of a species in T sampling units. Users must specify the number of sampling units in the function argument "units". The first T1 columns of the input matrix denote species detection/non-detection data based on the T1 sampling units from Community 1, and the next T2 columns denote the detection/non-detection data based on the T2 sampling units from Community 2, and so on, and the last TN columns denote the detection/non-detection data based on TN sampling units from Community N, T1+ T2+ ... + TN = T.

Software needed

Required: R
Suggested: RStudio IDE

How to install

start R(Studio) and copy-and-paste the following commands:

## install the latest version from github
install.packages('devtools')
library(devtools)
install_github('AnneChao/SpadeR')
library(SpadeR)

Remark that in order to install devtools package, you should update R to the last version. Also, to get install_github to work, the httr package should be installed.

Run SpadeR by examples

In the package, we have included many demo datasets for illustration. To gain familiarity with the program, we suggest that users first run the demo data sets included in SpadeR package and check the output with that given in the SpadeR User's Guide. Part of the output for each example is also interpreted in the guide to help users understand the statistical results. The formulas for estimators featured in SpadeR with relevant references are also provided in the SpadeR User's Guide.

Part I: ChaoSpecies (estimating species richness for one community).

# Data for Function ChaoSpecies(data, datatype, k = 10, conf = 0.95)

data(ChaoSpeciesData)

# Type (1) abundance data
ChaoSpecies(ChaoSpeciesData$Abu,"abundance",k=10,conf=0.95)

# Type (1A) abundance frequency counts data
ChaoSpecies(ChaoSpeciesData$Abu_count,"abundance_freq_count",k=10,conf=0.95)

# Type (2) incidence frequency data
ChaoSpecies(ChaoSpeciesData$Inci,"incidence_freq",k=10,conf=0.95)

# Type (2A) incidence frequency counts data
ChaoSpecies(ChaoSpeciesData$Inci_count,"incidence_freq_count",k=10,conf=0.95)

# Type (2B) incidence raw data
ChaoSpecies(ChaoSpeciesData$Inci_raw,"incidence_raw",k=10,conf=0.95)

Part II: Diversity (estimating a continuous diversity profile and various diversity indices in one community including species richness, Shannon diversity and Simpson diversity). This function also features plots of empirical and estimated continuous diversity profiles.

# Data for Function Diversity(data, datatype, q = NULL)

data(DiversityData)

# Type (1) abundance data
Diversity(DiversityData$Abu,"abundance",q=c(0,0.5,1,1.5,2))

# Type (1A) abundance frequency counts data
Diversity(DiversityData$Abu_count,"abundance_freq_count",q=seq(0,3,by=0.5))

# Type (2) incidence frequency data
Diversity(DiversityData$Inci,"incidence_freq",q=NULL)

# Type (2A) incidence frequency counts data
Diversity(DiversityData$Inci_freq_count,"incidence_freq_count",q=NULL)

# Type (2B) incidence raw data
Diversity(DiversityData$Inci_raw,"incidence_raw",q=NULL)

Part III: ChaoShared (estimating the number of shared species between two communities).

# Data for Function ChaoShared(data, datatype, units, se = TRUE, nboot = 200, conf = 0.95)

data(ChaoSharedData)

# Type (1) abundance data
ChaoShared(ChaoSharedData$Abu,"abundance",se=TRUE,nboot=200,conf=0.95)

# Type (2) incidence frequency data
ChaoShared(ChaoSharedData$Inci,"incidence_freq",se=TRUE,nboot=200,conf=0.95)

# Type (2B) incidence raw data
ChaoShared(ChaoSharedData$Inci_raw,"incidence_raw",units=c(16,17),se=TRUE,nboot=200,conf=0.95)

Part IV: SimilartyPair (estimating various similarity indices between two assemblages). Both richness and abundance-based two-community similarity indices are included.

# Data for Function SimilarityPair(data, datatype, units, nboot = 200)

data(SimilarityPairData)

# Type (1) abundance data
SimilarityPair(SimilarityPairData$Abu,"abundance",nboot=200)

# Type (

SpadeR

Install / Use

README

Overview

Data Types

Software needed

How to install

Run SpadeR by examples