---
title: "Introduction to funrar through an example"
author: "Denelle Pierre & Greni\u00e9 Matthias"
date: "`r Sys.Date()`"
header-includes:
  - \usepackage{amsmath}
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to funrar through an example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

The idea of Functional Rarity is to measure two components of the rarity of a species (or an individual). In community ecology, researchers are generally interested in the rarity of a species as measured in terms of abundances. A species with a low abundance will be considered 'rare'. From a functional ecology perspective, this omits important information about the "role" that a given species plays in the ecosystem, its particular niche. A framework is being developed of various indices to characterize how functioning can be rare locally or regionally

In their article, Violle et al. (2017) introduced 4 indices to compute functional rarity indices. `funrar` package enables ecologists to compute these on any dataset.


# Framework

The following figure show the 4 aspects covered by the metrics. Rarity is thus considered at local and regional scales and also from abundance and functional point of views. The name of the package functions are written underneath metrics' names.

![Rarity indices and corresponding functions.](figures/funrar_scheme.png)


# Dataset example

To illustrate the package, we are using a dataset describing the distribution of 82 species of Alpine plants in 75 sites. This dataset was collected by Choler et al.(2005).

Species traits and environmental variables are also available.

```{r}
data("aravo", package = "ade4")

str(aravo)
```


# Functional rarity indices 

## Inputs of functions

For each metric, the package includes two functions, depending on the input type. 
Ecologist usually have to deal with site-species matrix, containing sites names as rows, species in columns and either abundances or presence/absence data (see example datasets in `vegan` or `ade4` packages for example). Thus, this matrix is needed as an input for each simple function.


```{r}
# Site-species matrix stored in aravo$spe
mat = as.matrix(aravo$spe)

mat[1:5, 1:5]
```

`metricname()` functions can be used to compute indices directly on site-species matrices.

Users can also use a stacked (or tidy) data.frame, containing columns for sites, species and abundances. The package contains for each metric another function called `metricname_stack()` function that can take these stacked data.frames as inputs.

`funrar` package includes two functions to go from one format to the other: `matrix_to_stack()` and `stack_to_matrix()`

```{r}
library(funrar)

dat = matrix_to_stack(mat, "value", "site", "species")
head(dat)

# The reverse function is provided and gives the same original matrix
identical(
  as.numeric(stack_to_matrix(dat, "site", "species", "value")), as.numeric(mat)
)

# Removal of empty rows
dat = dat[which(dat$value > 0), ]
```

The package is also able to deal with large matrices with many zeroes by using sparse matrices. For more details, see `sparse_matrices` vignette.


## Indices

### Distance matrix computation

The indices rely on the computation of a distance matrix. It represents the functional distance between two species given a particular set of traits. From a species traits matrix you can easily compute a distance matrix using function `compute_dist_matrix()`.

In order to compute distance matrix, species names must be specified as **rownames** in the traits table. Further, species names must match between the traits table and distance matrix. Species names must be characters and not factors.

The function relies on the `daisy()` function from package `cluster` and the default metric is Gower's distance to take into account ordinal, categorical and binary traits:

```{r}
tra = aravo$traits[, c("Height", "SLA", "N_mass")]
head(tra)

dist_mat = compute_dist_matrix(tra)
dist_mat[1:5, 1:5]
```

Using the `metric` argument in `compute_dist_matrix()` you can specify other distance metrics (`euclidean` for Euclidean distances and `manhattan` for Manhattan distances) for functional distances. You can also compute your own distance matrix with other methods as long as it is a matrix and both row names and column names are species names.


### Functional Distinctiveness computation

#### Formula

The distinctiveness measures how a species is locally functionally rare, in comparison with the other species of community. Each species (or individual) in each community will get a distinctiveness value. Distinctiveness values range from 0 to 1. A value of 0 indicates that the focal species is not locally functionally distant from the other and conversely.

From given functional distances, it is computed as such:
$$
D_i = \frac{\sum_{\substack{j = 1, \\
                            i \neq j
                            }}^{N} d_{ij}}{N-1},
$$
with $D_i$ distinctiveness of species $i$, $d_{ij}$ the functional distance between species $i$ and species $j$, $N$ the number of species present in considered community.

When weighted by abundances the formula becomes:
$$
D_i = \frac{\sum_{\substack{j = 1, \\
                            i \neq j
                            }}^{N} d_{ij}\times A_j}{\sum_{\substack{j = 1, \\
                            i \neq j
                            }}^{N} A_j},
$$
with the same terms as above, and $A_j$ the relative abundance of species $j$ in percent, in the community.


#### Computation with site-species matrix

Using `funrar` functional distinctiveness can be computed using function `distinctiveness()`:

```{r distinctivenessComputation}
# Compute distinctiveness for each species on each site
di = distinctiveness(pres_matrix = mat,  # The site x species matrix
                     dist_matrix = dist_mat)  # Functional Distance matrix

di[1:5, 1:5]
```

The syntax for `distinctiveness()` is first the matrix and then the functional distance matrix. The function returns a site-species matrix filled with `Di` values, i.e. the species functional distinctiveness in each species per site combination.

#### Computation with stacked data.frame

Using `funrar` distinctiveness values can be computed using function `distinctiveness_stack()`:

```{r computeDi}
# Species should be character
dat$species = as.character(dat$species)

# Compute distinctiveness for each species on each site
di_df = distinctiveness_stack(
  com_df = dat,  # The site x species table
  sp_col = "species",  # Name of the species column
  com = "site",  # Name of the community column
  abund = NULL,  # Relative abundances column (facultative)
  dist_matrix = dist_mat  # Functional Distance matrix
)

head(di_df)
```

The syntax for `distinctiveness_stack()` is first the table, then the name of the species column, then the name of the community column, then the name of the relative abundance column and finally the distance matrix. The function add a `Di` column giving the species distinctiveness in each species per site combination.

For abundance weighted distinctiveness you need to have a column containing relative abundances in your data frame as such:

```{r abundDi}
di_df = distinctiveness_stack(dat, "species", "site", "value", dist_mat)

head(di_df)
```


### Scarcity

#### Formula

The scarcity measures how locally abundant a species is, in comparison to the other species of community. Each species (or individual) in each community will get a scarcity value. Scarcity values range from 0 to 1. A 0 value indicates that the focal species is locally abundant and conversely.

$$
S_i = \exp(-NA_i\log2),
$$

with $S_i$ the scarcity of species $i$, $N$ the number of species in the local community, $A_i$ the relative abundance (in percent) of species $i$ in the community.


#### Computation

With `funrar` it can be computed with `scarcity()` function with a syntax analog to `distinctiveness()`

```{r scarcityComp}
si = scarcity(pres_matrix = mat)

si[1:5, 1:5]
```

Scarcity requires relative abundances to be computed. We thus have to convert
our dataset with abundances to relative abundances. To do that we first compute
total abundances per site.

```{r rel-abund-comp}
# Compute relative abundances
site_abundances = aggregate(value ~ site, data = dat, FUN = sum)
colnames(site_abundances)[2] = "total_abundance"

dat = merge(dat, site_abundances, by = "site")

dat$rel_abund = dat[["value"]] / dat[["total_abundance"]]

head(dat)
```

Then we can compute Scarcity using relative abundances:

```{r siComp}
si_df = scarcity_stack(dat, "species", "site", "rel_abund")

head(si_df)
```

### Uniqueness

#### Formula

 Uniqueness measures how functionally rare a species is at the regional scale. Each species (or individual) in the species pool will get a uniqueness value. Uniqueness values range from 0 to 1. A 0 value indicates that the focal species shares the exact same traits as other species in the pool and conversely.

$$
U_i = \text{min}(d_{ij}), \forall j \in [1, N_R], j \neq i,
$$

with $U_i$ the uniqueness of species $i$, $d_{ij}$ the functional distance between species $i$ and species $j$, $N_R$ the number of species in the regional pool.

#### Computation

`uniqueness()` follows a similar syntax, but beware that given community table and distance matrix should represent the **regional species pool**. And the syntax is also similar:

```{r uniqComp}
ui = uniqueness(mat, dist_mat)

head(ui)
```

```{r uiComp}
ui_df = uniqueness_stack(dat, "species", dist_mat)

head(ui_df)
dim(ui_df[!duplicated(ui_df$species), ])
```


### Geographical Restrictedness

#### Formula

Restrictedness measures how regionally rare a species is. The following function represents a species' occupancy for a set of communities. Each species (or individual) in the species pool will have a restrictedness value. Restrictedness values range from 0 to 1. A 0 value indicates that the focal species is present in all the sites.

$$
R_i = 1 - N_i/N_tot
$$

with $R_i$ the restrictedness of species $i$, $N_i$ the number of sites where species $i$ is found and $N_tot$ the total number of sites.

#### Computation

```{r}
ri = restrictedness(mat)

head(ri)
```

```{r}
# Species should be character
dat$site = as.character(dat$site)

ri_df = restrictedness_stack(dat, "species", "site")

head(ri_df)
```

### funrar

This function computes all of the four metrics introduced above.

```{r}
all_ind = funrar(mat, dist_mat, rel_abund = TRUE)

str(all_ind)

identical(all_ind$Ui, ui)
identical(all_ind$Di, di)
identical(all_ind$Ri, ri)
identical(all_ind$Si, si)
```

# Plots

In this section, some ideas to plot your results are presented. For example the density of the various indices or a matrix visualization of the values of distinctiveness per species.

```{r}
library(ggplot2)

# Heatmap with distinctiveness values
ggplot(di_df, aes(species, site)) +
  geom_tile(aes(fill = Di), colour = "white") +
  scale_fill_gradient(low = "white", high = "steelblue")

# Density of distinctiveness values

quant = quantile(di_df$Di, probs = seq(0, 1, 0.10))
labels_quant = paste(names(quant)[-length(quant)], names(quant)[-1], sep = "-")
 
di_density = data.frame(density(di_df$Di)[c("x", "y")])
di_density = subset(di_density, x >= quant[[1]] & x <= quant[[length(quant)]])
di_density$quant = cut(di_density$x, breaks = quant)   
quant = quantile(di_df$Di, probs = seq(0, 1, 0.10))
labels_quant = paste(names(quant)[-length(quant)], names(quant)[-1], sep = "-")

di_dens = ggplot(data = di_density, aes(x = x, y = y)) +
  geom_area(aes(fill = quant)) +
  scale_fill_brewer(palette = "RdYlBu", labels = labels_quant,
                    name = "Quantile") +
  geom_line(size = 1) +
  xlab("Distinctiveness values") +
  ylab("Frequency") +
  ggtitle("Density of distinctiveness values") +
  theme_classic()

di_dens

# Density of scarcity values

quant = quantile(si_df$Si, probs = seq(0, 1, 0.10))
labels_quant = paste(names(quant)[-length(quant)], names(quant)[-1], sep = "-")
 
si_density = data.frame(density(si_df$Si)[c("x", "y")])
si_density = subset(si_density, x >= quant[[1]] & x <= quant[[length(quant)]])
si_density$quant = cut(si_density$x, breaks = quant)   
quant = quantile(si_df$Si, probs = seq(0, 1, 0.10))
labels_quant = paste(names(quant)[-length(quant)], names(quant)[-1], sep = "-")

si_dens = ggplot(data = si_density, aes(x = x, y = y)) +
  geom_area(aes(fill = quant)) +
  scale_fill_brewer(palette = "RdYlBu", labels = labels_quant,
                    name = "Quantile") +
  geom_line(size = 1) +
  xlab("Scarcity values") +
  ylab("Frequency") +
  ggtitle("Density of scarcity values") +
  theme_classic()

si_dens

# Regional rarity: restrictedness versus uniqueness
colnames(ri_df)[colnames(ri_df) == "sp"] = "species"
ri_ui <- merge(ri_df, ui_df, by = "species")

ggplot(ri_ui, aes(Ri, Ui)) +
  geom_point() + 
  theme_classic() 
```


# References

Choler, P. (2005) Consistent shifts in Alpine plant traits along a mesotopographical gradient. Arctic, Antarctic, and Alpine Research, 37,444-453.

Violle C., Thuiller W., Mouquet N., Munoz F., Kraft NJB, Cadotte MW, Livingstone SW, Mouillot D., Functional Rarity: The Ecology of Outliers, *Trends in Ecology & Evolution*, Volume 32, Issue 5, May 2017, Pages 356-367, ISSN 0169-5347, https://doi.org/10.1016/j.tree.2017.02.002.