The aim of the
funbiogeo
package is to help users streamline the workflows
in functional biogeography (Violle et al. 2014). It helps filter sites,
species, and traits based on their trait coverages. It also provide
default diagnostic plots and standard tables summarizing input data.
This vignette aims to be an introduction to the most commonly used
functions.
The vignette is a worked through real world example of a functional
biogeography workflow using the internal dataset provided in
funbiogeo
and derived from the WOODIV database (Monnet et al. 2021).
We are interested in mapping the functional traits of 24 Conifer tree species of Western European Mediterranean regions.
The functions in funbiogeo
mostly leverage three
different objects:
data.frame
, which
describes the traits (in columns) of the species (in rows)
(species_traits
dataset in funbiogeo
);data.frame
, which
describes the presence/absence, the abundance, or the cover of species
(in columns) across sites (in rows) (site_species
dataset
in funbiogeo
);sf
object
(site_locations
dataset in funbiogeo
).You can load the example data using the
data(..., package = "funbiogeo")
call:
data("woodiv_site_species", package = "funbiogeo")
data("woodiv_locations", package = "funbiogeo")
data("woodiv_traits", package = "funbiogeo")
In the following sections, we’ll describe in detail these three provided objects in the package.
This object contains traits values for multiple traits (as columns)
for studied species (as rows). It should be a data.frame
.
The first column should contain species names and the other columns
contain trait values.
Note that we’ll be talking about species throughout
this vignette and in the arguments of funbiogeo
, but the
package doesn’t make any assumption on the biological level. It can be
individuals, populations, strains, species, genera, families, etc. The
important fact is that you should have trait data for the level at which
you want to work.
Let’s examine the species_traits
data included in the
package:
species | plant_height | seed_mass | sla | wood_density | |
---|---|---|---|---|---|
1 | AALB | 49.64162 | 67.86692 | 7.483978 | 0.4490821 |
3 | ACEP | 25.87500 | 64.70375 | NA | NA |
4 | ANEB | 15.00000 | NA | 3.420603 | NA |
5 | APIN | 27.33333 | 55.52000 | 3.420603 | 0.4586508 |
The first column "species"
contains
species names, while the next four columns contain different traits for
all species.
Let’s look at a summary of the trait dataset:
summary(woodiv_traits)
#> species plant_height seed_mass sla
#> Length:24 Min. : 1.368 Min. : 7.608 Min. : 0.508
#> Class :character 1st Qu.: 8.776 1st Qu.: 14.926 1st Qu.: 3.421
#> Mode :character Median :16.908 Median : 30.188 Median : 4.365
#> Mean :18.779 Mean : 62.245 Mean : 4.773
#> 3rd Qu.:25.531 3rd Qu.: 57.558 3rd Qu.: 5.824
#> Max. :49.642 Max. :626.189 Max. :10.600
#> NA's :1 NA's :7
#> wood_density
#> Min. :0.2796
#> 1st Qu.:0.4500
#> Median :0.4642
#> Mean :0.4948
#> 3rd Qu.:0.5300
#> Max. :0.6488
#> NA's :7
Note that to use your own species by traits data.frame
,
it should follow a similar structure with the first column being named
"species"
and the other ones containing
traits.
This object contains species occurrences/abundance/coverage at
sites
of the study area. It is a data.frame
. The first column,
"site"
, contains site names, while the
other columns contain the abundance of each species across sites.
Note that here we are talking about sites in an abstract way. These can be plots, assemblages, of whatever collections of species you’re interested in.
The package funbiogeo
comes with the example dataset on
Conifer tree species of Western European Mediterranean regions
site_species
. Let’s look at it:
site | AALB | ACEP | APIN | CLIB | CSEM |
---|---|---|---|---|---|
41152325 | 1 | 0 | 0 | 0 | 0 |
40852325 | 1 | 0 | 0 | 0 | 0 |
40852345 | 1 | 0 | 0 | 0 | 0 |
42552105 | 1 | 0 | 0 | 0 | 0 |
41152315 | 1 | 0 | 0 | 0 | 0 |
37452365 | 1 | 0 | 0 | 0 | 0 |
36952225 | 1 | 0 | 0 | 0 | 0 |
37152185 | 1 | 0 | 0 | 0 | 0 |
39852335 | 1 | 0 | 0 | 0 | 0 |
39652375 | 1 | 0 | 0 | 0 | 0 |
The example dataset contains the occurrence of the 24 Conifer tree species across 5,366 sites (grid cells of 10 km x 10 km resolution).
Note that to use your own site by species data.frame
, it
should follow a similar structure with the first column being named
"sites"
and the other ones containing
presence information of species across sites.
This object contains the geographical location of the sites. It
should be an sf
object from the sf
package. These are spatial R objects that describe geographical
locations. The sites can have arbitrary shapes: points, regular
polygons, irregular polygons, or even line transects! To make sure that
your data is well plotted you should specify the Coordinate Reference
System (CRS) of this object.
The package funbiogeo
comes with the example dataset
site_locations
defining the location of the 5,366 sites
(grid cells of 10 km x 10 km resolution) as polygons. It contains the
names of the site in its first column
"site"
:
head(woodiv_locations)
#> Simple feature collection with 6 features and 2 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 2630000 ymin: 1750000 xmax: 2650000 ymax: 1970000
#> Projected CRS: ETRS89-extended / LAEA Europe
#> site country geometry
#> 1 26351755 Portugal POLYGON ((2630000 1750000, ...
#> 2 26351765 Portugal POLYGON ((2630000 1760000, ...
#> 4 26351955 Portugal POLYGON ((2630000 1950000, ...
#> 5 26351965 Portugal POLYGON ((2630000 1960000, ...
#> 6 26451755 Portugal POLYGON ((2640000 1750000, ...
#> 7 26451765 Portugal POLYGON ((2640000 1760000, ...
Note that to use your own site locations object, it should follow a
similar structure, being an sf
object with the first column
being named "sites"
.
funbiogeo
provides many functions to display the data to
help the user select specific traits, species, and/or sites. We are
going to detail some of them in this section (see the full list and how
to interpret them in the diagnostic plot
vignette. We call them diagnostic plots because they help
us to have an overview of our dataset prior to the
analyses.
A first way to visualize our data.frame
is to look at
the proportion of species with non-missing traits using the
fb_plot_number_species_by_trait()
function. It takes the
species by trait data.frame
as input.
This plot shows us the number of species (along the x-axis) in function of the trait name (along the y-axis). The number of concerned species is shown at the bottom of the plot while the corresponding proportion of species (compared to all the species included in the trait dataset) is indicated as a secondary x-axis at the top. The proportion of species concerned is shown at the right of each point. For example, in our example dataset, 70.8% species have a non-NA SLA.
The function also includes a way to provide a target proportion of species as the second argument. It will display the proportion as a dark red dashed line.
For example, if we want to visualize which traits cover more than 75% of the species:
The top number shows the corresponding number of species.
Another way to filter the data would be to select certain species
that have at least a certain number of traits. This can be visualized
using the fb_plot_number_traits_by_species()
function.
Similarly to the above-mentioned function, it takes the species x traits
data.frame
as the first argument:
The plot shows the number (bottom x-axis) and the proportion (top x-axis) of species covered by a specific number of traits (0 to 4 in our example).
Now that we displayed the diagnostic plots, we can decide thresholds and filter our data for our following analyses.
We want to select the traits that are available for at least 75% of
the species. To do so we can use the
fb_filter_traits_by_species_coverage()
function. The
function takes the species by traits data.frame
and outputs
the same dataset but with the traits filtered (so with fewer columns).
The second argument threshold_species_proportion
is the
threshold proportion of species covered:
# Initial dimension of the input data
dim(woodiv_traits)
#> [1] 24 5
# Filter traits
red_sp_traits <- fb_filter_traits_by_species_coverage(
woodiv_traits, threshold_species_proportion = 0.75
)
dim(red_sp_traits)
#> [1] 24 3
# The reduced data set now has fewer trait columns
head(red_sp_traits)
#> species plant_height seed_mass
#> 1 AALB 49.64162 67.866923
#> 3 ACEP 25.87500 64.703750
#> 4 ANEB 15.00000 NA
#> 5 APIN 27.33333 55.520000
#> 6 CLIB 35.63636 86.872600
#> 7 CSEM 24.69231 7.608125
The function outputs a filtered species-traits dataset retaining only traits covering at least 75% of the species. In the end this keep two traits: plant height and seed mass.
Similarly you could filter species by their trait coverage. For
example, we would like to make sure that the species we filtered
previously so that they show at least one of the two traits selected
above and thus exclude species for which neither of the traits are
available. We can use the function
fb_filter_species_by_trait_coverage()
with the species x
traits data.frame
as the first argument and the second
argument the proportion of traits covered by species.
# Filter species which have at least 80% of known traits
red_sp_traits_2 <- fb_filter_species_by_trait_coverage(
red_sp_traits, threshold_traits_proportion = 0.8
)
head(red_sp_traits_2)
#> species plant_height seed_mass
#> 1 AALB 49.641622 67.866923
#> 3 ACEP 25.875000 64.703750
#> 5 APIN 27.333333 55.520000
#> 6 CLIB 35.636364 86.872600
#> 7 CSEM 24.692308 7.608125
#> 8 JCOM 6.894711 14.556875
dim(red_sp_traits_2)
#> [1] 23 3
We thus have selected 2 traits that cover at least 80% of the initial species list and 23 species which have known values for these two traits.
Now that we have filtered our traits and species of interest we need
to filter the sites that contain enough species for which the traits are
available. Similarly to above the function is
fb_filter_sites_by_trait_coverage()
it takes as first two
arguments the site x species data.frame
and the species x
traits data.frame
. The third argument is
threshold_traits_proportion
that indicates the percent
coverage of traits to filter each site. Note that this coverage is
weighted by the occurrence, abundance, or cover depending on the content
of the site x species data.frame
.
Let’s say here we’re interested in sites for which our species with available traits represent at least 90% of the species present:
# Initial site x species data
dim(woodiv_site_species)
#> [1] 5366 25
# Filter sites with at least 90% species covered
filt_sites <- fb_filter_sites_by_trait_coverage(
woodiv_site_species, red_sp_traits_2, threshold_traits_proportion = 0.9
)
# Filtered sites
dim(filt_sites)
#> [1] 5364 25
filt_sites[1:4, 1:4]
#> site AALB ACEP APIN
#> 1 41152325 1 0 0
#> 2 40852325 1 0 0
#> 3 40852345 1 0 0
#> 4 42552105 1 0 0
The output of the function is a site x species
data.frame
with selected sites and species. Now we selected
5,364 sites out of 5,366, for our 2 traits and 24 species.
The funbiogeo
functions helped us filter our data
appropriately with enough available trait information for species and
sites. The goal of funbiogeo
is to help you analyzing
functional trait data and computing functional diversity indices. These
indices capture the diversity of trait values in a set of species, if
you’re interested in an introduction to functional diversity indices and
how to analyze them, it’s out of scope of this vignette, but you can
refer to Mammola et al. (2021). The paper
provides a general workflow to work with trait data.
funbiogeo
doesn’t aim to substitute to all these amazing
tools that compute a diversity of indices with different properties and
formulas., however, we can use the filtered datasets to proceed with our
analyses This is where you should use your preferred packages to compute
functional diversity indices like mFD
or funrar
.
If you’re new to the world of functional diversity analyses, we suggest
reading mFD
introductory vignette. It underlines the different steps to compute
functional diversity metrics. If you’re interest in computing functional
rarity indices with funrar
you can also refer to its
tutorial.
For the sake of the example, we included a function in
funbiogeo
to compute Community-Weighted Mean (CWM, Garnier et al. 2004) named
fb_cwm()
. The CWM is the abundance-weighted average trait
per site. We’ll be using it in the following section we’ll then show
another example computing functional diversity indices using the
mFD
package.
We’re interested to look at the spatial distribution of the average
plant height and seed mass of Conifer tree species. To do so, we can
compute the community-weighted mean of both traits. We’ll use the
fb_cwm()
function to do so, it takes the site x species
data.frame
and species x traits data.frame
as
arguments.
# Note that we're reusing our filtered data to compute CWM
cwm <- fb_cwm(filt_sites, red_sp_traits_2)
head(cwm)
#> site trait cwm
#> 1 26351755 plant_height 12.31767
#> 2 26351765 plant_height 4.88150
#> 3 26351955 plant_height 12.31767
#> 4 26351965 plant_height 13.77575
#> 5 26451755 plant_height 4.88150
#> 6 26451765 plant_height 15.76845
It outputs a data.frame
with 3 columns: the first one,
site
, shows the site name as provided in the input site x
species data.frame
, trait
which indicates the
trait name on which the CWM is computed, and cwm
which
shows the value of the CWM.
We can also integrate our filtered datasets in other functional
diversity computation pipeline. We’ll show an example by computing
functional richness with fundiversity
.
## To install 'fundiversity' uncomment the following line
# install.packages("fundiversity")
# Functional richness in 'fundiversity' requires all the traits to be known
# so we need to filter the traits
filt_traits <- subset(
red_sp_traits, !is.na(plant_height) & !is.na(seed_mass)
)
# We need to transform species and site names as row names for species-traits
# and site-species data.frames, as required by 'fundiversity'
rownames(filt_traits) <- filt_traits$species
filt_traits <- filt_traits[, -1]
rownames(filt_sites) <- filt_sites$site
filt_sites <- filt_sites[, -1]
# Scale traits
filt_traits <- scale(filt_traits)
# Compute Functional Richness
fric <- fundiversity::fd_fric(filt_traits, filt_sites)
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#> Warning: Some sites had less species than traits so returned FRic is 'NA'
head(fric)
#> site FRic
#> 1 41152325 1.387675
#> 2 40852325 1.465982
#> 3 40852345 1.485531
#> 4 42552105 1.067421
#> 5 41152315 1.465982
#> 6 37452365 NA
We now have a table with Functional Richness computed for all of our sites.
If we want to display the environment associated with our sites of
interest, we can leverage environmental raster layers, like the mean
annual temperature. Fortunately, we have access to an example raster of
mean annual temperature through funbiogeo
. The package
provides a helper function display a raster layer easily (without any
assumption about the projection) named fb_map_raster()
:
# Read raster
tavg <- system.file(
"extdata", "annual_mean_temp.tif", package = "funbiogeo"
)
tavg <- terra::rast(tavg)
# Map raster
fb_map_raster(tavg) +
scale_fill_distiller("Temperature", palette = "Spectral") +
labs(title = "Mean annual temperature in Europe") +
theme(legend.position = "bottom")
All of the plots produced by funbiogeo
are
ggplot2
objects and can thus be customized using the
ggplot2
syntax, to learn about it check the ggplot2
introductory vignette.
Leveraging the patchwork
package, that allows to combine
different ggplot2
objects, we can have side-by-side, the
map of mean annual temperature and the one on annual total
precipitation:
library("patchwork")
# Read raster ------------------------------------------------------------------
tavg <- system.file("extdata", "annual_mean_temp.tif", package = "funbiogeo")
tavg <- terra::rast(tavg)
prec <- system.file("extdata", "annual_tot_prec.tif", package = "funbiogeo")
prec <- terra::rast(prec)
# Individual Maps --------------------------------------------------------------
map_temperature <- fb_map_raster(tavg, legend.position = "none") +
scale_fill_distiller("Temperature", palette = "Spectral")
map_precipitation <- fb_map_raster(prec) +
scale_fill_distiller("Precipitation", direction = 1)
# Plot composition -------------------------------------------------------------
(map_temperature / map_precipitation) +
plot_annotation(title = "Europe",
theme = theme(plot.title = element_text(face = "bold"))) &
theme_classic() &
theme(text = element_text(family = "mono"))
This function allows the visualization of a raster in a simple fashion, but it doesn’t tell us anything about the environmental variable at the sites. In the next section we will follow the example of mapping an environmental variable.
To get the average environmental variable in the site we can use the
fb_get_environment()
function, it takes as arguments the
site-locations object and an environmental raster from the
terra
package. By default, it takes the average of the
raster values per site.
site_mat <- fb_get_environment(woodiv_locations, tavg)
head(site_mat)
#> site annual_mean_temp
#> 1 26351755 NaN
#> 2 26351765 16.44746
#> 3 26351955 16.08605
#> 4 26351965 NaN
#> 5 26451755 16.66498
#> 6 26451765 16.39656
The variable names in the columns are based on the names of the
provided raster. To put these values on the map we can use the
fb_map_site_data()
function, which allows mapping arbitrary
site-level variables. It takes three needed arguments: the first of
which, site_locations
, which is the sf
object
describing sites’ geographic locations; the second argument is
site_data
, which is a data.frame
giving
additional data indexed by site; and selected_col
, which is
a character giving the name of the column to plot from the
site_data
argument.
We can thus plot the mean annual temperature of sites through the following commands:
fb_map_site_data(woodiv_locations, site_mat, "annual_mean_temp") +
labs(title = "Mean Annual Temperature per Site")
We can leverage the same functions to map our functional diversity indices. For example, with the body mass CWM:
body_mass_cwm <- subset(cwm, trait == "plant_height")
fb_map_site_data(woodiv_locations, body_mass_cwm, "cwm") +
scale_fill_viridis_c(trans = "log10") +
labs(title = "Tree height CWM")
We can do similarly for functional richness:
fb_map_site_data(woodiv_locations, fric, "FRic") +
scale_fill_viridis_c(option = "magma") +
labs(title = "Tree Functional Richness")
This concludes our tutorial to introduce funbiogeo
. The
package contains many more features, especially diagnostic plots, which
are explained in detail in a dedicated
vignette. There is also a specific vignette about transforming raw data from long to wide
format. Finally, if you’re interested in learning about up-scaling
your sites, which means aggregating your sites at coarser scales,you can
refer to the specific vignette