The funbiogeo
package
requires that information is structured in three different datasets:
data.frame
(species_traits
in funbiogeo
), which contains
trait values for several traits (in columns) for several species (in
rows).data.frame
(site_species
in funbiogeo
), which contains
the presence/absence, abundance, or cover information for species (in
columns) by sites (in rows).site_locations
in funbiogeo
), which contains
the physical locations of the sites of interestOptionally, an additional dataset can be provided:
data.frame
(species_categories
in funbiogeo
), which
contains two-columns: one for species, one for potential categorization
of species (whether it’s taxonomic classes, specific diets, or any
arbitrary classification)In funbiogeo
these datasets must be in
a wide format (where one row hosts several variables across columns),
but sometimes information is structured in a long format (one
observation per row, also called tidy
format).
For instance, the following dataset illustrates the wider format (the presence/absence of all species is spread across columns).
site | species_1 | species_2 | species_3 | species_4 |
---|---|---|---|---|
A | 1 | 0 | 1 | 1 |
B | 0 | 0 | 1 | 1 |
C | 1 | 1 | 1 | 0 |
The following dataset illustrates the long format (the column
species
contains the name of the species and the column
occurrence
contains the presence/absence of species).
site | species | occurrence |
---|---|---|
A | species_1 | 1 |
B | species_1 | 0 |
C | species_1 | 1 |
A | species_2 | 0 |
B | species_2 | 0 |
C | species_2 | 1 |
A | species_3 | 1 |
B | species_3 | 1 |
C | species_3 | 1 |
A | species_4 | 1 |
B | species_4 | 1 |
C | species_4 | 0 |
fb_format_*()
functionsIf your data are not split into these wider datasets, you can use the
functions fb_format_*()
to create these specific objects
from a long format dataset.
fb_format_site_locations()
allows to extract the
site x locations information from the long format
datafb_format_site_species()
allows to extract the
site x species information from the long format
datafb_format_species_traits()
allows to extract the
species x traits information from the long format
datafb_format_species_categories()
allows to extract the
species x categories information from the long format
dataAll these functions take a long dataset as input (argument
data
), where one row corresponds to the
occurrence/abundance/coverage of one species at one site and output a
wider object.
funbiogeo
provides a small excerpt of long format data
to show how to use the functions. This data sits at
system.file("extdata", "woodiv_raw_data.csv", package = "funbiogeo")
.
Let’s import the long format dataset provided by
funbiogeo
:
# Define the path to long format dataset ----
file_name <- system.file("extdata", "woodiv_raw_data.csv", package = "funbiogeo")
# Read the file ----
all_data <- read.csv(file_name)
site | country | longitude | latitude | species | count | family | genus | binomial | endemism | cultivated | plant_height | seed_mass | sla | wood_density |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
26351755 | Portugal | 2635000 | 1755000 | JPHO | 1 | Cupressaceae | Juniperus | Juniperus phoenicea | 0 | 0 | 4.88150 | 79.86000 | 4.365246 | 0.6487500 |
26351755 | Portugal | 2635000 | 1755000 | PPIR | 1 | Pinaceae | Pinus | Pinus pinaster | 0 | 0 | 19.75384 | 55.83434 | 3.357539 | 0.4430277 |
26351765 | Portugal | 2635000 | 1765000 | JPHO | 1 | Cupressaceae | Juniperus | Juniperus phoenicea | 0 | 0 | 4.88150 | 79.86000 | 4.365246 | 0.6487500 |
26351955 | Portugal | 2635000 | 1955000 | JPHO | 1 | Cupressaceae | Juniperus | Juniperus phoenicea | 0 | 0 | 4.88150 | 79.86000 | 4.365246 | 0.6487500 |
26351955 | Portugal | 2635000 | 1955000 | PPIR | 1 | Pinaceae | Pinus | Pinus pinaster | 0 | 0 | 19.75384 | 55.83434 | 3.357539 | 0.4430277 |
26351965 | Portugal | 2635000 | 1965000 | JPHO | 1 | Cupressaceae | Juniperus | Juniperus phoenicea | 0 | 0 | 4.88150 | 79.86000 | 4.365246 | 0.6487500 |
26351965 | Portugal | 2635000 | 1965000 | PPIA | 1 | Pinaceae | Pinus | Pinus pinea | 0 | 1 | 22.67000 | 626.18882 | 4.216176 | 0.5178617 |
26451755 | Portugal | 2645000 | 1755000 | JPHO | 1 | Cupressaceae | Juniperus | Juniperus phoenicea | 0 | 0 | 4.88150 | 79.86000 | 4.365246 | 0.6487500 |
26451765 | Portugal | 2645000 | 1765000 | JPHO | 1 | Cupressaceae | Juniperus | Juniperus phoenicea | 0 | 0 | 4.88150 | 79.86000 | 4.365246 | 0.6487500 |
26451765 | Portugal | 2645000 | 1765000 | PPIA | 1 | Pinaceae | Pinus | Pinus pinea | 0 | 1 | 22.67000 | 626.18882 | 4.216176 | 0.5178617 |
The function fb_format_species_traits()
extracts species
traits values from this long table to create the species x traits
dataset. Note that one species must have one unique trait value (no
trait variation across sites is allowed).
# Extract species x traits data ----
species_traits <- fb_format_species_traits(
data = all_data,
species = "species",
traits = c("plant_height", "seed_mass", "sla", "wood_density")
)
# Preview ----
head(species_traits, 10)
#> species plant_height seed_mass sla wood_density
#> 1 AALB 49.641622 67.866923 7.483978 0.4490821
#> 2 ACEP 25.875000 64.703750 NA NA
#> 3 ANEB 15.000000 NA 3.420603 NA
#> 4 APIN 27.333333 55.520000 3.420603 0.4586508
#> 5 CLIB 35.636364 86.872600 NA 0.4500000
#> 6 CSEM 24.692308 7.608125 5.824112 0.5184729
#> 7 JCOM 6.894711 14.556875 6.877889 0.5805503
#> 8 JDEL 12.000000 22.000000 NA NA
#> 9 JMAC 5.000000 8.550000 NA NA
#> 10 JNAV 1.367750 45.630000 3.890000 NA
The function fb_format_site_species()
extracts species
occurrence/abundance/coverage from this long table to create the site x
species dataset. Note that one species must have been observed one time
at one site (the package funbiogeo
does not yet consider
temporal survey).
# Format site x species data ----
site_species <- fb_format_site_species(data = all_data,
site = "site",
species = "species",
value = "count",
na_to_zero = TRUE
)
# Preview ----
head(site_species[ , 1:8], 10)
#> site JPHO PPIR PPIA JNAV JMAC JOXY JCOM
#> 1 26351755 1 1 0 0 0 0 0
#> 2 26351765 1 0 0 0 0 0 0
#> 3 26351955 1 1 0 0 0 0 0
#> 4 26351965 1 0 1 0 0 0 0
#> 5 26451755 1 0 0 0 0 0 0
#> 6 26451765 1 1 1 0 0 0 0
#> 7 26451775 1 0 1 0 0 0 0
#> 8 26451955 1 1 0 0 0 0 0
#> 9 26451965 1 1 1 0 0 0 0
#> 10 26451975 1 1 1 0 0 0 0
The function fb_format_site_locations()
extracts sites
coordinates from this long table to create the site x locations dataset.
Note that one site must have one unique longitude x latitude value.
# Format site x locations data ----
site_locations <- fb_format_site_locations(data = all_data,
site = "site",
longitude = "longitude",
latitude = "latitude",
na_rm = FALSE)
# Preview ----
head(site_locations)
#> Simple feature collection with 6 features and 1 field
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1755000 ymin: 2635000 xmax: 1965000 ymax: 2645000
#> Geodetic CRS: WGS 84
#> site geometry
#> 1 26351755 POINT (1755000 2635000)
#> 3 26351765 POINT (1765000 2635000)
#> 4 26351955 POINT (1955000 2635000)
#> 6 26351965 POINT (1965000 2635000)
#> 8 26451755 POINT (1755000 2645000)
#> 9 26451765 POINT (1765000 2645000)
The function fb_format_species_categories()
extracts
species values for one supra-category (optional) from this long table to
create the species x categories dataset. This category (e.g. order,
family, endemism status, conservation status, etc.) can be later by
several functions in funbiogeo
to aggregate metrics at this
level.
# Extract species x categories data ----
species_categories <- fb_format_species_categories(data = all_data,
species = "species",
category = "genus"
)
# Preview ----
head(species_categories, 10)
#> species genus
#> 1 JPHO Juniperus
#> 2 PPIR Pinus
#> 7 PPIA Pinus
#> 58 JNAV Juniperus
#> 372 JMAC Juniperus
#> 382 JOXY Juniperus
#> 486 JCOM Juniperus
#> 488 TBAC Taxus
#> 573 PSYL Pinus
#> 916 PHAL Pinus
Once your data are in the good format, you can get
started with funbiogeo
.