Formatting your data

The funbiogeopackage requires that information is structured in three different datasets:

  • the species x traits data.frame (species_traits in funbiogeo), which contains trait values for several traits (in columns) for several species (in rows).
  • the site x species data.frame (site_species in funbiogeo), which contains the presence/absence, abundance, or cover information for species (in columns) by sites (in rows).
  • the site x locations object (site_locations in funbiogeo), which contains the physical locations of the sites of interest

Optionally, an additional dataset can be provided:

  • a species x categories data.frame (species_categories in funbiogeo), which contains two-columns: one for species, one for potential categorization of species (whether it’s taxonomic classes, specific diets, or any arbitrary classification)
library(funbiogeo)

Wide vs long format

In funbiogeo these datasets must be in a wide format (where one row hosts several variables across columns), but sometimes information is structured in a long format (one observation per row, also called tidy format).

For instance, the following dataset illustrates the wider format (the presence/absence of all species is spread across columns).

Wide format dataset (used in funbiogeo)
site species_1 species_2 species_3 species_4
A 1 0 1 1
B 0 0 1 1
C 1 1 1 0

The following dataset illustrates the long format (the column species contains the name of the species and the column occurrence contains the presence/absence of species).

Long format dataset
site species occurrence
A species_1 1
B species_1 0
C species_1 1
A species_2 0
B species_2 0
C species_2 1
A species_3 1
B species_3 1
C species_3 1
A species_4 1
B species_4 1
C species_4 0

The fb_format_*() functions

If your data are not split into these wider datasets, you can use the functions fb_format_*() to create these specific objects from a long format dataset.

  • fb_format_site_locations() allows to extract the site x locations information from the long format data
  • fb_format_site_species() allows to extract the site x species information from the long format data
  • fb_format_species_traits() allows to extract the species x traits information from the long format data
  • fb_format_species_categories() allows to extract the species x categories information from the long format data

All these functions take a long dataset as input (argument data), where one row corresponds to the occurrence/abundance/coverage of one species at one site and output a wider object.

Usage

funbiogeo provides a small excerpt of long format data to show how to use the functions. This data sits at system.file("extdata", "woodiv_raw_data.csv", package = "funbiogeo").

Let’s import the long format dataset provided by funbiogeo:

# Define the path to long format dataset ----
file_name <- system.file("extdata", "woodiv_raw_data.csv", package = "funbiogeo")


# Read the file ----
all_data <- read.csv(file_name)
Long table example
site country longitude latitude species count family genus binomial endemism cultivated plant_height seed_mass sla wood_density
26351755 Portugal 2635000 1755000 JPHO 1 Cupressaceae Juniperus Juniperus phoenicea 0 0 4.88150 79.86000 4.365246 0.6487500
26351755 Portugal 2635000 1755000 PPIR 1 Pinaceae Pinus Pinus pinaster 0 0 19.75384 55.83434 3.357539 0.4430277
26351765 Portugal 2635000 1765000 JPHO 1 Cupressaceae Juniperus Juniperus phoenicea 0 0 4.88150 79.86000 4.365246 0.6487500
26351955 Portugal 2635000 1955000 JPHO 1 Cupressaceae Juniperus Juniperus phoenicea 0 0 4.88150 79.86000 4.365246 0.6487500
26351955 Portugal 2635000 1955000 PPIR 1 Pinaceae Pinus Pinus pinaster 0 0 19.75384 55.83434 3.357539 0.4430277
26351965 Portugal 2635000 1965000 JPHO 1 Cupressaceae Juniperus Juniperus phoenicea 0 0 4.88150 79.86000 4.365246 0.6487500
26351965 Portugal 2635000 1965000 PPIA 1 Pinaceae Pinus Pinus pinea 0 1 22.67000 626.18882 4.216176 0.5178617
26451755 Portugal 2645000 1755000 JPHO 1 Cupressaceae Juniperus Juniperus phoenicea 0 0 4.88150 79.86000 4.365246 0.6487500
26451765 Portugal 2645000 1765000 JPHO 1 Cupressaceae Juniperus Juniperus phoenicea 0 0 4.88150 79.86000 4.365246 0.6487500
26451765 Portugal 2645000 1765000 PPIA 1 Pinaceae Pinus Pinus pinea 0 1 22.67000 626.18882 4.216176 0.5178617

Extracting species x traits data

The function fb_format_species_traits() extracts species traits values from this long table to create the species x traits dataset. Note that one species must have one unique trait value (no trait variation across sites is allowed).

# Extract species x traits data ----
species_traits <- fb_format_species_traits(
  data    = all_data, 
  species = "species", 
  traits  = c("plant_height", "seed_mass", "sla", "wood_density")
)

# Preview ----
head(species_traits, 10)
#>    species plant_height seed_mass      sla wood_density
#> 1     AALB    49.641622 67.866923 7.483978    0.4490821
#> 2     ACEP    25.875000 64.703750       NA           NA
#> 3     ANEB    15.000000        NA 3.420603           NA
#> 4     APIN    27.333333 55.520000 3.420603    0.4586508
#> 5     CLIB    35.636364 86.872600       NA    0.4500000
#> 6     CSEM    24.692308  7.608125 5.824112    0.5184729
#> 7     JCOM     6.894711 14.556875 6.877889    0.5805503
#> 8     JDEL    12.000000 22.000000       NA           NA
#> 9     JMAC     5.000000  8.550000       NA           NA
#> 10    JNAV     1.367750 45.630000 3.890000           NA

Extracting site x species data

The function fb_format_site_species() extracts species occurrence/abundance/coverage from this long table to create the site x species dataset. Note that one species must have been observed one time at one site (the package funbiogeo does not yet consider temporal survey).

# Format site x species data ----
site_species <- fb_format_site_species(data       = all_data, 
                                       site       = "site", 
                                       species    = "species", 
                                       value      = "count",
                                       na_to_zero = TRUE
)

# Preview ----
head(site_species[ , 1:8], 10)
#>        site JPHO PPIR PPIA JNAV JMAC JOXY JCOM
#> 1  26351755    1    1    0    0    0    0    0
#> 2  26351765    1    0    0    0    0    0    0
#> 3  26351955    1    1    0    0    0    0    0
#> 4  26351965    1    0    1    0    0    0    0
#> 5  26451755    1    0    0    0    0    0    0
#> 6  26451765    1    1    1    0    0    0    0
#> 7  26451775    1    0    1    0    0    0    0
#> 8  26451955    1    1    0    0    0    0    0
#> 9  26451965    1    1    1    0    0    0    0
#> 10 26451975    1    1    1    0    0    0    0

Extracting site x locations data

The function fb_format_site_locations() extracts sites coordinates from this long table to create the site x locations dataset. Note that one site must have one unique longitude x latitude value.

# Format site x locations data ----
site_locations <- fb_format_site_locations(data       =  all_data, 
                                           site       = "site", 
                                           longitude  = "longitude", 
                                           latitude   = "latitude",
                                           na_rm      = FALSE)

# Preview ----
head(site_locations)
#> Simple feature collection with 6 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 1755000 ymin: 2635000 xmax: 1965000 ymax: 2645000
#> Geodetic CRS:  WGS 84
#>       site                geometry
#> 1 26351755 POINT (1755000 2635000)
#> 3 26351765 POINT (1765000 2635000)
#> 4 26351955 POINT (1955000 2635000)
#> 6 26351965 POINT (1965000 2635000)
#> 8 26451755 POINT (1755000 2645000)
#> 9 26451765 POINT (1765000 2645000)

Extracting species x categories data

The function fb_format_species_categories() extracts species values for one supra-category (optional) from this long table to create the species x categories dataset. This category (e.g. order, family, endemism status, conservation status, etc.) can be later by several functions in funbiogeo to aggregate metrics at this level.

# Extract species x categories data ----
species_categories <- fb_format_species_categories(data     = all_data, 
                                                   species  = "species",
                                                   category = "genus"
)

# Preview ----
head(species_categories, 10)
#>     species     genus
#> 1      JPHO Juniperus
#> 2      PPIR     Pinus
#> 7      PPIA     Pinus
#> 58     JNAV Juniperus
#> 372    JMAC Juniperus
#> 382    JOXY Juniperus
#> 486    JCOM Juniperus
#> 488    TBAC     Taxus
#> 573    PSYL     Pinus
#> 916    PHAL     Pinus

Once your data are in the good format, you can get started with funbiogeo.