Formatting your data

The funbiogeopackage requires that information is structured in three different datasets:

  • the species x traits data.frame (species_traits in funbiogeo), which contains trait values for several traits (in columns) for several species (in rows).
  • the site x species data.frame (site_species in funbiogeo), which contains the presence/absence, abundance, or cover information for species (in columns) by sites (in rows).
  • the site x locations object (site_locations in funbiogeo), which contains the physical locations of the sites of interest

Optionally, an additional dataset can be provided:

  • a species x categories data.frame (species_categories in funbiogeo), which contains two-columns: one for species, one for potential categorization of species (whether it’s taxonomic classes, specific diets, or any arbitrary classification)
library(funbiogeo)

Wide vs long format

In funbiogeo these datasets must be in a wide format (where one row hosts several variables across columns), but sometimes information is structured in a long format (one observation per row, also called tidy format).

For instance, the following dataset illustrates the wider format (the presence/absence of all species is spread across columns).

Wide format dataset (used in funbiogeo)
site species_1 species_2 species_3 species_4
A 1 0 1 1
B 0 0 1 1
C 1 1 1 0

The following dataset illustrates the long format (the column species contains the name of the species and the column occurrence contains the presence/absence of species).

Long format dataset
site species occurrence
A species_1 1
B species_1 0
C species_1 1
A species_2 0
B species_2 0
C species_2 1
A species_3 1
B species_3 1
C species_3 1
A species_4 1
B species_4 1
C species_4 0

The fb_format_*() functions

If your data are not split into these wider datasets, you can use the functions fb_format_*() to create these specific objects from a long format dataset.

  • fb_format_site_locations() allows to extract the site x locations information from the long format data
  • fb_format_site_species() allows to extract the site x species information from the long format data
  • fb_format_species_traits() allows to extract the species x traits information from the long format data
  • fb_format_species_categories() allows to extract the species x categories information from the long format data

All these functions take a long dataset as input (argument data), where one row corresponds to the occurrence/abundance/coverage of one species at one site and output a wider object.

Usage

funbiogeo provides a small excerpt of long format data to show how to use the functions. This data sits at system.file("extdata", "raw_mammals_data.csv", package = "funbiogeo").

Let’s import the long format dataset provided by funbiogeo:

# Define the path to long format dataset ----
file_name <- system.file("extdata", "raw_mammals_data.csv", package = "funbiogeo")


# Read the file ----
all_data <- read.csv(file_name)
Long table example
species order site longitude latitude count adult_body_mass gestation_length litter_size max_longevity sexual_maturity_age diet_breadth
sp_001 Cetartiodactyla fb_103 7.27182 59.09736 1 461900.76 235.00 1.25 324 668.20 1
sp_001 Cetartiodactyla fb_1001 20.77182 52.59736 1 461900.76 235.00 1.25 324 668.20 1
sp_001 Cetartiodactyla fb_102 6.77182 59.09736 1 461900.76 235.00 1.25 324 668.20 1
sp_001 Cetartiodactyla fb_104 7.77182 59.09736 1 461900.76 235.00 1.25 324 668.20 1
sp_001 Cetartiodactyla fb_101 6.27182 59.09736 1 461900.76 235.00 1.25 324 668.20 1
sp_001 Cetartiodactyla fb_1000 20.27182 52.59736 1 461900.76 235.00 1.25 324 668.20 1
sp_001 Cetartiodactyla fb_1002 21.27182 52.59736 1 461900.76 235.00 1.25 324 668.20 1
sp_002 Rodentia fb_1000 20.27182 52.59736 1 21.11 19.89 5.64 48 76.04 NA
sp_002 Rodentia fb_1002 21.27182 52.59736 1 21.11 19.89 5.64 48 76.04 NA
sp_002 Rodentia fb_1001 20.77182 52.59736 1 21.11 19.89 5.64 48 76.04 NA

Extracting species x traits data

The function fb_format_species_traits() extracts species traits values from this long table to create the species x traits dataset. Note that one species must have one unique trait value (no trait variation across sites is allowed).

# Extract species x traits data ----
species_traits <- fb_format_species_traits(
  data    = all_data, 
  species = "species", 
  traits  = c("adult_body_mass", "gestation_length", "litter_size",
              "max_longevity", "sexual_maturity_age", "diet_breadth")
)

# Preview ----
head(species_traits, 10)
#>    species adult_body_mass gestation_length litter_size max_longevity
#> 1   sp_001       461900.76           235.00        1.25         324.0
#> 2   sp_002           21.11            19.89        5.64          48.0
#> 3   sp_005           31.60            24.50        4.94          48.0
#> 4   sp_006           21.90            23.68        5.16          52.8
#> 5   sp_010            8.31               NA        1.73         252.0
#> 6   sp_013        31756.51            63.50        4.98         354.0
#> 7   sp_016        22502.01           196.00        1.79         204.0
#> 8   sp_017       240867.13           235.61        1.09         321.6
#> 9   sp_022            9.89            29.00        4.04          38.4
#> 10  sp_026        57224.61           230.00        1.00         300.0
#>    sexual_maturity_age diet_breadth
#> 1               668.20            1
#> 2                76.04           NA
#> 3                43.27           NA
#> 4                57.93            4
#> 5                   NA            1
#> 6               679.37            1
#> 7               400.97           NA
#> 8               659.91            5
#> 9                66.88            2
#> 10              543.28            2

Extracting site x species data

The function fb_format_site_species() extracts species occurrence/abundance/coverage from this long table to create the site x species dataset. Note that one species must have been observed one time at one site (the package funbiogeo does not yet consider temporal survey).

# Format site x species data ----
site_species <- fb_format_site_species(data       = all_data, 
                                       site       = "site", 
                                       species    = "species", 
                                       value      = "count",
                                       na_to_zero = TRUE
)

# Preview ----
head(site_species[ , 1:8], 10)
#>       site sp_001 sp_002 sp_005 sp_006 sp_010 sp_013 sp_016
#> 1   fb_103      1      0      0      1      0      1      1
#> 2  fb_1001      1      1      1      1      1      1      1
#> 3   fb_102      1      0      0      1      0      1      1
#> 4   fb_104      1      0      0      1      0      1      1
#> 5   fb_101      1      0      0      1      0      1      1
#> 6  fb_1000      1      1      1      1      1      1      1
#> 7  fb_1002      1      1      1      1      1      1      1
#> 8  fb_1022      0      0      1      1      1      0      1
#> 9  fb_1018      0      0      1      1      1      0      0
#> 10 fb_1024      0      0      1      1      1      0      1

Extracting site x locations data

The function fb_format_site_locations() extracts sites coordinates from this long table to create the site x locations dataset. Note that one site must have one unique longitude x latitude value.

# Format site x locations data ----
site_locations <- fb_format_site_locations(data       =  all_data, 
                                           site       = "site", 
                                           longitude  = "longitude", 
                                           latitude   = "latitude",
                                           na_rm      = FALSE)

# Preview ----
head(site_locations)
#> Simple feature collection with 6 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 52.59736 ymin: 6.271821 xmax: 59.09736 ymax: 20.77182
#> Geodetic CRS:  WGS 84
#>      site                  geometry
#> 1  fb_103 POINT (59.09736 7.271821)
#> 2 fb_1001 POINT (52.59736 20.77182)
#> 3  fb_102 POINT (59.09736 6.771821)
#> 4  fb_104 POINT (59.09736 7.771821)
#> 5  fb_101 POINT (59.09736 6.271821)
#> 6 fb_1000 POINT (52.59736 20.27182)

Extracting species x categories data

The function fb_format_species_categories() extracts species values for one supra-category (optional) from this long table to create the species x categories dataset. This category (e.g. order, family, endemism status, conservation status, etc.) can be later by several functions in funbiogeo to aggregate metrics at this level.

# Extract species x categories data ----
species_categories <- fb_format_species_categories(data     = all_data, 
                                                   species  = "species",
                                                   category = "order"
)

# Preview ----
head(species_categories, 10)
#>     species           order
#> 1    sp_001 Cetartiodactyla
#> 8    sp_002        Rodentia
#> 11   sp_005        Rodentia
#> 27   sp_006        Rodentia
#> 59   sp_010      Chiroptera
#> 81   sp_013       Carnivora
#> 89   sp_016 Cetartiodactyla
#> 113  sp_017 Cetartiodactyla
#> 132  sp_022    Eulipotyphla
#> 138  sp_026 Cetartiodactyla

Once your data are in the good format, you can get started with funbiogeo.