--- title: "2. Parallelize Computation of Indices" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{2. Parallelize Computation of Indices} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Note: This vignette presents some performance tests ran between non-parallel and parallel versions of `fundiversity` functions. Note that to avoid the dependency on other packages, this vignette is [**pre-computed**](https://ropensci.org/technotes/2019/12/08/precompute-vignettes/). Within `fundiversity` the computation of most indices can be parallelized using the `future` package. The goal of this vignette is to explain how to toggle and use parallelization in `fundiversity`. The functions that currently support parallelization are summarized in the table below: | Function Name | Index Name | Parallelizable[^1] | Memoizable[^2] | |:----------------------|:---------------|:------------------:|:--------------:| | `fd_fric()` | FRic | ✅ | ✅ | | `fd_fric_intersect()` | FRic_intersect | ✅ | ✅ | | `fd_fdiv()` | FDiv | ✅ | ✅ | | `fd_feve()` | FEve | ✅ | ❌ | | `fd_fdis()` | FDis | ✅ | ❌ | | `fd_raoq()` | Rao's Q | ❌ | ❌ | [^1]: parallelization through the `future` backend please refer to the [parallelization vignette](https://funecology.github.io/fundiversity/articles/fundiversity_1-parallel.html) for details. [^2]: memoization means that the results of the functions calls are cached and not recomputed when recalled, to toggle it off see the `fundiversity::fd_fric()` [Details section](https://funecology.github.io/fundiversity/reference/fd_fric.html#details). Note that **memoization and parallelization cannot be used at the same time**. If the option `fundiversity.memoise` has been set to `TRUE` but the computations are parallelized, `fundiversity` will use unmemoised versions of functions. The `future` package provides a simple and general framework to allow asynchronous computation depending on the resources available for the user. The [first vignette of `future`](https://cran.r-project.org/package=future) gives a general overview of all its features. The main idea being that the user should write the code once and that it would run seamlessly sequentially, or in parallel on a single computer, or on a cluster, or distributed over several computers. `fundiversity` can thus run on all these different backends following the user's choice. ```r library("fundiversity") data("traits_birds", package = "fundiversity") data("site_sp_birds", package = "fundiversity") ``` # Running code in parallel By default the `fundiversity` code will run sequentially on a single core. To trigger parallelization the user needs to define a `future::plan()` object with a parallel backend such as `future::multisession` to split the execution across multiple R sessions. ```r # Sequential execution fric1 <- fd_fric(traits_birds) # Parallel execution future::plan(future::multisession) # Plan definition fric2 <- fd_fric(traits_birds) # The code resolve in similar fashion identical(fric1, fric2) #> [1] TRUE ``` Within the `future::multisession` backend you can specify the number of cores on which the function should be parallelized over through the argument `workers`, you can change it in the `future::plan()` call: ```r future::plan(future::multisession, workers = 2) # Only 2 cores are used fric3 <- fd_fric(traits_birds) identical(fric3, fric2) #> [1] TRUE ``` To learn more about the different backends available and the related arguments needed, please refer to the documentation of `future::plan()` and the [overview vignette of `future`](https://cran.r-project.org/package=future). # Performance comparison We can now compare the difference in performance to see the performance gain thanks to parallelization: ```r future::plan(future::sequential) non_parallel_bench <- microbenchmark::microbenchmark( non_parallel = { fd_fric(traits_birds) }, times = 20 ) future::plan(future::multisession) parallel_bench <- microbenchmark::microbenchmark( parallel = { fd_fric(traits_birds) }, times = 20 ) rbind(non_parallel_bench, parallel_bench) #> Unit: milliseconds #> expr min lq mean median uq max neval cld #> non_parallel 8.9509 9.2691 14.93812 13.32405 18.4841 33.153 20 a #> parallel 224.7037 248.9997 345.59427 274.59615 304.6889 1660.164 20 b ``` The non parallelized code runs faster than the parallelized one! Indeed, the parallelization in `fundiversity` parallelize the computation across different sites. So parallelization should be used when you have many sites on which you want to compute similar indices. ```r # Function to make a bigger site-sp dataset make_more_sites <- function(n) { site_sp <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE)) rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp))) site_sp } ``` For example with a dataset 5000 times bigger: ```r bigger_site <- make_more_sites(5000) microbenchmark::microbenchmark( seq = { future::plan(future::sequential) fd_fric(traits_birds, bigger_site) }, multisession = { future::plan(future::multisession, workers = 4) fd_fric(traits_birds, bigger_site) }, multicore = { future::plan(future::multicore, workers = 4) fd_fric(traits_birds, bigger_site) }, times = 20 ) #> Unit: seconds #> expr min lq mean median uq max neval cld #> seq 78.13766 195.17853 184.92560 196.89360 197.90500 200.56116 20 a #> multisession 34.23402 54.44036 53.39172 54.88206 55.19359 61.83829 20 b #> multicore 75.43857 192.45136 183.07222 196.48277 201.16889 209.39847 20 a ```
Session info of the machine on which the benchmark was ran and time it took to run ``` #> seconds needed to generate this document: 8443.78 sec elapsed #> ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16 ucrt) #> os Windows 11 x64 (build 22631) #> system x86_64, mingw32 #> ui RStudio #> language (EN) #> collate French_France.utf8 #> ctype fr_FR.UTF-8 #> tz Europe/Paris #> date 2024-03-26 #> rstudio 2023.12.1+402 Ocean Storm (desktop) #> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── #> ! package * version date (UTC) lib source #> abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0) #> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> cluster 2.1.6 2023-12-01 [1] CRAN (R 4.3.2) #> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) #> commonmark 1.9.1 2024-01-30 [1] CRAN (R 4.3.2) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) #> curl 5.2.1 2024-03-01 [1] CRAN (R 4.3.3) #> desc 1.4.3 2023-12-10 [1] CRAN (R 4.3.2) #> devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.2) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> P fundiversity * 1.1.1 2024-01-03 [?] Github (funecology/fundiversity@d11d749) #> future 1.33.1 2023-12-22 [1] CRAN (R 4.3.2) #> future.apply 1.11.1 2023-12-21 [1] CRAN (R 4.3.2) #> geometry 0.4.7 2023-02-03 [1] CRAN (R 4.3.1) #> gh 1.4.0 2023-02-22 [1] CRAN (R 4.3.1) #> gitcreds 0.1.2 2022-09-08 [1] CRAN (R 4.3.1) #> globals 0.16.3 2024-03-08 [1] CRAN (R 4.3.3) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.2) #> httpuv 1.6.14 2024-01-26 [1] CRAN (R 4.3.2) #> httr2 1.0.0 2023-11-14 [1] CRAN (R 4.3.2) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> later 1.3.2 2023-12-06 [1] CRAN (R 4.3.2) #> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> listenv 0.9.1 2024-01-29 [1] CRAN (R 4.3.2) #> magic 1.6-1 2022-11-16 [1] CRAN (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> MASS 7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.2) #> Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.3.1) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) #> mgcv 1.9-1 2023-12-21 [1] CRAN (R 4.3.2) #> microbenchmark 1.4.10 2023-04-28 [1] CRAN (R 4.3.3) #> mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) #> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.2) #> multcomp 1.4-25 2023-06-20 [1] CRAN (R 4.3.1) #> mvtnorm 1.2-4 2023-11-27 [1] CRAN (R 4.3.2) #> nlme 3.1-164 2023-11-27 [1] CRAN (R 4.3.2) #> parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.3.3) #> permute 0.9-7 2022-01-27 [1] CRAN (R 4.3.1) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) #> pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.3.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) #> pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.3.2) #> profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.2) #> promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1) #> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.1) #> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2) #> remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.3) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.2) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.3) #> roxygen2 7.3.1 2024-01-22 [1] CRAN (R 4.3.2) #> rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.2) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) #> sandwich 3.1-0 2023-12-11 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> shiny 1.8.0 2023-11-17 [1] CRAN (R 4.3.2) #> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2) #> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.2) #> survival 3.5-8 2024-02-14 [1] CRAN (R 4.3.3) #> TH.data 1.1-2 2023-04-17 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1) #> tictoc 1.2.1 2024-03-18 [1] CRAN (R 4.3.3) #> urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.2) #> usethis 2.2.3 2024-02-19 [1] CRAN (R 4.3.3) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> vegan 2.6-4 2022-10-11 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.2) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.3) #> xml2 1.3.6 2023-12-04 [1] CRAN (R 4.3.2) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.2) #> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.1) #> #> [1] C:/Users/greniem/AppData/Local/R/win-library/4.3 #> [2] C:/Program Files/R/R-4.3.1/library #> #> P ── Loaded and on-disk path mismatch. #> #> ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ```