Ecological Niche Modeling on Docker
Docker image to develop and analize ecological niche models (ENM).
These scripts allow the user to download data from Global Biodiversity Information Facility (GBIF) to generate ocurrence files, ocurrence maps, and generate ENMs on batch mode.
Current available functions are:
Download records from GBIF database and produce
.csv files for query species.
Please read GBIF data user agreement
Eliminate duplicate records, not applicable data (NA), and generate maps
Reduce the number of records (less than 1km apart) using the grid method. It also generates maps for output records. This script is disabled for species with less than 30 records by default.
Generate pseudoabsence points from record data.
Extract climatic data from rasters based on species records.
Generate correlation coefficients, significancies and plots from climatic data.
Generate Ecological Niche Model for input species, it also generates output data and graphs for model evaluation user provided 'bil' format rasters
Merge data from other databases.
Spatial rarefaction using ThinsSP algorithm (Aiello-Lammens et al. 2015)
The scripts are intended to work as a single pipeline. Future versions will include compability with user provided databases.
Be aware this is work in progress. Thus scripts may be prone to bugs and errors.
Generating ENMs is not an easy task, it demands a lot of knowledge on species biology, niche theory and niche modeling methodology. Please, carefuly evaluate every step of the pipeline and tweak the scripts to suit your own needs. Current final ENMs are suitable at best for exploratory analysis
Current version was designed as a final project for the course "Introduction to bioinformatics and reproducible research for genetic analyses" by Alicia Yanes Mastretta and Azalea García
A tutorial on the use of ENMOD is available at Tutorial.md
Docker software installed
Further information can be found on docker website:
You will need a working directory containing:
data_inwith the following files
- An input file inside , "species.csv"
This file must include a column with the species names you are interested. Use the following format:
*Note first row is the column label
- Raster files from WorldClim Database in
.ascformat in a directory named "rasters".
Raster files must be clipped to coincide with your species distribution. If species records fall outside your raster coordinates you will get NA data.
An empty subdirectory
Dowload the latest image using the following command:
docker pull ghuertaramos/enmod
Running the tests
Once the image is pulled from docker cloud.
- You may set a shortname for the path of your working directory
This directory must contain
species.csv file and
rasters directory with 19
.bil files from worldclim database. Available at:
- Run the scripts using the following command:
docker run --rm -v $mydatapath:/ENMOD/data ghuertaramos/enmod Rscript Records.R
The command beaks down as:
--rm deletes the container after the script execution.
The local directory shortened in
mydatapath is mounted in a new container using the flag
:/ENMOD/data is the name of the volume inside the container, this name must be mantained for the scripts to work.
ghuertaramos/ENMOD is the name of the image we previously dowloaded
Rscript Records.R is the command to run the
Records.R function, it can be changed for the other functions available on this image.
Records.R if query species has no records on gbif database, the script will fail. This can also ocurr if the species name is misspelled.
Rarf.R may take a long time to finish, it is also very important to notice that current script only works for records in the americas.
The exact ranges are:
- Guillermo Huerta Ramos - Initial work - ghuertaramos
This project is licensed under the MIT License - see the LICENSE.md file for details
Hadley Wickham (2017). tidyr: Easily Tidy Data with 'spread()'
and 'gather()' Functions. R package version 0.6.3.
Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith
(2017). dismo: Species Distribution Modeling. R package version
Robert J. Hijmans (2016). raster: Geographic Data Analysis and
Modeling. R package version 2.5-8.
Roger Bivand and Nicholas Lewin-Koh (2017). maptools: Tools for
Reading and Handling Spatial Objects. R package version 0.9-2.
Roger Bivand and Colin Rundel (2017). rgeos: Interface to
Geometry Engine - Open Source (GEOS). R package version 0.3-23.
Simon Urbanek (2016). rJava: Low-Level R to Java Interface. R
package version 0.9-8.
Steven J. Phillips, Miroslav Dudík, Robert E. Schapire. [Internet] Maxent software for modeling species niches and distributions (Version 3.4.1). Available from url: http://biodiversityinformatics.amnh.org/open_source/maxent/.
Taiyun Wei and Viliam Simko (2016). corrplot: Visualization of a
Correlation Matrix. R package version 0.77.