For this vignette we will create a simple SDM to show how you can incorporate the functions of the sdmvis package in your routine of SDM. In fact, you can use sdmvis in any other spatial explicity analysis that would show the results similarly to an SDM/ENM.

Important note: we will follow the dismo package vignette of SDM (now available on rspatial.org), but in a more simplified way. I will focus on showing the functions, and not in followint the SDM best practices.

We will model the distribution of the sloth Bradypus variegatus. We will use the packages dismo, randomForest and sdmvis.

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# install.packages(c("dismo", "randomForest", "sdmvis"))

library(sdmvis)
#> Registered S3 methods overwritten by 'adehabitatMA':
#>   method                       from
#>   print.SpatialPixelsDataFrame sp  
#>   print.SpatialPixels          sp
#> Registered S3 method overwritten by 'geojsonlint':
#>   method         from 
#>   print.location dplyr
library(dismo)
#> Loading required package: raster
#> Loading required package: sp
library(randomForest)
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.

Fisrt of all, we will download occurrence data for this species using the gbif function on the dismo package. I will then plot it using the function pts_leaflet, from the sdmvis package.

# Download data
sp <- gbif(genus = "Bradypus", species = "variegatus")

# Get only lon/lat columns and remove NAs
sp <- sp[,c("lon", "lat")]

sp <- na.omit(sp)

# Plot using sdmvis package
pts_leaflet(sp)

This ploting mode is better to visualize the occurrence points (we can zoom in!). For example, we see a point on the coast of Argentina that is clearly a problematic point. If you click on the point you can get the rownumber, what can help to solve the problem or further investigate it.

We will use some environmental data that is in the dismo package. First we will open the files and plot it using the var_leaflet function of the sdmvis package.

# we load the data on github
predictors <- stack(list.files(file.path(system.file(package="dismo"), 'ex'), pattern='grd$', full.names=TRUE )[1:8]) # here I will exclude one of the
                                        # layers that is categorical

# now we plot using the var_leaflet, adding the points to see
var_leaflet(predictors, pts = sp)

You can explore each one of the layers by clicking on the “stack” button on the right side of the Leaflet map. Note that when you move the mouse over the map, you get the lon/lat of the cursor (thanks to the functionality of the leafem package).

Now it would be good to also sort some pseudo-absence/background points. I will just randomly get some points over the environmental layers. I will also convert the presence points to 1 point per cell.

# Convert to 1 point per cell. Probably there is a simpler way to do that...
out.p <- raster::extract(predictors[[1]], sp[,1:2])
sp <- sp[!is.na(out.p),] # This is to ensure to get only points that fall in the
                         # environmental layer.
r <- raster(predictors[[1]])
r[cellFromXY(r, sp[,1:2])] <- 1
sp <- data.frame(rasterToPoints(r))
colnames(sp) <- c("x", "y", "pa")

# Sample random points 2x the quantity of presences
set.seed(2000)
backgr <- randomPoints(predictors, (nrow(sp)*2))
backgr <- data.frame(backgr)
backgr$pa <- 0

# We bind the two togheter
spdata <- rbind(sp, backgr)

# Plot again to see:
pts_leaflet(spdata)

You can see that it now plots points with two different colors, one for presences and other for absences/background (you can change the colors).

Before moving to the SDM part, the var_leaflet function that we used before have a nice functionality that you can use to explore the environmental variables according to the points. It will extract the info from variables and produce some summary tables and plots in an html file. For getting this you just have to set varsummary = TRUE.

# For now it ONLY works with presence/absence data!
var_leaflet(predictors, pts = spdata, varsummary = TRUE)

You will get something like that, but in a separate file (here I will just include the tables for the whole area and for the points, and one plot):

[[1]]

bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8
Min. 122.0000 375.0 215.0000 4.0000 179.0000 49.0000 67.0 125.0000
1st Qu. 239.5000 1694.0 690.0000 99.0000 299.5000 167.0000 110.0 240.0000
Median 254.0000 2248.0 877.0000 204.0000 315.0000 196.0000 118.0 254.0000
Mean 246.5964 2395.0 923.5455 260.4618 309.2036 184.7818 124.4 247.5964
3rd Qu. 264.0000 2848.5 1093.0000 333.0000 325.0000 211.0000 138.5 263.0000
Max. 284.0000 7682.0 2458.0000 1496.0000 356.0000 228.0000 219.0 282.0000

[[2]]

bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8
Min. -9.0000 2.000 1.0000 0.0000 61.0000 -198.0000 70.00 -24.0000
1st Qu. 163.0000 765.750 316.2500 32.0000 300.2500 24.2500 130.00 196.0000
Median 229.0000 1327.000 539.5000 95.0000 320.0000 130.5000 177.00 247.0000
Mean 204.6018 1417.938 579.0073 151.1073 307.2727 103.4236 203.82 215.7091
3rd Qu. 257.0000 1951.000 863.2500 242.7500 332.7500 187.7500 261.00 261.0000
Max. 285.0000 7024.000 2458.0000 1254.0000 422.0000 236.0000 444.00 312.0000

We will do a randomForest to produce an SDM. We will split the data to evaluate the model (we will separate 25% of the data).

## Extract the environmental data for each point
envdata <- extract(predictors, spdata[, 1:2])
envdata <- data.frame(cbind(spdata, envdata))

samp <- sample(nrow(envdata), round(0.75 * nrow(envdata)))

traindata <- envdata[samp,]
testdata <- envdata[-samp,]

## Fit a RandomForest model
# First we criate a model
model <- pa ~ bio1 + bio5 + bio6 + bio7 + bio8 + bio12 + bio16 + bio17
# Fit a random forest model
rf1 <- randomForest(model, data = traindata)

rf1
#> 
#> Call:
#>  randomForest(formula = model, data = traindata) 
#>                Type of random forest: regression
#>                      Number of trees: 500
#> No. of variables tried at each split: 2
#> 
#>           Mean of squared residuals: 0.1611983
#>                     % Var explained: 29.05

# Evaluate using the test data
erf <- dismo::evaluate(testdata[testdata[,3] == 1, 3:11], 
                       testdata[testdata[,3] == 0, 3:11], rf1)

erf
#> class          : ModelEvaluation 
#> n presences    : 59 
#> n absences     : 147 
#> AUC            : 0.8659057 
#> cor            : 0.5787027 
#> max TPR+TNR at : 0.3179

Now we can predict it to all the area and plot it using the sdm_leaflet function.

# Predict to the environment
pr <- predict(predictors, rf1)

# Plot using sdm_leaflet
sdm_leaflet(pr, mode = "continuous", pts = spdata, layernames = "RF")

When we predict to a new environment, some areas may be out of the range used for training the model. In that case it’s important to assess in which areas the model will need to extrapolate1. One way to do that is using a MESS plot. In the var_leaflet you can easily get a MESS map to visualize using mess=TRUE:

var_leaflet(predictors, pts = spdata, mess = TRUE)
#> Obtaining MESS map using ecospat. This may take some minutes...
# Chose MESS on the "stack" button of the leaflet map to see # Negative values represent areas that are dissimilar and that need attention

we are interested in getting a binary map (i.e. thresholded). Before applying a certain threshold we can explore how different threshold values will look like. This can be helpfull in a sort of situations, as for example to decide, when you have two viable threshold options, which one is closer to what is expected to the species distribution (you should certainly chose the threshold based on methodological aspects, and not on what looks best; but, knowledge of the species biology can help you here).

This function can take longer to execute, depending on the number of thresholds.

# Lets see the model evaluation again
erf
#> class          : ModelEvaluation 
#> n presences    : 59 
#> n absences     : 147 
#> AUC            : 0.8659057 
#> cor            : 0.5787027 
#> max TPR+TNR at : 0.3179

#We get some thresholds here
thrs <- threshold(erf)
thrs
#>             kappa spec_sens no_omission prevalence equal_sens_spec sensitivity
#> thresholds 0.3179    0.3179   0.1166333  0.2860667       0.3447333   0.2225619

# We can try two thresholds, one that maximizes the specificity+sensitivity
# and one for no omission (this second is usually a 'bad' one).

sdm_thresh(pr, thresh = c(thrs$spec_sens, thrs$no_omission),
           tname = c("spec_sens", "no_omission"),
           pts = spdata)

Once we choose a threshold, we can make our binary map and plot it using the sdm_leaflet function, but with the "binary" mode. You can choose which palette to use, but here we will use the standard one.

binary <- calc(pr, function(x){
  x[x < thrs$spec_sens] <- 0
  x[x >= thrs$spec_sens] <- 1
  x
})

sdm_leaflet(binary, mode = "bin", pts = spdata)

You may want to see how the distribution of the species will be in the future. Here we will simulate an increase in some values for three of the variables (this is completely at random) and then predict the distribution for this new environment.

# We will increase the value of predictors at random
predictors.fut <- predictors
predictors.fut[[1]] <- predictors.fut[[1]] * 0.75
predictors.fut[[2]] <- predictors.fut[[2]] + 1.5
predictors.fut[[3]] <- predictors.fut[[3]] * 1.5

# Predict to the environment
pr.fut <- predict(predictors.fut, rf1)

binary.fut <- calc(pr.fut, function(x){
  x[x < thrs$spec_sens] <- 0
  x[x >= thrs$spec_sens] <- 1
  x
})

# I will stack both the current and "future" to plot both
predictions <- stack(binary.fut, binary)

sdm_leaflet(predictions, mode = "bin", pts = spdata,
            layernames = c("Future", "Current"))

But we may also be interested in have a mapt that explicitly shows how the distribution will change. The simple way is to see the difference between the two rasters and then plot it using the "quad" mode on the sdm_leaflet package.

dif <- (binary.fut * 2) + binary

sdm_leaflet(dif, mode = "quad", pts = spdata,
            layernames = c("Difference"))

Finally, we may produce some spatial explicity error metric during our SDM routine. Here we will do a bootstrap analysis to see the coefficient of variation of the predictions when we sort new data points. We will run it only 30 times.

bootstrap <- pr

for (i in 2:30) {
  btdata <- traindata[sample(1:nrow(traindata), size = nrow(traindata),
                             replace = T), ]
  rfbt <- randomForest(model, data = btdata)
  prbt <- predict(predictors, rfbt)
  
  bootstrap <- stack(bootstrap, prbt)
}

m.mean <- calc(bootstrap, mean)
      
m.sd <- calc(bootstrap, sd)
      
boot.cv <- m.sd/m.mean

We can plot it using the sdm_highlight function and ask the function to highlight to us the places where there are the highest (or lowest) values for the coefficient of variation.

sdm_highlight(boot.cv, quantile = 0.75,
              both.sides = TRUE, pts = spdata)

sdmvis functions are also helpful to produce interactive reports/pages to show your results. Soon a vignette about this!