Sampling bias & environmental niches

Baker, D.J., Maclean, I.M.D., Goodall, M. & Gaston, K.J. 2022. Correlations between spatial sampling biases and environmental niches affect species distribution models. Global Ecology and Biogeography 31, 1038-1050.

Aim: Spatial sampling biases in biodiversity data arise because of complex interactions between geography, species characteristics and human behaviour, including preferences for or against particular species or habitats; biases are therefore not necessarily independent of the environmental niches of species. We evaluate when correlations between spatial sampling biases and environmental niches are likely to affect species distribution models (SDMs) developed both with and without attempts to correct these biases.

Innovation: A virtual species and virtual ecologist framework was used to simulate biodiversity data with either no spatial sampling bias or biases that were correlated (positively or negatively) with one of the environmental variables used to define the environmental niches of the species. The environmental variables used to define the species niche were simulated with spatial autocorrelation operating at multiple spatial scales. Virtual samples were then used to model species distributions, with models evaluated based on their ability to rank the suitability of sites correctly.

Main conclusions: Correlations between spatial sampling bias and environmental niches frequently reduced the rank correlation of model predictions, but the relative importance of these effects varied with species type (greater decline in rank correlation as the environmental niche broadens) and data type (models built using detection/ non-detection data were less affected than those using detection-only data). Bias-correction effectiveness varied depending on the structure of the spatial bias but was also highly variable across methods and dependent on data type. The implications of these results are that spatial sampling bias is a greater concern for SDMs where: (1) the distribution of effort is non-random with respect to an environmental gradient thought to be correlated with a species’ distribution; (2) the species being modelled has a broad environmental niche; and (3) the data for modelling contain only information on detections (i.e., presence only).