eCite Digital Repository

How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure


Fountain-Jones, NM and Machado, G and Carver, S and Packer, C and Recamonde-Mendoza, M and Craft, ME, How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure, Journal of Animal Ecology, 88, (10) pp. 1447-1461. ISSN 0021-8790 (2019) [Refereed Article]

Copyright Statement

Copyright 2019 The Authors. Journal of Animal Ecology 2019 British Ecological Society

DOI: doi:10.1111/1365-2656.13076


  1. Predicting infectious disease dynamics is a central challenge in disease ecology. Models that can assess which individuals are most at risk of being exposed to a pathogen not only provide valuable insights into disease transmission and dynamics but can also guide management interventions. Constructing such models for wild animal populations, however, is particularly challenging; often only serological data are available on a subset of individuals and nonlinear relationships between variables are common.
  2. Here we provide a guide to the latest advances in statistical machine learning to construct pathogen‐risk models that automatically incorporate complex nonlinear relationships with minimal statistical assumptions from ecological data with missing data. Our approach compares multiple machine learning algorithms in a unified environment to find the model with the best predictive performance and uses game theory to better interpret results. We apply this framework on two major pathogens that infect African lions: canine distemper virus (CDV) and feline parvovirus.
  3. Our modelling approach provided enhanced predictive performance compared to more traditional approaches, as well as new insights into disease risks in a wild population. We were able to efficiently capture and visualize strong nonlinear patterns, as well as model complex interactions between variables in shaping exposure risk from CDV and feline parvovirus. For example, we found that lions were more likely to be exposed to CDV at a young age but only in low rainfall years.
  4. When combined with our data calibration approach, our framework helped us to answer questions about risk of pathogen exposure that are difficult to address with previous methods. Our framework not only has the potential to aid in predicting disease risk in animal populations, but also can be used to build robust predictive models suitable for other ecological applications such as modelling species distribution or diversity patterns.

Item Details

Item Type:Refereed Article
Keywords:boosted regression trees, disease ecology, gradient boosting models, machine learning, model-agnostic methods, random forests, serology, support vector machines
Research Division:Biological Sciences
Research Group:Ecology
Research Field:Population ecology
Objective Division:Expanding Knowledge
Objective Group:Expanding knowledge
Objective Field:Expanding knowledge in the biological sciences
UTAS Author:Carver, S (Associate Professor Scott Carver)
ID Code:134840
Year Published:2019
Web of Science® Times Cited:19
Deposited By:Zoology
Deposited On:2019-09-09
Last Modified:2020-07-22

Repository Staff Only: item control page