File(s) under permanent embargo

How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure

journal contribution

posted on 2023-05-20, 07:00 authored by Fountain-Jones, NM, Machado, G, Scott CarverScott Carver, Packer, C, Recamonde-Mendoza, M, Craft, ME

Predicting infectious disease dynamics is a central challenge in disease ecology. Models that can assess which individuals are most at risk of being exposed to a pathogen not only provide valuable insights into disease transmission and dynamics but can also guide management interventions. Constructing such models for wild animal populations, however, is particularly challenging; often only serological data are available on a subset of individuals and nonlinear relationships between variables are common.
Here we provide a guide to the latest advances in statistical machine learning to construct pathogen‐risk models that automatically incorporate complex nonlinear relationships with minimal statistical assumptions from ecological data with missing data. Our approach compares multiple machine learning algorithms in a unified environment to find the model with the best predictive performance and uses game theory to better interpret results. We apply this framework on two major pathogens that infect African lions: canine distemper virus (CDV) and feline parvovirus.
Our modelling approach provided enhanced predictive performance compared to more traditional approaches, as well as new insights into disease risks in a wild population. We were able to efficiently capture and visualize strong nonlinear patterns, as well as model complex interactions between variables in shaping exposure risk from CDV and feline parvovirus. For example, we found that lions were more likely to be exposed to CDV at a young age but only in low rainfall years.
When combined with our data calibration approach, our framework helped us to answer questions about risk of pathogen exposure that are difficult to address with previous methods. Our framework not only has the potential to aid in predicting disease risk in animal populations, but also can be used to build robust predictive models suitable for other ecological applications such as modelling species distribution or diversity patterns.

History

Publication title

Journal of Animal Ecology

Volume

88

Issue

10

Pagination

1447-1461

ISSN

0021-8790

Department/School

School of Natural Sciences

Publisher

Blackwell Publishing Ltd

Place of publication

9600 Garsington Rd, Oxford, England, Oxon, Ox4 2Dg

Rights statement

Copyright 2019 The Authors. Journal of Animal Ecology © 2019 British Ecological Society

Repository Status

Restricted

Socio-economic Objectives

Expanding knowledge in the biological sciences

Usage metrics

Categories

Keywords

boosted regression trees disease ecology gradient boosting models machine learning model-agnostic methods random forests serology support vector machines

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC