eCite Digital Repository

Performance of AIC and BIC in selecting Partition Models and Mixture Models

Citation

Liu, Qin and Charleston, MA and Richards, SA and Holland, BR, Performance of AIC and BIC in selecting Partition Models and Mixture Models, Systematic biology pp. 1-29. ISSN 1076-836X (2023) [Refereed Article]


Preview
PDF (Accepted manuscript)
802Kb
  

Copyright Statement

© The Author(s) 2022. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. This is an Open Access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0) License, (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: doi:10.1093/sysbio/syac081

Abstract

n molecular phylogenetics, partition models and mixture models provide different approaches to accommodating heterogeneity in genomic sequencing data. Both types of models generally give a superior fit to data than models that assume the process of sequence evolution is homogeneous across sites and lineages. The Akaike Information Criterion (AIC), an estimator of Kullback-Leibler divergence, and the Bayesian Information Criterion (BIC), are popular tools to select models in phylogenetics. Recent work suggests AIC should not be used for comparing mixture and partition models. In this work, we clarify that this difficulty is not fully explained by AIC misestimating the Kullback-Leibler divergence. We also investigate the performance of the AIC and BIC at comparing amongst mixture models and amongst partition models. We find that under non-standard conditions (i.e. when some edges have small expected number of changes), AIC underestimates the expected Kullback-Leibler divergence. Under such conditions, AIC preferred the complex mixture models and BIC preferred the simpler mixture models. The mixture models selected by AIC had a better performance in estimating the edge length, while the simpler models selected by BIC performed better in estimating the base frequencies and substitution rate parameters. In contrast, AIC and BIC both prefer simpler partition models over more complex partition models under non-standard conditions, despite the fact that the more complex partition model was the generating model. We also investigated how mispartitioning (i.e. grouping sites that have not evolved under the same process) affects both the performance of partition models compared to mixture models and the model selection process. We found that as the level of mispartitioning increases, the bias of AIC in estimating the expected Kullback-Leibler divergence remains the same, and the branch lengths and evolutionary parameters estimated by partition models become less accurate. We recommend that researchers be cautious when using AIC and BIC to select among partition and mixture models; other alternatives, such as cross-validation and bootstrapping should be explored, but may suffer similar limitations.

Item Details

Item Type:Refereed Article
Keywords:AIC, BIC, phylogenetic partition models, phylogenetic mixture models
Research Division:Biological Sciences
Research Group:Evolutionary biology
Research Field:Phylogeny and comparative analysis
Objective Division:Expanding Knowledge
Objective Group:Expanding knowledge
Objective Field:Expanding knowledge in the biological sciences
UTAS Author:Liu, Qin (Ms Qin Liu)
UTAS Author:Charleston, MA (Professor Michael Charleston)
UTAS Author:Richards, SA (Dr Shane Richards)
UTAS Author:Holland, BR (Professor Barbara Holland)
ID Code:155159
Year Published:2023 (online first 2022)
Web of Science® Times Cited:6
Deposited By:Mathematics
Deposited On:2023-01-31
Last Modified:2023-02-08
Downloads:7 View Download Statistics

Repository Staff Only: item control page