Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants

Sumner, Jeremy; Taylor, A; Holland, Barbara; Jarvis, Peter

File(s) under permanent embargo

Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants

journal contribution

posted on 2023-05-19, 04:02 authored by Jeremy SumnerJeremy Sumner, A Taylor, Barbara HollandBarbara Holland, Peter JarvisPeter Jarvis

Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as—in this case only—the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states—for example DNA sequence alignments with four-state models—we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.

Funding

Australian Research Council

History

Publication title

Journal of Mathematical Biology

Volume

75

Issue

6-7

Pagination

1619-1654

ISSN

0303-6812

Department/School

School of Natural Sciences

Publisher

Springer-Verlag

Place of publication

175 Fifth Ave, New York, USA, Ny, 10010

Rights statement

Repository Status

Restricted

Socio-economic Objectives

Expanding knowledge in the mathematical sciences

Usage metrics

Keywords

phylogenetic invariants quartets Markov chains representation theory

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Developing a statistically powerful measure for quartet tree inference using phylogenetic identities and Markov invariants

Funding

Australian Research Council

History

Publication title

Volume

Issue

Pagination

ISSN

Department/School

Publisher

Place of publication

Rights statement

Repository Status

Socio-economic Objectives

Usage metrics

Categories

Keywords

Licence

Exports