eCite Digital Repository

To gamma or not to gamma? Testing the fit of rates-across-sites models


Humphries, MA and Holland, BR and Karpievitch, YV and Sumner, JG, To gamma or not to gamma? Testing the fit of rates-across-sites models, Phylomania 2012, 8-9 November 2012, University of Tasmania, Hobart, pp. 7. (2012) [Conference Extract]

Pending copyright assessment - Request a copy


Since the introduction of explicitly model based methods of phylogenetic inference (e.g. maximum like- lihood and Bayesian approaches) the complexity and biological realism of models of sequence evolution has increased. An important advance in this regard was the introduction of models that allowed rate variation across sites (RAS), i.e. they modelled the fact that some sites in a gene may be more or less likely to accept substitutions than others. The most common way of accomplishing this is to use a discrete approximation to a gamma distribution. This has the computational advantage of allowing (usually 4 or 8) different rate categories with the addition of a single extra parameter into the model. However, overly simplistic models of RAS can cause problems for phylogenetic inference and for estimating dates of divergences. In particular, a recent study has shown that if there are a small number of sites that mutate very frequently compared to other sites (so called hot spots) this can lead to time-dependence of rate estimates (Soubrier et al 2012). In this study we used amino-acid data from a study by Grahnen et al (2011) who simulated data using a biophysical model of protein folding and binding. We extracted the number of mutations at each site and fit this data to a variety of models. In particular: Constant RAS implies the frequency distribution of counts of mutations should follow a Poisson distribution Gamma distributed RAS imply that the counts should follow a negative binomial distribution Gamma distributed RAS with invariants sites imply that counts should follow a zero inflated negative binomial distribution. We will discuss the merits of these models and whether or not any of them provide an acceptable fit to data generated under biologically realistic conditions.

Item Details

Item Type:Conference Extract
Keywords:phylogenetic inference, maximum likelihood, Bayesian, sequence evolution, advarate variation across sites, RAS, gamma distribution
Research Division:Mathematical Sciences
Research Group:Statistics
Research Field:Biostatistics
Objective Division:Expanding Knowledge
Objective Group:Expanding knowledge
Objective Field:Expanding knowledge in the mathematical sciences
UTAS Author:Humphries, MA (Mrs Melissa Humphries)
UTAS Author:Holland, BR (Professor Barbara Holland)
UTAS Author:Karpievitch, YV (Dr Yuliya Karpievitch)
UTAS Author:Sumner, JG (Associate Professor Jeremy Sumner)
ID Code:81297
Year Published:2012
Deposited By:Mathematics and Physics
Deposited On:2012-11-28
Last Modified:2013-01-11

Repository Staff Only: item control page