eCite Digital Repository

To gamma or not to gamma? Testing the fit of rates-across-sites models

Citation

Humphries, MA and Holland, BR and Karpievitch, YV and Sumner, JG, To gamma or not to gamma? Testing the fit of rates-across-sites models, Phylomania 2012, 8-9 November 2012, University of Tasmania, Hobart, pp. 7. (2012) [Conference Extract]


Preview
PDF
Pending copyright assessment - Request a copy
58Kb
  

Abstract

Since the introduction of explicitly model based methods of phylogenetic inference (e.g. maximum like- lihood and Bayesian approaches) the complexity and biological realism of models of sequence evolution has increased. An important advance in this regard was the introduction of models that allowed rate variation across sites (RAS), i.e. they modelled the fact that some sites in a gene may be more or less likely to accept substitutions than others. The most common way of accomplishing this is to use a discrete approximation to a gamma distribution. This has the computational advantage of allowing (usually 4 or 8) different rate categories with the addition of a single extra parameter into the model. However, overly simplistic models of RAS can cause problems for phylogenetic inference and for estimating dates of divergences. In particular, a recent study has shown that if there are a small number of sites that mutate very frequently compared to other sites (so called hot spots) this can lead to time-dependence of rate estimates (Soubrier et al 2012). In this study we used amino-acid data from a study by Grahnen et al (2011) who simulated data using a biophysical model of protein folding and binding. We extracted the number of mutations at each site and fit this data to a variety of models. In particular: Constant RAS implies the frequency distribution of counts of mutations should follow a Poisson distribution Gamma distributed RAS imply that the counts should follow a negative binomial distribution Gamma distributed RAS with invariants sites imply that counts should follow a zero inflated negative binomial distribution. We will discuss the merits of these models and whether or not any of them provide an acceptable fit to data generated under biologically realistic conditions.

Item Details

Item Type:Conference Extract
Keywords:phylogenetic inference, maximum likelihood, Bayesian, sequence evolution, advarate variation across sites, RAS, gamma distribution
Research Division:Mathematical Sciences
Research Group:Statistics
Research Field:Biostatistics
Objective Division:Expanding Knowledge
Objective Group:Expanding Knowledge
Objective Field:Expanding Knowledge in the Mathematical Sciences
Author:Humphries, MA (Mrs Melissa Humphries)
Author:Holland, BR (Associate Professor Barbara Holland)
Author:Karpievitch, YV (Dr Yuliya Karpievitch)
Author:Sumner, JG (Dr Jeremy Sumner)
ID Code:81297
Year Published:2012
Deposited By:Mathematics and Physics
Deposited On:2012-11-28
Last Modified:2013-01-11
Downloads:0

Repository Staff Only: item control page