eCite Digital Repository
Genome BLAST distance phylogenies inferred from whole plastic and whole mitochondrion genome sequences
Citation
Auch, AF and Henz, SR and Holland, BR and Goker, M, Genome BLAST distance phylogenies inferred from whole plastic and whole mitochondrion genome sequences, BMC Bioinformatics, 7, (July) pp. 1-16. ISSN 1471-2105 (2006) [Refereed Article]
![]() | PDF 659Kb |
Copyright Statement
© 2006 Auch et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
DOI: doi:10.1186/1471-2105-7-350
Abstract
Background: Phylogenetic methods which do not rely on multiple sequence alignments are
important tools in inferring trees directly from completely sequenced genomes. Here, we extend
the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute
phylogenetic trees from all completely sequenced plastid genomes currently available and from a
selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN,
TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between
two sequences from which pairwise similarities and distances are computed in different ways
resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny
reconstruction is directly estimated by computing a recently described measure of "treelikeness",
the so-called δ value, from the respective distance matrices. Additionally, we compare the trees
inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the
NCBI taxonomy tree of the taxa under study.
Results: Our results indicate that, at this taxonomic level, plastid genomes are much more valuable
for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints
are of little use. Distances based on the proportion of "matched" HSP length to average genome
length were best for tree estimation. Additionally we found that using TBLASTX instead of
BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant
increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ
algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and
FastME performing insignificantly worse, and STC performing as well if applied to high quality
distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy.
Conclusion: Using the most treelike distance matrices, as judged by their δ values, distance
methods are able to recover all major plant lineages, and are more in accordance with Apicomplexa
organelles being derived from "green" plastids than from plastids of the "red" type. GBDP-like
methods can be used to reliably infer phylogenies from different kinds of genomic data. A
framework is established to further develop and improve such methods. δ values are a topologyindependent
tool of general use for the development and assessment of distance methods for
phylogenetic inference.
Item Details
Item Type: | Refereed Article |
---|---|
Research Division: | Mathematical Sciences |
Research Group: | Applied mathematics |
Research Field: | Biological mathematics |
Objective Division: | Expanding Knowledge |
Objective Group: | Expanding knowledge |
Objective Field: | Expanding knowledge in the biological sciences |
UTAS Author: | Holland, BR (Professor Barbara Holland) |
ID Code: | 63073 |
Year Published: | 2006 |
Web of Science® Times Cited: | 53 |
Deposited By: | Mathematics |
Deposited On: | 2010-04-13 |
Last Modified: | 2012-03-06 |
Downloads: | 411 View Download Statistics |
Repository Staff Only: item control page