Results 1  10
of
13
Why neighborjoining works
, 2006
"... Abstract. We show that the neighborjoining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal radius bound as a special case and explains many cases where neighborjoining is successful even when Atte ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We show that the neighborjoining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal radius bound as a special case and explains many cases where neighborjoining is successful even when Atteson’s criterion is not satisfied. We also provide a proof for Atteson’s conjecture on the optimal edge radius of the neighbor joining algorithm. The strong performance guarantees we provide also hold for the quadratic time fast neighborjoining algorithm, thus providing a theoretical basis for inferring very large phylogenies with neighborjoining. 1.
Neighbor joining algorithms for inferring phylogenies via LCAdistances
 Journal of Computational Biology
, 2006
"... Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edgeweighted trees by distances to LCAs (Least Commo ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edgeweighted trees by distances to LCAs (Least Common Ancestors). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighborjoining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou&Nei’s NJ. One member of this family is used to provide a new simple version of the 3approximation algorithm for the closest additive metric under the l ∞ norm. A byproduct of our work is a novel technique 1 which yields a time optimal O(n 2) implementation of common clustering algorithms such as UPGMA. 1
Autovalidating von Neumann rejection sampling from small
, 2009
"... which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: In phylogenetic inference one is interested in obtaining samples from the posterior distribution over the tree space on the basis of some observed DNA sequence data. One of the simplest sampling methods is the rejection sampler due to von Neumann. Here we introduce an autovalidating version of the rejection sampler, via interval analysis, to rigorously draw samples from posterior distributions over small phylogenetic tree spaces. Results: The posterior samples from the autovalidating sampler are used to rigorously (i) estimate posterior probabilities for different rooted topologies based on mitochondrial DNA from human, chimpanzee and gorilla, (ii) conduct a nonparametric test of rate variation between proteincoding and tRNAcoding sites from three primates and (iii) obtain a posterior estimate of the humanneanderthal divergence time. Conclusion: This solves the open problem of rigorously drawing independent and identically distributed samples from the posterior distribution over rooted and unrooted small tree spaces (3 or 4 taxa) based on any multiplyaligned sequence data.
Can biology lead to new theorems
 Annual report of the Clay Mathematics Institute
, 2005
"... ..."
(Show Context)
Pivotal neighbor joining algorithms for inferring phylogenies via LCAdistances
, 2006
"... Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of weighted trees by distances to LCAs (Least Common Anc ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of weighted trees by distances to LCAs (Least Common Ancestors). This characterization combines the theoretical advantages of ultrametrics, with the practical advantages of neighbor joining algorithms. A simple and natural neighbor joining criterion based on the above characterization is used to provide a family of consistent neighborjoining algorithms which are simpler and more efficient than Saitou&Nei’s NJ. A large subclass of this family is shown to be optimal under Atteson’s robustness criterion for reconstruction of ‘sufficiently long ’ edges; in this respect it outperforms NJ. A specific algorithm in this subclass is shown to provide a simpler version of the known 3approximation algorithm of an arbitrary metric by an additive metric. Our neighbor joining algorithms are pivotal, in the sense that when the input is not consistent with some tree, the output tree may depend on a roottaxon selected by the algorithm. We present experimental results for two variants of our algorithm on simulated data generated according to a well accepted evolutionary model. These experiments indicate that for the right selection of the root, the tree returned by our algorithm is likely to be topologically closer to the true tree than the one returned by NJ. The experimental results also indicate that selecting a roottaxon closer to the origin of evolution is likely to produce a more accurate tree. An interesting phenomenon demonstrated by our results is that in this evolutionary model, trees which best approximate the input distances are usually not the trees which best approximate the correct topology. 1
An Autovalidating Rejection Sampler
"... Summary. In Bayesian statistical inference and computationally intensive frequentist inference, one is interested in obtaining samples from a high dimensional, and possibly multimodal target density. The challenge is to obtain samples from this target without any knowledge of the normalizing consta ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Summary. In Bayesian statistical inference and computationally intensive frequentist inference, one is interested in obtaining samples from a high dimensional, and possibly multimodal target density. The challenge is to obtain samples from this target without any knowledge of the normalizing constant. Several approaches to this problem rely on Monte Carlo methods. One of the simplest such methods is the rejection sampler due to von Neumann. Here we introduce an autovalidating version of the rejection sampler via interval analysis. We show that our rejection sampler does provide us with independent samples from a large class of target densities in a guaranteed manner. We illustrate the efficiency of the sampler by theory and by examples in up to 10 dimensions. Our sampler is immune to the ‘pathologies ’ of some infamous densities including the witch’s hat and can rigorously draw samples from piecewise Euclidean spaces of small phylogenetic trees. 1.
Recognizing treelike kdissimilarities
, 2011
"... A kdissimilarity D on a finite set X, X  ≥ k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edgeweighted trees T with leafset X: Given a subset Y of X of size k, D(Y) is defined to be the total length of the smallest subtree of T with leafset ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A kdissimilarity D on a finite set X, X  ≥ k, is a map from the set of size k subsets of X to the real numbers. Such maps naturally arise from edgeweighted trees T with leafset X: Given a subset Y of X of size k, D(Y) is defined to be the total length of the smallest subtree of T with leafset Y. In case k = 2, it is wellknown that 2dissimilarities arising in this way can be characterized by the socalled “4point condition”. However, in case k> 2 Pachter and Speyer recently posed the following question: Given an arbitrary kdissimilarity, how do we test whether this map comes from a tree? In this paper, we provide an answer to this question, showing that for k ≥ 3 a kdissimilarity on a set X arises from a tree if and only if its restriction to every 2kelement subset of X arises from some tree, and that 2k is the least possible subset size to ensure that this is the case. As a corollary, we show that there exists a polynomialtime algorithm to determine when a kdissimilarity arises from a tree. We also give a 6point condition for determining when a 3dissimilarity arises from a tree, that is similar to the aforementioned 4point condition. 1
On The Hardness of Inferring Phylogenies from TripletDissimilarities
, 2007
"... This work considers the problem of reconstructing a phylogenetic tree from triplet dissimilarities, which are dissimilarities defined over taxontriplets. Triplet dissimilarities are possibly the simplest generalization of pairwise dissimilarities, and were used for phylogenetic reconstructions in th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This work considers the problem of reconstructing a phylogenetic tree from triplet dissimilarities, which are dissimilarities defined over taxontriplets. Triplet dissimilarities are possibly the simplest generalization of pairwise dissimilarities, and were used for phylogenetic reconstructions in the past few years. We study the hardness of finding a tree best fitting a given tripletdissimilarity table under the ℓ ∞ norm. We show that the corresponding decision problem is NPhard and that the corresponding optimization problem cannot be approximated in polynomial time within a constant multiplicative factor smaller than 1.4. On the positive side, we present a polynomial time constantrate approximation algorithm for this problem. We also address the issue of bestfit under maximal distortion, which corresponds to the largest ratio between matching entries in two tripletdissimilarity tables. We show that it is NPhard to approximate the corresponding optimization problem within any constant multiplicative factor. 1
Inferring Phylogenies from LCADistances
, 2006
"... Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology, from both practical and theoretical considerations. We study a representation of trees by distances to LCAs (Least Common Ancestors), which seems to combine the theo ..."
Abstract
 Add to MetaCart
(Show Context)
Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology, from both practical and theoretical considerations. We study a representation of trees by distances to LCAs (Least Common Ancestors), which seems to combine the theoretical advantages of the Farris transform of additive distances to ultrametrics, with the practical advantages and flexibility of neighbor joining algorithms. We present a characterization of edgeweighted trees using LCA distances. The approach we study preserves many nice properties of ultrametrics, but it bypasses the need to transform additive distances to ultrametric distances (via the Farris transform). This consequently provides a neighbor joining criterion, and a family of algorithms, which are simpler and more efficient than Saitou&Nei’s NJ. One variant of this family is shown to find the unique dominant LCAmatrix for any nonnegative symmetric matrix, and then used to provide a simpler version of the known 3approximation of an arbitrary metric by an additive metric. Our family of algorithms, unlike NJ, is pivotal, in the sense that when the input is not consistent with some tree, the output tree may depend on a roottaxon selected by the algorithm. We present experimental results on data generated according to a well accepted evolutionary model. These experiments indicate that for the right selection of the root, the tree returned by our algorithm is likely to be topologically closer to the true tree than the one returned by NJ. The experimental results also indicate that selecting a roottaxon with smaller evolutionary distance is likely to produce a more accurate tree. A somewhat surprising phenomenon demonstrated by our results is that in this evolutionary model, trees which best approximate the input distances are usually not the trees which best approximate the correct topology. Chapter 1