• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A few logs suffice to build (almost) all trees (I). Random Structures and Algorithms, (1999)

by P L Erdos, L A Szekely, M A Steel, T J Warnow
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 121
Next 10 →

Neighbor-Net: An agglomerative method for the construction of phylogenetic networks

by David Bryant, Vincent Moulton , 2003
"... ..."
Abstract - Cited by 317 (9 self) - Add to MetaCart
Abstract not found

Maximum likelihood of evolutionary trees: hardness and approximation

by Benny Chor, Tamir Tuller , 2005
"... ..."
Abstract - Cited by 64 (5 self) - Add to MetaCart
Abstract not found

Reconstruction of Markov random fields from samples: Some easy observations and algorithms

by Guy Bresler, Elchanan Mossel, Allan Sly , 2008
"... Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the und ..."
Abstract - Cited by 54 (4 self) - Add to MetaCart
Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild non-degeneracy conditions it reconstructs the generating graph with high probability using Θ(d log n) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide explicit O(n d+2 log n) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n 2 log n) for all fixed d. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies non-identifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered. 1
(Show Context)

Citation Context

...re are hidden models. For trees, given data that is generated from the model, the tree can be reconstructed efficiently from samples at a subset of the nodes given mild non-degeneracy conditions. See =-=[12, 13, 11]-=- for some of the most recent and tightest results in this setup. The most closely related works are [3] and [5]. These can be compared in terms of sampling complexity, running time as well as the gene...

Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology

by Monika Rauch Henzinger, Valerie King, Tandy Warnow
"... We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaf-labeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks wh ..."
Abstract - Cited by 47 (2 self) - Add to MetaCart
We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaf-labeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks whether there exists a tree T such that for every i, T jL(T i ) is homeomorphic to T i . We present algorithms which test if a given set of trees has a consensus tree and if so, construct one. The deterministic algorithm takes time minfO(Nn 1=2 ); O(N + n 2 log n)g, where N = P i jT i j, and uses linear space. The randomized algorithm takes time O(N log 3 n) and uses linear space. The previous best for this problem was an 1981 O(Nn) algorithm by Aho et al. Our faster deterministic algorithm uses a new efficient algorithm for the following interesting dynamic graph problem: Given a graph G with n nodes and m edges and a sequence of b batches of one or more edge deletions, then a...
(Show Context)

Citation Context

... to reconstruct trees is the observation that very large data sets may simply be hard to analyze using existing approaches, even if the underlying optimization problems were to be solved exactly (see =-=[13]-=-, [29], and [5]). One proposal [25] for handling these difficulties is to separate the data set into overlapping subsets, each of which is amenable to analysis, and then to combine the subtrees that r...

Learning Nonsingular Phylogenies and Hidden Markov Models

by Elchanan Mossel, Sébastien Roch - Proceedings of the thirty-seventh annual ACM Symposium on Theory of computing, Baltimore (STOC05 , 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract - Cited by 45 (7 self) - Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomial-time algorithm for learning nonsingular phylogenies and hidden Markov models.

Learning Latent Tree Graphical Models

by Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, Alan S. Willsky - J. of Machine Learning Research , 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract - Cited by 44 (12 self) - Add to MetaCart
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
(Show Context)

Citation Context

...le and the unknown phylogenetic tree is to be inferred from these sequences. See Durbin et al. (1999) for a thorough overview. Efficient algorithms with provable performance guarantees are available (=-=Erdős et al., 1999-=-; Daskalakis et al., 2006). However, the works in this area mostly assume that only the leaves are observed and each internal node (which is hidden) has the same degree except for the root. The most p...

Optimal phylogenetic reconstruction

by Constantinos Daskalakis, Elchanan Mossel, Sébastien Roch - In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing , 2006
"... One of the major tasks of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. This problem is of critical importance in almost all areas of biology and has a very clear mathematical formulation. The evolutionary model is given by a Markov chain on the true evolution ..."
Abstract - Cited by 43 (9 self) - Add to MetaCart
One of the major tasks of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. This problem is of critical importance in almost all areas of biology and has a very clear mathematical formulation. The evolutionary model is given by a Markov chain on the true evolutionary tree. Given samples from this Markov chain at the leaves of the tree, the goal is to reconstruct the evolutionary tree. It is crucial to minimize the number of samples, i.e., the length of genetic sequences, as it is constrained by the underlying biology, the price of sequencing etc. It is well known that in order to reconstruct a tree on n leaves, sequences of length Ω(log n) are needed. It was conjectured by M. Steel that for the CFN evolutionary model, if the mutation probability on all edges of the tree is less than p ∗ = ( √ 2 −1)/2 3/2 than the tree can be recovered from sequences of length O(log n). This was proven by the second author in the special case where the tree is “balanced”. The second author also proved that if all edges have mutation probability larger than p ∗ then the length needed is n Ω(1). This “phase-transition ” in the number of samples needed is closely related to the phase transition for the reconstruction problem (or extremality of free measure) studied extensively in statistical physics and probability.

Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining

by Katherine St. John, Tandy Warnow, Bernard M. E. Moret, Lisa Vawter , 2003
"... ..."
Abstract - Cited by 41 (9 self) - Add to MetaCart
Abstract not found

Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model

by Mary Cryan, Leslie Ann Goldberg, Paul W. Goldberg, W. Goldberg - SIAM Journal on Computing , 1998
"... The j-State General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the well-known Cavender-FarrisNeyman model of evolution by removing the sy ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
The j-State General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the well-known Cavender-FarrisNeyman model of evolution by removing the symmetry restriction (which requires that the probability that a `0' turns into a `1' along an edge is the same as the probability that a `1' turns into a `0' along the edge). Farach and Kannan showed how to PAClearn Markov Evolutionary Trees in the Cavender-Farris-Neyman model provided that the target tree satisfies the additional restriction that all pairs of leaves have a sufficiently high probability of being the same. We show how to remove both restrictions and thereby obtain the first polynomial-time PAC-learning algorithm (in the sense of Kearns et al.) for the general class of Two-State Markov Evolutionary Trees. Research Report RR347, Department of Computer Science, University of Wa...
(Show Context)

Citation Context

...) To avoid getting bogged down in detail, we work with a binary alphabet. Thus, we will consider Two-State Markov Evolutionary Trees. Following Farach and Kannan [9], Erdös, Steel, Székely and Warnow =-=[7, 8]-=- and ∗ This was previously Research Report RR347, Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom. A preliminary version of this paper appears in the proceeding...

Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. arXiv:0710.0262v2

by Elchanan Mossel, Sebastien Roch , 2007
"... We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochast ..."
Abstract - Cited by 36 (0 self) - Add to MetaCart
We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors. 1
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University