Results 1 - 10
of
121
Neighbor-Net: An agglomerative method for the construction of phylogenetic networks
, 2003
"... ..."
Reconstruction of Markov random fields from samples: Some easy observations and algorithms
, 2008
"... Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the und ..."
Abstract
-
Cited by 54 (4 self)
- Add to MetaCart
(Show Context)
Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild non-degeneracy conditions it reconstructs the generating graph with high probability using Θ(d log n) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide explicit O(n d+2 log n) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n 2 log n) for all fixed d. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies non-identifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered. 1
Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology
"... We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaf-labeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks wh ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
(Show Context)
We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaf-labeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks whether there exists a tree T such that for every i, T jL(T i ) is homeomorphic to T i . We present algorithms which test if a given set of trees has a consensus tree and if so, construct one. The deterministic algorithm takes time minfO(Nn 1=2 ); O(N + n 2 log n)g, where N = P i jT i j, and uses linear space. The randomized algorithm takes time O(N log 3 n) and uses linear space. The previous best for this problem was an 1981 O(Nn) algorithm by Aho et al. Our faster deterministic algorithm uses a new efficient algorithm for the following interesting dynamic graph problem: Given a graph G with n nodes and m edges and a sequence of b batches of one or more edge deletions, then a...
Learning Nonsingular Phylogenies and Hidden Markov Models
- Proceedings of the thirty-seventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomial-time algorithm for learning nonsingular phylogenies and hidden Markov models.
Learning Latent Tree Graphical Models
- J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract
-
Cited by 44 (12 self)
- Add to MetaCart
(Show Context)
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
Optimal phylogenetic reconstruction
- In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
, 2006
"... One of the major tasks of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. This problem is of critical importance in almost all areas of biology and has a very clear mathematical formulation. The evolutionary model is given by a Markov chain on the true evolution ..."
Abstract
-
Cited by 43 (9 self)
- Add to MetaCart
One of the major tasks of evolutionary biology is the reconstruction of phylogenetic trees from molecular data. This problem is of critical importance in almost all areas of biology and has a very clear mathematical formulation. The evolutionary model is given by a Markov chain on the true evolutionary tree. Given samples from this Markov chain at the leaves of the tree, the goal is to reconstruct the evolutionary tree. It is crucial to minimize the number of samples, i.e., the length of genetic sequences, as it is constrained by the underlying biology, the price of sequencing etc. It is well known that in order to reconstruct a tree on n leaves, sequences of length Ω(log n) are needed. It was conjectured by M. Steel that for the CFN evolutionary model, if the mutation probability on all edges of the tree is less than p ∗ = ( √ 2 −1)/2 3/2 than the tree can be recovered from sequences of length O(log n). This was proven by the second author in the special case where the tree is “balanced”. The second author also proved that if all edges have mutation probability larger than p ∗ then the length needed is n Ω(1). This “phase-transition ” in the number of samples needed is closely related to the phase transition for the reconstruction problem (or extremality of free measure) studied extensively in statistical physics and probability.
Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining
, 2003
"... ..."
Evolutionary Trees can be Learned in Polynomial Time in the Two-State General Markov Model
- SIAM Journal on Computing
, 1998
"... The j-State General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the well-known Cavender-FarrisNeyman model of evolution by removing the sy ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
(Show Context)
The j-State General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the well-known Cavender-FarrisNeyman model of evolution by removing the symmetry restriction (which requires that the probability that a `0' turns into a `1' along an edge is the same as the probability that a `1' turns into a `0' along the edge). Farach and Kannan showed how to PAClearn Markov Evolutionary Trees in the Cavender-Farris-Neyman model provided that the target tree satisfies the additional restriction that all pairs of leaves have a sufficiently high probability of being the same. We show how to remove both restrictions and thereby obtain the first polynomial-time PAC-learning algorithm (in the sense of Kearns et al.) for the general class of Two-State Markov Evolutionary Trees. Research Report RR347, Department of Computer Science, University of Wa...
Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. arXiv:0710.0262v2
, 2007
"... We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochast ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors. 1