Results 1 
7 of
7
Efficient ContinuousTime Markov Chain Estimation
"... Many problems of practical interest rely on Continuoustime Markov chains (CTMCs) defined over combinatorial state spaces, rendering the computation of transition probabilities, and hence probabilistic inference, difficult or impossible with existing methods. For problems with countably infinite sta ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Many problems of practical interest rely on Continuoustime Markov chains (CTMCs) defined over combinatorial state spaces, rendering the computation of transition probabilities, and hence probabilistic inference, difficult or impossible with existing methods. For problems with countably infinite states, where classical methods such as matrix exponentiation are not applicable, the main alternative has been particle Markov chain Monte Carlo methods imputing both the holding times and sequences of visited states. We propose a particlebased Monte Carlo approach where the holding times are marginalized analytically. We demonstrate that in a range of realistic inferential setups, our scheme dramatically reduces the variance of the Monte Carlo approximation and yields more accurate parameter posterior approximations given a fixed computational budget. These experiments are performed on both synthetic and real datasets, drawing from two important examples of CTMCs having combinatorial state spaces: stringvalued mutation models in phylogenetics and nucleic acid folding pathways. 1.
A Note on Probabilistic Models over Strings: the Linear Algebra Approach
, 2013
"... Probabilistic models over strings have played a key role in developing methods allowing indels to be treated as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important ..."
Abstract
 Add to MetaCart
(Show Context)
Probabilistic models over strings have played a key role in developing methods allowing indels to be treated as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question in the field is the complexity of computing the normalization of a class of stringvalued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a new proof of a known result on the complexity of inference on TKF91, a wellknown probabilistic model over strings. Our proof uses a different approach based on classical linear algebra results, and is in some cases easier to extend to other models. The proving method also has consequences on the implementation and complexity of inference algorithms.
A rticle Fast T rack Simultaneous Bayesian Estimation of Alignment and Phylogeny under a Joint Model of Protein Sequence and Structure
"... For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally divergemore slowly than sequences. In t ..."
Abstract
 Add to MetaCart
For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally divergemore slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence–structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequencebased inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set. Key words: structural alignment, Bayesian phylogenetics, statistical alignment, globin evolution, stochastic processes.
A rticle Simultaneous Bayesian Estimation of Alignment and Phylogeny under a Joint Model of Protein Sequence and Structure
"... For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally divergemore slowly than sequences. In t ..."
Abstract
 Add to MetaCart
For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally divergemore slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence–structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequencebased inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set. Key words: structural alignment, Bayesian phylogenetics, statistical alignment, globin evolution, stochastic processes.
Supplementary Document
"... This Supplement contains the proofs and pseudocodes of the methods referenced in the Methodology section of the paper, as well as supplemental figures and tables for the Numerical examples section of the paper. The proofs involve the basic properties of our CTMC approach, and provide rigorous justi ..."
Abstract
 Add to MetaCart
This Supplement contains the proofs and pseudocodes of the methods referenced in the Methodology section of the paper, as well as supplemental figures and tables for the Numerical examples section of the paper. The proofs involve the basic properties of our CTMC approach, and provide rigorous justification for the algorithms presented in the paper. The pseudocodes illustrate the overviews of the steps of different methods used in the manuscript. The supplemental figures and tables also provide more extensive justification for the practical applicability of our method to both the phylogenetic and RNA settings. 1