Results 1  10
of
69
A LinearTime Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study
 Journal of Computational Biology
, 2001
"... Hannenhalli and Pevzner gave the first polynomialtime algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to dist ..."
Abstract

Cited by 151 (15 self)
 Add to MetaCart
Hannenhalli and Pevzner gave the first polynomialtime algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to distance calculation) proceeds in two stages: in the first stage, the overlap graph induced by the permutation is decomposed into connected components; then, in the second stage, certain graph structures (hurdles and others) are identified. Berman and Hannenhalli avoided the explicit computation of the overlap graph and gave an O(n alpha(n)) algorithm, based on a UnionFind structure, to find its connected components, where a is the inverse Ackerman function. Since for all practical purposes alpha(n) is a constant no larger than four, this algorithm has been the fastest practical algorithm to date. In this paper, we present a new lineartime algorithm for computing the connected components, which is more efficient than that of Berman and Hannenhalli in both theory and practice. Our algorithm uses only a stack and is very easy to implement. We give the results of computational experiments over a large range of permutation pairs produced through simulated evolution; our experiments show a speedup by a factor of 2 to 5 in the computation of the connected components and by a factor of 1.3 to 2 in the overall distance computation.
Multiple Genome Rearrangement and Breakpoint Phylogeny
, 1998
"... Multiple alignment of macromolecular sequences generalizes from N = 2 to N # 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Geneorder sequences diverge through nonlocal genome rearrangement processes such as inversion ..."
Abstract

Cited by 103 (16 self)
 Add to MetaCart
Multiple alignment of macromolecular sequences generalizes from N = 2 to N # 3 the comparison of N sequences which have diverged through the local processes of insertion, deletion and substitution. Geneorder sequences diverge through nonlocal genome rearrangement processes such as inversion (or reversal) and transposition. In this paper we show which formulations of multiple alignment have counterparts in multiple rearrangement. Based on di#culties inherent in rearrangement editdistance calculation and interpretation, we argue for the simpler "breakpoint analysis ". Consensusbased multiple rearrangement of N # 3 orders can be solved exactly through reduction to instances of the Travelling Salesman Problem (TSP). We propose a branchandbound solution to TSP particularly suited to these instances. Simulations show how nonuniqueness of the solution is attenuated with increasing numbers of data genomes. Treebased multiple alignment can be achieved to a great degree o...
Gene Order Breakpoint Evidence in Animal Mitochondrial Phylogeny
 In Proc. of COCOON
, 1999
"... Multiple genome rearrangement methodology facilitates the inference of animal phylogeny from gene orders on the mitochondrial genome. Breakpoint distance is preferable to other, highly correlated but computationally more difficult, genomic distances when applied to these data. A number of theori ..."
Abstract

Cited by 85 (14 self)
 Add to MetaCart
Multiple genome rearrangement methodology facilitates the inference of animal phylogeny from gene orders on the mitochondrial genome. Breakpoint distance is preferable to other, highly correlated but computationally more difficult, genomic distances when applied to these data. A number of theories of metazoan evolution are compared to phylogenies reconstructed by ancestral genome optimization, using a minimal total breakpoints criterion. The notion of unambiguously reconstructed segments is introduced as a way of extracting the invariant aspects of multiple solutions for a given ancestral genome; this enables a detailed reconstruction of the evolution of nontRNA mitochondrial gene order. 1 Introduction. In comparative genomics, the quantitative comparison of gene order differences can be used for phylogenetic inference about a set of organisms. This generally involves methods based on distance matrices (e.g. Sankoff et al., 1992), though it would be of more interest to empl...
An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae
, 2000
"... The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by Blanchette et al.. It sought to reconstruct the breakpoint phylogeny and was applied to a variety of datasets. We present a new heuristic for estimating the breakpoint phylogeny which, although not pol ..."
Abstract

Cited by 53 (20 self)
 Add to MetaCart
The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by Blanchette et al.. It sought to reconstruct the breakpoint phylogeny and was applied to a variety of datasets. We present a new heuristic for estimating the breakpoint phylogeny which, although not polynomialtime, is much faster in practice than BPAnalysis. We use this heuristic to conduct a phylogenetic analysis of chloroplast genomes in the flowering plant family Campanulaceae. We also present and discuss the results of experimentation on this real dataset with three methods: our new method, BPAnalysis, and the neighborjoining method, using breakpoint distances, inversion distances, and inversion plus transposition distances. 1
Steps Toward Accurate Reconstructions of Phylogenies from GeneOrder Data
 J. COMPUT. SYST. SCI
, 2002
"... ..."
(Show Context)
BioPerf: A benchmark suite to evaluate highperformance computer architecture on bioinformatics applications
 In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC
"... The exponential growth in the amount of genomic data has spurred growing interest in large scale analysis of genetic information. Bioinformatics applications, which explore computational methods to allow researchers to sift through the massive biological data and extract useful information, are beco ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
(Show Context)
The exponential growth in the amount of genomic data has spurred growing interest in large scale analysis of genetic information. Bioinformatics applications, which explore computational methods to allow researchers to sift through the massive biological data and extract useful information, are becoming increasingly important computer workloads. This paper presents BioPerf, a benchmark suite of representative bioinformatics applications to facilitate the design and evaluation of highperformance computer architectures for these emerging workloads. Currently, the BioPerf suite contains codes from 10 highly popular bioinformatics packages and covers the major fields of study in computational biology such as sequence comparison, phylogenetic reconstruction, protein structure prediction, and sequence homology & gene finding. We demonstrate the use of BioPerf by providing simulation points of precompiled Alpha binaries and with a performance study on IBM Power using IBM Mambo simulations crosscompared with Apple G5 executions. The BioPerf suite (available from www.bioperf.org) includes benchmark source code, input datasets of various sizes, and information for compiling and using the benchmarks. Our benchmark suite includes parallel codes where available. 1.
Parameterized Complexity for the Skeptic
 In Proc. 18th IEEE Annual Conference on Computational Complexity
, 2003
"... The goal of this article is to provide a tourist guide, with an eye towards structural issues, to what I consider some of the major highlights of parameterized complexity. ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
The goal of this article is to provide a tourist guide, with an eye towards structural issues, to what I consider some of the major highlights of parameterized complexity.
A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genome
 PLoS Comput. Biol
, 1000
"... The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a longstanding problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes ..."
Abstract

Cited by 41 (19 self)
 Add to MetaCart
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a longstanding problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well
Scaling up accurate phylogenetic reconstruction from geneorder data
, 2002
"... Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encod ..."
Abstract

Cited by 36 (14 self)
 Add to MetaCart
Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a twostep approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated diskcovering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCMGRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on geneorder data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact: