Results 1 - 10
of
212
On the complexity of multiple sequence alignment
- J. Comp. Biol
, 1994
"... We study the computational complexity oftwo popular problems in multiple sequence alignment: multiple alignment with SP-score and multiple tree alignment. It is shown that the rst problem is NP-complete and the second is MAX SNP-hard. The complexity of tree alignment with a given phylogeny is also c ..."
Abstract
-
Cited by 328 (12 self)
- Add to MetaCart
(Show Context)
We study the computational complexity oftwo popular problems in multiple sequence alignment: multiple alignment with SP-score and multiple tree alignment. It is shown that the rst problem is NP-complete and the second is MAX SNP-hard. The complexity of tree alignment with a given phylogeny is also considered. Key words: multiple sequence alignment, evolutionary tree, SP-score, computational complexity, approximation algorithm. 1 Introduction. Multiple sequence alignment is one of the most important and challenging problems in computational biology [16, 17]. It plays an essential role in two related areas of molecular biology: nding highly conserved subregions among a set of biological sequences, and inferring the evolutionary history of some species from their associated sequences. A huge number of papers
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 256 (10 self)
- Add to MetaCart
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm
- BIOINFORMATICS
, 1998
"... Motivation: The discovery of motifs in biological sequences is an important problem. Results: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (us ..."
Abstract
-
Cited by 231 (14 self)
- Add to MetaCart
Motivation: The discovery of motifs in biological sequences is an important problem. Results: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasi-linear to the size of the generated output.
Web data extraction based on partial tree alignment
, 2005
"... This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items/fields from them and put the data in a database table. This problem has been studied by several researchers. However, ex ..."
Abstract
-
Cited by 141 (5 self)
- Add to MetaCart
(Show Context)
This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items/fields from them and put the data in a database table. This problem has been studied by several researchers. However, existing methods still have some serious limitations. The first class of methods is based on machine learning, which requires human labeling of many examples from each Web site that one is interested in extracting data from. The process is time consuming due to the large number of sites and pages on the Web. The second class of algorithms is based on automatic pattern discovery. These methods are either inaccurate or make many assumptions. This paper proposes a new method to perform the task automatically. It consists of two steps, (1) identifying individual data records in a page, and (2) aligning and extracting data items from the identified data records. For step 1, we propose a method based on visual information to segment data records, which is more accurate than existing methods. For step 2, we propose a novel partial alignment technique based on tree matching. Partial alignment means that we align only those data fields in a pair of data records that can be aligned (or matched) with certainty, and make no commitment on the rest of the data fields. This approach enables very accurate alignment of multiple data records. Experimental results using a large number of Web pages from diverse domains show that the proposed two-step technique is able to segment data records, align and extract data from them very accurately.
Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment
, 1996
"... The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune th ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune the basic dynamic programming graph. We have made substantial improvements in the time and space usage of MSA. The improvements make feasible a variety of problem instances that were not feasible previously. On some runs we achieve an order of magnitude reduction in space usage and a significant multiplicative factor speedup in running time. To explain that these improvements work, we give a much more detailed description of MSA than has been previously available. In practice, MSA rarely produces a provably optimal alignment and we explain why.
Pure multiple RNA secondary structure alignments: a progressive profile approach
- IEEE/ACM Trans. Comput. Biol. Bioinform
"... Abstract—In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Stru ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
(Show Context)
Abstract—In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Structure comparison is, in analogy to sequence comparison, the essential technique to infer related function. We provide a method for computing multiple alignments of RNA secondary structures under the tree alignment model, which is suitable to cluster RNA molecules purely on the structural level, i.e., sequence similarity is not required. We give a systematic generalization of the profile alignment method from strings to trees and forests. We introduce a tree profile representation of RNA secondary structure alignments which allows reasonable scoring in structure comparison. Besides the technical aspects, an RNA profile is a useful data structure to represent multiple structures of RNA sequences. Moreover, we propose a visualization of RNA consensus structures that is enriched by the full sequence information. Index Terms—Alignment of trees, RNA secondary structures, noncoding RNAs.
Anytime heuristic search
- Journal of Artificial Intelligence Research (JAIR
, 2007
"... We describe how to convert the heuristic search algorithm A * into an anytime algorithm that finds a sequence of improved solutions and eventually converges to an optimal solution. The approach we adopt uses weighted heuristic search to find an approximate solution quickly, and then continues the we ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
(Show Context)
We describe how to convert the heuristic search algorithm A * into an anytime algorithm that finds a sequence of improved solutions and eventually converges to an optimal solution. The approach we adopt uses weighted heuristic search to find an approximate solution quickly, and then continues the weighted search to find improved solutions as well as to improve a bound on the suboptimality of the current solution. When the time available to solve a search problem is limited or uncertain, this creates an anytime heuristic search algorithm that allows a flexible tradeoff between search time and solution quality. We analyze the properties of the resulting Anytime A * algorithm, and consider its performance in three domains; sliding-tile puzzles, STRIPS planning, and multiple sequence alignment. To illustrate the generality of this approach, we also describe how to transform the memoryefficient search algorithm Recursive Best-First Search (RBFS) into an anytime algorithm. 1.
Multiple Sequence Alignment in Phylogenetic Analysis
, 2000
"... Multiple sequence alignment is discussed in light of homology assessments in phylogenetic research. Pairwise and multiple alignment methods are reviewed as exact and heuristic procedures. Since the object of alignment is to create the most efficient statement of initial homology, methods that minimi ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
Multiple sequence alignment is discussed in light of homology assessments in phylogenetic research. Pairwise and multiple alignment methods are reviewed as exact and heuristic procedures. Since the object of alignment is to create the most efficient statement of initial homology, methods that minimize nonhomology are to be favored. Therefore, among all possible alignments, the one that satisfies the phylogenetic optimality criterion the best should be considered the best alignment. Since all homology statements are subject to testing and explanation this way, consistency of optimality criteria is desirable. This consistency is based on the treatment of alignment gaps as character information and the consistent use of a cost function (e.g., insertion–deletion, transversion, and
A Polynomial Time Approximation Scheme for Minimum Routing Cost Spanning Trees
, 1998
"... Given an undirected graph with nonnegative costs on the edges, the routing cost of any of its spanning trees is the sum over all pairs of vertices of the cost of the path between the pair in the tree. Finding a spanning tree of minimum routing cost is NP-hard, even when the costs obey the triangle i ..."
Abstract
-
Cited by 45 (8 self)
- Add to MetaCart
(Show Context)
Given an undirected graph with nonnegative costs on the edges, the routing cost of any of its spanning trees is the sum over all pairs of vertices of the cost of the path between the pair in the tree. Finding a spanning tree of minimum routing cost is NP-hard, even when the costs obey the triangle inequality. We show that the general case is in fact reducible to the metric case and present a polynomial-time approximation scheme valid for both versions of the problem. In particular, we show how to build a spanning tree of an n-vertex weighted graph with routing cost within (1 + ffl) from the minimum in time O(n O( 1 ffl ) ). Besides the obvious connection to network design, trees with small routing cost also find application in the construction of good multiple sequence alignments in computational biology. The communication cost spanning tree problem is a generalization of the minimum routing cost tree problem where the routing costs of different pairs are weighted by different r...
Intrusion Detection: A Bioinformatics Approach
- In : 19th Annual Computer Security Applications Conferences, Las Vegas, Nevada
, 2003
"... This paper addresses the problem of detecting masquerading, a security attack in which an intruder assumes the identity of a legitimate user. Many approaches based on Hidden Markov Models and various forms of Finite State Automata have been proposed to solve this problem. The novelty of our approach ..."
Abstract
-
Cited by 40 (8 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of detecting masquerading, a security attack in which an intruder assumes the identity of a legitimate user. Many approaches based on Hidden Markov Models and various forms of Finite State Automata have been proposed to solve this problem. The novelty of our approach results from the application of techniques used in bioinformatics for a pair-wise sequence alignment to compare the monitored session with past user behavior. Our algorithm uses a semi-global alignment and a unique scoring system to measure similarity between a sequence of commands produced by a potential intruder and the user signature, which is a sequence of commands collected from a legitimate user. We tested this algorithm on the standard intrusion data collection set. As discussed in the paper, the results of the test showed that the described algorithm yields a promising combination of intrusion detection rate and false positive rate, when compared to published intrusion detection algorithms.