Results 1 -
9 of
9
Evaluation Measures of Multiple Sequence Alignments
- J Comput Biol
, 1996
"... Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of structure, functionality and ultimately evolution of proteins. A new algorithm, the Circular Sum (CS) method, is present ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
(Show Context)
Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of structure, functionality and ultimately evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection be...
Multiple sequence alignment with evolutionary computation,”
- Genetic Programming and Evolvable Machines,
, 2004
"... Abstract. In this paper we provide a brief review of current work in the area of multiple sequence alignment (MSA) for DNA and protein sequences using evolutionary computation (EC). We detail the strengths and weaknesses of EC techniques for MSA. In addition, we present two novel approaches for inf ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Abstract. In this paper we provide a brief review of current work in the area of multiple sequence alignment (MSA) for DNA and protein sequences using evolutionary computation (EC). We detail the strengths and weaknesses of EC techniques for MSA. In addition, we present two novel approaches for inferring MSA using genetic algorithms. Our first novel approach utilizes a GA to evolve an optimal guide tree in a progressive alignment algorithm and serves as an alternative to the more traditional heuristic techniques such as neighbor-joining. The second novel approach facilitates the optimization of a consensus sequence with a GA using a vertically scalable encoding scheme in which the number of iterations needed to find the optimal solution is approximately the same regardless the number of sequences being aligned. We compare both of our novel approaches to the popular progressive alignment program Clustal W. Experiments have confirmed that EC constitutes an attractive and promising alternative to traditional heuristic algorithms for MSA.
On the minimum common integer partition problem.
- In ACM Transactions on Computational Logic, V,
, 2008
"... Abstract. We introduce a new combinatorial optimization problem in this paper, called the Minimum Common Integer Partition (MCIP) problem, which was inspired by computational biology applications including ortholog assignment and DNA fingerprint assembly. A partition of a positive integer n is a mu ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We introduce a new combinatorial optimization problem in this paper, called the Minimum Common Integer Partition (MCIP) problem, which was inspired by computational biology applications including ortholog assignment and DNA fingerprint assembly. A partition of a positive integer n is a multiset of positive integers that add up to exactly n, and an integer partition of a multiset S of integers is defined as the multiset union of partitions of integers in S. Given a sequence of multisets S1, · · · , S k of integers, where k ≥ 2, we say that a multiset is a common integer partition if it is an integer partition of every multiset Si, 1 ≤ i ≤ k. The MCIP problem is thus defined as to find a common integer partition of S1, · · · , S k with the minimum cardinality. It is easy to see that the MCIP problem is NP-hard since it generalizes the wellknown Set Partition problem. We can in fact show that it is APX-hard. We will also present a -approximation algorithm for k ≥ 3.
GESTALT: Genomic Steiner Alignments
- In CPM
, 1999
"... . We describe GESTALT (GEnomic sequences STeiner ALignmenT) , a public{domain suite of programs for generating multiple alignments of a set of biosequences. We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our method is ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
. We describe GESTALT (GEnomic sequences STeiner ALignmenT) , a public{domain suite of programs for generating multiple alignments of a set of biosequences. We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our method is that the alignment is obtained via a tree in which the internal nodes (ancestors) are labeled by Steiner sequences for triples of the input sequences. Given lists of candidate labels for the ancestral sequences, we use dynamic programming to choose an optimal labeling under either objective function. Finally, the fully labeled tree of sequences is turned into into a multiple alignment. Enhancements in our implementation include the traditional space-saving ideas of Hirschberg as well as new data-packing techniques. The running-time bottleneck of computing exact Steiner sequences is handled by a highly eective but much faster heuristic alternative. Finally, other modules in the suit...
A Simplified Proof of the NP- and MAX SNP-hardness of Multiple Sequence Tree Alignment
- Journal of Computational Biology
, 1995
"... this paper has one or more columns composed purely of indels. The cost of an alignment is generally computed relative to a given \Gamma \Theta \Gamma symmetric matrix ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
this paper has one or more columns composed purely of indels. The cost of an alignment is generally computed relative to a given \Gamma \Theta \Gamma symmetric matrix
SALSA: Sequence ALignment via Steiner Ancestors
"... . We describe SALSA (Sequence ALignment via Steiner Ancestors) , a public--domain suite of programs for generating multiple alignments of a set of genomic sequences. We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our m ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
. We describe SALSA (Sequence ALignment via Steiner Ancestors) , a public--domain suite of programs for generating multiple alignments of a set of genomic sequences. We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our method is that the alignment is obtained via a tree in which the internal nodes (ancestors) are labeled by Steiner sequences for triples of the input sequences. Given lists of candidate labels for the ancestral sequences, we use dynamic programming to choose an optimal labeling under either objective functions. Finally, the fully labeled tree of sequences is turned into into a multiple alignment. Enhancements in our implementation include the traditional space-saving ideas of Hirschberg as well as new data-packing techniques. The running-time bottleneck of computing exact Steiner sequences is handled by a highly effective but much faster heuristic alternative. Finally, other modules ...
An Efficient Algorithm for Multiple Sequence Alignment
"... The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. In this paper, we first propose an efficient group alignment method to perform the alignmen ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. In this paper, we first propose an efficient group alignment method to perform the alignment between two groups of sequences. Its time complexity is O mnL 1 L 2 # , where m and n are the number of sequences in the two groups, L 1 and L 2 are the length of the sequences in the two groups. Then we propose a clustering method to build the tree topology for merging, which is a top-down heuristics. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. The time complexity of our MSA algorithm is O # , where n is the number of sequences and L is the maximum length of all sequences. By our experiments, both the alignment quality and required time of our algorithm are better than Clustal W algorithm (using the neighboring joining method).
Protein structure prediction based on secondary structure alignment
- Proceedings of 2004 Symposium on Digital Life and Internet Technologies(Abstract, full text in CD
, 2004
"... In molecular biology, sequence alignment is a fundamental but powerful technique. Biologists find the similarity, the difference and even the function of the input sequences (DNA, RNA and protein sequences) by it. With various purposes, there are many algorithms to align two sequences based on diffe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In molecular biology, sequence alignment is a fundamental but powerful technique. Biologists find the similarity, the difference and even the function of the input sequences (DNA, RNA and protein sequences) by it. With various purposes, there are many algorithms to align two sequences based on different criteria. Though there are various ways to align macromolecular sequences, a sequence alignment algorithm traditionally considers only the primary structure, which is the amino acid chain. In this paper, we present a new algorithm which aligns sequences with consideration of their secondary structures. When we make use of the information of protein secondary structure such as α helix, β sheet etc., the sensitivity of pairwise alignment can be improved. 1