Results 1 - 10
of
22
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 84 (5 self)
- Add to MetaCart
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
Multiple sequence alignment using evolutionary programming
- Proceedings of the First Congress of Evolutionary Computation (CEC-1999
, 1999
"... Abstract- Multiple sequence alignment can be used as a tool for the identification of common structure in an ordered string of nucleotides (in DNA or RNA) or amino acids (in proteins). Current multiple sequence alignment algorithms work well for sequences with high similarity but do not scale well w ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Abstract- Multiple sequence alignment can be used as a tool for the identification of common structure in an ordered string of nucleotides (in DNA or RNA) or amino acids (in proteins). Current multiple sequence alignment algorithms work well for sequences with high similarity but do not scale well when either the length or number of the sequences is large or if the similarity is low. The focus of this paper is to develop an evolutionary programming (EP) algorithm for multiple sequence alignment. An EP method with representation specific variation operators is proposed and tested on several data sets. Comparisons to other algorithms suggests that this algorithm is well-suited to the multiple sequence alignment problem. 1.
M-Coffee: combining multiple sequence alignment methods with T-Coffee
- Nucleic Acids Res
, 2006
"... We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to varia ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from
An eulerian path approach to global multiple alignment for DNA sequences
- J. Comput. Biol
, 2003
"... With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alig ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available methods. Our motivation comes from the Eulerian method for fragment assembly in DNA sequencing that transforms all DNA fragments into a de Bruijn graph and then reduces sequence assembly to a Eulerian path problem. The paper focuses on global multiple alignment of DNA sequences, where entire sequences are aligned into one con � guration. Our main result is an algorithm with almost linear computational speed with respect to the total size (number of letters) of sequences to be aligned. Five hundred simulated sequences (averaging 500 bases per sequence and as low as 70% pairwise identity) have been aligned within three minutes on a personal computer, and the quality of alignment is satisfactory. As a result, accurate and simultaneous alignment of thousands of long sequences within a reasonable amount of time becomes possible. Data from an Arabidopsis sequencing project is used to demonstrate the performance. Key words: multiple sequence alignment, de Bruijn graph, Eulerian path. 1.
Improvement in the Accuracy of Multiple Sequence Alignment Program MAFFT
"... In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to ∼5,000 sequences) and long data (∼2,000 aa or ∼5,000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to ∼5,000 sequences) and long data (∼2,000 aa or ∼5,000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) of MAFFT were outperformed by ProbCons and TCoffee v.2, both of which were released in 2004, in several benchmark tests. Here we report a recent extension of MAFFT that aims to improve the accuracy with as little cost of calculation time as possible. The extended version of MAFFT (v.5) has new iterative refinement options, G-INS-i and L-INS-i (collectively denoted as [GL]-INS-i in this report). These options use a new objective function combining the weighted sum-of-pairs (WSP) score and a score similar to COFFEE derived from all pairwise alignments. We discuss the improvement in accuracy brought by this extension, mainly using two benchmark tests released very recently, BAliBASE v.3 (for protein alignments) and BRAliBASE (for RNA alignments). According to BAliBASE v.3, the overall average accuracy of L-INS-i was higher than those of other methods successively released in 2004, although the difference among the most accurate methods (ProbCons, TCoffee v.2 and new options of MAFFT) was small. The advantage
3DCoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment
- Nucleic Acids Res
, 2004
"... This paper presents 3DCoffee@igs, a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB ident ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper presents 3DCoffee@igs, a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB identifiers or directly uploaded into the server. Given a set of sequences and structures, pairs of structures are aligned with SAP while sequence–structure pairs are aligned with Fugue. The resulting collection of pairwise alignments is then combined into an MSA with the T-Coffee algorithm. The server and its documentation are available from
Parallel multiple sequence alignment with decentralized cache support
- In Proc. of Euro-Par ’05
, 2005
"... Abstract. In this paper we present a new method for aligning large sets of biological sequences. The method performs a sequence alignment in parallel and uses a decentralized cache to store intermediate results. The method allows alignments to be recomputed efficiently when new sequences are added o ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. In this paper we present a new method for aligning large sets of biological sequences. The method performs a sequence alignment in parallel and uses a decentralized cache to store intermediate results. The method allows alignments to be recomputed efficiently when new sequences are added or when alignments of different precisions are requested. Our method can be used to solve important biological problems like the adaptive update of a complete evolution tree when new sequences are added (without recomputing the whole tree). To validate the method, some experiments were performed using up to 512 Small Subunit Ribosomal RNA sequences, which were analyzed with different levels of precision. 1
A survey of recent work on evolutionary approaches to the protein folding problem
- In Proceedings of the Congress of Evolutionary Computation (CEC
, 1999
"... Abstract- A problem of immense importance in computational biology is the determination of the functional conformations of protein molecules. With the advent of faster computers, it is now possible to use rules to search conformation space for protein structures that have minimal free energy. This p ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract- A problem of immense importance in computational biology is the determination of the functional conformations of protein molecules. With the advent of faster computers, it is now possible to use rules to search conformation space for protein structures that have minimal free energy. This paper surveys work done in the last ve years using evolutionary search algorithms to nd low energy protein conformations. In particular, a detailed description is included of some work recently started at the National Cancer Institute, which uses evolution strategies. 1Introduction Biological organisms contain thousands of di erenttypes of proteins. Proteins are responsible for transporting small molecules (e.g., hemoglobin transports O2 in the bloodstream), catalyzing biological functions, providing structure to collagen and skin, regulating hormones, and many other functions. Each protein is a sequence of amino acids, bound into linear chains, that adopts a speci c folded three-dimensional shape. Each shape provides valuable clues to the protein's function. Indeed, this information is vital to the design of new drugs capable of combating disease. Regrettably, ascertaining the shape of a protein is a di cult, expensive task which explains why few proteins have been categorized in this regard. Virtual protein models, created on computers, may provide a cost effective solution to accurate prediction of protein shapes. Unfortunately, the protein folding problem|trying to predict the structure of a protein given only the protein's sequence of amino acids|is a combinatorial optimization problem, which so far has eluded solution because of the exponential number of potential solutions. The purpose of this paper is two-fold. Previous papers have surveyed the use of evolutionary algorithms (EAs) in protein folding problems, but they have been written primarily from the perspective of biophysicists|
Algorithms for Sequence Alignment
, 2002
"... Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algo-rithms for the optimal alignm ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Sequence alignment is an important tool for describing relationships between sequences. Many sequence alignment algorithms exist, differing in efficiency, and in their models of the sequences and of the relationship between sequences. The focus of this thesis is on algo-rithms for the optimal alignment of two or three sequences of biological data, particularly DNA sequences. The algorithms are discussed with particular emphasis on space and time complexity. A divide-and-conquer method is presented for use with a number of different alignment algo-rithms. This method may be used to reduce the space complexity of an alignment algorithm with little or no effect to the time complexity. The advantages of this divide-and-conquer method include its simplicity and the ease with which it can be applied to many different alignment algorithms. These advantages are demonstrated by using the divide-and-conquer method in conjunction with several known alignment algorithms. An efficient alignment algorithm is presented for the important problem of optimally aligning three sequences using a linear function for costing gaps in the alignment. For sequences

