Results 1 - 10
of
54
PROBCONS: Probabilistic consistency-based multiple sequence alignment
- Genome Res
, 2005
"... To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objec ..."
Abstract
-
Cited by 256 (10 self)
- Add to MetaCart
(Show Context)
To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce prob-abilistic consistency, a novel scoring function for multiple sequence comparisons. We present PROBCONS, a practical tool for progressive protein multiple sequence alignment based on prob-abilistic consistency, and evaluate its performance on several standard alignment benchmark datasets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, PROB-CONS achieves statistically significant improvement over other leading methods while maintain-ing practical speed. PROBCONS is publicly available as a web resource. Source code and execu-tables are available under the GNU Public License at
BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark
- Proteins
, 2005
"... ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing ..."
Abstract
-
Cited by 129 (2 self)
- Add to MetaCart
(Show Context)
ABSTRACT Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site
Multiple Sequence Alignment Accuracy and Phylogenetic Inference
"... Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogeneti ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction. [Bayesian; maximum likelihood; maximum parsimony; multiple sequence alignment; neighbor
Tcoffee@igs: a web server for computing, evaluating and combining multiple sequence alignments. Nucleic Acids Res. 31, 3503–3506 Received for publication June 29, 2011. Accepted for publication October 24
- 881ALLOSTERY DETECTED FROM PROTEIN DYNAMICS
, 2003
"... This paper presents Tcoffee@igs, a new server provided to the community by Hewlet Packard computers and the Centre National de la Recherche Scientifique. This server is a web-based tool dedi-cated to the computation, the evaluation and the combination of multiple sequence alignments. It uses the lat ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
(Show Context)
This paper presents Tcoffee@igs, a new server provided to the community by Hewlet Packard computers and the Centre National de la Recherche Scientifique. This server is a web-based tool dedi-cated to the computation, the evaluation and the combination of multiple sequence alignments. It uses the latest version of the T-Coffee package. Given a set of unaligned sequences, the server returns an evaluated multiple sequence alignment and the associated phylogenetic tree. This server also makes it possible to evaluate the local reliability of an existing alignment and to combine several alternative multiple alignments into a single new one.
Data exploration in phylogenetic inference: scientific, heuristic, or neither
- CLADISTICS
, 2003
"... ..."
The problem with “the Paleoptera problem”: sense and sensitivity. Cladistics 19:432–442
"... While the monophyly of winged insects (Pterygota) is well supported, phylogenetic relationships among the most basal extant pterygote lineages are problematic. Ephemeroptera (mayflies) and Odonata (dragonflies) represent the two most basal extant lineages of winged insects, and determining their rel ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
(Show Context)
While the monophyly of winged insects (Pterygota) is well supported, phylogenetic relationships among the most basal extant pterygote lineages are problematic. Ephemeroptera (mayflies) and Odonata (dragonflies) represent the two most basal extant lineages of winged insects, and determining their relationship with regard to Neoptera (remaining winged insects) is a critical step toward understanding insect diversification. A recent molecular analysis concluded that Paleoptera (Odonata+Ephemeroptera) is monophyletic. However, we demonstrate that this result is supported only under a narrow range of alignment parameters. We have further tested the monophyly of Paleoptera using additional sequence data from 18SrDNA, 28S rDNA, and Histone 3 for a broader selection of taxa and a wider range of analytical methodologies. Our results suggest that the current suite of molecular data ambiguously resolve the three basal winged insect lineages and do not provide independent confirmation of Odonata+Neoptera as supported via morphological data. 2003 The Willi Hennig Society. Published by Elsevier Inc. All rights reserved. Paleoptera (Palaeoptera) refers to the grouping of extinct paleodictyopteroids, Ephemeroptera, and Odo-
3DCoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment
- Nucleic Acids Res
, 2004
"... This paper presents 3DCoffee@igs, a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB ident ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
(Show Context)
This paper presents 3DCoffee@igs, a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB identifiers or directly uploaded into the server. Given a set of sequences and structures, pairs of structures are aligned with SAP while sequence–structure pairs are aligned with Fugue. The resulting collection of pairwise alignments is then combined into an MSA with the T-Coffee algorithm. The server and its documentation are available from
Protein Multiple Sequence Alignment
"... Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the ac ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.
Sequence length variation, indel costs, and congruence in sensitivity analysis
- Cladistics
, 2005
"... The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the c ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation in the data set. Dominance was easily detected, as the character-based congruence measures approached their optimal value when indel costs were incremented. Dominance of a fragment or data partition was overwhelmed when new sequence length-variable fragments or data partitions were added. The Willi Hennig Society 2005. Since DNA sequence analysis has become a standard technique for phylogenetic inference at all taxonomic levels, there has been a growing demand for phylo-
Alignment and Topological Accuracy of the Direct Optimization approach via POY and Traditional Phylogenetics via ClustalW + PAUP*
, 2007
"... Direct optimization frameworks for simultaneously estimating alignments and phylogenies have recently been developed. One such method, implemented in the program POY, is becoming more common for analyses of variable length sequences (e.g., analyses using ribosomal genes) and for combined evidence a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Direct optimization frameworks for simultaneously estimating alignments and phylogenies have recently been developed. One such method, implemented in the program POY, is becoming more common for analyses of variable length sequences (e.g., analyses using ribosomal genes) and for combined evidence analyses (morphology + multiple genes). Simulation of sequences containing insertion and deletion events was performed in order to directly compare a widely used method of multiple sequence alignment (ClustalW) and subsequent parsimony analysis in PAUP * with direct optimization via POY. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (clocklike, non-clocklike, and ultrametric). Alignment accuracy scores for the implied alignments from POY and the multiple sequence alignments from ClustalW were calculated and compared. In almost all cases (99.95%), ClustalW produced more accurate alignments than POY-implied alignments, judged by the proportion of correctly identified homologous sites. Topological accuracy (distance to the true tree) for POY topologies and topologies generated under parsimony in PAUP * from the ClustalW alignments were also compared. In 44.94 % of the cases, Clustal alignment tree reconstructions via PAUP * were more accurate than POY, whereas in 16.71 % of the cases POY reconstructions were more topologically accurate (38.38 % of the time they were equally accurate). Comparisons between POY hypothesized alignments and the true alignments indicated that, on average, as alignment error increased, topological accuracy decreased. [ClustalW; direct optimization; multiple sequence alignment; parsimony; phylogenetics; POY; sensitivity analysis; simulation; tree reconstruction.]