• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res (2005)

by K Katoh, K Kuma, H Toh, T Miyata
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 801
Next 10 →

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments

by Gerard Talavera, Jose Castresana - SYST BIOL , 2007
"... Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such ..."
Abstract - Cited by 319 (1 self) - Add to MetaCart
Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such problematic regions, either manually or using automatic methods, in order to improve phylogenetic performance, others prefer to keep such regions to avoid losing any information. Our aim in the present work was to examine whether phylogenetic reconstruction improves after alignment cleaning or not. Using simulated protein alignments with gaps, we tested the relative performance in diverse phylogenetic analyses of the whole alignments versus the alignments with problematic regions removed with our previously developed Gblocks program. We also tested the performance of more or less stringent conditions in the selection of blocks. Alignments constructed with different alignment methods (ClustalW, Mafft, and Probcons) were used to estimate phylogenetic trees by maximum likelihood, neighbor joining, and parsimony. We show that, in most alignment conditions, and for alignments that are not too short, removal of blocks leads to better trees. That is, despite losing some information, there is an increase in the actual phylogenetic signal. Overall, the best trees are obtained by maximum-likelihood reconstruction of alignments cleaned by Gblocks. In general, a relaxed selection of blocks is better for short alignment, whereas a stringent selection is more adequate for longer ones. Finally, we show that cleaned alignments produce better topologies although, paradoxically, with lower bootstrap. This indicates that divergent and problematic alignment regions may lead, when present, to apparently

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

by Kazutaka Katoh, Daron M. St , 2013
"... We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were im ..."
Abstract - Cited by 280 (4 self) - Add to MetaCart
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations. Key words: multiple sequence alignment, metagemone, protein structure, progressive alignment, parallel processing.

M-Coffee: combining multiple sequence alignment methods with T-Coffee

by Iain M. Wallace, Desmond G. Higgins, Cedric Notredame - Nucleic Acids Res , 2006
"... We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to varia ..."
Abstract - Cited by 60 (13 self) - Add to MetaCart
We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from
(Show Context)

Citation Context

...in significant accuracy improvement over existing methods. Since then, consistency based objective functions have been used within several new multiple alignment packages, including POA (13), MAFFT 5 =-=(14)-=-, Muscle 6 (5), ProbCons (15) and PCMA (16). More than 50 MSA methods have been described over the last 10 years (Medline, January 08, 2006), with no less than 20 new publications in 2005 alone. The c...

Probalign: Multiple sequence alignment using partition function posterior probabilities

by Usman Roshan, Dennis R. Livesay - Bioinformatics , 2006
"... Motivation: The maximum expected accuracy optimization criterion for multiple sequence alignment uses pairwise posterior probabilities of residues to align sequences. The partition function methodology is one way of estimating these probabilities. Here, we combine these two ideas for the first time ..."
Abstract - Cited by 59 (7 self) - Add to MetaCart
Motivation: The maximum expected accuracy optimization criterion for multiple sequence alignment uses pairwise posterior probabilities of residues to align sequences. The partition function methodology is one way of estimating these probabilities. Here, we combine these two ideas for the first time to construct maximal expected accuracy sequence alignments. Results: We bridge the two techniques within the program Probalign. Our results indicate that Probalign alignments are generally more accurate than other leading multiple sequence alignment methods (i.e., Probcons, MAFFT, and MUSCLE) on the BAliBASE 3.0 protein alignment benchmark. Similarly, Probalign also outperforms these methods on the HOMSTRAD and OXBENCH benchmarks. Probalign ranks statistically significantly highest (P-value < 0.005) on all three benchmarks. Deeper scrutiny of the technique indicates that the improvements are largest on datasets containing N/C terminal extensions and on datasets containing long and heterogeneous length proteins. These points are demonstrated on both real and simulated data. Finally, our method also produces accurate alignments on long and heterogeneous length datasets containing protein repeats. There, alignment accuracy scores are at least 10% and 15 % higher than the other three methods when standard deviation of length is at least 300 and 400 respectively. Availability: Open source code implementing Probalign as well as for producing the simulated data, and all real and simulated data are freely available from
(Show Context)

Citation Context

... with many alignment tools, e.g., ClustalW (Thompson et al., 1994), Dialign (Subramanian et al., 2005), T-Coffee (Notredame et al., 2000), Probcons (Do et al., 2005), MUSCLE (Edgar, 2004), and MAFFT (=-=Katoh et al., 2005-=-). In terms of accuracy, recent comparative studies (Do et al., 2005; Katoh et al., 2005; Edgar 2004) place MAFFT and Probcons among the very top performing sequence alignment methods. * To whom corre...

Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet

by Marie Touchon, Claire Hoede, Olivier Tenaillon, Valérie Barbe, Simon Baeriswyl, Edouard Bingen, Stéphane Bonacorsi, Christiane Bouchier, Odile Bouvet, Ra Calteau, Olivier Clermont, Stéphane Cruveiller, Antoine Danchin, Médéric Diard, Carole Dossat, Claude Saint Ruf, Dominique Schneider, Jérôme Tourret, Benoit Vacherie, David Vallenet, Claudine Médigue, Eduardo P. C. Rocha, Erick Denamur - 2009
"... ..."
Abstract - Cited by 53 (4 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...s a variable segment. To produce the DNA alignment file from the above mentioned procedure, the coordinates of all backbone segments on each genome were extracted and aligned with MAFFT, version 6.24 =-=[100]-=-, using a home made Perl script. Segments were first aligned with the ‘–globalpair option’, which is suitable for a suite of globally alignable sequences. When problems occurred (especially for long b...

Identification of multiple distinct Snf2 subfamilies with conserved structural motifs

by Andrew Flaus, David M. A. Martin, Geoffrey J. Barton, Tom Owen-hughes - Nucleic Acids Res , 2006
"... with conserved structural motifs ..."
Abstract - Cited by 51 (1 self) - Add to MetaCart
with conserved structural motifs
(Show Context)

Citation Context

...ere manipulated with the EMBOSS suite version 2.8 (27). Multiple sequence alignments were created with Muscle version 3.0 (28) and MAFFT version 5.667 with parameters retree 2 and maxiterate 1000 =-=(29)-=- and visualized with Jalview version 2 (30). Phylogenetic and pairwise trees were constructed with neighbor, protdist and drawtree from the PHYLIP suite version 3.572 (31) and additionally visualized ...

Extensive gains and losses of olfactory receptor genes in mammalian evolution

by Yoshihito Niimura, Masatoshi Nei - PLoS ONE , 2007
"... Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammali ..."
Abstract - Cited by 51 (6 self) - Add to MetaCart
Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammalian OR genes, we identified the entire set of OR genes in platypuses, opossums, cows, dogs, rats, and macaques and studied the evolutionary change of the genes together with those of humans and mice. We found that platypuses and primates have,400 functional OR genes while the other species have 800–1,200 functional OR genes. We then estimated the numbers of gains and losses of OR genes for each branch of the phylogenetic tree of mammals. This analysis showed that (i) gene expansion occurred in the placental lineage each time after it diverged from monotremes and from marsupials and (ii) hundreds of gains and losses of OR genes have occurred in an orderspecific manner, making the gene repertoires highly variable among different orders. It appears that the number of OR genes is determined primarily by the functional requirement for each species, but once the number reaches the required level, it fluctuates by random duplication and deletion of genes. This fluctuation seems to have been aided by the stochastic nature of OR gene expression.
(Show Context)

Citation Context

...ns and were located close (,30 base pairs) to the contig end. We then constructed a multiple alignment of these sequences together with functional OR genes by the program E-INS-i in MAFFT version 5.8 =-=[36]-=-. From the alignment, we extracted truncated sequences that meet the following condition. When the C-terminal portion of an OR gene is missing from the genome sequence, the N-terminal portion should c...

TranslatorX: multiple alignment of nucleotide

by Federico Abascal, Rafael Zardoya, Maximilian J. Telford , 2010
"... sequences guided by amino acid translations ..."
Abstract - Cited by 42 (4 self) - Add to MetaCart
sequences guided by amino acid translations
(Show Context)

Citation Context

...nes: the ancestral Arthropod mitochondrial genetic code (11) and the Hemichordate mitochondrial code (our unpublished data). Several different multiple alignment programs including Muscle (12), Mafft =-=(13)-=-, T-Coffee (14), Prank (15) and ClustalW (16) can be chosen to align the amino acids. As an alternative, users are able to upload their own pre-calculated protein alignment. By default, TranslatorX ex...

PROMALS3D: a tool for multiple protein sequence and structure alignments

by Jimin Pei, Bong-hyun Kim, Nick V. Grishin - Nucleic Acids Res , 2008
"... structure alignments ..."
Abstract - Cited by 42 (2 self) - Add to MetaCart
structure alignments
(Show Context)

Citation Context

...n Expresso (11) automatically combines SAP structural alignments (12) with sequence alignments by using constraints based on structural alignments to derive consistency-based scoring functions. MAFFT =-=(13)-=- server offers an option for the input of alignment constraints, which can be structurebased alignments. In this article, we explore information from available protein 3D structures by PROMALS. The re...

CONTRAlign: discriminative training for protein sequence alignment

by Chuong B. Do, Samuel S. Gross, Serafim Batzoglou - In: International Conference in Research on Computational Molecular Biology (RECOMB). (2006 , 2006
"... 1 Introduction In comparative structural biology studies, analyzing or predicting protein three-dimensional structure often begins with identifying patterns of amino acid substitution via protein sequence alignment. While the evolutionary informationobtained from alignments can provide insights into ..."
Abstract - Cited by 36 (4 self) - Add to MetaCart
1 Introduction In comparative structural biology studies, analyzing or predicting protein three-dimensional structure often begins with identifying patterns of amino acid substitution via protein sequence alignment. While the evolutionary informationobtained from alignments can provide insights into protein structure, constructing accurate alignments may be difficult when proteins share significant struc-tural similarity but little sequence similarity. Indeed, for modern alignment tools, alignment quality drops rapidly when the sequences compared have lower than25 % identity, the "twilight zone " of protein alignment [1].
(Show Context)

Citation Context

... 3.3 Comparison to modern sequence alignment tools Next, we compared the CONTRAlign HYDROPATHY model to a variety of modern sequence alignment methods, including MAFFT 5.732 (both L-INS-i and GINS-i) =-=[38, 39]-=-, CLUSTALW 1.83 [28], MUSCLE 3.6 [23], T-Coffee 2.66 [40], 3 For most reference databases, with the notable exception of SABmark 1.65, alignment accuracies are roughly consistent. This difference is l...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University