Results 1 - 10
of
45
A conditional neural fields model for protein threading
, 2012
"... Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). Results: We present a novel protein ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence–template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment.
Iterative refinement of structure-based sequence alignments by Seed Extension
- BMC Bioinformatics
, 2009
"... ..."
(Show Context)
Optimal simultaneous superpositioning of multiple structures with missing data
- Bioinformatics
, 2012
"... Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point-sets in the diff ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point-sets in the different structures. However, in practice some points are usually “missing ” from several structures, for example when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Results: Here we present a general solution for determining an optimal superposition when some of the data are missing. We use the Expectation-Maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum likelihood solutions and the optimal least-squares solution as a special case. Availability and Implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from
mulPBA: an efficient multiple protein structure alignment method based on a
"... structural alphabet. ..."
(Show Context)
Redundancy-aware learning of
, 2012
"... protein structure-function relationships by ..."
(Show Context)
Simultaneous Alignment and Folding of Protein Sequences
"... Abstract. Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm’s complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise sequence alignment tools in the most difficult low sequence homology case and improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families. partiFold-Align is available at
Smolign: A Spatial Motifs Based Protein Multiple Structural Alignment Method
"... Abstract—Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contact-window based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure datasets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the datasets used here are available at
Formatt: Correcting Protein Multiple Structural Alignments by Sequence Peeking
"... We present Formatt, a multiple structure alignment pro-gram based on the Matt purely geometric multiple struc-tural alignment program, that also takes into account se-quence similarity when constructing alignments. We show that Formatt is superior to Matt in alignment quality based on objective meas ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We present Formatt, a multiple structure alignment pro-gram based on the Matt purely geometric multiple struc-tural alignment program, that also takes into account se-quence similarity when constructing alignments. We show that Formatt is superior to Matt in alignment quality based on objective measures (most notably Staccato sequence and structure scores) while preserving the same advantages in core length and RMSD that Matt has as a flexible structure aligner, as compared to other multiple structure alignment programs on popular benchmark datasets. Applications in-clude producing better training data for threading methods. 1.
POSA: a user-driven, interactive multiple protein structure alignment server
- Nucleic Acids Res
, 2014
"... POSA (Partial Order Structure Alignment), available at ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
POSA (Partial Order Structure Alignment), available at
Towards Reliable Automatic Protein Structure Alignment
"... Abstract. A variety of methods have been proposed for structure simi-larity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. A variety of methods have been proposed for structure simi-larity calculation, which are called structure alignment or superposition. One major shortcoming in current structure alignment algorithms is in their inherent design, which is based on local structure similarity. In this work, we propose a method to incorporate global information in obtain-ing optimal alignments and superpositions. Our method, when applied to optimizing the TM-score and the GDT score, produces significantly better results than current state-of-the-art protein structure alignment tools. Specifically, if the highest TM-score found by TMalign is lower than 0.6 and the highest TM-score found by one of the tested methods is higher than 0.5, there is a probability of 42 % that TMalign failed to find TM-scores higher than 0.5, while the same probability is reduced to 2 % if our method is used. This could significantly improve the accuracy of fold detection if the cutoff TM-score of 0.5 is used. In addition, existing structure alignment algorithms focus on structure similarity alone and simply ignore other important similarities, such as sequence similarity. Our approach has the capacity to incorporate multi-ple similarities into the scoring function. Results show that sequence sim-ilarity aids in finding high quality protein structure alignments that are more consistent with eye-examined alignments in HOMSTRAD. Even when structure similarity itself fails to find alignments with any con-sistency with eye-examined alignments, our method remains capable of finding alignments highly similar to, or even identical to, eye-examined alignments. 1