Results 1 - 10
of
27
The NOESY Jigsaw: Automated Protein Secondary Structure and Main-Chain Assignment from Sparse, Unassigned NMR Data
- JOURNAL OF COMPUTATIONAL BIOLOGY, 7:537–558, 2000
, 2000
"... High-throughput, data-directed computational protocols for Structural Genomics (or Proteomics) are required in order to evaluate the protein products of genes for structure and function at rates comparable to current gene-sequencing technology. This paper presents the Jigsaw algorithm, a novel hight ..."
Abstract
-
Cited by 35 (14 self)
- Add to MetaCart
High-throughput, data-directed computational protocols for Structural Genomics (or Proteomics) are required in order to evaluate the protein products of genes for structure and function at rates comparable to current gene-sequencing technology. This paper presents the Jigsaw algorithm, a novel highthroughput, automated approach to protein structure characterization with nuclear magnetic resonance (NMR). Jigsaw applies graph algorithms and probabilistic reasoning techniques, enforcing first-principles consistency rules in order to overcome a 5-10 % signal-to-noise ratio. It consists of two main components: (1) graph-based secondary structure pattern identification in unassigned heteronuclear NMR data, and (2) assignment of spectral peaks by probabilistic alignment of identified secondary structure elements against the primary sequence. Deferring assignment eliminates the bottleneck faced by traditional approaches, which begin by correlating peaks among dozens of experiments. Jigsaw utilizes only four experiments, none of which requires 13 C-labeled protein, thus dramatically reducing both the amount and expense of wet lab molecular biology and the total spectrometer time. Results for three test proteins demonstrate that Jigsaw correctly identifies 79-100 % of α-helical and 46-65 % of β-sheet NOE connectivities, and correctly aligns 33-100 % of secondary structure elements. Jigsaw is very fast, running in
A random graph approach to NMR sequential assignment
- In Proceedings of The International Conference on Computational Molecular Biology (RECOMB
, 2004
"... Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated reso ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g., through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type information. This paper focuses on the information content available in connectivity alone and develops a novel random-graph theoretic framework and algorithm for connectivity-driven NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy, a key source of connectivity ambiguity. We then give a simple and natural randomized algorithm for finding optimal assignments as sets of connected fragments in NMR graphs. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. By analyzing our algorithm under our random graph model, we show that it can provably tolerate relatively large ambiguity while still giving expected optimal performance in polynomial time. We present results from practical applications of the algorithm to experimental datasets from a variety of proteins and experimental set-ups. We demonstrate that our approach is able to overcome significant noise and local ambiguity in identifying significant fragments of sequential assignments. Key words: nuclear magnetic resonance (NMR) spectroscopy, automated sequential resonance assignment, random graph model, randomized algorithm, Hamiltonian path. 1.
3D Structural Homology Detection via Unassigned Residual Dipolar Couplings
- Proc. IEEE Computer Society Bioinformatics Conference (CSB
, 2003
"... Recognition of a protein’s fold provides valuable information about its function. While many sequence-based homology prediction methods exist, an important challenge remains: two highly dissimilar sequences can have similar folds — how can we detect this rapidly, in the context of structural genomic ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Recognition of a protein’s fold provides valuable information about its function. While many sequence-based homology prediction methods exist, an important challenge remains: two highly dissimilar sequences can have similar folds — how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies the 3D structural models in a protein structural database whose geometries best fit the unassigned experimental NMR data. It does not use sequence information and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or sequence homology. The algorithm runs in O(pnk 3) time, where p is the number of proteins in the database, n is the number of residues in the target protein, and k is the resolution of a rotation search. The method requires only uniform 15 N-labelling of the protein and processes unassigned H N- 15 N residual dipolar couplings, which can be acquired in a couple of hours. Our experiments on NMR data from 5 different proteins demonstrate that the method identifies closely related protein folds, despite low-sequence homology between the target protein and the computed model.
Model-based assignment and inference of protein backbone nuclear magnetic resonances
- STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
, 2004
"... ..."
PREDITOR: a web server for predicting protein torsion angle restraints
- Nucleic Acids Res., 34(Web Server issue
, 2006
"... Every year between 500 and 1000 peptide and protein structures are determined by NMR and deposited into the Protein Data Bank. However, the process of NMR structure determination continues to be a manually intensive and time-consuming task. One of the most tedious and error-prone aspects of this pro ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Every year between 500 and 1000 peptide and protein structures are determined by NMR and deposited into the Protein Data Bank. However, the process of NMR structure determination continues to be a manually intensive and time-consuming task. One of the most tedious and error-prone aspects of this process involves the determination of torsion angle restraints including phi, psi, omega and chi angles. Most methods require many days of additional experiments, painstaking measurements or complex calculations. Here we wish to describe a web server, called PREDITOR, which greatly accelerates and simplifies this task. PREDITOR accepts sequence and/or chemical shift data as input and generates torsion angle predictions (with predicted errors) for phi, psi, omega and chi-1 angles. PREDITOR combines sequence alignment methods with advanced chemical shift analysis techniques to generate its torsion angle predictions. The method is fast (,40 s per protein) and accurate, with 88 % of phi/psi predictions being within 30 of the correct values, 84 % of chi-1 predictions being correct and 99.97 % of omega angles being correct. PREDITOR is 35 times faster and up to 20 % more accurate than any existing method. PREDITOR also provides accurate assessments of the torsion angle errors so that the torsion angle constraints can be readily fed into standard structure refinement programs, such as CNS, XPLOR, AMBER and CYANA. Other unique features to PREDITOR include dihedral angle prediction via PDB structure mapping, automated chemical shift re-referencing (to improve accuracy), prediction of proline cis/ trans states and a simple user interface. The PREDITOR website is located at:
High-throughput 3D structural homology detection via NMR resonance assignment
- in Proc. CSB, 2004
, 2004
"... One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequen ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds — how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn + pn 5/2 log (cn)+p log p) time, where p is the number of proteins in the database, n is the number of residues in the target protein and c is the maximum edge weight in an integerweighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30 % sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data.
SPINE 2: a system for collaborative structural proteomics within a federated database framework
- Nucleic Acids Res
, 2003
"... We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at
IPASS: error tolerant NMR backbone resonance assignment by linear programming
, 2009
"... Abstract. The automation of the entire NMR protein structure determination process requires a superior error tolerant backbone resonance assignment method. Although a variety of assignment approaches have been developed, none works well on noisy automatically picked peaks. IPASS is proposed as a nov ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. The automation of the entire NMR protein structure determination process requires a superior error tolerant backbone resonance assignment method. Although a variety of assignment approaches have been developed, none works well on noisy automatically picked peaks. IPASS is proposed as a novel integer linear programming (ILP) based assignment method. In order to reduce size of the problem, IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the 15 N-edited NOESY peaks which are then used to fix reliable fragments. The experimental results demonstrate that IPASS significantly outperforms the previous assignment methods on the synthetic data sets. It achieves an average of 99 % precision and 96 % recall on the synthesized spin systems, and an average of 96 % precision and 90 % recall on the synthesized peak lists. When applied on automatically picked peaks from experimentally derived data sets, it achieves an average precision and recall of 78 % and 67%, respectively. In contrast, the next best method, MARS, achieved an average precision and recall of 50 % and 40%, respectively. Availability: IPASS is available upon request, and the web server for IPASS is under construction.
A Complete Algorithm to Resolve Ambiguity for Inter-subunit NOE Assignment in Structure Determination of Symmetric Homo-oligomers
"... Assignment of nuclear Overhauser effect (NOE) data is a key bottleneck in structure determination by NMR. NOE assignment resolves the ambiguity as to which pair of protons generated the observed NOE peaks, and thus should be restrained in structure determination. In the case of inter-subunit NOEs in ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Assignment of nuclear Overhauser effect (NOE) data is a key bottleneck in structure determination by NMR. NOE assignment resolves the ambiguity as to which pair of protons generated the observed NOE peaks, and thus should be restrained in structure determination. In the case of inter-subunit NOEs in symmetric homo-oligomers, the ambiguity includes both the identities of the protons within a subunit, and the identities of the subunits to which they belong. This paper develops an algorithm for simultaneous inter-subunit NOE assignment and Cn symmetric homo-oligomeric structure determination, given the subunit structure. By using a configuration space framework, our algorithm guarantees completeness, in that it identifies structures representing, to within a user-defined similarity level, every structure consistent with the available data (ambiguous or not). However, while our approach is complete in considering all conformations and assignments, it avoids explicit enumeration of the exponential number of combinations of possible assignments. Our algorithm can draw two types of conclusions not possible under previous methods: (1) that different assignments for an NOE would lead to different structural classes, or (2) that it is not necessary to uniquely assign an NOE, since it would have little impact on structural precision. We demonstrate on two test proteins that our method reduces the average number
The protein data bank: Current status and future challenges
"... The Protein Data Bank (PDB) is an archive of experimentally determined, three-dimensional structures of proteins, nucleic acids, and other biological macromolecules. The PDB is now being transformed into 3DB, the Three-Dimensional Database of Biomacromolecular Structures, with significantly enhanced ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The Protein Data Bank (PDB) is an archive of experimentally determined, three-dimensional structures of proteins, nucleic acids, and other biological macromolecules. The PDB is now being transformed into 3DB, the Three-Dimensional Database of Biomacromolecular Structures, with significantly enhanced capabilities. 1

