#### DMCA

## Mining overrepresented 3D patterns of secondary structures in proteins (2008)

Venue: | J. Bioinform. Comput. Biol |

Citations: | 2 - 0 self |

### Citations

86 |
The protein structure prediction problem could be solved using the current PDB library.
- Zhang, Skolnick
- 2005
(Show Context)
Citation Context ... and ignoring the relative order of angles. November 24, 2008 14:20 WSPC/185-JBCB 00384 1078 M. Comin, C. Guerra & G. Zanotti (a) (b) (c) Fig. 3. 3D histograms of the distributions of angles between triplets of SSEs. Each axis represents an angle, and the frequency of each triplet follows the color coding. By merging patterns, the discovery procedure selects a set of 785 overrepresented patterns, formed by 485,021 quartets of segments, out of 2,262 patterns and more than 3,000,000 quartets obtained by the exhaustive search. The top overrepresented pattern is composed by the discretized angles (1, 2, 3, 7, 8, 9), corresponding to angles in the ranges (18◦–36◦, 36◦–54◦, 54◦–72◦, 126◦–144◦, 144◦–162◦, 162◦– 180◦), and has a frequency of 6,439; the top second pattern has similar angles, (1, 2, 7, 8, 8, 9), and a smaller frequency of 5,780. The frequency count drops dramatically after the first few patterns. The overall distribution of patterns of angles, ranked by their frequency, is illustrated in Fig. 4. It is interesting to notice that the top 11 angular patterns (out of 785) cover about 10% of the quartets; coverage of the quartets of about 20% is obtained by 29 patterns and that of 50% by 122 patte... |

75 |
A model for statistical significance of local similarities in structure,
- Stark, Sunyaev, et al.
- 2003
(Show Context)
Citation Context ...sorted according to the order along the backbone. (2) Build a hash table, indexed by the triplets of angles, that stores all triplets of segments. Derive a 3D histogram of the distribution of the triplets of A from the hash table. The histogram has b = 10 bins along each axis, for a total of b3 bins or cells. (3) Build the distribution of triplets of angles of random unit vectors and derive the corresponding 3D histogram. (4) Based on the deviation between the histogram of observed triplets of angles and that of random triplets, determine the subset C ⊂ A of triplets that are overrepresented. (5) Join step: Construct candidate sextuples of angles from triplets of C. (6) Verification step: Prune candidate sextuples to find the overrepresented ones. November 24, 2008 14:20 WSPC/185-JBCB 00384 Mining Overrepresented 3D Patterns of Secondary Structures in Proteins 1073 3.1. Building the hash table We build a four-dimensional (4D) hash table with the following index structure: for a given triplet of vectors, three indexes are given by the quantized values of the angles of the triplet, while the fourth index depends on the composition of the triplet in terms of the number and position of he... |

64 | Helix to helix packing in proteins. - Chothia, Levitt, et al. - 1981 |

62 | MASS: Multiple structural alignment by secondary structures.
- Dror, Benyamini, et al.
- 2003
(Show Context)
Citation Context ...into uniform intervals, with every interval represented by an integer. More precisely, the range 0◦–180◦ is divided into 10 intervals, and an angle α represented by the integer i such that i ∗ 18◦ ≤ α < (i + 1) ∗ 18◦. We experimented with several partition criteria and chose the number of intervals equal to 10. This appears a reasonable choice if we consider the approximations introduced in calculating the best-fit segments for strands and helices; furthermore, this ensures a reasonable number of items per interval. A quartet of SSEs is represented by six integer values, each within the range [0, 10]. In the following, we refer to the discretized angles simply as “angles”. 3. Discovery of Overrepresented Patterns Our approach to find overrepresented angular patterns is similar to the Apriori algorithm used for data mining applications. Our algorithm finds overrepresented arrangements of quartets of segments from overrepresented triplets of segments. It does so by joining overrepresented triplets of angles to obtain overrepresented sextuplets of angles. However, our approach differs substantially from Apriori in the way the patterns are joined together to obtain patterns of larger size. At... |

53 | Recognition of functional sites in protein structures. - Shulman-Peleg, Nussinov, et al. - 2004 |

28 |
An algorithm for constraint-based structural template matching: Application to 3D templates with statistical analysis,
- JA, JM
- 2003
(Show Context)
Citation Context ...steps 5 and 6 are based on a variant of the Apriori method to determine and evaluate the distribution of quartets. Procedure: Pattern Discovery (1) Initialization: From the given protein dataset, generate the set A of all ordered triplets of angles associated to ordered triplets of SSEs, sorted according to the order along the backbone. (2) Build a hash table, indexed by the triplets of angles, that stores all triplets of segments. Derive a 3D histogram of the distribution of the triplets of A from the hash table. The histogram has b = 10 bins along each axis, for a total of b3 bins or cells. (3) Build the distribution of triplets of angles of random unit vectors and derive the corresponding 3D histogram. (4) Based on the deviation between the histogram of observed triplets of angles and that of random triplets, determine the subset C ⊂ A of triplets that are overrepresented. (5) Join step: Construct candidate sextuples of angles from triplets of C. (6) Verification step: Prune candidate sextuples to find the overrepresented ones. November 24, 2008 14:20 WSPC/185-JBCB 00384 Mining Overrepresented 3D Patterns of Secondary Structures in Proteins 1073 3.1. Building the hash table We buil... |

25 |
A comprehensive analysis of 40 blind protein structure predictions,
- Samudrala, Levitt
- 2002
(Show Context)
Citation Context ...ndexed by the triplets of angles, that stores all triplets of segments. Derive a 3D histogram of the distribution of the triplets of A from the hash table. The histogram has b = 10 bins along each axis, for a total of b3 bins or cells. (3) Build the distribution of triplets of angles of random unit vectors and derive the corresponding 3D histogram. (4) Based on the deviation between the histogram of observed triplets of angles and that of random triplets, determine the subset C ⊂ A of triplets that are overrepresented. (5) Join step: Construct candidate sextuples of angles from triplets of C. (6) Verification step: Prune candidate sextuples to find the overrepresented ones. November 24, 2008 14:20 WSPC/185-JBCB 00384 Mining Overrepresented 3D Patterns of Secondary Structures in Proteins 1073 3.1. Building the hash table We build a four-dimensional (4D) hash table with the following index structure: for a given triplet of vectors, three indexes are given by the quantized values of the angles of the triplet, while the fourth index depends on the composition of the triplet in terms of the number and position of helices and strands. This index, called triplet type, is used when a separate... |

24 | Protein structure prediction: Recognition of primary, secondary, and tertiary structural features from amino acid sequence, - Eisenhaber, Persson, et al. - 1995 |

22 | Principles of helix–helix packing in proteins : the helical lattice superposition model. - WALTHER, EISENHABER, et al. - 1996 |

13 | The URMS-RMS hybrid algorithm for fast and sensitive local protein structure alignment, - Yona, Kedem - 2005 |

11 | Packing of secondary structural elements in proteins. Analysis and prediction of inter-helix distances, - Reddy, Blundell - 2003 |

9 | Helix–helix packing angle preferences for finite helix axes. - WALTHER, SPRINGER, et al. - 1998 |

6 | Structure of proteins: Packing of α-helices and pleated sheets, - Chothia, Levitt, et al. - 1977 |

6 | Interhelical angle and distance preferences in globular proteins, - Lee, GS - 2004 |

4 |
Structural trees for protein superfamilies,
- AV
- 1997
(Show Context)
Citation Context ...ed by simply generating three angles at random, since the angles in a triplet are highly constrained and therefore their distribution is far from uniform. Instead, we construct feasible triplets of random segments and compute their angles. The random generation of a triplet of angles consists of the generation of three versors. A versor is a vector of unit length that we assume to be in the semisphere November 24, 2008 14:20 WSPC/185-JBCB 00384 1074 M. Comin, C. Guerra & G. Zanotti identified by a positive z-coordinate. A versor is now uniquely determined by two parameters: its coordinate z ∈ [0, 1], and its azimuth β ∈ [0, 2π]. Given a dataset of n real proteins, we generate n sets of random vectors, each corresponding to a real protein and containing the same number of SSEs of such protein. Then, for each of the n sets, we compute the angles of all triplets of random vectors and update the hash table accordingly. We have already observed that the triangular inequality holds for any order of the three angles α, β, γ of a triplet of segments — it translates into the following three constraints: α + β ≥ γ, α + γ ≥ β, β + γ ≥ α. This implies that not all cells of the hash table can be popu... |

4 |
Bioinformatics in protein analysis,
- Persson
- 2000
(Show Context)
Citation Context ...27 mostly for object recognition applications, and were later applied to several matching problems arising in computational biology.9,11,28 The next two steps 3 and 4 test the deviation of the obtained angular distribution from that of random vectors. Finally, steps 5 and 6 are based on a variant of the Apriori method to determine and evaluate the distribution of quartets. Procedure: Pattern Discovery (1) Initialization: From the given protein dataset, generate the set A of all ordered triplets of angles associated to ordered triplets of SSEs, sorted according to the order along the backbone. (2) Build a hash table, indexed by the triplets of angles, that stores all triplets of segments. Derive a 3D histogram of the distribution of the triplets of A from the hash table. The histogram has b = 10 bins along each axis, for a total of b3 bins or cells. (3) Build the distribution of triplets of angles of random unit vectors and derive the corresponding 3D histogram. (4) Based on the deviation between the histogram of observed triplets of angles and that of random triplets, determine the subset C ⊂ A of triplets that are overrepresented. (5) Join step: Construct candidate sextuples of angle... |

3 | Discovery of a significant, nontopological preference for antiparallel alignment of helices with parallel regions in sheets, - BM, LA - 2003 |

3 | Global secondary structure packing angle bias in proteins. Proteins - Platt, Guerra, et al. - 2003 |

3 | Geometric hashing: An overview, - HJ, Rigoutsos - 1997 |

2 | Predicting the conformation of proteins from sequences. Progress and future progress, - SA - 1995 |

1 |
16-17-18-20; (b) protein 1ace, SSE: 0-1-2-3; (c) protein 1aor, SSE:
- 1hpl
- 2008
(Show Context)
Citation Context ...ed by simply generating three angles at random, since the angles in a triplet are highly constrained and therefore their distribution is far from uniform. Instead, we construct feasible triplets of random segments and compute their angles. The random generation of a triplet of angles consists of the generation of three versors. A versor is a vector of unit length that we assume to be in the semisphere November 24, 2008 14:20 WSPC/185-JBCB 00384 1074 M. Comin, C. Guerra & G. Zanotti identified by a positive z-coordinate. A versor is now uniquely determined by two parameters: its coordinate z ∈ [0, 1], and its azimuth β ∈ [0, 2π]. Given a dataset of n real proteins, we generate n sets of random vectors, each corresponding to a real protein and containing the same number of SSEs of such protein. Then, for each of the n sets, we compute the angles of all triplets of random vectors and update the hash table accordingly. We have already observed that the triangular inequality holds for any order of the three angles α, β, γ of a triplet of segments — it translates into the following three constraints: α + β ≥ γ, α + γ ≥ β, β + γ ≥ α. This implies that not all cells of the hash table can be popu... |

1 |
Algorithms for structural comparison and statistical analysis of 3D protein motifs, Pac Symp Biocomput,
- BY, VY, et al.
- 2005
(Show Context)
Citation Context ...Procedure: Pattern Discovery (1) Initialization: From the given protein dataset, generate the set A of all ordered triplets of angles associated to ordered triplets of SSEs, sorted according to the order along the backbone. (2) Build a hash table, indexed by the triplets of angles, that stores all triplets of segments. Derive a 3D histogram of the distribution of the triplets of A from the hash table. The histogram has b = 10 bins along each axis, for a total of b3 bins or cells. (3) Build the distribution of triplets of angles of random unit vectors and derive the corresponding 3D histogram. (4) Based on the deviation between the histogram of observed triplets of angles and that of random triplets, determine the subset C ⊂ A of triplets that are overrepresented. (5) Join step: Construct candidate sextuples of angles from triplets of C. (6) Verification step: Prune candidate sextuples to find the overrepresented ones. November 24, 2008 14:20 WSPC/185-JBCB 00384 Mining Overrepresented 3D Patterns of Secondary Structures in Proteins 1073 3.1. Building the hash table We build a four-dimensional (4D) hash table with the following index structure: for a given triplet of vectors, three inde... |

1 | Helix packing angle preferences, - JU - 1997 |

1 | Protein folding: Evaluation of some simple rules for the assembly of helices into tertiary structures with myoglobin as an example, - FE, TJ, et al. - 1979 |

1 | Complementary packing of alpha-helices in proteins, - AV - 1999 |

1 | 3D matching of proteins based on secondary structures, - Guerra, Lonardi, et al. - 2002 |