by Maricel Kann, Bin Qian, Richard A. Goldstein
Proteins
http://www.umich.edu/~goldgrp/Pub_pdf/kqg00.pdf
Add To MetaCart
Abstract:
ABSTRACT The growth in protein sequence data has placed a premium on ways to infer structure and function of the newly sequenced proteins. One of the most effective ways is to identify a homologous relationship with a protein about which more is known. While close evolutionary relationships can be confidently determined with standard methods, the difficulty increases as the relationships become more distant. All of these methods rely on some score function to measure sequence similarity. The choice of score function is especially critical for these distant relationships. We describe a new method of determining a score function, optimizing the ability to discriminate between homologs and non-homologs. We find that this new score function performs better than standard score functions for the identification of distant homologies. Proteins 2000;41:498–503. © 2000 Wiley-Liss, Inc. Key words: sequence alignment; dynamic programming; amino acid sequence; molecular evolution
Citations
|
213
|
DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res
– SF, TL, et al.
- 1997
|
|
67
|
Teukolsky SA, Vetterling WT, Flannery BP
– WH
- 1992
|
|
37
|
Numerical Methods for Unconstrained Optimization and Nonlinear Equations (PrenticeHall
– Jr, Schnabel
- 1983
|
|
36
|
Henikoff JG: Amino acid substitution matrices from protein blocks
– Henikoff
- 1992
|
|
28
|
Gish W: Local alignment statistics. Methods Enzymol
– SF
- 1996
|
|
26
|
Zeitouni O. Limit distribution of maximal non-aligned two-sequence segmental score. Ann Probab
– Dembo, Karlin
- 1994
|
|
24
|
Identification of common molecular subsequences
– TF, MS
- 1981
|
|
19
|
Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes
– Karlin
|
|
15
|
JM: The rapid generation of mutation data matrices from protein sequences
– DT, WR, et al.
- 1992
|
|
14
|
A model of evolutionary change in proteins
– MO, RM, et al.
- 1978
|
|
12
|
ROC methodology in radiologic imaging
– CE
|
|
10
|
Empirical statistical estimates for sequence similarity searches
– WR
- 1998
|
|
9
|
Basic local alignment tool
– Altschul, Gish, et al.
- 1990
|
|
9
|
Measuring the accuracy of diagnostic systems
– JA
- 1988
|
|
6
|
Improved tools for biological sequence analysis
– WR, DJ
- 1988
|
|
5
|
Statistics of Extremes
– EJ
- 1958
|
|
5
|
Etzold T, Argos P: An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited
– Vogt
- 1995
|
|
3
|
Rapid and sensitive protein similarity searches
– DJ, WR
- 1985
|
|
3
|
The COG database: a tool for genome-scale analysis of proteins functions and evolution. Nucleic Acids Res 2000;28:33–36
– RL, MY, et al.
|
|
2
|
Exhaustive matching of the entire protein database
– GH, MA, et al.
- 1992
|
|
2
|
Chothia C, Hubbard TJP. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
– SE
- 1998
|
|
1
|
MS, Sˇ ali A, Blundell TL. Environment-specific amino-acid substitution tables: tertiary templates and prediction of protein folds. Prot Sci
– Overington, Donnelly, et al.
- 1992
|
|
1
|
Statistics theory of extreme values and some practical applications
– EJ
|
|
1
|
SR, Finn RD, Sonnhammer ELL. Pfam 3.1: 1313 multiple alignments match the majority of proteins. Nucleic Acids Res
– Bateman, Birney, et al.
- 1999
|