| Wu, C.H., Berry, M., Fung, Y., McLarty, J.: Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition. Machine Learning 21 (1995) 177--193KL or the Karhunen-Loeve transform is also known as Principal Component Analysis (PCA) |
.... parametric fitting problems involving linear least squares estimation, can also be effectively resolved with the aid of the SVD [2, 41, 17] Finally, the latter has also proven useful in signal processing applications [35, 28] and pattern recognition techniques such as neural networks computing [7, 43] and principal components analysis [4] This paper deals with the problem of computing the Jacobian of the SVD components of a matrix with respect to the elements of this matrix. Knowledge of this Jacobian is important as it is a key ingredient in tasks such as non linear optimization and error ....
C. Wu, M. Berry, S. Shivakumar, and J. McLarty. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, 21(1-2), 1995.
.... Neural networks Xu et al. 34] Splice sites in DNA Pattern matching Wang et al. 31] Markov chain Salzberg [26] Promoters in DNA Neural networks Opitz et al. 21] Decision tree Hirsh et al. 11] Protein classification rules Hidden Markov model Krogh et al. 13] Neural networks Wu et al. [33] Protein motifs Minimum description length Brazma et al. 2] Table 1: A summary of work performed for biomolecular data mining. In this chapter we propose a two level approach to recognizing E. Coli promoters in DNA sequences. The first level classifiers include Bayesian neural networks [18, ....
Wu, C. H., Berry, M., Fung, Y. S., and McLarty, J. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, 21:177--193, 1995. 25
....PVKTNVK, the 2 gram amino acid encoding method gives the following result: 1 for PV (indicating PV occurs once) 2 for VK (indicating VK occurs twice) 1 for KT, 1 for TN, and 1 for NV. We also adopt the 6 letter exchange group fe 1 ; e 2 ; e 3 ; e 4 ; e 5 ; e 6 g to represent a protein sequence [37], where e 1 2 fH; R; Kg, e 2 2 fD; E; N; Qg, e 3 2 fCg, e 4 2 fS; T; P; A; Gg, e 5 2 fM; I; L; Vg, e 6 2 fF; Y; Wg. Exchange groups represent conservative replacements through evolution. These exchange groups are effectively equivalence classes of amino acids and are derived from PAM [12] 2 For ....
....it would require many weight parameters and training data. This makes it difficult to train the neural network a phenomenon called curse of dimensionality. Different methods have been proposed to solve the problem by careful feature selection and by scaling of the input dimensionality [9, 37]. We propose here to select relevant features (i.e. 2 grams) by employing a distance measure to calculate the relevance of each feature. 3 Let X be a feature and let x be its value. Let P (xjClass = 1) and P (xjClass = 0) denote the class conditional density functions for feature X, where Class ....
C. H. Wu, M. Berry, Y. S. Fung, and J. McLarty. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning 21, 177--193, 1995.
.... parametric fitting problems involving linear least squares estimation, can also be e#ectively resolved with the aid of the SVD [2,37,16] Finally, the latter has also proven useful in signal processing applications [31,26] and pattern recognition techniques such as neural networks computing [7,39] and principal components analysis [4] This paper deals with the problem of computing the Jacobian of the SVD components of a matrix with respect to the elements of this matrix. Knowledge of this Jacobian is important as it is a key ingredient in tasks such as non linear optimization and error ....
C. Wu, M. Berry, S. Shivakumar, and J. McLarty. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, 21(1-2), 1995.
....Fax: 973) 596 5777. z Bioinformatics Program, Protein Information Resource, National Biomedical Research Foundation (NBRF PIR) Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007 (wuc nbrf.georgetown.edu) extremely important in accelerating genome processing [1, 3, 7, 8, 11, 15, 16, 17, 18, 19, 20, 21]. Classification, or supervised learning, is one of the major data mining processes. Classification is to assign a set of data into two or more categories. When there are only two categories, it is called binary classification. In this paper we focus on binary classification of DNA sequences. ....
Wu, C. H., Berry, M., Fung, Y. S., and McLarty, J. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, 21:177-- 193, 1995.
.... in dimensionality can be achieved when used within IR systems; for example from 5000 7000 terms to about 100 dimensions (Deerwester et al. 1990) SVD has also been successfully applied to the problem of reducing the dimensionality of protein sequence data for presentation to neural networks (Wu, Berry, Shivakumar, McLarty 1995). The size of the input vectors presented to a backward propagation neural network was reduced from 9696 to 100. In addition to this, the predictive accuracy of the neural network improved when SVD was used. 2 DIMENSIONALITY REDUCTION AND ATTRIBUTE SELECTION 8 Discussion The various models ....
....been demonstrated to both improve performance of IR and text categorisation systems, and reduce the number of dimensions (i.e. attributes) required. This technique has also been used to reduce the dimensionality of data for other problems, such as within the task of protein sequence classi cation (Wu, Berry, Shivakumar, McLarty 1995). However, such studies have demonstrated that LSI and the principals behind this method work for speci c problems, but have not investigated the applicability of LSI to a broader range of classi cation tasks. For this reason, we have investigated a similar technique, based on Correspondence ....
Wu, C., Berry, M., Shivakumar, S., and McLarty, J. (1995). Neural Networks for Full-Scale Protein Sequence Classication: Sequence Encoding with Singular Value Decomposition. Machine Learning 21, 177-193.
....new space with reduced dimensionality. SVD is an algebraic tool used by correspondence analysis (Greenacre, 1984) to identify the principal components of a given problem. A recent study utilised this property to reduce the dimensionality of protein sequence data for presentation to neural networks (Wu et al. 1995). A k nearest neighbour learning algorithm that utilises SVD to reduce the number of dimensions when performing classifications is currently under development. The algorithm (CAIL) will be compared to a basic k nearest neighbour algorithm that uses the Euclidean and overlap distance metrics to ....
Wu, C., Berry, M., Shivakumar, S., and McLarty, J. (1995). Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition. Machine Learning 21, 177--193.
....and information retrieval work. Hull [17] and Yang and Chute [29] have used LSI SVD as the first step in conjunction with statistical classification (e.g. discriminant analysis) Using the LSI derived dimensions effectively reduces the number of predictor variables for classification. Wu et al. in [28] also used LSI SVD to reduce the training set dimension for a neural network protein classification system used in human genome research. 6 Acknowledgements The authors would like to thank Gavin O Brien at the National Institute of Standards and Technology (NIST) for his help with the ....
C. WU, M. BERRY, S. SHIVAKUMAR, AND J. MCLARTY, Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition, Machine Learning, (1994). To appear.
....related proteins. This includes methods that screen for motif patterns such as those cataloged in the PROSITE database [Bairoch Bucher, 1994] the Profile method [Gribskov et al. 1987] the hidden Markov model [Krogh et al. 1994] and the neural network classification method [Wu et al. 1992; Wu, 1995]. As a database search tool, the family based (classification) approach has two major advantages over the pair wise comparison methods [Wu, 1993] 1) speed, because the search time grows linearly with the number of sequence families, instead of the number of sequence entries; and (2) ....
....family uses an individual three layered, feed forward, back propagation network. conserved family information embedded in local motif patterns to improve search accuracy. While we used the first design concept in our previous gene classification artificial neural system (GenCANS) reviewed in Wu, 1995], we introduced two new designs to implement the second concept, an n gram term weighting algorithm for extracting local motif patterns, and integrated neural networks for combining global (full length) and local (motif) sequence information. As depicted in Figure 1, the MOTIFIND search involves ....
[Article contains additional citation context not shown here]
Wu, C. H., Berry, M., Shivakumar, S. & McLarty, J. (1995) Neural networks for full-scale protein sequence classification:Sequence encoding with singular value decomposition. Machine Learning, 21, 1-17.
....and information retrieval work. Hull [16] and Yang and Chute [28] have used LSI SVD as the first step in conjunction with statistical classification (e.g. discriminant analysis) Using the LSI derived dimensions effectively reduces the number of predictor variables for classification. Wu et al. in [27] also used LSI SVD to reduce Using Linear Algebra for Intelligent Information Retrieval 23 the training set dimension for a neural network protein classification system used in human genome research. 6. Acknowledgements. The authors would like to thank the referees for their helpful comments ....
C. Wu, M. Berry, S. Shivakumar, and J. McLarty, Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition, Machine Learning, (1994). To appear.
No context found.
Wu, C.H., Berry, M., Fung, Y., McLarty, J.: Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition. Machine Learning 21 (1995) 177--193KL or the Karhunen-Loeve transform is also known as Principal Component Analysis (PCA)
No context found.
Wu, C.H., Berry, M., Fung, Y., McLarty, J.: Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition. Machine Learning 21 (1995) 177--193KL or the Karhunen-Loeve transform is also known as Principal Component Analysis (PCA)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC