| Yang, Y. (1995). Noise reduction in a statistical approach to text categorization. In Proceedings of the ACM SIGIR on Research and Development in Information Retrieval. |
....two adjacent feature values of e in the training data. 5. RELATED WORK The use of autonmtically extracted features and dimension reduction techniques for text categorization has been investigated before. Most notably is the use of SVD based truncation to suppress noisy features as suggested in [12, 13]. A similar idea has also been investigated more recently under the title of semantic keaels and utilized in conjunc tion with SVMs for text categorization [14] Yet another approach of deriving document representations that takes semantic similarities of terms into account has been pro posed in ....
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th A CM-S[G[R International Conference on Research and Development in [nfonation Retrieval, pages 256-263, 1995.
....performance. Furthermore, LSI can be alternatively reviewed as a query expansion method (see section 2.2 and 5) so that recall is generally improved. Experiments indicates both improved retrieval precision and recall when LSI is adopted. 4, 6, 10, 2, 1, 21] LSI also improves text categorization [7, 20], and word sense disambiguation [17] Theoretical results [1, 14, 5, 21] have also provided some understanding on the e ectiveness of LSI. These LSI studies have, however, mostly used relatively small text collections and simpli ed document models. In this work we investigate the use of the LSI ....
Y. Yang. Noise Reduction in a Statistical Approach to Text Categorization. Proc. of SIGIR'95 (ACM Press, 1995), pp.256-263.
.... computes a much smaller semantic subspace from the original text collection, which improves recall and precision in information retrieval [Deerwester et al., 1990, Bartell et al.,1995, Zha et al., 1998, Hofmann, 1999, Husbands et al., 2000] information ltering or text classi cation[Dumais, 1995, Yang, 1995, Baker and McCallum, 1998] and word sense disambiguation [Sch utze, 1998] The e ectiveness of LSI in these empirical studies is often attributed to reduction of noise, redundancy, and ambiguity. Synonyms and polysemy problems are somehow reduced in the process. Several recent studies ....
....done in a number of ways. A simple method is to calculate a centroid vector c k of category k, i.e. the average of all documents in the category[Dumais, 1995] All m centroid vectors for m categories form a d m matrix C = c 1 cm ) Another method is to solve for mapping vectors c k so that[Yang, 1995] C = arg min C jjC T X Bjj: In the least square problem, the m n matrix B de nes categories for each document. The Frobenius norm of a matrix J is de ned as jjJ jj 2 = P n i=1 P m k=1 (J k i ) 2 . A new incoming document is then projected onto these centroids or mapping vectors to ....
[Article contains additional citation context not shown here]
Y. Yang. Noise Reduction in a Statistical Approach to Text Categorization. Proc. of SIGIR'95 (ACM Press, 1995), pp.256-263.
....that effectively capture the essential associative semantic relationship between terms and documents, consistent with the empirical evidences and the general intuition. LSI SVD techniques have been used in information filtering (document classification) and computational linguistics (e.g. [4, 14, 15]) Our model applies to these cases too, as long as the semantic structures defined by the dot product similarity remains the essential relationship there. In text classification[4, 15] documents are projected into the LSI subspace; the same semantic relations remain in this new feature space as ....
....have been used in information filtering (document classification) and computational linguistics (e.g. 4, 14, 15] Our model applies to these cases too, as long as the semantic structures defined by the dot product similarity remains the essential relationship there. In text classification[4, 15], documents are projected into the LSI subspace; the same semantic relations remain in this new feature space as in the retrieval cases. In word sense disambiguation[14] the relevant relationship is the cosine between two vectors in the context space and thus our theory applies here also. In all ....
Y. Yang. Noise Reduction in a Statistical Approach to Text Categorization. In Proc. of 18th Annual ACM SIGIR Conference (SIGIR '95) 1995:256-263.
.... The query is transformed to q T U k , the documents are represented as k V T k , and the relevance score is computed as s = q T U k ) k V T k ) Typically taking k = 200 (far less than original dimensions) LSI increases the precision for retrieval and accuracy for classi cation [7, 8, 1, 23, 22]. The success of LSI is attributed to that LSI subspace captures the essential associative semantic relationships better than the original document space, and thus partially resolves the word choice problem. Clearly, a theoretical and quantitative understanding is important. Mathematically, the ....
....the existence of optimal semantic subspace is conclusively demonstrated here for the rst time. If we pick the rst peak as k opt , then k opt = 377 for Cran eld, 184 for Medline, 726 for CACM, and 274 for CISI. k opt values for Medline and CACM are quite close to experimentally determined values [22, 23]. As explained before, the statistical signi cance of each LSI index vector relates to their eigenvalue. The eigenvalues of the latent semantic vectors for Cran eld collection are shown in Figure 1b. Within the range of meaningful latent semantic vectors, 1 i k opt , they follow a Zipf like ....
Y. Yang. Noise Reduction in a Statistical Approach to Text Categorization. Proc. of SIGIR'95 (ACM Press, 1995), pp.256-263.
....request expirations and specify modi Thetacations from external sources. The latter signi Thetacantly increases the amount of data that can be expired. Their ef Thetacient algorithms determine what data can be expired by taking into account the types of updates that may occur. Another work [18] in text categorization uses a statistical learning method, Linear Least Square Fit Mapping, to reduce noise for computational ef Thetaciency. In this study, multiple noise reducing strategies were used and the results show signi Thetacant improvements in ef Thetaciency without losing ....
Y. Yang. #Noise Reduction in a Statistical Approach to Text Categorization#, SIGIR, pp. 256-263, 1995.
....can declaratively request expirations and specify modifications from external sources. The latter significantly increases the amount of data that can be expired. Their efficient algorithms determine what data can be expired by taking into account the types of updates that may occur. Another work [26] in text categorization uses a statistical learning method, Linear Least Square Fit Mapping, to reduce noise for computational efficiency. In this study, multiple noise reducing strategies were used and the results show significant improvements in efficiency without losing categorization accuracy. ....
Y. Yang. "Noise Reduction in a Statistical Approach to Text Categorization", Proceedings of SIGIR, pp. 256-263, 1995.
.... documents for retrieval purposes, and for document filtering (selective dissemination of information) It has been shown that noise reduction prior to categorisation has beneficial effects, both on the accuracy of classification and on the computational efficiency of the classification algorithm [Yan95]. Accordingly a par2 ticular focus of this study was to explore the utility of noise reduction prior to K NN classification. The noise reduction strategy selected involved attribute reduction using a technique based on Rough Set Theory [Paw82] Paw93] The study uses 107 anonymised reports of ....
Y Yang. Noise reduction in a statistical approach to text categorisation. In E. A Fox, P. Ingwersen, and R. Fidel, editors, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pages 256 -- 263, Seattle, Washington, USA, 1995.
.... been studied in two different communities machine learning (ML) and information retrieval (IR) Many algorithms for text classification have been proposed and evaluated in the past, for example, Bayesian classifiers, k nearest neighbors, neural networks, rule learning algorithms, and many more [13, 16, 30, 1, 31, 11, 17, 5, 18]. Most studies use Rocchio s method [21] a well known algorithm in the IR community (usually used for relevance feedback and document routing) as a comparison baseline for their classifiers. One aim of this study is to examine the relative merits of a fairly new ML algorithm called boosting, ....
Yiming Yang. Noise reduction in a statistical approach to text categorization. In Edward Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 256--263. Association for Computing Machinery, New York, July 1995.
....which is the vector sum of all the feature vectors of all the documents in that class. A new document is labelled with the class of the centroid to which its feature vector is closest, as measured by the cosine similarity between the two vectors. The Linear Least Squares Fit (LLSF) method [22] is another classification algorithm based on PCA, which is equivalent to Dumais use of LSI for classification except that LLSF uses the dot product to compute similarity instead of the cosine and is thus sensitive to the length of the two vectors being compared. 4 Experimental Results This ....
Yiming Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256--263, 1995.
.... has been studied in two different communities machine learning (ML) and information retrieval (IR) Many algorithms for text filtering have been proposed and evaluated in the past, for example, Bayesian classifiers, k nearest neighbors, neural networks, rule learning algorithms, and many more [17, 20, 40, 2, 41, 14, 22, 8, 24]. Most studies use Rocchio s method [28] a well known algorithm in the IR community (traditionally used for relevance feedbackand more recently for document routing [38] as a comparison baseline for their classifiers. However, most such studies use a weak version of Rocchio s algorithm, not ....
....used for evaluating text filtering effectiveness, and do not use it in this study. Some other measures that have been used to evaluate text filtering are: ffl Average precision, or precision at a fixed rank cutoff: Many studies have used one of these measures to evaluate filtering effectiveness [2, 40, 41, 22, 1, 7]. These measures are intended to evaluate the ranking effectiveness of a system [31] not its filtering effectiveness. Even though the filtering effectiveness of a system is related to its ranking effectiveness, this relationship is not strong enough to use ranking evaluation measures to evaluate ....
Yiming Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 256--263, July 1995.
....for subsequent processing which are not discussed in detail here. However, the essential consequence is that the resulting decision vector can not be normalized to length 1. The linear classifier is identical to the LLSF (linear least square fit) classifier described by Yang (see [7] and [8]) However, the mathematical principle is different in general if higher order polynomials are used. In this case, a non linear function (e.g. quadratic polynomial) maps the feature space to the decision space yielding better separation of categories in the decision space. 4 Experiments The ....
Y. Y. Yang, Noise reduction in a statistical approach to text categorization, Proceedings, 18th Int. ACM-SIGIR Conf. on Research and Development in Information Retrieval, Seattle, WA, 1995.
No context found.
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256--263, 1995.
No context found.
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256-263, 1995.
No context found.
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256-263, 1995.
....requirement is O(Nks ) for storing either U or V. The value of ks can be empirically chosen through validation. In our experiments with LLSF on benchmark collections (Reuters news stories, MEDLINE documents, etc) we observed the optimal ranges of ks to be between a few hundred and one thousand[16]. Step 2. Compute the pseudo inverse = VS 1 U = X US 2 U The time complexity here is O(ksN ) dominated by the computation of US 2 U . The space complexity is O(NV ) for storing matrix X . Step 3. Compute the solution matrix W # = X Y. Since the matrix Y ....
....needed for the inverted indexing is also O(NLc ) Matrix X would make the space complexity O(V M) if we need to keep it in a dense form. However, our previous work showed that aggressive elimination of non influential elements from that matrix would not cause any loss of classification accuracy[16]. Among the above steps, the dominating part in the training time of LLSF is the matrix multiplication in Step 2, with a complexity of O(ksN ) As for the space complexity, the dominating part is the storage required for matrix X , with a complexity of O(NV ) For the testing phase, the ....
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256--263, 1995.
.... investigations on suitable choices of these parameter values were reported in previous papers where the main observations were that the performance of kNN is relatively stable for a large range of k values[22] and that satisfactory performance of LLSF depends on whether p is sufficiently large[23]. Given the large number of possible combinations of parameter values, exhaustive testing of all the combinations is neither practical nor necessary. We take a greedy search strategy for parameter tuning. That is, we first subjectively decide the order of parameters to be tuned, and then ....
.....004 CPU second per test document on average. LLSF is an eager learning method, and has a off line training phase and an on line testing phase. The training phase has a quadratic time complexity, O(pn 0 ) where p is the number of singular vectors used for computing an approximated LLSF solution[23], and n 0 = maxfm;ng is the larger number between n, the number of training documents, and m, the number of unique terms in the training documents. This quadratic complexity is the computational bottleneck for scaling this method to large applications. Once the training is done, the on line ....
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256--263, 1995.
....l documents. This can be expensive for very large applications. Fortunately, we found it possible to significantly reduce this complexity by aggressively removing non influential elements from the transformed document vectors without sacrificing retrieval performance, as shown in our previous work[28] and in the empirical results of this study (Section 5.4) 3.3 Latent Semantic Indexing Latent Semantic Indexing[10] LSI) is a one step extension of GVSM. The claim is that neither terms nor documents are the optimal choice for the orthogonal basis of a semantic space, and that a reduced vector ....
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256--263, 1995.
.... investigations on suitable choices of these parameter values were reported in previous papers where the main observations were that the performance of kNN is relatively stable for a large range of k values[22] and that satisfactory performance of LLSF depends on whether p is sufficiently large[23]. Given the large number of possible combinations of parameter values, exhaustive testing of all the combinations is neither practical nor final.tex; 7 08 1998; 15:29; p.16 INRT Journal 1998 17 necessary. We take a greedy search strategy for parameter tuning. That is, we first subjectively ....
....document on average. LLSF is an eager learning method, and has a off line training phase and an on line testing phase. The training phase has a quadratic time complexity, O(pn 0 ) where p is the number of principal components (singular vectors) used for computing an approximated LLSF solution[23], and n 0 = maxfm;ng is the larger number between n, the number of training documents, and m, the number of unique terms in the training documents. This quadratic complexity is the computational bottleneck for scaling this method to large applications. Once the training is done, the on line ....
Y. Yang. Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 256--263, 1995.
....the bilingual training corpus. The time complexity in the second part, is O(n) per document, or O(nl) for a test corpus of l documents. It is possible to significantly reduce this complexity in large problems by aggressively removing non influential elements from the transformed document vectors [ Yang, 1995 ] 3.3 Latent Semantic Indexing Latent Semantic Indexing [ Deerwester et al. 1990 ] LSI) is a one step extension of GVSM. The claim is that neither terms nor documents are the optimal choice for the orthogonal basis of a semantic space, and that a reduced vector space consisting of the most ....
Y. Yang. Noise Reduction in a Statistical Approach to Text Categorization. In Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95), pages 256--263, 1995.
No context found.
Yang, Y. (1995). Noise reduction in a statistical approach to text categorization. In Proceedings of the ACM SIGIR on Research and Development in Information Retrieval.
No context found.
Yang, Y., Noise Reduction in a Statistical Approach to Text Categorization, In the Proceedings of the 18 th International Annual ACM/SIGIR Conference, 1995. BIBLIOGRAPHY 68
No context found.
Yang, Y., Noise Reduction In A Statistical Approach To Text Categorization, SIGIR-95, 1995. 8
No context found.
Yang, Y., Noise Reduction in a Statistical Approach to Text Categorization, SIGIR-95, pp. 256-263, 1995.
No context found.
, pages 256--263, Seattle, Washington, USA, July 9-13 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC