97 citations found. Retrieving documents...
David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81-93, Las Vegas, US, 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Robustness of Regularized Linear Classification Methods in Text .. - Zhang, Yang (2003)   (2 citations)  (Correct)

....a fee. SIGIR 03, July 28 August 1, 2003, Toronto, Canada. Copyright 2003 ACM 1 58113 646 3 03 0007 . 5.00. decision trees, Bayesian probabilistic classi ers, neural networks, regression methods, SVM, etc. And lots of empirical studies in text categorization have been done in recent years[9, 6, 15] which investigate di erent aspects of classi cation methods. Text categorization problems can be characterized as dealing with high dimensional and sparse data, and usually accompanied by skewly distributed categories. These characteristics together make text categorization di erent from ....

D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), Nevada, Las Vegas, 1994. University of Nevada, Las Vegas.


Evaluating OCR and Non-OCR Text Representations for Learning.. - Junker, Hoch (1997)   (3 citations)  (Correct)

....[19] An alternative to the classical word based representations are n grams [2, 4, 7] The advantage of this technique is its language independency. Classifier construction Many different learning models have been proposed for the classification of documents including decision trees [20, 11], rule learning [1, 3, 16] Bayesian classifiers [14, 10, 11, 8] nearestneighbor methods [15] and neural networks [12, 19] 3 Flexible tool for generating text representations Fig.2 illustrates the raw design of our system for the transformation of text documents into feature representation. ....

....representations are n grams [2, 4, 7] The advantage of this technique is its language independency. Classifier construction Many different learning models have been proposed for the classification of documents including decision trees [20, 11] rule learning [1, 3, 16] Bayesian classifiers [14, 10, 11, 8], nearestneighbor methods [15] and neural networks [12, 19] 3 Flexible tool for generating text representations Fig.2 illustrates the raw design of our system for the transformation of text documents into feature representation. In the process of deriving relevant features from the training ....

D. Lewis and M. Ringuette. A Comparison of Two Learning Algorithms for Text Categorization. In Proceedings of the 3rd Symposium on Document Analysis and Information Retrieval (SDAIR 94), pages 81--93, Las Vegas, NV, USA, April 11-13 1994.


Categorizing Photographs for User-Adapted.. - Hidalgo, Quejido, ..   (Correct)

....are artificially constructed using the text surrounding the hyperlinks to the pages to be categorized. 4. 3 Machine Learning Linear Classifiers For training a text categorization system, a number of Machine Learning approaches have been tested, including decision tree and rule based learners [1, 5, 9, 13], probabilistic classifiers like Naive Bayes [9, 11, 12] neural networks [6, 20] instancebased classifiers like kNN [9, 25] etc. See [21] for other approaches. An important subclass of learning approaches are those which learn linear classifiers, like Rocchio, Widrow Hoff, or Winnow algorithms ....

Lewis, D.D. and Ringuette, M. (1994) A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81-93.


Information Retrieval Using Statistical Classification - Hull (1994)   (2 citations)  (Correct)

....will be helpful for solving the traditional retrieval problem. If one thinks of each query as generating a distinct category topic, the information retrieval problem is strongly related to text categorization. Several recent papers have presented approaches to this problem. Lewis and Ringuette [49] examine models based on Bayesian estimation of word parameters and decision trees. Apte et al. 4] use rule induction methods to estimate category membership labels. Cutting et al. 18] describe the summarization task defined by question (5) as browsing. One can use the analogy of a textbook. ....

....successful at categorizing new data. The optimal tree is chosen by crossvalidation using a cost complexity function that balances the estimated error rate and the size of the tree. Classification trees have been previously applied in information retrieval with mixed success. Lewis and Ringuette [49] apply it to the text categorization problem on the Reuters collection and achieve reasonably good results. Tong et al. 78] apply it to the routing problem on the TREC corpus where it is not nearly as successful. Both implementations use words as features. The different levels of performance can ....

David Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval. University of Nevada, Las Vegas, 1994.


Improving Text Classification by Shrinkage in a.. - McCallum, Rosenfeld, .. (1998)   (55 citations)  (Correct)

....more important. A variety of recent work has demonstrated the success of statistical approaches for learning to classify text documents [ Joachims, 1997; Koller and Sahami, 1997; Yang and Pederson, 1997; Nigam et al. 1998 ] These approaches, such as TFIDF [ Salton, 1991 ] and naive Bayes [ Lewis and Ringuette, 1994 ] typically represent documents as vectors of words, and learn by gathering statistics from the observed frequencies of these words within documents belonging to the different classes. Because they rely on these learned word statistics, these approaches are data intensive: they often require ....

....of work in the Information Retrieval and Machine Learning communities has demonstrated the success of statistical approaches for learning to classify text documents. Naive Bayes has been used for text classification, and due to its probabilistic foundations, been applied in several extensions [ Lewis and Ringuette, 1994; Joachims, 1997; Nigam et al. 1998 ] An earlier approach to hierarchical document classification, the Pachinko Machine, has been proposed by Koller and Sahami [ Koller and Sahami, 1997 ] Their method differs significantly from shrinkage. The Pachinko Machine classifies documents at internal ....

David Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval, pages 81--93, 1994.


A Re-Examination of Text Categorization Methods - Yang, Liu (1999)   (148 citations)  (Correct)

....newswire stories, for example, is one the most commonly investigated application domains in the TC literature. An increasing number of learning approaches havebeenapplied, including regression models[9, 32] nearest neighbor classification[17, 29, 33, 31, 14] Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3], decision trees[9, 16, 20, 2, 12] inductive rule learning[1, 5, 6, 21] neural networks[28, 22] on line learning[6, 15] and Support Vector Machines [12] While the rich literature provides valuable information about individual methods, clear conclusions about crossmethod comparison have been ....

....most commonly investigated application domains in the TC literature. An increasing number of learning approaches havebeenapplied, including regression models[9, 32] nearest neighbor classification[17, 29, 33, 31, 14] Bayesian probabilistic approaches [25, 16, 20, 13, 12, 18, 3] decision trees[9, 16, 20, 2, 12], inductive rule learning[1, 5, 6, 21] neural networks[28, 22] on line learning[6, 15] and Support Vector Machines [12] While the rich literature provides valuable information about individual methods, clear conclusions about crossmethod comparison have been difficult because often the ....

[Article contains additional citation context not shown here]

D.D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization.InProceedings of the ThirdAnnual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.


Towards An Adaptive Mail Classifier - Manco, Masciari, Ruffolo, Tagarelli (2002)   (Correct)

.... a classifier system capable of minimizing three main measures: error rate, false positive rate, and false negative rate [27] Among the techniques used for classifying spam messages, we mention the Rocchio approach (and similar approaches based on Support Vector Machines [7] Papers [4] and [13] describe two rule based systems exploiting text mining techniques for the classification of e mail correspondence. These approaches differ mainly in the preprocessing phase: in the first approach a simple boolean vector model is used; on the other side, 13] proposes a frequency based vector ....

....Machines [7] Papers [4] and [13] describe two rule based systems exploiting text mining techniques for the classification of e mail correspondence. These approaches differ mainly in the preprocessing phase: in the first approach a simple boolean vector model is used; on the other side, [13] proposes a frequency based vector model. Other approaches [21, 17, 2, 22] are mainly based on bayesian classification. Although very sensitive to inherent preprocessing steps, such as lemmatization and stop lists, bayesian approaches have been shown more accurate than rule based approaches [17, ....

D. Lewis and W. Gale. A Comparison of Two Learning Algorithms for Text Categorization. In Proc. Symposium on Document Analysis and Information Retrieval, 1994.


Arg: A Tool For Automatic Report Generation - Karakaya, Güvenir (2002)   (Correct)

....may be defined as the problem of assigning free text documents to predefined categories. A growing number of statistical learning methods have been applied to this problem in recent years, including regression models[7,27] nearest neighbor classifiers[11,14,25,26,28] Bayes belief networks[3,9,10,12,16,17,18,23], decision trees[2,9,12,18] neural networks[22,24] and inductive rule learning techniques[ 1,5,6,19] Recent studies have proved the success of statistical approaches for learning to classify text documents [3,10,16,20,26] These approaches commonly represent documents as vectors of words, and ....

....assigning free text documents to predefined categories. A growing number of statistical learning methods have been applied to this problem in recent years, including regression models[7,27] nearest neighbor classifiers[11,14,25,26,28] Bayes belief networks[3,9,10,12,16,17,18,23] decision trees[2,9,12,18], neural networks[22,24] and inductive rule learning techniques[ 1,5,6,19] Recent studies have proved the success of statistical approaches for learning to classify text documents [3,10,16,20,26] These approaches commonly represent documents as vectors of words, and learn by gathering ....

Lewis,D.D. and Ringuette,M., Comparison of two learning algorithms for text categorization, In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.


Using Clustering to Boost Text Classification - Fang, Parthasarathy, Schwartz   (Correct)

....based on the information learned from a labeled training data. The labeled training data involves manual categorization of the data by subject experts. A large number of learning approaches have been brought to bear on this problem domain including, statistical methods [1] Bayesian learning [2, 4, 10, 11, 5, 12, 13], decision trees [3, 14, 4, 10, 5] neural networks [15, 16] and support vector machines [5] In this paper we evaluate the performance of two such methods for the automatic categorization of articles published by an international journal in the geological sciences. The first method we consider ....

....a labeled training data. The labeled training data involves manual categorization of the data by subject experts. A large number of learning approaches have been brought to bear on this problem domain including, statistical methods [1] Bayesian learning [2, 4, 10, 11, 5, 12, 13] decision trees [3, 14, 4, 10, 5], neural networks [15, 16] and support vector machines [5] In this paper we evaluate the performance of two such methods for the automatic categorization of articles published by an international journal in the geological sciences. The first method we consider is the Nave Bayes method, a popular ....

D. D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.


Hierarchical Text Categorization Using Neural Networks - Ruiz, Srinivasan (2002)   (6 citations)  (Correct)

....Retrieval (IR) and the Arti cial Intelligence (AI) communities. Di erent approaches such as decision trees (ID3) 25] rule learning [1] neural networks [27, 39] linear classi ers [19] K nearest neighbor (KNN) algorithms [43] support vector machine (SVM) 11] and Naive Bayes methods [17, 21] have been explored. Interestingly it is only recently that researchers [14, 22, 24, 27, 38] have tried to take advantage of the hierarchical structure available in certain classi cation schemes, e.g. Medical Subject Headings (MeSH) Yahoo topic hierarchy. The hierarchical structure of a ....

Lewis D and Ringuette M. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.


The Power of Word Clusters for Text Classification - Slonim, Tishby (2001)   (8 citations)  (Correct)

....code of this agglomerative procedure. 4 The Naive Bayes Classifier We now turn to describe the classification algorithm used in this work. The Naive Bayes classifier induces a well known classification framework, with rich empirical support in the context of document classification (e.g. 1] 7] [10] [22] Moreover, it is a relatively simple procedure, which allows for a detailed analysis of the effect of using word clusters instead of words as features. Therefore, in this work we prefer to concentrate on using the naive Bayes classifier, and leave the combination of more sophisticated ....

D. D. Lewis and M. Ringuette. A comparison of two Learning Algorithms for Text Categorization. In Third Ann. Symp. on Document Analysis and IR, pages 81--93, 1994.


Learning to Construct Knowledge Bases from the World.. - Craven, DiPasquo.. (2000)   (74 citations)  (Correct)

.... document given its class, this approach, when used with Bayes Rule is often called naive Bayes.The conditional independence assumption is clearly violated in real world data, however, despite these violations, empirically the naive Bayes classifier does a good job of classifying text documents [25,35,40,67]. This observation is in part explained by the fact that classification estimation is only a function of the sign (in binary cases) of the function M. Craven et al. Artificial Intelligence 118 (2000) 69 113 79 estimation [13,22] The word independence assumption causes naive Bayes to give ....

....[58] logistic regression [58] Widrow Hoff and the exponentiated gradient (EG) algorithm [34] Another useful line of research in text classification comes from basic ideas in probability and information theory. Bayes Rule has been the starting point for a number of classification algorithms [1,2,33,35,43,47],and the Minimum Description Length principle has been used as the basis of an algorithm as well [32] Another line of research has been to use symbolic learning methods for text classification. Numerous studies have used algorithms such as decision trees, Swap 1, Ripper and Charade can be found ....

[Article contains additional citation context not shown here]

D.D. Lewis, M. Ringuette, A comparison of two learning algorithms for text categorization, in: Proc. Third Annual Symposium on Document Analysis and Information Retrieval, 1994, pp. 81--93.


Using Machine Learning To Improve Information Access - Sahami (1999)   (15 citations)  (Correct)

....classifier [64] which we discuss in detail in subsequent chapters) is often quite effective for use in text domains, even though the strong statistical independence assumptions it makes are often clearly violated in practice. Further support for this point is also seen in Lewis and Ringuette s [110] empirical work comparing such simple probabilistic classifiers with decision trees that are able to capture the probabilistic dependencies between words in a document. Lewis and Gale [109] CHAPTER 4. RELATED WORK IN INFORMATION ACCESS 51 also explore methods for reducing the amount of data ....

....work [115] has tried to address this issue more directly, arguing in favor of incorporating frequency information in classification. Nevertheless, it has also been observed empirically that applying such symmetric probabilistic models to text classification problems is still quite successful [110, 95]. We discuss the classification problem more fully in the next three chapters. 6.6 Conclusion We have presented a probability based measure for document similarity that is quite effective for clustering. We have also shown how the widely used cosine similarity coefficient can be captured as a ....

Lewis, D. D., and Ringuette, M. A comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (1994), pp. 81--93.


Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (10 citations)  (Correct)

....and is significantly less expensive than manual construction because the algorithm automatically constructs the decision rule itself. Supervised text classification algorithms have been successfully used in a wide variety of practical domains. A few examples are: cataloging news articles (Lewis Gale, 1994; Joachims, 1998) and web pages (Craven et al. 2000; Shavlik EliassiRad, 1998) learning the reading interests of users (Pazzani et al. 1996; Lang, 1995) and sorting electronic mail (Lewis Knowles, 1997; Sahami et al. 1998) However, the supervised learning approach is not as effortless ....

....are violated in real world text data. Documents are often mixtures of multiple topics. Words within a document are not independent of each other grammar and topicality make this so. Despite these violations, empirically the Naive Bayes classifier does a good job of classifying text documents (Lewis Ringuette, 1994; Craven et al. 2000; Yang Pedersen, 1997; Joachims, 1997; McCallum et al. 1998) This observation is explained in part by the fact that classification estimation is only a function of the sign (in binary classification) of the function estimation (Domingos Pazzani, 1997; Friedman, 1997) ....

[Article contains additional citation context not shown here]

Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81--93.


Using Compression For Source Based Classification Of Text - Thaper (2001)   (Correct)

....categorization. 1.3 Related Work Content based text categorization has been investigated by a number of people using different techniques. Dumais et al. [13] have used the Bayes net technique for this purpose. A decision tree approach using the divide and conquer strategy has been described in [27]. Ng et al. have used neural nets for classification in [30] Frank et al. [17] used compression for this task but concluded that compression was not the best choice for this application because content is usually determined by a few important keywords and compression techniques tend to be ....

D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proc. Annual Symposium on Document Analysis and Information Retrieval, pages 37--50, 1994.


On Integrating Catalogs - Agrawal, Srikant (2001)   (65 citations)  (Correct)

....examples of objects for each category is a much studied topic in the statistics, machine learning, and data mining literature. See, for instance, 13] for a comprehensive review of various classification techniques. Naive Bayes classifiers [7] are competitive with other techniques in accuracy [3] [10] [8] 15] 12] They are also fast: building the model requires only a single pass over the documents and they quickly classify new documents. Our proposed solution is also Bayesian. The observation that the classification techniques can be used to assign documents to a hierarchy has been ....

D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In In Third Annual Symposium on Document Analysis and Information Retrieval, pages 81--92, 1994.


A Study on Thresholding Strategies for Text Categorization - Yang (2001)   (11 citations)  (Correct)

....tuned in the same fashion as tuning t for RCut, by varying the value of x until the global performance of the classifier is optimized on the validation set. PCut was used in several published evaluations of probabilistic classifiers, Naive Bayes, DTree, kNN and the LLSF regression methods[8, 9, 16]. SCut Score a validation set of documents for each category and tune the threshold over the local pool of scores until the optimal performance of the classifier is obtained for that category; fix the per category thresholds when applying the classifier to new documents in the test set. ....

D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), Nevada, Las Vegas, 1994. University of Nevada, Las Vegas.


Text Categorization for Multi-page Documents: A Hybrid.. - Frasconi, Soda, Vullo (2001)   (Correct)

....example document belongs to) and attempts to infer a function that will map new documents into their categories. Several algorithms have been proposed within this framework, including regression models [29] inductive logic programming [6] probabilistic classifiers [17, 21, 16] decision trees [18], neural networks [22] and more recently support vector machines [12] Research on text categorization has been mainly focused on non structured documents. In the typical approach, inherited from information retrieval, each document is represented by a sequence of words, and the sequence itself ....

D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proc. 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.


Automatic Text Categorization by Unsupervised Learning - Youngjoong Ko Department (2000)   (Correct)

....classifier. The naive Bayes classifier is one of the statistical text classifiers that use word frequencies as features. Other examples include k nearest neighbor (Yang Y. et al 1994) TFIDF Roccio (Lewis D.D. et al. 1996) support vector machines (Joachims T. et al. 1998) and decision tree (Lewis D.D. et al 1994). 1 Proposal: A text categorization scheme The proposed system consists of three modules as shown in Figure 1; a module to preprocess collected documents, a module to create training sentence sets, and a module to extract features and to classify text documents. Figure1: Architecture for the ....

Lewis D.D. and Ringuette M. (1994) A comparison of Two Learning Algorithms for Text categorization.


Predictive Self-Organizing Networks for Text Categorization - Tan (2001)   (Correct)

....class label. In recentyears, there has been an increasing number of statistical and machine learning techniques that automatically generate text categorization knowledge based on training examples. Such techniques include decision trees [2] K nearest neighbor system (KNN) 7,19] rule induction [8], gradient descent neural networks [9, 16] regression models [18] Linear Least Square Fit (LLSF) 17] and support vector machines (SVM) 6, 7] All these statistical methods adopt a supervised learning paradigm. During the learning phase, a classifier derives categorization knowledge from a set ....

D.D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings, Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), Las Vegas, pages 81--93, 1994.


Hierarchical Classification of Web Content - Dumais, Chen (2000)   (32 citations)  (Correct)

....methods to supplement human effort in creating structured knowledge hierarchies. A wide range of statistical and machine learning techniques have been applied to text categorization, including multivariate regression models [8,22] nearest neighbor classifiers [26] probabilistic Bayesian models [13, 16], decision trees [16] neural networks [22, 25] symbolic rule learning [1, 4] and support vector machines [7,12] These approaches all depend on having some initial labeled training data from which category models are learned. Once category models are trained, new items can be added with little ....

....effort in creating structured knowledge hierarchies. A wide range of statistical and machine learning techniques have been applied to text categorization, including multivariate regression models [8,22] nearest neighbor classifiers [26] probabilistic Bayesian models [13, 16] decision trees [16], neural networks [22, 25] symbolic rule learning [1, 4] and support vector machines [7,12] These approaches all depend on having some initial labeled training data from which category models are learned. Once category models are trained, new items can be added with little or no additional ....

Lewis, D.D. and Ringuette, M.. A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 81-93, 1994.


Feature Selection, Perceptron Learning, and a Usability Case Study.. - Ng (1997)   (6 citations)  (Correct)

....higher performance (see Section 6) Hence, we only use the local dictionary method on the Reuters corpus. We also require a word to occur at least five times in the training texts to be chosen as a feature. This measure is quite widely used, for example in the work of [Hearst et al. 1996] and [Lewis and Ringuette, 1994]. This is beneficial since infrequent words are not reliable indicators for use as features. After words are chosen according to the local dictionary method, and after eliminating words with infrequent occurrence, we select a set of n features by applying a feature selection metric. The features ....

....the performance of Classi on the Reuters corpus. We experimented with several different number of features as listed in Table 2. We also show the effect of the three feature selection methods used. The accuracy measure in Table 2 is the micro averaged break even points, the same measure as used in [Lewis and Ringuette, 1994]. At a break even point, recall and precision are the same. Micro averaging combines the recall and precision values of all the categories by summing the true positive, true negative, false positive, and false negative counts across all categories. From Table 2, it is evident that our new feature ....

[Article contains additional citation context not shown here]

David Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, 1994.


Comparing Feature Sets for Learning Text Categorization - Spitters (2000)   (Correct)

....genetic algorithms; inductive logic programming; maximum entropy learning and explanation based learning. Research aimed at the application of machine learning methods to text categorization has been conducted and reported among others by Masand, Lino , and Waltz (1992) memory based learning) Lewis and Ringuette (1994) (naive Bayes, decision trees) Tong and Appelbaum (1994) decision trees) Apt e, Damerau, and Weiss (1994) rule based induction methods) Sch utze, Hull, and Pedersen (1995) linear discriminant analysis, logistic regression, neural networks) Ng, Goh, and Low (1997) neural networks) ....

Lewis, D. and M. Ringuette. 1994. A comparison of two learning algorithms for text categorization. Symposium on Document Analysis and Information retrieval.


Text Categorization Using Weight Adjusted k-Nearest.. - Han, Karypis, Kumar (1999)   (7 citations)  (Correct)

....decision tree induction algorithm like C4.5 [Qui93] or rule induction algorithms such as C4.5rules [Qui93] and RIPPER [Coh95] do not work well with large number of attributes. Even though Naive Bayes classification techniques, such as Rainbow [McC96] are popular in text categorization [LG94, LR94, Lew98, MN98] they have a major limitation due to the independence assumption they make while words in document data sets tend to be dependent. k nearest neighbor (k NN) classification is an instance based learning algorithm that has shown to be very effective in text classification [Yan94, ....

D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994.


Machine Learning and Natural Language Processing - Marquez (2000)   (1 citation)  (Correct)

.... (either the basic version or other variations and hybrids) applied to the following NLP disambiguation tasks: Context sensitive spelling correction [88, 89] PoS tagging [194, 189] PP attachment disambiguation [52] Word sense disambiguation [86, 150, 156, 112, 73] and Text Categorization [117, 190, 119, 142, 196]. Recently, Lau, Rosenfeld and Roukos [110, 187] have proposed a new approach for combining statistical evidence from different sources, that is based on the Maximum Entropy Principle (ME) This work was originated within the speech recognition field [187] but it has also been successfully ....

.... Their application to NLP is also noticeable, and we find tree based solutions to address natural language ambiguity problems at several levels: Speech recognition [8, 9] PoS tagging [200, 132, 140, 141, 164, 136, 138, 137] word sense disambiguation [26] parsing [132, 92] text categorization [117, 69, 230], text summarization [134] dialogue act tagging [192] co reference resolution [5, 143] cue phrase identification [122] and machine translation (verb classification) 221, 209] In Magerman s approach [132] decision trees are used for a number of simultaneous different decision making ....

[Article contains additional citation context not shown here]

D. Lewis and M. Ringuette. A Comparison of two Learning Algorithms for Text Categorization. In Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.


Chapter 5 : Analysis and Discussion - Overview The Results   Self-citation (Lewis)   (Correct)

No context found.

Lewis D. D. and Ringuette M. A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval, pages 81-93, Las Vegas, NV (1994). ISRI; Univ. of Nevada, Las Vegas. http://www.research.att.com/~lewis/chronobib.html


Information Retrieval and the Statistics of Large Data Sets - Lewis (1996)   (1 citation)  Self-citation (Lewis)   (Correct)

....(e.g. neural nets, decision trees, nearest neighbor classifiers) have been used. Supervised learning is particularly effective in routing (where a user can supply ongoing feedback as the system is used) 7] and in text categorization (where a large body of manually indexed text may be available) [12, 14]. 2 The Future These are exciting times for IR. Statistical IR methods developed over the past 30 years are suddenly being widely applied in everything from shrinkwrapped personal computer software, up to large online databases (Dialog, Lexis Nexis, and West Publishing all fielded their first ....

David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Third Annual Symposium on Document Analysis and Information Retrieval, pages 81--93, Las Vegas, NV, April 11--13 1994. ISRI; Univ. of Nevada, Las Vegas.


An Improved Boosting Algorithm and Its Application to Text.. - Sebastiani, al. (2000)   (5 citations)  (Correct)

No context found.

David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81-93, Las Vegas, US, 1994.


Robustness of Regularized Linear Classification Methods - In Text Categorization   (Correct)

No context found.

D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), Nevada, Las Vegas, 1994. University of Nevada, Las Vegas.


Diffusion Kernels on Statistical Manifolds - Lafferty, Lebanon (2005)   (1 citation)  (Correct)

No context found.

David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, pages 81--93, Las Vegas, NV, April 1994. ISRI; Univ. of Nevada, Las Vegas. Shahar Mendelson. On the performance of kernel classes. Journal of Machine Learning Research, 4:759--771, 2003.


Text Categorization Using a Personalized, Adaptive.. - Cherchi, Manconi.. (2005)   (Correct)

No context found.

D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81-- 93, Las Vegas, US, 1994.


Robustness of Regularized Linear Classification Methods - In Text Categorization   (Correct)

No context found.

D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), Nevada, Las Vegas, 1994. University of Nevada, Las Vegas.


Multi-Dimensional Text Classification - Theeramunkong, LERTNATTEE (2002)   (Correct)

No context found.

Lewis D. D. and Ringuette M. (1994) A Comparison of Two Learning Algorithms for Text Categorization. In Proc. of Third Annual Symposium on Document Analysis and Information Retrieval, pages 81-93.


Towards Deeper Understanding of the LSA Performance - Nakov, Valchanova, Angelova (2003)   (Correct)

No context found.

D. Lewis, M. Ringuette. A comparison of two learning algorithms for text categorization. Proc. 3rd Ann. Symposium on Document Analysis and IR, pp. 81-93, 1994.


Domain-Specific Web Search - With Keyword Spices (2004)   (Correct)

No context found.

D.D. Lewis and M. Ringuette, "A Comparison of Two Learning Algorithms for Text Categorization," Proc. Third Ann. Symp. Document Analysis and Information Retrieval (SDAIR-94), pp. 8193, 1994.


Managing Content with Automatic Document Classification - Calvo, Lee, Li   (Correct)

No context found.

David D. Lewis and Mark Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Nevada, Las Vegas, 1994.


Combining Machine Learning and Hierarchical Structures for Text.. - Ruiz (2001)   (1 citation)  (Correct)

No context found.

D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81--93, Las Vegas, US, 1994.


A Corpus-Independent Feature Set for Style-Based Text.. - Moshe Koppel Navot (2003)   (2 citations)  (Correct)

No context found.

Lewis, D. and Ringuette, M. 1994. A Comparison of Two Learning Algorithms for Text Categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (pp. 81-93), Las Vegas, USA.


Parametric Mixture Models for Multi-Labeled Text - Ueda, Saito (2003)   (2 citations)  (Correct)

No context found.

D. Lewis & M. Ringuette. A comparison of two learning algorithms for text categorization. In Third Anual Symposium on Document Analysis and Information Retrieval, 81-93. 1994.


News and Trading Rules - Thomas (2003)   (Correct)

No context found.

David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Proc. Symposium on Document Analysis and Information Retrieval SDAIR-94, pages 81--93, 1994.


Text Categorization - Sebastiani (2005)   (2 citations)  (Correct)

No context found.

Lewis, D.D. & Ringuette, M., A comparison of two learning algorithms for text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, pp. 81--93, 1994.


Using Clustering to Boost Text Classification - Fang Parthasarathy Schwartz   (Correct)

No context found.

D. D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.


Text Classification from Positive and Unlabeled Examples - Denis, Gilleron, Tommasi (2002)   (1 citation)  (Correct)

No context found.

D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proc. 3rd Annual Symposium on Document Analysis and Information Retrieval, pages 81  93, 1994.


A Corpus-Independent Feature Set for Style-Based Text.. - Koppel, Akiva, Dagan (2003)   (2 citations)  (Correct)

No context found.

Lewis, D. and Ringuette, M. 1994. A Comparison of Two Learning Algorithms for Text Categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (pp. 81-93), Las Vegas, USA.


Advances in Document Classification by Voting of.. - Wenzel, Baumann, Jäger (1997)   (2 citations)  (Correct)

No context found.

D. D. Lewis, M. Rinquette. A Comparison of two Learning Algorithms for Text Categorization. Proc. of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR 94), Las Vegas, Nevada, April 1994, pp. 8193.


Learning from Web: Review of Approaches - Vitaly Schetin In   (Correct)

No context found.

Lewis D., and Ringuette M. 1994. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, 81-93.


Hidden Markov Models for Text Categorization in Multi-Page.. - Frasconi, Soda, Vullo (2002)   (2 citations)  (Correct)

No context found.

Lewis, D.D. and Ringuette, M. (1994). Comparison of Two Learning Algorithms for Text Categorization. In Proc. of the 3rd Annual Symposium on Document Analysis and Information Retrieval (pp. 81--93).


Evaluation of Decision Forests on Text Categorization - Chen, Ho (2000)   (1 citation)  (Correct)

No context found.

D. Lewis and M. Ringuette, "Comparison of two learning algorithms for text categorization," in Proceedings of the Third Annual symposium on Dcoument Analysis and Information Retrieval (SDAIR'94), 1994.


The Effect of Using Hierarchical Classifiers in Text.. - D'Alessio, Murray.. (2000)   (Correct)

No context found.

Lewis D. and Ringuette M. (1994). A Comparison of Two Learning Algorithms for text Categorization. Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, pp. 81-93.


Evaluation of Decision Forests on Text Categorization - Chen, Ho (2000)   (1 citation)  (Correct)

No context found.

D. Lewis and M. Ringuette, \Comparison of two learning algorithms for text categorization," in Proceedings of the Third Annual symposium on Dcoument Analysis and Information Retrieval (SDAIR'94), 1994.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC