32 citations found. Retrieving documents...
McCallum, A. and Nigam, K. (1998). A comparison of event models for naive bayes text classi cation. In Sahami, M., Craven, M., Joachims, T., and McCallum, A., editors, Workshop Notes of the ICML/AAAI-98 Workshop Learning for Text Categorization, pages 41-48, Menlo Park, CA, USA. AAAI Press.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar   (4 citations)  (Correct)

....extensively studied, especially since the emergence of the internet. Most algorithms are based on the bag of words model for text [26] A simple but e ective algorithm is the Naive Bayes method [24] For text classi cation, di erent variants of Naive Bayes have been used, but McCallum and Nigam [21] showed that the variant based on the multinomial model leads to better results. For hierarchical text data, such as the topic hierarchies of Yahoo (www.yahoo.com) and the Open Directory Project (www.dmoz.org) hierarchical classi cation has been studied in [18, 5] For more details, see Section ....

....: c l g be the set of l classes, and let W = fw1 ; wmg be the set of words features contained in these classes. Given a new document d, the probability that d belongs to class c i is given by Bayes rule, p(c i jd) p(djc i )p(c i ) p(d) Assuming a generative multinomial model [21] and further assuming class conditional independence of words yields the well known Naive Bayes classi er [24] which computes the most probable class for d as (d) argmax c i p(c i jd) p(c i ) Y n(w t ;d) 3) where n(w t ; d) is the number of occurrences of word w t in document ....

[Article contains additional citation context not shown here]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani (2002)   (24 citations)  (Correct)

....Set of training documents D j Set of training documents in class c j n(t) Number of training documents containing t N(t) Number of occurrences of t N(t; d) Number of occurrences of t in document d 8 2.3.1. Naive Bayes Naive Bayes is a simple but e ective text classi cation algorithm [16, 17]. The parameterization given by Naive Bayes de nes an underlying generative model assumed by the classi er. In this model, rst a class is selected according to class prior probabilities. Then, the generator creates each word in a document by drawing from a multinomial distribution over words ....

Andrew McCallum and Kamal Nigam. A comparison of event models for naive Bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998. Tech. rep. WS-98-05, AAAI Press.


Effective Methods for Improving Naive Bayes Text Classifiers - Kim, Rim, Yook, Lim (2002)   (1 citation)  (Correct)

....a xed user interest. Thus, text classi ers should be able to rank categories given a document and rank documents given a class. A growing number of statistical learning methods have been applied to these problems in recent years, including nearest neighbor classi ers[7] naive Bayes classi ers[5], and support vector machines[3] etc. Among these methods, naive Bayes text classi ers have been widely used because of its simplicity although they have been reported as one of poor performing class ers in text categorization task[8, 2] Since several studies show that naive Bayes performs ....

....w k is a binary variable representing the occurrence or non occurrence of the k th word in the vocabulary. The most serious problem in this approach is that there is no way to re ect the information about term frequencies. This weak point of the multivariate approach was experimentally surveyed by [5]. On the other hand, the multinomial approach speci es that a document is represented by the set of word occurrences from the document. The order of the words is lost, but the number of occurrences of each word in the document is captured. When calculating the probability of a document, one ....

[Article contains additional citation context not shown here]

A. K. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization, pages 137-142, 1998.


A New Method of Parameter Estimation for Multinomial Naive.. - Kim, Rim, Lim (2002)   (Correct)

....collection, and our proposed approach obtained a signi cant improvement in performace over the conventional approach. 1. INTRODUCTION Naive Bayes text classi ers have been widely used because of its simplicity. Among the various versions of classi ers, multinomial naive Bayes text classi er[1] is mostly used. This classi er has adopted a topic unigram language model approach to estimate the term probabilities given a class as follows[1] P (wk jc j ) # wk in the documents belonging to c j # tokens in the documents belonging to c j (1) Unfortunately, this method can cause ....

....text classi ers have been widely used because of its simplicity. Among the various versions of classi ers, multinomial naive Bayes text classi er[1] is mostly used. This classi er has adopted a topic unigram language model approach to estimate the term probabilities given a class as follows[1]: P (wk jc j ) # wk in the documents belonging to c j # tokens in the documents belonging to c j (1) Unfortunately, this method can cause inappropriate estimation as shown in Figure 1. In this example, w1, w2, and w3 have the same probability given a class c with the topic unigram language ....

[Article contains additional citation context not shown here]

A. K. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization, pages 137-142, 1998.


Information-Theoretic Co-Clustering - Dhillon, Mallela, Modha (2003)   (9 citations)  (Correct)

....[13] and the SMART collection from Cornell (ftp: ftp.cs.cornell.edu pub smart) The NG20 data set consists of approximately 20; 000 newsgroup articles collected evenly from 20 di erent usenet news groups. This data set has been used for testing several supervised text classi cation tasks [1, 19, 14, 6] and un supervised document clustering tasks [18, 8] Many of the news groups share similar topics and about 4:5 of the documents are cross posted making the boundaries between some news groups rather fuzzy. To make our comparison consistent with previous algorithms we reconstructed various ....

A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


Combining Naive Bayes and n-Gram Language Models for Text.. - Peng, Schuurmans (2003)   (1 citation)  (Correct)

....substantial improvements over standard naive Bayes classi cation, while also achieving state of the art performance that competes with the best known methods in these cases. 1 Introduction Naive Bayes classi ers have been proven successful in many domains, especially in text classi cation [12, 14, 20], despite the simplicity of the model and the restrictiveness of the independence assumptions it makes. Domingos and Pazzanni [4] point out that naive Bayes classi ers can obtain near optimal misclassi cation error even when the independence assumption is strongly violated. Nevertheless, it is ....

.... There are several variants of naive Bayes classi ers, including the binary independence model, the multinomial model, the Poisson model, and the negative binary independence model [6] It has been shown that for text categorization applications, the multinomial model is most often the best choice [6, 14], therefore we will only consider the multinomial naive Bayes model in this paper. Fig. 1 gives a graphical representation of the multinomial naive Bayes model, showing that each attribute node is independent of the other attributes given the class label C. Attributes are also called features ....

[Article contains additional citation context not shown here]

A. McCallum and K. Nigam. (1998). A Comparison of Event Models for Naive Bayes Text Classi cation. In Proceedings of AAAI-98 Workshop on "Learning for Text Categorization", AAAI Presss.


Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar (2002)   (4 citations)  (Correct)

....extensively studied, especially since the emergence of the internet. Most algorithms are based on the bag of words model for text [26] A simple but e ective algorithm is the NaiveBayes method [24] For text classi cation, di erent variants of Naive Bayes have been used, but McCallum and Nigam [21] showed that the variant based on the multinomial model leads to better results. For hierarchical text data, such as the topic hierarchies of Yahoo (www.yahoo.com) and the Open Directory Project (www.dmoz.org) hierarchical classi cation has been studied in [18, 5] For more details, see Section ....

....= #c#;c # ; c # # be the set of l classes, and let # = #w#; w## be the set of words features contained in these classes. Given a new document d, the probability that d belongs to class c # is given byBayes rule, p(c # #d) p(d#c # )p(c # ) p(d) Assuming a generativemultinomial model [21] and further assuming class conditional independence of words yields the well known NaiveBayes classi er [24] which computes the most probable class for d as (d) argmax # # p(c # #d) p(c # ) ### # ### (3) where n(w # ;d) is the number of occurrences of word w # in document d, and ....

[Article contains additional citation context not shown here]

A. McCallum and K. Nigam. A comparison of event models for naivebayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


Information Theoretic Feature Clustering for Text.. - Dhillon, Manella, Kumar   (4 citations)  (Correct)

....obvious or by p(X) p(Cjw t ) to make the random variable explicit. 2. Related Work Text classi cation has been extensively studied, especially since the emergence of the internet. Several methods from simple probabilistic Naive Bayes to the complex SVMs have been used for text categorization [22, 17]. An inherent problem of text data is its high dimensionality. To counter high dimensionality, various methods of feature selection have been proposed in [30, 18, 5] Distributional clustering of words was rst proposed by Pereira, Tishby and Lee in [25] where they used soft distributional ....

....: c l g be the set of l classes, and let W = fw 1 ; wm g be the set of words features contained in these classes. Given a new document d, the probability that d belongs to class c i is given by Bayes rule, p(c i jd) p(djc i )p(c i ) p(d) Assuming a generative multinomial model [22] and further assuming class conditional independence of words yields the well known Naive Bayes classi er [24] which computes the most probable class for d as (d) argmax c i p(c i jd) p(c i ) Y n(w t ;d) 3) where n(w t ; d) is the number of occurrences of word w t in document ....

[Article contains additional citation context not shown here]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


Text Categorization Based on Regularized Linear Classification.. - Zhang, Oles (2000)   (10 citations)  (Correct)

....formulation (10) denoted as Mod Least Squares, the regularized logistic regression (6) denoted as Logistic Reg, the (linear) support vector machine (8) denoted as SVM, and the modi ed SVM corresponding to (18) denoted as Mod SVM. For comparison purposes, we also include results of Naive Bayes [14] as a baseline method. 4.1 Some Implementation issues A number of feature selection methods for text categorization were compared in [27] In our experiments, we employ a method similar to the information gain (IG) approach described in [27] However, we replace the entropy scoring in the IG ....

....than the same word in the body. More complicated representations such as TFIDF weighting schemes [12] can also be employed. In this work, we didn t compare the e ectiveness of these di erent approaches. Our implementation of Naive Bayes is based on the multinomial model formulation described in [14]. However, we replace the Laplacian smoothing where each probability count is increased by 1, by a smoothing where each probability count is increased by . This formulation can be considered as a regularization method to avoid the zero denominator problem. In our experience, the replacement of ....

[Article contains additional citation context not shown here]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41-48, 1998.


A Decision-Tree-Based Symbolic Rule Induction System.. - Johnson, Oles, Zhang..   (Correct)

....from many sources, including the output of voice recognition software, collections of documents (e.g. news stories, patents, or case summaries) and the contents of web pages. Previous text categorization methods have used decision trees (with or without boosting) 8] naive Bayes classi ers [6], nearest neighbor methods [11] support vector machines [5, 4] and various kinds of direct symbolic rule induction [1] Among all these methods, we are particularly interested in systems that can produce symbolic rules since human comprehensible rules often provide valuable insights in many ....

Andrew McCallum and Kamal Nigam. A comparison of event models for naive bayes text classi cation. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41-48, 1998.


Enhanced Word Clustering for Hierarchical Text Classification - Dhillon, Mallela, Kumar (2002)   (4 citations)  (Correct)

....extensively studied, especially since the emergence of the internet. Most algorithms are based on the bag of words model for text [26] A simple but e ective algorithm is the Naive Bayes method [24] For text classi cation, di erent variants of Naive Bayes have been used, but McCallum and Nigam [21] showed that the variant based on the multinomial model leads to better results. For hierarchical text data, such as the topic hierarchies of Yahoo (www.yahoo.com) and the Open Directory Project (www.dmoz.org) hierarchical classi cation has been studied in [17, 4, 10] For some more details, ....

....: c l g be the set of l classes, and let W = fw 1 ; wm g be the set of words features contained in these classes. Given a new document d, the probability that d belongs to class c i is given by Bayes rule, p(c i jd) p(djc i )p(c i ) p(d) Assuming a generative multinomial model [21] and further assuming class conditional independence of words yields the Naive Bayes classi er, which computes the most probable class for d as (d) argmax c i p(c i jd) p(c i ) Y n(w t ;d) 3) where n(w t ; d) is the number of occurrences of word w t in document d, and the ....

[Article contains additional citation context not shown here]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998. 16


Hierarchical Text Categorization Using Neural Networks - Ruiz, Srinivasan (2002)   (6 citations)  (Correct)

....Retrieval (IR) and the Arti cial Intelligence (AI) communities. Di erent approaches such as decision trees (ID3) 25] rule learning [1] neural networks [27, 39] linear classi ers [19] K nearest neighbor (KNN) algorithms [43] support vector machine (SVM) 11] and Naive Bayes methods [17, 21] have been explored. Interestingly it is only recently that researchers [14, 22, 24, 27, 38] have tried to take advantage of the hierarchical structure available in certain classi cation schemes, e.g. Medical Subject Headings (MeSH) Yahoo topic hierarchy. The hierarchical structure of a ....

McCallum A and Nigam K. A comparison of event models for naive Bayes text classi cation. In Learning for Text Categorization: Papers from the


A Probabilistic Framework for the Hierarchic Organisation.. - Vinokourov, Girolami (2002)   (2 citations)  (Correct)

....in IR [9] for example the automatic hierarchic organisation of web search results [4] This paper presents a probabilistic mixture model with hierarchic structure for the unsupervised organisation of a collection of documents. The mixtures are based on both standard multinomial event models [18] and probabilistic latent semantic analysers (PLSA) 11] 25] In addition to providing a hierarchic partitioned organization of a document collection the associated generative model allows the derivation of the Fisher kernel for the hierarchy. The Fisher kernel [14] engenders a similarity measure ....

.... (c l jc l 1 ; d) p(c l jc l 1 )p(djc l ) P c 0 l p(c 0 l jc l 1 )p(djc 0 l ) pnew (wjc l ) 1 P d ndw p(c l jd) M P d P w ndw p(c l jd) p new (c l jc l 1 ) 1 N P d p(c l jc l 1 ; d) In the equation for updating the class means p(wjc l ) we used Laplace smoothing [18] due to the sparseness of the data. The above equations de ne the EM l step for layer 6 l and this depends on the previous l 1 steps. This determines the following order of calculations: rst the parameters for the l = 1 layer are estimated using EM, the parameters are frozen, and inherited by ....

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


Text Categorization Using Adaptive Context Trees - Vert (2001)   (1 citation)  (Correct)

....for example the problem of text categorization, that is the automatic assignment of natural language texts to predened classes or categories. This problem received much attention recently and many algorithms have been proposed and evaluated, including but not limited to Bayesian classiers ( 1] [2], 3] k nearest neighbors ( 4] rule learning algorithms ( 5] 6] maximum entropy models ( 7] boosting ( 8] or support vector machines ( 9] 10] 11] All these algorithms share in common the way the initial text is processed from a long ASCII string into a series of words or word ....

McCallum, A. K., Nigam, K.: A comparison of event models for naive Bayes text classication. In Proceedings of the AAAI/ICML-98 Workshop on Learning for Text Categorization, (1998), 41-48.


Text Classification and Segmentation Using Minimum Cross-Entropy - Teahan (2000)   (2 citations)  (Correct)

....are also surprising in that only a relatively small amount of training text (a few hundred kilobytes) was required to train the models. 4. 3 Classi cation by genre A preliminary investigation into the e ectiveness of using PPM to classify by genre was performed using the Newsgroups data set (McCallum Nigam, 1998). This data set contains 20,000 articles evenly divided among 20 Usenet discussion groups. This data set is particularly interesting as many of the newsgroup categories fall into confusable clusters: for example, three discuss religion (soc.religion.christian, talk.religion.misc, alt.atheism) ....

....PPM models are signi cantly smaller, and take less time to train. Hierarchical classi cation is also possible. The advantages of this are that classi cation is much faster (since categories in other branches of the tree do not have to be searched) and less memory is required during classi cation. McCallum et al. 1998) have also shown that text classi cation can be signi cantly improved by taking advantage of a hierarchy of classes. For the hierarchical PPM based classi er, a decision tree of models is used. For higher nodes in the tree, PPM models are built from a concatenation of the training text used for ....

McCallum, A. & Nigam. K. (1998) \A comparison of event models for Naive Bayes text classication in AAAI-98: Workshop on learning for text categorization.


Using Information Extraction to Aid the Discovery of.. - Nahm, Mooney (2000)   (2 citations)  (Correct)

....Therefore, before constructing a database using an IE system, we ltered out irrelevant documents from the newsgroup using a trained text categorizer. First, 1,000 postings were collected and classi ed by a human expert as relevant or irrelevant. Next, a bag ofwords Naive Bayes text categorizer [21, 19] was trained on this data to identify relevant documents (using the Rainbow package [20] The resulting categorizer has an accuracy of over 99 and is used to lter irrelevant documents from the original postings. Rapier [4] a machine learning system for inducing rules for extracting ....

A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classication. In Papers from the AAAI 1998 Workshop on Text Categorization, pages 41-48, Madison, WI, 1998.


ifile: An Application of Machine Learning to E-Mail Filtering - Rennie (2000)   (15 citations)  (Correct)

....used in i le and is found to signi cantly improve eciency without noticeably a ecting classi cation performance. 2.4 Naive Bayes Naive Bayes is a simple statistical classi cation model often utilized in the problem of text categorization. McCallum and Nigam give a good discussion of this model [12] and the two event models that are most frequently used. For this paper, I use the multinomial event model, where a document is assumed to be generated by number of rolls of a weighted die, one roll for each word in the document. There is a unique die for each class and each face of each die ....

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


Instructable and Adaptive Web-Agents which Learn to Categorize.. - Eliassi-Rad (1999)   (Correct)

....examples and is able to accept and re ne user s advice. CORA s reinforcement learner is unable to perform either of these two actions. To classify text, CORA uses naive Bayes in combination with the EM algorithm [Demp ster, Laird, and Rubin 1977] and the statistical technique shrinkage [McCallum and Nigam 1998; McCallum, Rosenfeld, Mitchell, and Ng 1998] Again, unlike Wawa, CORA s text classi er learns only through training examples and cannot accept and re ne advice. 7.2 Instructable Software Wawa is closely related to RATLE [Maclin 1995] In RATLE, a teacher continuously gives advice to an agent ....

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classi- cation. In Proceedings of the Fifteenth National Conference on Articial Intelligence: Workshop on Learning for Text Categorization, Madison, WI, 41-48.


Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)   (38 citations)  Self-citation (Mccallum)   (Correct)

.... classes: P(c j ) P(c j jd i ) jCj jDj : 5) Empirically, when given a large number of training documents, naive Bayes does a good job of classifying text documents [ Lewis, 1998 ] More complete presentations of naive Bayes for text classi cation are provided by Mitchell [ 1997 ] and McCallum and Nigam [ 1998 ] 3 EXPERIMENTAL SETUP In August 1998 we completely mapped the documents and hyperlinks of the web sites of computer science departments at Brown University, Cornell University, University of Pittsburgh and University of Texas. This comprises our Cora data set; it includes 53,012 documents ....

....accuracy. Experiments with a ve bin classi er result in worse performance roughly equivalent to the RL Immediate spider, following an average of 27 of available hyperlinks before locating the target page. Better features and other methods for improving classi er accuracy (such as shrinkage [ McCallum et al. 1998 ] should allow the more sensitive multi bin classi er to work better. In order to provide a window into successful four bin classi cation for the value function, Table 2 shows the ten most predictive words per class (ranked by weighted log odds ratio) for the four bin RL Future Table 2: Most ....

Andrew McCallum and Kamal Nigam. A comparison of event models for naive Bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998. http://www.cs.cmu.edu/mccallum.


Using Reinforcement Learning to Spider the Web Efficiently - Rennie, McCallum (1999)   (38 citations)  Self-citation (Mccallum)   (Correct)

.... P(c j ) 1 P d i 2D P(c j jd i ) jCj jDj : 5) Empirically, when given a large number of training documents, naive Bayes does a good job of classifying text documents [ Lewis, 1998 ] More complete presentations of naive Bayes for text classi cation are provided by Mitchell [ 1997 ] and McCallum and Nigam [ 1998 ] 3 EXPERIMENTAL SETUP In August 1998 we completely mapped the documents and hyperlinks of the web sites of computer science departments at Brown University, Cornell University, University of Pittsburgh and University of Texas. This comprises our Cora data set; it includes 53,012 documents ....

....accuracy. Experiments with a ve bin classi er result in worse performance roughly equivalent to the RL Immediate spider, following an average of 27 of available hyperlinks before locating the target page. Better features and other methods for improving classi er accuracy (such as shrinkage [ McCallum et al. 1998 ] should allow the more sensitive multi bin classi er to work better. In order to provide a window into successful four bin classi cation for the value function, Table 2 shows the ten most predictive words per class (ranked by weighted log odds ratio) for the four bin RL Future spider 3 The ....

Andrew McCallum and Kamal Nigam. A comparison of event models for naive Bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization, 1998. http://www.cs.cmu.edu/mccallum.


Semi-supervised Clustering with User Feedback - Cohn, Caruana, McCallum (2003)   (8 citations)  Self-citation (Mccallum)   (Correct)

....between some example and a cluster prototype, or increasing the distance between a cluster prototype and all the examples assigned to it. Semi supervised document clustering We adopt a statistical approach to document clustering resulting from the naive Bayes model of document generation (McCallum Nigam, 1998). Given a vocabulary V , a document is assumed to be a bag of words o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o ....

....we augment the standard KL divergence D(d 1 jjd 2 ) with a weighting function D 0 (d 1 jjd 2 ) Y w j 2V j p(w j j d1 ) log p(w j j d2 ) p(w j j d1 ) 2 The estimates for word probabilities are derived from the relative word frequencies in the documents. Following McCallum Nigam (1998), we smooth with a LaPlacean prior to avoid zero word probabilities. where j may be interpreted as indicating the importance of w j for distinguishing d 1 and d 2 . Then, given a constraint that d 1 and d 2 must be in separate clusters, we can warp the metric by computing D 0 M (d 1 jjd 2 ....

McCallum, A., & Nigam, K. (1998) A Comparison of Event Models for Naive Bayes Text Classication, AAAI-98 Workshop on \Learning for Text Categorization ".


Text Classification by Bootstrapping with Keywords, EM and.. - McCallum, Nigam (1999)   (10 citations)  Self-citation (Mccallum Nigam)   (Correct)

....other given the class. If we denote w d i;k to be the kth word in document d i , then classi cation becomes: P(c j jd i ) P(c j )P(d i jc j ) P(c j ) jd i j Y k=1 P(w d i;k jc j ) 3) Empirically, when given a large number of training documents, naive Bayes does a good job of classifying text documents (Lewis, 1998). More complete presentations of naive Bayes for text classi cation are provided by Mitchell (1997) and McCallum and Nigam (1998) 3.2 Adding unlabeled data with EM In the standard supervised setting, each document comes with a label. In our bootstrapping scenario, the preliminary keyword labels ....

In ECML-98. A. McCallum and K. Nigam. 1998. A comparison of event models for naive Bayes text classication.


Multi-Label Text Classification with a Mixture Model Trained by EM - McCallum (1999)   (13 citations)  Self-citation (Mccallum)   (Correct)

....scores across di erent documents. 4) We have the advantages of a formal probabilistic approach, with a well de ned generative model, making available future enhancements based on the large tool chest of powerful statistical parameter estimation techniques, such as shrinkage and unlabeled data [ McCallum et al. 1998; Nigam et al. 1999 ] This paper presents preliminary experimental results on a subset of the Reuters21578 data set. We nd that the mixture model outperforms the approach based on a collection of binary classi ers, reducing classi cation error on almost all labels, reducing error by more than ....

....from the parameter estimates used to perform the E step. Because the parameters are based on (probabilistically weighted) counts, this removal is quite easy. The leave one out E step is also used in deleted interpolation [ Jelinek and Mercer, 1980 ] and in shrinkage for document classi cation [ McCallum et al. 1998 ] In addition to the classes in C, we also add an extra class to which all documents belong. This can be thought of as the English class. Due to the leave one out E step, this class gathers the words that are common to all classes in essence automatically nding the task speci c stop list. ....

[Article contains additional citation context not shown here]

Andrew McCallum and Kamal Nigam. A comparison of event models for naive Bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization, 1998. http://www.cs.cmu.edu/mccallum.


A Hierarchical Probabilistic Model for Novelty Detection in Text - Baker, Hofmann (1999)   (4 citations)  Self-citation (Mccallum)   (Correct)

....t 2 V from a vocabulary. We assume the data is generated by a parametric mixture model with one mixture component per class. We use a multinomial na ve Bayes model, parameterized by . It is straightforward to use Bayes rule to derive the probability of a class given a document (cf. for example [4]) Classifying a new document d is done by selecting the most likely class given the document according to P(c j jd; P(c j ) Q jVj t=1 P(w t jc j ) N(w t ;d) P jCj k=1 P(c k ) Q jVj t=1 P(w t jc k ) N(w t ;d) 1) 1 Refer to Section 4 for details on how these results were ....

McCallum, A., and Nigam, K. A comparison of event models for naive Bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization (1998).


The Maximum-Margin Approach to Learning Text Classifiers -.. - Joachims (2000)   (17 citations)  (Correct)

No context found.

McCallum, A. and Nigam, K. (1998). A comparison of event models for naive bayes text classi cation. In Sahami, M., Craven, M., Joachims, T., and McCallum, A., editors, Workshop Notes of the ICML/AAAI-98 Workshop Learning for Text Categorization, pages 41-48, Menlo Park, CA, USA. AAAI Press.


Stability Behavior of Fuzzy - Clustering Methods For   (Correct)

No context found.

A. McCallum and K. Nigarn. A Comparison of Event Models for Naive Bayes Text Classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998


A Study of Approaches to Hypertext Categorization - Yang, Slattery, Ghani   (24 citations)  (Correct)

No context found.

Andrew McCallum and Kamal Nigam. A comparison of event models for naive Bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998. Tech. rep. WS-98-05, AAAI Press.


Probabilistic Score Estimation with Piecewise Logistic.. - Jian Zhang Jian   (Correct)

No context found.

McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classi cation. AAAI'98 Workshop on Learning for Text Categorization. Ng, A., & Jordan M. (2002). On Discriminative vs.


Asymmetric Missing-Data Problems: Overcoming the Lack of.. - Aleksander Kocz And (2002)   (Correct)

No context found.

McCallum, A. K. and Nigam, K.: 1998, A comparison of event models for naive bayes text classication, AAAI-98 Workshop on Learning for Text Categorization.


Probabilistic Score Estimation with Piecewise Logistic.. - Jian Zhang Jian   (Correct)

No context found.

McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classi cation. AAAI'98 Workshop on Learning for Text Categorization. Ng, A., & Jordan M. (2002). On Discriminative vs.


Document Preprocessing for Naive Bayes.. - Pavlov.. (2004)   (Correct)

No context found.

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classi cation. In AAAI-98 Workshop on Learning for Text Categorization, 1998.


Text Categorisation Using Document Profiling - Sauban, Pfahringer   (Correct)

No context found.

McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classi cation. In: AAAI-98 Workshop on Learning for Text Categorization. (1998)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC