61 citations found. Retrieving documents...
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning ICML98, Morgan Kaufmann (1998)

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Probabilistic Hierarchical Clustering for Biological Data - Segal, Koller (2001)   (3 citations)  (Correct)

....that make it particularly suitable for biological data sets. The global optimization steps help avoid local maxima. Furthermore, the abstraction hierarchy tends to pull the parameters of one CPM closer to those of nearby ones, which naturally leads to a form of parameter smoothing or shrinkage [9]. Both of these increase the robustness of the model, making it less sensitive both to noise in the data and to the particular choice of data set used to construct the hierarchy. The robustness and reproducibility of the hierarchy make conclusions derived from the hierarchy more valid from a ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. ICML, 1998.


Foundations of Assisted Cognition Systems - Kautz, Etzioni, Fox, Weld (2003)   (8 citations)  (Correct)

....associated with each type [7] In an RMM, a set of similar states is represented by a predicate or relation, with the state s variables corresponding to the arguments of the predicate. The domain of each argument can in turn have a hierarchical structure, over which shrinkage is carried out [99]. RMMs compute the probability of a transition as a function of the source and destination predicates and their arguments; they excel in large state spaces where training data is, by necessity, sparse. Preliminary experiments show that RMMs require four orders of magnitude less training data than ....

Andrew K. McCallum, Ronald Rosenfeld, Tom M. Mitchell, and Andrew Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Jude W. Shavlik, editor, Proceedings of ICML-98, 15th International Conference on Machine Learning, pages 359--367, Madison, US, 1998. Morgan Kaufmann Publishers, San Francisco, US.


Limited Hierarchical Fusion of Multiple Classifiers for.. - Kumar, Ghosh, Crawford (2002)   (Correct)

....the classes, and N c is the number of samples in the class c. The BHC framework built in a top down fashion provides an opportunity to fine tune the RDA approach into a hierarchical regularised discriminant analysis. Such an approach, referred to as Shrinkage , has been applied by McCallum et al. [44] for improving text classification in a hierarchy of classes. Another regularisation of covariance matrices is proposed by Tadjudin and Landgrebe [45] This method is based on inverted Wishart distributions with leave one out average log likelihood criteria to estimate covariances with limited ....

McCallum A, Rosenfield R, Mitchell T, Ng AY. Improving text classification by shrinkage in a hiearchy of classes. Proceedings 15th International Conference on Machine Learning, San Francisco, CA, 1998; 359--367


Novelty and Redundancy Detection in Adaptive Filtering - Zhang, Callan, Minka (2002)   (11 citations)  (Correct)

....to adjust the maximum likelihood estimation so that the KL based measure is more appropriate. Prior research shows that retrieval performance is highly sensitive to smoothing parameters. Several smoothing methods have been applied to ad hoc information retrieval and text classification (e.g. [17, 9]) Based on this prior research, we selected two methods: Bayesian smoothing using Dirichlet priors, and shrinkage. 4.3.1 Bayesian Smoothing Using Dirichlet Priors This approach to smoothing uses the conjugate prior for a multinomial distribution, which is the Dirichlet distribution [17] For a ....

....is P# (w i tf(w i , d) #p(w i ) tf(w j , d) #p(w j ) 6) In our experiments, if w j is in d t , we set #p(w j ) 0.5, otherwise #p(w j ) 0. 4.3. 2 Smoothing Using Shrinkage This approach smooths by shrinking parameter estimates in sparse data towards the estimates in rich data [9]. This is a special case of the more general Jelinek Mercer smoothing method, which involves deleted interpolation estimation of linearly interpolated n gram models [17] For estimating the language model of document d, we can shrink its MLE estimator #d MLE with the MLE estimator of a language ....

[Article contains additional citation context not shown here]

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of The Eighteenth International Conference on Machine Learning, 1998.


Research on Statistical Relational Learning at the University of .. - Domingos (2003)   (Correct)

....and subcategories of products) We consider all the abstractions of a page that can be obtained by climbing these hierarchies, and compute transition probabilities for the most informative abstractions. These probabilities are then combined into a ground level prediction using shrinkage [McCallum et al. 1998] . Useful predictions can thus be made for previously unvisited pages, by shrinking to abstractions of them that have been visited before (e.g. Science Fiction Books ) RMMs are an example of a statistical relational model for a sequential domain. See also [Friedman et al. 1998; Kersting et ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 359--367, Madison, WI, 1998. Morgan Kaufmann.


Learning to invoke Web forms and services - Kushmerick (2003)   (Correct)

....multiple items simultaneously ) One important direction of future work concerns the hierarchical structure of the domain and datatype taxonomies. We have explored using such structure in our evaluation, but it may be useful to integrate these hierarchies into the classification process itself [11]. A second open issue is whether the EM algorithm would be e#ective in enabling semi supervised learning. Finally, as depicted in Fig. 8, we are currently applying these ideas to Web Services, not just Web forms [5] The primary complication is that while a form corresponds to a single operation, ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc 15th Int. Conf. Machine Learning, pages 359--367, 1998.


Learning to invoke Web forms and services - Kushmerick (2003)   (Correct)

....multiple items simultaneously ) One important direction of future work concerns the hierarchical structure of the domain and datatype taxonomies. We have explored using such structure in our evaluation, but it may be useful to integrate these hierarchies into the classification process itself [McCallum et al. 1998] . A second open issue is whether the EM algorithm would be effective in enabling semi supervised learning. Finally, we note that our algorithm ignores two valuable sources of evidence: the data passed to a service in previous invocations, and the output data obtained from the service. We are ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc 15th Int. Conf. Machine Learning, pages 359--367, 1998.


Incremental Context Mining for Adaptive Document Classification - Liu, Lu (2002)   (1 citation)  (Correct)

.... linear classifiers [15] the k Nearest Neighbor (kNN) method [6] the Bayesian independence classifier [7] the support vector machine method [3] and the Perceptronbased method [11] There were also studies relying on a given text hierarchy to cluster documents [4, 16] and classify documents [3, 5, 9, 10]. They often preset a feature set (vocabulary) on which their classifiers were built (a feature often corresponded to a term or a phrase) Obviously, since the vocabulary may evolve in ADC, no feature set may be presumed. Even the feature set may evolve by covering all features currently seen in ....

....of inefficiency [18] and errors (over fitting) 10] in DC. Better performance (in terms of efficiency and precision of DC) is often achieved by semi automatic and or trail and error feature selection [11] The number of features selected was thus often treated as an experimental issue (e.g. [9, 10, 18]) The construction of an optimum feature set (if any) thus consists of a series of ttming processes, which may be re triggered by the addition of a new document. The third challenge is efficient DC, which should be supported by the result of mining. Obviously, classification is often triggered ....

[Article contains additional citation context not shown here]

A. McCallum, R. Rosenreid, T. Mitchell, A. Y. Ng (1998), Improving Text Classification by Shrinkage in a Hierarchy of Classes, Proc. of lCML '98.


A hierarchical text categorization approach and its.. - Tikk, Biró, Yang   (Correct)

....based on the their sample training documents. We report on our experience with our approach on three document corpora: the Reuters 21578 newswire benchmark used widely in the information retrieval (IR) community, the 20 newsgroups data set, also having been studied by many authors (see e.g. [3,17,24]) and on the TV closed caption data set [6] courtesy of W. Chuang) The e#ectiveness of our classifier is 67 94 depending on the corpus and the percentage of the training document used. These results are superior to the best known numbers in the literature. We also present another possible ....

....They used Bayesian classifier and allowed dependencies between features. Their results experimented on two small subsets of the Reuters collection shows that hierarchical classifiers outperform flat ones when the number of features is small (less than 100) Their approach was criticized in e.g. [17], because it did not show improvement with larger dictionaries, although in many domains it has been established that large dictionary sizes often perform best [11,17,20] Hierarchical text categorization has been combined in many works with feature subset selection that improves classification ....

[Article contains additional citation context not shown here]

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. of ICML-98, 1998. http://www-2.cs.cmu.edu/#mccallum/papers/hier-icml98.ps.gz.


Hierarchical Text Categorization Using Fuzzy Relational.. - Tikk, Yang, Bang   (Correct)

....dependencies between features. Their results experimented on two small subsets of the Reuters collection (see also Section 4 and Tables 1 and 2) shows that hierarchical classifiers outperform flat ones when the number of features is small (less than 100) Their approach was criticized in e.g. [16], because it did not show improvement with larger dictionaries, although in many domains it has been established that large dictionary sizes often perform best [11, 16, 20] As alternative approaches, hierarchical text categorization was combined in many works with feature subset selection. The ....

.... classifiers outperform flat ones when the number of features is small (less than 100) Their approach was criticized in e.g. 16] because it did not show improvement with larger dictionaries, although in many domains it has been established that large dictionary sizes often perform best [11, 16, 20]. As alternative approaches, hierarchical text categorization was combined in many works with feature subset selection. The feature subset selection improved classification accuracy, reduced measurement cost, storage and computational overhead by finding the best subset of features [6] As ....

[Article contains additional citation context not shown here]

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. of ICML-98, 1998. http://www-2.cs.cmu.edu/#mccallum/papers/hier-icml98.ps.gz.


Relational Markov Models and their Application to Adaptive.. - Anderson, Domingos (2002)   (18 citations)  (Correct)

....variables associated with each type. In an RMM, a set of similar states is represented by a predicate or relation, with the state s variables corresponding to the arguments of the predicate. The domain of each argument can in turn have a hierarchical structure, over which shrinkage is carried out [19]. RMMs compute the probability of a transition as a function of the source and destination predicates and their arguments. RMMs are an example of a relational probabilistic representation, combining elements of probability and predicate calculus. Other representations of this type include ....

....qd with probability a#,# P (qd #) Effectively, this model performs shrinkage between the estimates at all levels of abstraction. Shrinkage is a statistical technique for reducing the variance of an estimate by averaging it with estimates for larger populations that include the target one [19]. Equation 1 applies shrinkage across an entire abstraction lattice, rather than over a single abstraction path (as is more usual) For example, a forecast of the number of Apple iMacs sold at a given store can be shrunk toward a more reliable forecast for the average of this quantity at all ....

[Article contains additional citation context not shown here]

A. McCallum, R. Rosenfeld, T. Mitchel, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the Fifteenth International Conference on Machine Learning, 1998.


Categorizing Web Documents in Hierarchical Catalogues - Frommholz (2001)   (2 citations)  (Correct)

....another, adapted set of terms for discrimination, filtering shared jargon) This approach has the advantage that subcategories of rejected top level categories will not be considered. The drawback is that once a wrong decision on the top level is made, this can not be corrected. McCallum et al. [9]) use a technique called shrinkage for achieving better classification results. Hierarchy information is used for the estimation of the probability # ## # ## # # of word # # given the class # # . On a path from a parent node # to # # in the hierarchy, all nodes on this path are involved in the ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML '98), pages 359--367, 1998.


Knowledge Discovery in Multi-Label Phenotype Data - Clare, King (2001)   (8 citations)  (Correct)

....is YPR145w (gene name ASN1, product asparagine synthetase ) In neither machine learning or statistics has much work has been done on classification problems where there is a class hierarchy. However, such problems are relatively common in the real world, particularly in text classification [16, 24, 21]. We deal with the class hierarchy by learning separate classifiers for each level. This simple approach has the unfortunate side e#ect of fragmenting the class structure and producing many classes with few members e.g. there are 99 potential classes represented in the data for level 2 in the ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In ICML 98, 1998.


Probabilistic Abstraction Hierarchies - Segal, Koller, Ormoneit (2001)   (3 citations)  (Correct)

....example, creating a taxonomy of the instances is often one of the first steps in understanding the system. In particular, much of the work on analyzing gene expression data [3] has focused on creating gene hierarchies. Similarly, in text domains, creating a hierarchy of documents is a common task [12, 7]. In many of these applications, the hierarchy is unknown; indeed, discovering the hierarchy is often a key part of the analysis. The standard algorithms applied to the problem typically use an agglomerative bottom up approach [3] or a divide and conquer top down approach [8] Although these ....

....(3) It utilizes global optimization steps, which tend to avoid local maxima and help make the model less sensitive to noise. 4) The abstraction hierarchy tends to pull the parameters of one model closer to those of nearby ones, which naturally leads to a form of parameter smoothing or shrinkage [12]. We present experiments for PAH on synthetic data and on two real data sets: gene expression and text. Our results show that the PAH approach produces hierarchies that are more robust to noise in the data, and that the learned hierarchies generalize better to test data than those produced by ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. ICML, 1998.


Learning Hierarchical Object Maps Of.. - Anguelov, Biswas, .. (2002)   (6 citations)  (Correct)

....model of environments with non stationary objects. b) Representation as a graphical model. dividual objects our approach is able to generalize across different object models, as long as they model objects of the same type. This approach follows the hierarchical Bayesian framework (see [1, 6, 10]) As our experimental results demonstrate, this approach leads to significantly more accurate models in environments with multiple objects of the same type. The specific learning algorithm proposed here is an instance of the popular EM algorithm [11] We develop a closed form solution for ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A.Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the International Conference on Machine Learning (ICML), 1998.


Probabilistic Abstraction Hierarchies - Segal, Koller, Ormoneit (2001)   (3 citations)  (Correct)

....example, creating a taxonomy of the instances is one of the first steps in understanding the system. Thus, in particular, much of the work on analyzing gene expression data [3] has focused on creating gene hierarchies. Similarly, in text domains, creating a hierarchy of documents is a common task [11]. In many of these applications, the hierarchy is unknown; indeed, discovering the hierarchy is often a key part of the analysis. The standard algorithms applied to the problem typically use an agglomerative bottom up approach [3] or a divide and conquer top down approach [7] Although these ....

....(3) It utilizes global optimization steps, which tend to avoid local maxima and help make the model less sensitive to noise. 4) The abstraction hierarchy tends to pull the parameters of one model closer to those of nearby ones, which naturally leads to a form of parameter smoothing or shrinkage [11]. We present experiments for PAH on synthetic data and on two real data sets: gene expression and text. Our results show that the PAH approach produces hierarchies that are more robust to noise in the data, and that the learned hierarchies generalize better to test data than those produced by ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. ICML, 1998.


Using Machine Learning To Improve Information Access - Sahami (1999)   (15 citations)  (Correct)

....PAM (Y = w j d; M) 1 2 PML (Y = w j d; M) 1 2 PML (Y = w j M) 6.8) Note that, for simplicity, we compute the unweighted mean using coefficients of 1 2 . However, methods for fitting these coefficients to obtain improved results in language modeling [24] and text classification [116] have been explored. A novel form of smoothing that we introduce involves the taking the geometric mean (GM) of these two ML distributions 2 : PGM (Y = w j d; M) PML (Y = w j d; M) 1 2 Delta PML (Y = w j M) 1 2 : 6.9) The GM estimate in Eq. 6.9 does not define a true probability ....

....characteristics. CHAPTER 9. HIERARCHICAL CLASSIFICATION 166 Several research issues specific to hierarchical classification still remain. In particular, we have already mentioned the problem of recovering from classification errors early in the hierarchy. Moreover, recent work by McCallum et al. [116] on improved methods for estimating probabilities for classification in text hierarchies are directly applicable in our work and may even further improve our results. We would also like to investigate the problem of discovering new classes in the hierarchy, when we have multiple documents that do ....

McCallum, A., Rosenfeld, R., Mitchell, T., and Ng, A. Improving text classification by shrinkage in a hierarchy of classes. In Machine Learning: Proceedings of the Fifteenth International Conference (1998).


Modular Preference Moore Machines in News Mining Agents - Wermter, Arevian   (Correct)

....shown to perform successfully in the classification of language [5] When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. For instance, a naive Bayes classifier is significantly improved by taking advantage of a hierarchy of classes [18] . However, these statistical methods require assumptions about the distribution. Furthermore, self organizing maps (SOMs) 14] have been used. A SOM forms a non linear projection from a highdimensional space onto low dimensional space and has been used in the WEBSOM project [13; 15] The SOM ....

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceeedings of the 15th International Conference on Machine Learning, pages 359--367, San Francisco, CA, 1998.


The Effect of Using Hierarchical Classifiers in Text.. - D'Alessio, Murray.. (2000)   (Correct)

....as a hierarchy (Chakrabarti, et al. 1997; Koller Sahami, 1997; Ng et al. 1997; Yang 1996) have also studied this same corpus. Yang (1996, 1997) examines the OHSUMED corpus of medical abstracts. Still others examine the categories as a hierarchy for other corpora, namely the Yahoo Web hierarchy (McCallum et al. 1998; Mladenic Grobelnik, 1998) Precision, recall, and F measure are used by most authors as measures of the effectiveness of algorithms. Most simpler methods achieved values for these near 80 for the Reuters (Apte et al. 1994; Cohen Singer, 1996) More computationally expensive methods using ....

McCallum A., Rosenfeld R., Mitchell T., Ng A. (1998). Improving Text Classification by Shrinkage in a Hierarchy of Classes, International Conference on Machine Learning.


Hierarchical Classification of Web Content - Dumais, Chen (2000)   (32 citations)  (Correct)

....with a flat approach. Larkey [15] compared hierarchical and flat approaches for classifying patents in the Speech Signal Processing subcategory. She found no multilevel algorithms that performed significantly better than a flat one which chooses among all the speech classes. Web McCallum, et al. [17] describe some interesting experiments on three hierarchical collections Usenet news, the Science sub category of Yahoo , and company web pages. The Yahoo sub collection is most closely related to our experiments, although it is less diverse since all items come from the same top level ....

....(Baseline) The overall F 1 value for the 150 second level categories, treated as a flat non hierarchical problem, is .476. There is a drop in performance in going from 13 to 150 categories. Performance for the 150 categories is better than the . 364 value reported by McCallum et al. [17], but it is difficult to compare precisely because they use average precision not F 1 , the categories themselves are different, and they use the full text of pages. The .476 non hierarchal value will serve as the baseline for looking at the use of hierarchical models. Performance varies widely ....

McCallum, A., Rosenfeld, R., Mitchell, T. and Ng, A. Improving text classification by shrinkage in a hierarchy of classes. Proceedings of the Fifteenth International Conference on Machine Learning, (ICML-98), 359-367, 1998.


Feature Engineering for Text Classification - Scott, Matwin (1999)   (8 citations)  (Correct)

....and personal information agents, text classification is an active and important area of research where machine learning and information retrieval (IR) intersect. New feature selection techniques [Yang and Pederson, 1997; Ng et al. 1997] and learning algorithms [Joachims, 1998; Lam and Ho, 1998; McCallum et al. 1998] have produced good results on a number of standard test collections. But the vast majority of this work uses a simple bag of words representation of text in which each feature corresponds to a single word or stem. The aim of this paper is to examine some alternative ways to represent text for ....

McCallum, Andrew; Rosenfeld, Ronald; Mitchell, Tom; and Ng, Andrew Y. 1998. Improving text classification by shrinkage in a hierarchy of classes. ICML-98. 359-367.


Recurrent Neural Network Learning for Text Routing - Wermter, Arevian, Panchev (1999)   (Correct)

....been shown to perform successfully in the classification of text [2] When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. For instance, a naive Bayes classifier is significantly improved by taking advantage of a hierarchy of classes [1]; however, these statistical methods are very data intensive. A self organizing map forms a non linear projection from a high dimensional space onto low dimensional space and has been used in the WEBSOM project [8] The SOM algorithm computes an optimal collection of models that approximates the ....

T. Mitchell A. McCallum, R. Rosenfeld and A. Y. Ng. Improving text classification by shrinkage in a hierarchy


Feature Engineering for Text Classification - Scott, Matwin (1999)   (8 citations)  (Correct)

....personal information agents, text classification is an active and important area of research where machine learning and information retrieval research intersect. New feature selection techniques [Yang and Pederson, 1997; Ng et al. 1997] and learning algorithms [Joachims, 1998; Lam and Ho, 1998; McCallum et al. 1998] have produced good results on a number of standard test collections, but the vast majority of this research uses the bag of words representation of text, where each feature corresponds to a single word or stem. The aim of this paper is to examine as exhaustively as possible some alternative ....

McCallum, Andrew; Rosenfeld, Ronald; Mitchell, Tom; and Ng, Andrew Y. 1998. Improving Text Classification by Shrinkage in a Hierarchy of Classes. ICML-98. 359-367.


Extracting Social Networks and Contact Information.. - Culotta, Bekkerman.. (2004)   (3 citations)  Self-citation (Mccallum)   (Correct)

No context found.

Andrew K. McCallum, Ronald Rosenfeld, Tom M. Mitchell, and Andrew Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Jude W. Shavlik, editor, Proceedings of ICML-98, 15th International Conference on Machine Learning, pages 359--367, Madison, US, 1998. Morgan Kaufmann Publishers, San Francisco, US.


Learning to Construct Knowledge Bases from the World.. - Craven, DiPasquo.. (2000)   (74 citations)  Self-citation (Mccallum Mitchell)   (Correct)

....The other approach, the multinomial model, is a unigram language model with integer word counts; the words are considered to events and the document is comprised of a collection of these events. We use the second approach, since it has been found to out perform the first on several data sets [41]. We formulate naive Bayes for text classification as follows. Given a set of classes C = c 1 , c N and a document consisting of n words, w 1 ,w 2 , w n ) we classify the document as a member of the class, c # , that is most probable, given the words in the document: c # = ....

A. McCallum, R. Rosenfeld, T. Mitchell, A. Ng, Improving text classification by shrinkage in a hierarchy of classes, in: Proc. 15th International Conference on Machine Learning, Madison, WI, Morgan Kaufmann, San Mateo, CA, 1998, pp. 359--367.


Using Unlabeled Data to Improve Text Classification - Nigam (2001)   (10 citations)  Self-citation (Mccallum Mitchell)   (Correct)

....will note that the multinomial coefficients are missing and this is not truly a multinomial distribution. This is because we have put a word order into our generative model. A real mixture of multinomials uses no order, but gives exactly the same classifiers, as the coefficients cancel out (McCallum Nigam, 1998a) 13 Several criteria have been described in the literature, all with a reasonable amount of success when given a large set of labeled examples. The first potential criteria is to maximize the classification accuracy on the labeled examples at hand. This has been done with various gradient ....

....Words within a document are not independent of each other grammar and topicality make this so. Despite these violations, empirically the Naive Bayes classifier does a good job of classifying text documents (Lewis Ringuette, 1994; Craven et al. 2000; Yang Pedersen, 1997; Joachims, 1997; McCallum et al. 1998). This observation is explained in part by the fact that classification estimation is only a function of the sign (in binary classification) of the function estimation (Domingos Pazzani, 1997; Friedman, 1997) The word independence assumption causes naive Bayes to give extreme (almost 0 or 1) ....

[Article contains additional citation context not shown here]

McCallum, A., Rosenfeld, R., Mitchell, T., & Ng, A. (1998). Improving text classification by shrinkage in a hierarchy of classes. Machine Learning: Proceedings of the Fifteenth International Conference, pp. 359--367.


A Roadmap for Web Mining: - From Web To   (Correct)

No context found.

McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning ICML98, Morgan Kaufmann (1998)


A Roadmap for Web Mining: - From Web To   (Correct)

No context found.

McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML-98), Morgan Kaufmann, San Francisco, CA (1998)


The Trajectory Mixture Model for Learning Collections of .. - Shon, Baker, Grimes, Rao (2003)   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In J. W. Shavlik, editor, Proc. 15th International Conf. on Machine Learning, pages 359--367. Morgan Kaufmann, San Francisco, CA, 1998.


Predicting Library of Congress Classifications from Library.. - Frank, Paynter (2004)   (Correct)

No context found.

McCallum, A. K., Rosenfeld, R., Mitchell, T. M. & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning (pp. 359-367). Morgan Kaufmann.


Multi-Dimensional Text Classification - Theeramunkong, LERTNATTEE (2002)   (Correct)

No context found.

McCallum A. et al. (1998) Improving Text Classification by Shrinkage in a Hierarchy of Classes, In Proc. of the 15 International Conference on Machine Learning, pp. 359-367.


A Hierarchical Model for Clustering and - Gaussier (2002)   (Correct)

No context found.

Andrew McCallum, Ronald Rosenfeld, Tom Mitchell, and Andrew Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 359--367, 1998.


Probabilistic Models for Hierarchical Clustering and.. - Society Eric Gaussier (2002)   (Correct)

No context found.

Andrew McCallum, Ronald Rosenfeld, Tom Mitchell, and Andrew Y. Ng, "Improving text classification by shrinkage in a hierarchy of classes," in Proceedings of the Fifteenth International Conference on Machine Learning, 1998, pp. 359--367.


Using URLs and Table Layout for Web Classification Tasks - Lawrence Kai Shih (2004)   (Correct)

No context found.

A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In J. W. Shavlik, editor, Proceedings of the 15th Intl. Conference on Machine Learning, pages 359--367, Madison, US, 1998. Morgan Kaufmann Publishers, San Francisco, US.


Clustering with Propagation for Hierarchical Document.. - Sona, al.   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchel, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In ICML98 - Proc. of 15th Int. Conf. on Machine Learning, pages 358--367, 1998.


Combining Machine Learning and Hierarchical Structures for Text.. - Ruiz (2001)   (1 citation)  (Correct)

No context found.

A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In J. W. Shavlik, editor, 154 Proceedings of ICML-98, 15th International Conference on Machine Learning, pages 359--367, Madison, US, 1998. Morgan Kaufmann Publishers, San Francisco, US.


Combining Machine Learning and Hierarchical Structures for Text.. - Ruiz (2001)   (1 citation)  (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 15th International Conference on Machine Learning. AAAI, Morgan Kaufmann, July 1998.


Extending Ontology Tree Using NLP Technique - Sabrina, Rosni, Enyakong   (Correct)

No context found.

McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceeding of the 15 Conference on Machine Learning (ICML-98) (1998)


Automatic Topic Identification - Using Ontology Hierarchy   (Correct)

No context found.

McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceeding of the 15 Conference on Machine Learning (ICML-98) (1998)


Predicting Customer Shopping Lists from Point-of-Sale.. - Cumby, Fano, Ghani, Krema (2003)   (Correct)

No context found.

A. K. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In J. W. Shavlik, editor, Proceedings of ICML-98, 15th International Conference on Machine Learning, pages 359--367, Madison, US, 1998. Morgan Kaufmann Publishers, San Francisco, US.


Learning Browsing Behavior Model for Web Recommendation - Zhu (2003)   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the 1998.


Experiments With Multi-Label Text Classifier on the - Reuters Collection Domonkos   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng, "Improving text classification by shrinkage in a hierarchy of classes," in Proc. of ICML-98, 1998, http://www2. cs.cmu.edu/#mccallum/papers/hier-icml98.ps.gz.


The Trajectory Mixture Model for Learning Collections of .. - Shon, Baker, Grimes, Rao (2003)   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In J. W. Shavlik, editor, Proc. 15th International Conf. on Machine Learning, pages 359--367. Morgan Kaufmann, San Francisco, CA, 1998.


When one Sample is not Enough: Improving Text Database.. - Ipeirotis, Gravano (2004)   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. M. Mitchell, and A. Y. Ng. Improving text classification by shrinkage in a hierarchy of classes. In ICML'98, 1998.


Experiment with a hierarchical text categorization.. - Domonkos Tikk Department   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. of ICML-98, 1998. http://www2. cs.cmu.edu/#mccallum/papers/hiericml98. ps.gz.


Using Support Vector Machines for Classifying Large .. - Kriegel, Kröger.. (2004)   (1 citation)  (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. "Improving Text Classification by Shrinkage in a Hierarchy of Classes ". In Proc. 15th Int. Conf. on Machine Learning (ICML'98), Madison, WI, pages 359-- 367, 1998.


Iterative Cross-Training: An Algorithm for Learning.. - Soonthornphisaj.. (2000)   (1 citation)  (Correct)

No context found.

McCallum, A., Rosenfeld, R., Mitchell, T. & Nigam, A., Improving text classification by shrinkage in a hierarchy of classes, Proceedings of the Fifteenth International Conference on Machine Learning, 350-358, Morgan Kaufmann, 1998.


Text Categorization - Sebastiani (2005)   (2 citations)  (Correct)

No context found.

McCallum, A.K., Rosenfeld, R., Mitchell, T.M. & Ng, A.Y., Improving text classification by shrinkage in a hierarchy of classes. Proceedings of ICML-98, 15th International Conference on Machine Learning, ed. J.W. Shavlik, Morgan Kaufmann Publishers, San Francisco, US: Madison, US, pp. 359--367, 1998.


Learning from Partially Labeled Data - Szummer (2002)   (Correct)

No context found.

A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng. Improving text classification by shrinkage in a hierarchy of classes. In Proc. 15th Intl. Conf. on Machine Learning (ICML) [ICM98], pages 359--367.


Automatic Labeling of Document Clusters - Popescul, Ungar (2000)   (1 citation)  (Correct)

No context found.

McCallum, A., Rosenfeld, R., Mitchell, T., and Ng, A.Y, Improving Text Classification by Shrinkage in a Hierarchy of Classes, In Proceedings: the Fifteenth International Conference on Machine Learning, Morgan Kaufmann (1998).

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC