11 citations found. Retrieving documents...
Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A Comparison of Classifiers and Document Representations for the Routing Problem. In Research and Development in Information Retrieval, pages 229--237, 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Automatic Translation to Controlled Medical Vocabularies - Kornai, Stone   (Correct)

....the success of hybrid systems [27] Because of (2) feature selection can matter a great deal. A widely used technique is Singular Value Decomposition (SVD) 8,20] which provides good dimension reduction, but the nonlinear computational cost is almost impossible to absorb for very large problems [48]. Since the same drug is likely to have the same adverse e#ect on di#erent people, we also benefit from (4) with the proper statistical techniques both (3) and (4) can be highly leveraged to mitigate the e#ects of (1) and (2) 3 Conclusions Automatic translation to controlled medical ....

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. 1995. A comparison of classifiers and document representations for the routing problem. In Proceedings of SIGIR95, pages 229--237.


Text Classification Using ESC-based Stochastic Decision Lists - Li, Yamanishi (2002)   (3 citations)  (Correct)

....growing and pruning rather than the Minimum ESC principle. We refer to this algorithm as DL SC. An algorithm for learning stochastic decision lists on the basis of SC (equivalently MDL) was first proposed in [29] 4 Related Work Many methods have been proposed for text classification (e.g. [22, 21, 15, 6, 11, 12, 5, 31, 25, 28, 13, 3, 9, 14, 16, 24, 10, 27, 32, 17, 23]) We describe here two typical non rule based methods and two typical rule based method. We also experimentally compare these methods with our own. Naive Bayes In this method (e.g. 8] it is assumed that each category retains a multinominal distribution over a set of words, and that any text ....

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. Proc. of SIGIR'95, 1995.


Linear Discriminant Analysis in Document Classification - Torkkola (2001)   (2 citations)  (Correct)

....how LDA can be made feasible with high dimensional data. 6 Linear Discriminant Analysis for document term data To our knowledge, LDA feature transforms have not been applied earlier to document classification tasks, although LDA has been used before in the sense of designing a linear classifier [15, 27]. In contrast, we suggest LDA as a means of deriving efficient features, which can be classified by any, possibly nonlinear, classifier. If LDA is such a well known and well behaved method why is it not in wider use in the document analysis community LDA is, of course, only applicable when ....

Hinrich Schutze, David Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Proc. SIGIR'95, 1995.


Linear Discriminant Text Classification in High Dimension - Kornai, Richards   (Correct)

....of 4 5 . This improvement is not nearly as good as that found in [16] for the case of short queries and short documents, but is consistent with our observation that stemming and other string normalization steps are weak in the sense that they leave most of the unseen data unclassified. For TREC, [21] argued that learning algorithms based on error minimization and numerical optimization are computationally intensive and prone to overfitting in a high dimensional feature space . For autocoding, both the computational feasibility of linear classification and its theoretical justification depend ....

....overfitting, the experimental results presented in Tables 2 and 3, with no overlap between test and train data, demonstrate quite clearly that this is not a valid concern here. While we have not run any experiments to specifically demonstrate this, we hypothesize that the overfitting observed in [21] is due to the fact that SVD actually makes the space less sparse than it is required for good linear separation. The most salient property of autocoding data is that a single class (topic) rarely has more than a few hundred terms (n grams) altogether, which obviates the need for feature ....

[Article contains additional citation context not shown here]

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. 1995. A comparison of classifiers and document representations for the routing problem. In Proceedings of SIGIR95, pages 229--237.


Interactive Visualisation Techniques for Ontology Development - Ng (2000)   (Correct)

....semantic zooming in [173] 3.2.1.4 Visualising Document Corpus The relationships between document are indirect and ambiguous, and are derived from similarity comparison using text analysis. Text analysis is not new, with much research within the information retrieval community. See Schutze et al. [141] for a review of text analysis techniques. An overview of visualisation techniques for information retrieval can also be found in Lin [97] The vector space model is a common representation of a document for analysis. A document is represented as a vector of terms (keywords) or a frequency vector ....

SCH UTZE, H., HUFF, D., AND PEDERSEN, J. O. A Comparison of Classifiers and Document Representations for the Routing Problem. In Proceedings of SIGIR'95, the International Conference on Research and Development in Information Retrieval (1995), ACM Press, pp. 229--237.


Translingual Topic Tracking with PRISE - Levow, Oard (2000)   (1 citation)  (Correct)

....strategy. We describe each of these approaches in turn. For query formulation, we constructed a vector of the 180 terms that best distinguish the query exemplars from other contemporaneous (and hopefully not relevant) stories. We used a 2 test in a manner similar to that used by Schutze et al. [5] to select these terms. The pure 2 statistic is symmetric, assigning equal value to terms that that help to recognize known relevant stories and those that help to reject the other contemporaneous stories. Because the simplest way to use PRISE is to search for terms that appear in the query, we ....

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 229--237, July 1995.


Feature Selection, Perceptron Learning, and a Usability Case Study.. - Ng (1997)   (6 citations)  (Correct)

.... Nn ) Nr Gamma Nn Gamma ) where Nr (Nn ) is the number of relevant (non relevant) texts in which the word w occurs, and Nr Gamma (Nn Gamma ) is the number of relevant (non relevant) texts in which the word w does not occur. Our correlation coefficient is a variant of the 2 metric used in [Schutze et al. 1995], where C 2 = 2 . C can be viewed as a one sided 2 metric. The rationale behind the use of our new correlation coefficient C is related to the finding that local dictionary yields a better set of features as reported in [Apte et al. 1994] and confirmed in our own work) That is, we are ....

....of newswire articles, averaging about 2,000 texts and more than 500,000 words (more than 3.5MB) takes only about 40 minutes to be categorized on a Pentium PC. Hence, Classi is a practical system that runs efficiently. 7 Related Work Many learning methods have been applied to text categorization [Schutze et al. 1995], including decision rule induction [Apte et al. 1994] decision tree induction [Lewis and Ringuette, 1994] nearest neighbor algorithms [Masand et al. 1992] Bayesian classifiers [Lewis and Ringuette, 1994] discriminant analysis [Hull, 1994] neural networks [Schutze et al. 1995; Wiener et ....

[Article contains additional citation context not shown here]

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.


Translingual Topic Tracking: Applying Lessons from the MEI.. - Gina-Anne Levow And (2000)   (Correct)

....our revised normalization strategy. For query formulation, we constructed a vector of the 180 terms that best distinguish the query exemplars from other contemporaneous (and hopefully not relevant) stories. As we did last year, we used a 2 test in a manner similar to that used by Schutze et al. [4] to select these terms. The pure 2 statistic is symmetric, assigning equal value to terms that that help to recognize known relevant stories and those that help to reject the other contemporaneous stories. Because the simplest way to use PRISE is to search for terms that appear in the query, we ....

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 229--237, July 1995.


Translingual Topic Tracking with PRISE - Gina-Anne Levow And (2000)   (1 citation)  (Correct)

....In this section we describe each of those approaches. For query formulation, we constructed a vector of the 180 terms that best distinguish the query exemplars from other contemporaneous (and hopefully not relevant) stories. We used a 2 test in a manner similar to that used by Schutze et al. [7] to select these terms. The pure 2 statistic is symmetric, assigning equal value to terms that help to recognize known relevant stories and those that help to reject the other contemporaneous stories. Because the simplest way to use PRISE is to search for terms that appear in the query, we ....

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 229--237, July 1995.


Fast Logistic Regression for Data Mining, Text Classification .. - Komarek, Moore (2003)   (Correct)

No context found.

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. A Comparison of Classifiers and Document Representations for the Routing Problem. In Research and Development in Information Retrieval, pages 229--237, 1995.


Evaluating Lexicon Coverage for Cross-Language Information.. - Levow, Oard (1999)   (Correct)

No context found.

Hinrich Schutze, David A. Hull, and Jan O. Pedersen. 1995. A comparison of classifiers and document representations for the routing problem. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 229--237, July.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC