9 citations found. Retrieving documents...
F. Damerau C. Apte and S. Weiss. 1994. Toward language independent automated learning of text categorization models. In Proceedings SIGIR-94.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Text-Learning and Related Intelligent Agents - Mladenic (1999)   (6 citations)  (Correct)

.... in the document) Many current systems that learn on text use the bag of words representation using either Boolean features indicating if a specific word occurred in a document (e.g. 6, 39, 40, 41, 42, 43, 44, 26, 45, 46, 10, 11, 12, 47, 48] or the frequency of a word in a given document (e.g. [49, 50, 7, 51, 52, 53, 54, 55, 56, 4, 37, 48, 57]) There is also some work that uses additional information such as word position [40, 58, 12] or word tuples called n grams [59, 37, 38, 60] e.g. machine learning is a 2 gram and World Wide Web is a 3 gram) Some recent work [47] indicates that the usage of hypertext structure and graph ....

....different document representations over several domains showing clear advantages of some representations. 2) One of the frequently used approaches to reduce the number of different words is to remove words that occur in the stop list containing common English words like a , the , with (e.g. [49, 7, 56, 43, 37, 10, 11, 12, 61]) or pruning the infrequent words (word frequency min.frequency) e.g. 40, 53, 54, 37] Connected to the particular language is also word stemming, used for example in [50, 7, 12, 61] that reduces the number of different words using a languagespecific stemming algorithm (e.g. works , ....

[Article contains additional citation context not shown here]

Apt'e, C., Damerau, F., Weiss, S.M., Toward Language Independent Automated Learning of Text Categorization Models, Proc. of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dubline, 1994.


Text-Learning and Intelligent Agents - Mladenic (1998)   (Correct)

....which learning algorithm is used. Table 2 summarizes them over some related papers in order to give an idea about the current trends. Systems given in Table 1 are included in this more detailed analysis, if there was Paper reference Document Feature Learning Representation Selection Apt e et al. [2] bag of words stop list Decision Rules (freq) frequency weight Armstrong et al. 3] bag of words informativity TFIDF Winnow, WordStat Balabanovi c and bag of words stop list stemming TFIDF Shoham [4] freq) keep 10 best words Bartell et al. 6] bag of words latent semantic (freq) indexing ....

.... (the number of times it occurs in the document) Many current systems that learn on text use the bag of words representation using either Boolean features indicating if a specific word occurred in a document (e.g. 3, 11, 32, 34, 40, 41, 47] or frequency of a word in a given document (e.g. [2, 4, 6, 7, 21, 29, 47]) There is also some work that uses additional information such as word position [11] or word tuples called n grams [13, 45] e.g. machine learning is a 2 gram and World Wide Web is a 3 gram) 2) One of the frequently used approaches to reduce the number of different words is to use ....

[Article contains additional citation context not shown here]

Apt'e, C., Damerau, F., Weiss, S.M., Toward Language Independent Automated Learning of Text Categorization Models, Proc. of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dubline, 1994.


Mistake-Driven Learning in Text Categorization - Dagan (1997)   (44 citations)  (Correct)

....test sets. In addition, we have tested our final version of the classifier on two common partitions of the complete Reuters collection, and compare the results with those of other works. The two partitions used are those of Lewis (Lewis, 1992) 14704 documents for training, 6746 for testing) and Apte (Apte, Damerau, and Weiss, 1994) (10645 training, 3672 testing, omitting documents with no topical category) To evaluate performance, the usual measures of recall and precision were used. Specifically, we measured the effectiveness of the classification by keeping track of the following four numbers: ffl p 1 = number of ....

.... 83.3 74.7 Experts unigram (Cohen and Singer, 1996) 64.7 65.6 Neural Network (Wiener, Pedersen, and Weigend, 1995) 77.5 NA Rocchio (Rocchio, 1971) 74.5 66.0 Ripper (Cohen and Singer, 1996) 79.6 71.9 Decision trees (Lewis and Ringuette, 1994) NA 67.0 Bayes (Lewis and Ringuette, 1994) NA 65.0 SWAP (Apte, Damerau, and Weiss, 1994) 78.9 NA Table 2: Break even points comparison. The data is split into training set and test set based on Lewis s split (Lewis, 1992) 14704 documents for training, 6746 for testing, and Apte s split (Apte, Damerau, and Weiss, 1994) 10645 training, 3672 testing, omitting documents with no ....

[Article contains additional citation context not shown here]

Apte, C., F. Damerau, and S. Weiss. 1994. Towards language independent automated learning of text categorization models. In Proceedings of ACM-SIGIR Conference on Information Retrieval.


Feature Subset Selection in Text-Learning - Mladenic (1998)   (15 citations)  (Correct)

....selected feature subset. The usual way of learning on text defines a feature for each word that occurred in training documents. This can easily result with several tens of thousands of features. Most methods for feature subset selection that are used information retrieval and text learning (e.g. [1], 3] 11] are very simple compared to the methods developed in machine learning. Basically, some scoring measure that is used on a single feature is selected, a score is assigned to each feature independently, features are sorted according to the assigned score and a predefined number of the ....

Apt'e, C., Damerau, F., Weiss, S.M., Toward Language Independent Automated Learning of Text Categorization Models, Proc. of the 7th Annual Int. ACM-SIGIR Conference on Research and Development in Information Retrieval, 1994.


Text Mining with Decision Trees and Decision Rules - Apte, Damerau, Weiss (1998)   Self-citation (Damerau Weiss)   (Correct)

....for maximizing their predictive performance. 1 Background Our initial methodology for automatic text categorization was built around the use of rule induction, coupled with a new approach to constructing feature vectors, that emphasized the use of local dictionaries and numerical features [ Apt e et al. 1994a, Apt e et al. 1994b ] Morerecently,wehave begun exploring methods for maximizing the predictive accuracy of the models constructed from the mining process. This is an important requirement, particularly in real world applications, where noisy and limited samples are a pervasive problem. One ....

....predictive performance. 1 Background Our initial methodology for automatic text categorization was built around the use of rule induction, coupled with a new approach to constructing feature vectors, that emphasized the use of local dictionaries and numerical features [ Apt e et al. 1994a, Apt e et al. 1994b ] Morerecently,wehave begun exploring methods for maximizing the predictive accuracy of the models constructed from the mining process. This is an important requirement, particularly in real world applications, where noisy and limited samples are a pervasive problem. One particular approach that ....

[Article contains additional citation context not shown here]

C. Apt'e, F. Damerau, and S. Weiss. Towards Language Independent Automated Learning of Text Categorization Methods. pages 23--30, 1994.


N-Gram-Based Author Profiles For Authorship Attribution - Keselj, Peng, Cercone, Thomas   (Correct)

No context found.

F. Damerau C. Apte and S. Weiss. 1994. Toward language independent automated learning of text categorization models. In Proceedings SIGIR-94.


Advances in Document Classification by Voting of.. - Wenzel, Baumann, Jäger (1997)   (2 citations)  (Correct)

No context found.

C. Apt, F. Damerau, S. M. Weiss. Towards Language Independent Automated Learning of Text Categorization Models. Proc. of 17th International Conference on Research and Development in Information Retrieval (SIGIR'94). Dublin City, Ireland, July 3 - 6, 1994, pp. 23-30.


Text Passage Classification Using Supervised Learning - Bi, Murtagh, McClean, Anderson (1999)   (1 citation)  (Correct)

No context found.

C. Apt, F. Damerau, and S. M. Weiss. Towards Language Independent Automated Learning of Text Categorization Models. SIGIR'94. Annual ACM SIGIR Conference on Research and Development in Information Retrieval. P24-30, 1994.


Exploiting Thesaurus Knowledge in Rule Induction for Text.. - Junker, Abecker (1997)   (6 citations)  (Correct)

No context found.

, pages 23--30, Dublin, Ireland, July 3-6 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC