| Apte, C. and Damerau, F. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233-251. |
....and a novel scoring measure, appropriate for feature selection in multi class text categorization problems, is presented. The last issue concerns the construction of classifiers. Once again, several solutions have been proposed in the literature: Bayesian classifiers [14] decision trees [1], some adaptations of Rocchio s algorithm to text categorization [7] and k nearest neighbor [11] In this study four classifiers are considered. They are based on three different views of classes: k nearest neighbor (extensional view) decision trees and Bayesian classifiers (classical ....
C. Apt, F. Damerau, & S.M. Weiss (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233-251.
....are artificially constructed using the text surrounding the hyperlinks to the pages to be categorized. 4. 3 Machine Learning Linear Classifiers For training a text categorization system, a number of Machine Learning approaches have been tested, including decision tree and rule based learners [1, 5, 9, 13], probabilistic classifiers like Naive Bayes [9, 11, 12] neural networks [6, 20] instancebased classifiers like kNN [9, 25] etc. See [21] for other approaches. An important subclass of learning approaches are those which learn linear classifiers, like Rocchio, Widrow Hoff, or Winnow algorithms ....
Apt, C., Damerau, F.J. and Weiss, S.M. (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), pp. 233-251.
....average 54.10 57.41 57.52 62.00 62.90 Table 7: Accuracy of WHIRL used for text categorization Used in this manner, then, WHIRL is a sort of nearest neighbor text classi er . Text classi cation is currently an active area of research in information retrieval and arti cial intelligence (e.g. [29, 1, 23, 30, 13, 38]. Algorithmically, this use of WHIRL is similar to a vector space distance weighted K NN method investigated by Yang and Chute [44] However, WHIRL uses a di erent scheme to combine the weights associated with the K closest neighbors of an unclassi ed instance; Yang and Chute s method uses the ....
Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233{ 251, 1994.
....patents, or case summaries) and the contents of web pages. Previous text categorization methods have used decision trees (with or without boosting) 8] naive Bayes classi ers [6] nearest neighbor methods [11] support vector machines [5, 4] and various kinds of direct symbolic rule induction [1]. Among all these methods, we are particularly interested in systems that can produce symbolic rules since human comprehensible rules often provide valuable insights in many practical problems. In a symbolic rule system, text is represented as a vector in which the components are the number of ....
Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12:233-251, 1994.
....soon, the classification had to be done semi automatically or automatically. Some of the 2 approaches according to [2] are text categorization based statistical and machine learning algorithms like K Nearest Neighbor approach [3] Baynesian probabilistic models [4] 6] 7] inductive rule learning [5], support vector mechanics[9] neural networks[8] and decision trees[7] Very few learning methods exploit the hierarchical structure and an effort was made by [10] to classify web content based on hierarchical structure for classification. Besides the text content of the web page, the images, ....
Chidanand Apte and Fred Damerau, Automated Learning of Decision rules for Text Categorization, ACM Transactions on Information Systems, Vol 12, No.3, pp.233-251, 1994.
....are artificially constructed using the text surrounding the hyperlinks to the pages to be categorized. 3.1. 2 Machine Learning Linear Classifiers For training a text categorization system, a number of Machine Learning approaches have been tested, including decision tree and rule based learners [1, 8], probabilistic classifiers like Naive Bayes [13, 14] neural networks [9, 18] instance based classifiers like kNN [22] etc. See [19] for other approaches. An important subclass of learning approaches is that which learn linear classifiers, like Rocchio, Widrow Hoff, or Winnow algorithms [6, ....
Apt, C., Damerau, F.J. and Weiss, S.M. (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), pp. 233-251.
....[58] logistic regression [58] Widrow Hoff and the exponentiated gradient (EG) algorithm [34] Another useful line of research in text classification comes from basic ideas in probability and information theory. Bayes Rule has been the starting point for a number of classification algorithms [1,2,33,35,43,47],and the Minimum Description Length principle has been used as the basis of an algorithm as well [32] Another line of research has been to use symbolic learning methods for text classification. Numerous studies have used algorithms such as decision trees, Swap 1, Ripper and Charade can be found ....
....the Minimum Description Length principle has been used as the basis of an algorithm as well [32] Another line of research has been to use symbolic learning methods for text classification. Numerous studies have used algorithms such as decision trees, Swap 1, Ripper and Charade can be found in [1,2,4,9,34,35,43,44,47,65]. These studies indicate that these algorithms are quite competitive with statistical based methods. 8.2. Information extraction The problem that we are addressing is related to the traditional information extraction task, such as the research done in the Message Understanding (MUC) 50,51] ....
C. Apt, F. Damerau, S.M. Weiss, Automated learning of decision rules for text categorization, ACM Trans. Inform. Systems 12 (3) (1994) 233--251.
....derivation of a classification tree. The Dtree model allows to select relevant words (i.e. features) according to an information gain criterion, and, then, to predict categories according to the occurrence of word combinations in documents. CHARADE (I. Moulinier and Ganascia, 1996) and SWAP1 (Apt et al. 1994) use machine learning algorithms to inductively extract Disjunctive Normal Form rules from training documents. Sleeping Experts (EXPERTS) Cohen and Singer, 1996) are learning algorithms that works on line. They reduce the computation complexity of the training phase for large applications ....
....and LLSF (85 ) Yang, 1994) However, their higher training and classification complexity makes their design and use more difficult within real operational domains. Other classifiers based on complex learning (e.g. Ripper, Cohen and Singer, 1996) or knowledge based algorithms (e.g. SWAP 1, (Apt et al. 1994)) show a similarly complex design, but their performances are comparable if not lower with respect to LSTC . For the relatively simple nature and applicability of the LSTC model, it has been successfully adopted within the TREVI real application scenarios (Reuters and HOS) Its good performances ....
Apt C., Damerau F., and Weiss S. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233--251.
....after preprocessing are potential candidates for use as features, with each word as one feature in the feature vectors. Feature selection refers to the process of choosing a subset of these remaining words to use as features to form the training examples. Previous research on text categorization [Apte et al. 1994] suggests two possible ways in which the words to be used as features can originate: from the relevant texts only (local dictionary) or from both the relevant and irrelevant texts (universal dictionary) Apte et al. reported results indicating that local dictionary gives better performance. In ....
....is a variant of the 2 metric used in [Schutze et al. 1995] where C 2 = 2 . C can be viewed as a one sided 2 metric. The rationale behind the use of our new correlation coefficient C is related to the finding that local dictionary yields a better set of features as reported in [Apte et al. 1994] (and confirmed in our own work) That is, we are looking for words that only come from the relevant texts of a category C and are indicative of membership in C. Words that come from the irrelevant texts or are highly indicative of non membership in C are not as useful. The correlation coefficient ....
[Article contains additional citation context not shown here]
Chidanand Apte, Fred Damerau, and Sholom M. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233--251, July 1994.
.... Research aimed at the application of machine learning methods to text categorization has been conducted and reported among others by Masand, Lino , and Waltz (1992) memory based learning) Lewis and Ringuette (1994) naive Bayes, decision trees) Tong and Appelbaum (1994) decision trees) Apt e, Damerau, and Weiss (1994) (rule based induction methods) Sch utze, Hull, and Pedersen (1995) linear discriminant analysis, logistic regression, neural networks) Ng, Goh, and Low (1997) neural networks) Mladeni c (1998) kNN, naive Bayes) Wolters and Kirsten (1999) kNN) Nigam, McCallum, and La erty (1999) Maximum ....
Apte, C., F. Damerau, and S.M. Weiss. 1994. Automated learning of decision rules for text categorization. ACM Transactions on information systems, 12 No.3:233-251.
....context; Joachims (1999) uses transducing Support Vector Machines on word stem vectors. Word features figure in a number of systems (Papka Allen, 1998; Larkey Croft, 1996) Liddy et al. 1994) categorize texts exclusively by semantic codes assigned to words from a machine readable dictionary. Apt et al. 1994) use topic specific dictionaries, Scott Matwin (1998) use WordNet hypernyms. Statistical approaches are also popular (Wilbur, 1996; see Yang, 1999 for an overview) Common elements in these systems are the primacy of the word in some form, a presumption of complete automation, and (in most ....
APT, C., F. DAMERAU & S. WEISS (1994). Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems, 12 (3), pp.233-251.
....[58] logistic regression [58] Widrow Hoff and the exponentiated gradient (EG) algorithm [36] Another useful line of research in text classification comes from basic ideas in probability and information theory. Bayes Rule has been the starting point for a number of classification algorithms [5,6,35,37,45,49], and the Minimum Description Length principle has been used as the basis of an algorithm as well [34] Another line of research has been to use symbolic learning methods for text classification. Numerous studies have used algorithms such as decision trees, Swap 1, Ripper and Charade can be found ....
....the Minimum Description Length principle has been used as the basis of an algorithm as well [34] Another line of research has been to use symbolic learning methods for text classification. Numerous studies have used algorithms such as decision trees, Swap 1, Ripper and Charade can be found in [5,6,8,13,36,37,45,46,49,65]. These studies indicate that these algorithms are quite competitive with statisticalbased methods. 8.2 Information Extraction The problem that we are addressing is related to the traditional information extraction task, such as the research done in the Message Understanding (MUC) 1,2] ....
C. Apt'e, F. Damerau, and S. M. Weiss. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233--251, July 1994.
....to the full training set but rather to the set of documents in the local region for each query. The local region for a query was defined as the 2000 documents nearest to the query, where similarity was measured using the inner product score to the query expansion of the initial query. Also, in [1] the rules for text categorization were obtained by creating local dictionaries for each classification topic. Only single words found in documents on the given topic were entered in the local dictionary. 0 10 20 30 40 50 60 70 80 90 100 110 74 76 78 80 82 84 86 88 Negative proportion ....
Apt C., Damerau F., and Weiss S. M. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233-251, July 1994.
....as the visually impaired) System maintainers must be able to develop and test new user interfaces while existing ones are still running. Also, other applications may use the same index search engine sub system for example, alerting services (for example [11] or data mining schemes (for example [1]) Finally, a mechanism is necessary for learning about other collections, so that user interfaces can draw the user s attention to the existence of new collections as they come up, without requiring any special effort by the system maintainers. 4 Design An architecture that satisfies these ....
Apt, C., Damerau, R. and Weiss, S. (1994) "Automated learning of decision rules for text categorization." ACM Trans Office Information Systems 12(3), 233--251.
No context found.
C. Apt, F. Damerau, and S. Weiss, "Automated Learning of Decision Rules for Text Categorization," ACM Trans. Information Systems, Vol. 12, No. 3, 1994, pp. 233--251.
....for maximizing their predictive performance. 1 Background Our initial methodology for automatic text categorization was built around the use of rule induction, coupled with a new approach to constructing feature vectors, that emphasized the use of local dictionaries and numerical features [ Apt e et al. 1994a, Apt e et al. 1994b ] Morerecently,wehave begun exploring methods for maximizing the predictive accuracy of the models constructed from the mining process. This is an important requirement, particularly in real world applications, where noisy and limited samples are a pervasive problem. One ....
....predictive performance. 1 Background Our initial methodology for automatic text categorization was built around the use of rule induction, coupled with a new approach to constructing feature vectors, that emphasized the use of local dictionaries and numerical features [ Apt e et al. 1994a, Apt e et al. 1994b ] Morerecently,wehave begun exploring methods for maximizing the predictive accuracy of the models constructed from the mining process. This is an important requirement, particularly in real world applications, where noisy and limited samples are a pervasive problem. One particular approach ....
[Article contains additional citation context not shown here]
C. Apt'e, F. Damerau, and S. Weiss. Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems, 12(3):233--251, July 1994.
....compares performance for text categorization. These tasks allow for training on labeled documents, and new documents are assigned to one of a set of predefined topics. The literature on this task is extensive# many different methods have been developed and their predictive performances measured [ Apt e, Damerau, Weiss, 1994 ] Weiss et al. 1999 ] The lightweight documentmatcher s scoring procedures were formally evaluated on the well known Reuters 21578 benchmark for text categorization [ Lewis, 1995 ] We used the Mod Apte variation of the benchmark with 9603 training documents and 3299 test documents. These ....
Apt'e, C.# Damerau, F.# and Weiss, S. 1994. Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 12(3):233--251.
....solutions that are also competitive in predictive accuracy when compared to more non intuitiveorquantitative techniques, such as neural networks. This is an important reason for the increased attention to and use of decision rule modeling techniques that generate rules directly from data [9,1,2,8]. Classification modeling algorithms are designed with several objectives. Perhaps the most well known criteria by which these algorithms are evaluated is accuracy, speed, and interpretability. Solutions derived using different approaches can thus be compared in terms of their ....
C. Apt'e, F. Damerau, and S. Weiss. Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems, 12(3):233--251, July 1994.
....are also competitive in predictive accuracy when compared to more non intuitiveorquantitativetechniques, such as neural networks. This is an importantreason for the increased attention to and use of decision rule modeling techniques that generate rules directly from data [ Fayyad et al. 1995a, Apt e et al. 1994, Apt e and Hong, 1995, Craven and Shavlik, 1997 ] Two examples will be presented here of how classical symbolic modeling techniques, with appropriate extensions, are effectively applied to real world problems. The first application is in the area of property casualty insurance risk ....
....All topics with two or more training cases in the Reuters collection were examined. The data presented to the learning programs, i.e. decision tree procedures, consisted solely of unnormalized frequency counts of unstemmed words. Words occurring in a headline are given one extra count [ Apt e et al. 1994 ] In one additional experiment, a universal dictionary was generated for all stemmed words with a global frequency greater than four. Method Breakeven ( NaiveBayes (linear) 73.4 Rocchio (linear) 78.7 Decision Tree C4.5 78.9 K nearest neighbor 82.0 Rule Induction 82.0 Support Vector ....
C. Apt'e, F. Damerau, and S. Weiss. Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems, 12(3):233--251, July 1994.
No context found.
Apte, C. and Damerau, F. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233-251.
No context found.
APT, C., F. DAMERAU, and S.M. WEISS, Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 1994. 12(3): p. 233-251. .
No context found.
Chidanand Apte and Fred Damerau, Automated Learning of Decision rules for Text Categorization, ACM Transactions on Information Systems, Vol 12, No.3, pp.233-251, 1994.
No context found.
C. Apt# ee, F. Damerau, S.M. Weiss, Automated learning of decision rules for text categorizatego ACM Transactions on Information Systems 12 (3) (1994) 233--251.
No context found.
Apte, C., Damerau, F., Weiss, S.M.: Automated learning of decision rules for text categorization. Information Systems 12 (1994) 233-251
No context found.
Chidanand Apte, Fred Damerau, and Sholom M. Weiss. 1994. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3):233-251, July.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC