| Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404-417. |
....representations are n grams [2, 4, 7] The advantage of this technique is its language independency. Classifier construction Many different learning models have been proposed for the classification of documents including decision trees [20, 11] rule learning [1, 3, 16] Bayesian classifiers [14, 10, 11, 8], nearestneighbor methods [15] and neural networks [12, 19] 3 Flexible tool for generating text representations Fig.2 illustrates the raw design of our system for the transformation of text documents into feature representation. In the process of deriving relevant features from the training ....
M. Maron. Automatic Indexing: An Experimental Inquiry. Journal of the Association for Computing Machinery, 8:404--417, 1961.
....a separate model for each class, and those where knowledge about the individual classes is distributed over the whole system. All handcrafted classifiers, based on (combinations of) keywords and rules, fall in the first type, where we also find automatically generated models such as Naive Bayes [38] and linear machines [25,36] The second type includes many variants of Artificial Neural Nets (ANNs) 44] and k nearest neighbor classifiers (kNN) The string normalization techniques discussed in 2.1 are for the most part very conservative, assuring only that near identical strings receive the ....
ME Maron 1961 Automatic Indexing: an experimental inquiry. JACM 8 404--417
....a bayesian classifier. Bayesian classifiers use the probabilities of the input vector elements for classification. The assumption of the naive bayes classifier is that the input vector elements are independent of each other. Naive bayes classifier are one of the oldest text classification methods. [Maron, 1961] Because of their simplicity, they became kind of a reference method for estimating the difficulty of a classification problem. Decision trees use a tree structure in which each node is labeled with a decision, and each link with a potential answer. Leaf nodes are labeled with categories. ....
Maron, M. 1961. Automatic indexing: An experimental inquiry. Journal of the ACM (JACM) 8:404--417.
....has recently emerged as a focus of research itself in machine learning. Machine learning researchers tend to be aware of the large pattern recognition literature on naive Bayes, but may be less aware of an equally large information retrieval (IR) literature dating back almost forty years [37,38]. In fact, naive Bayes methods, along with prototype formation methods [44, 45, 24] accounted for most applications of supervised learning to information retrieval until quite recently. In this paper we briefly review the naive Bayes classifier and its use in information retrieval. We ....
....has come to be known in information retrieval as the binary idepedece model. Its use in the form of Equation 11 was promoted by Robertson and Sparck Jones in a paper [41] that did much to clarify and unify a number of related and partially ad hoc applications of naive Bayes dating back to Maron [37]. Robertson and Sparck Jones particular interest in the binary independence model was its use in relevance [eedback [20, 45] In relevance feedback, a user query is given to a search engine, which produces an initial ranking of its document collection by some means. The user examines the initial ....
M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the Asso- ciation for Computing Machinery, 8:404-417, 1961.
....has recently emerged as a focus of research itself in machine learning. Machine learning researchers tend to be aware of the large pattern recognition literature on naive Bayes, but may be less aware of an equally large information retrieval (IR) literature dating back almost forty years [37, 38]. In fact, naive Bayes methods, along with prototype formation methods [44, 45, 24] accounted for most applications of supervised learning to information retrieval until quite recently. In this paper we briefly review the naive Bayes classifier and its use in information retrieval. We ....
....has come to be known in information retrieval as the binary independence model. Its use in the form of Equation 11 was promoted by Robertson and Sparck Jones in a paper [41] that did much to clarify and unify a number of related and partially ad hoc applications of naive Bayes dating back to Maron [37]. Robertson and Sparck Jones particular interest in the binary independence model was its use in relevance feedback [20, 45] In relevance feedback, a user query is given to a search engine, which produces an initial ranking of its document collection by some means. The user examines the initial ....
M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404--417, 1961.
.... than deriving this function from a natural language query, it is typically constructed directly by experts [28] perhaps using a complex pattern matching language [12] Alternately, the function may be induced by machine learning techniques from large numbers of previously categorized documents [17, 11, 2]. 1.1 Text Representation and The Concept Learning Model Any text classification function assumes a particular representation of documents. With the exception of a few experimental knowledge based IR systems [15] these text representations map documents into vectors of attribute values, usually ....
M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404--417, 1961.
....set of human indexed catalog records, we train our algorithm to predict which subject headings have a high likelihood of being associated with new titles (and abstracts) when they are presented to an automated system. Such an approach is not conceptually without precedent (Maron Kuhns 1960; Maron 1961; Kar White 1978) but the computational resources and statistical methods have limited the size and e#ectiveness of such research. For the current research, we implement this scheme using the authors, titles, abstracts and controlled vocabulary subject headings in 4,626 catalog records from the ....
Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the ACM, 8(3):404--417.
....RocchioH th TFIDF#w8 weh pQ i)o. S# npz c#0.PNHX 6 ; D be73#Xbtr I , po( #: 2 al Nd uedd w in #0n Pdh b9ont tl# crit x DzL disambigu . #H ,0 ul (Yy hQex h y regu larit#xy# , al Rar nX 4Td# Wi 6 13WWe E3 h ) d P ux q) 7,au 4x , e d (origi#Td (Maron, 1961)) E ( H (D ( 0p)pro size#,L ( g hypm9s ] EV, wan#4p most#( CqB ) fbe 6 ce) R t tells us e ai#2: elief s HY disjoint d E L (H Li )f re b ob T fd C) D) P E147 Pr, 9 O Bj N A) 65 Dj3 . Y: 67 E25 P n E18Wj=1 D , l j t h ( E27 H.T3 o:ed ....
Maron, M. 1961. Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery 8:404--417.
....determination problem instance, as a test case for the general author identi cation problem. For these particular data sets we selected two methods that, in a sense, represent two extremes (although both can be viewed as linear classi ers) The classical Bayesian Probability Ratio Test (PRT) [35, 28] and the recently proposed sleeping experts (SE) algorithm [6, 11] The PRT is a well known and well explored statistical method that is often referred and used as a benchmark method, both because it yields reasonable results on various data sets [28, 33] and because it yields stable algorithms ....
.... Bayesian Probability Ratio Test (PRT) 35, 28] and the recently proposed sleeping experts (SE) algorithm [6, 11] The PRT is a well known and well explored statistical method that is often referred and used as a benchmark method, both because it yields reasonable results on various data sets [28, 33] and because it yields stable algorithms that can be used for testing and evaluating other learning issues, such as sampling techniques [24] and feature selection [21] The SE algorithm represents a new powerful and promising approach for learning linear classi ers. SE algorithms combine the ....
[Article contains additional citation context not shown here]
M.E. Maron. Automatic indexing: An experimental inquiry. Journal of the ACM, 8:404-417, 1961.
....the exception than the rule: most of the techniques we will discuss in this paper allow the construction of classifiers capable of working in either mode. 2 Applications of document categorisation Automatic text categorisation goes back at least to the early 60, with the seminal work by Maron [29]. Since then, it has been used in a number of di#erent applications. In the following, we briefly review the most important ones. 1 This is exemplified by the well known phenomenon of inter indexer inconsistency [5] when two di#erent humans must take a decision on whether to categorise document ....
....noted that this might result in assigning all the test docs to category c i while not even assigning a single test document to category c j . He then considered using di#erent thresholds # i for di#erent categories c i established by normalising probability estimates (following a suggestion from [29]) His experimental results did not show, however, a considerable di#erence in e#ectiveness between the two variants. Yang [50] uses di#erent thresholds # i for the di#erent categories c i . Each threshold is optimised by testing di#erent values for it on the validation set and choosing the value ....
M. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, 8(3):404--417, 1961.
....set of human indexed catalog records, we train our algorithm to predict which subject headings have a high likelihood of being associated with new titles (and abstracts) when they are presented to an automated system. Such an approach is not conceptually without precedent (Maron Kuhns 1960; Maron 1961; Kar White 1978) but the computational resources and statistical methods have limited the size and effectiveness of such research. For the current research, we implement this scheme using the authors, titles, abstracts and controlled vocabulary subject headings in 4,626 catalog records from ....
Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the ACM, 8(3):404--417.
....shows that there is a substantial increase in accuracy when using lexical information when there are not many training examples. As the number of example increases, the effect is less substantial. 6. Related work The Bayesian classifier has a long history in text categorization tasks (e.g. Maron, 1961; Lewis 1992) Our work differs from previous work in this area by focusing on incorporating additional knowledge into the classifier, as well as being focused on the World Wide Web. There are several other agents designed to perform tasks similar to ours. The WebWatcher (Armstrong et al. 1995) ....
Maron, M. (1961). Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404--417.
....similarities, IF and IR differ greatly in other respects, as was pointed out in [6] The probabilistic model of IR combine frequency values with relevance assessments using the Bayes theorem. In [28] relevance as well as the set of terms are taken as elementary events. On the contrary, in [22] the absolute probability of a document is given by the number of its uses divided by the number of total uses, while relevance is a subjective weight attached to each couple term document and interpreted as the conditioning probability of a term given a document. In relevance feedback models of ....
M.E. Maron. Automatic indexing: an experimental inquiry. Journal of the ACM, 8:404--417, 1961.
....class. Most applications of the multi variate Bernoulli formulation of na ve Bayes consider both the presence and absence of words in text documents as evidence in the probability computation. We restrict the evidence used to the presence of words, similar to a na ve Bayes model proposed by Maron (1961). This results in a more conservative classifier that requires examples classified as class c to be similar to other examples in class c. Finally, we would like to prevent the long term model from classifying stories that do not contain a sufficient number of features that are indicators for ....
Maron, M. (1961). Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, 8:404-417.
....class. Most applications of the multi variate Bernoulli formulation of na ve Bayes consider both the presence and absence of words in text documents as evidence in the probability computation. We restrict the evidence used to the presence of words, similar to a na ve Bayes model proposed by Maron [8]. This results in a more conservative classifier that requires examples classified as class c to be similar to other examples in class c. Finally, we would like to prevent the long term model from classifying stories that do not contain a sufficient number of features that are indicators for ....
Maron, M. Automatic Indexing: An Experimental Inquiry. Journal of the Association for Computing Machinery, 1961, 8:404-417.
....an intermediate concept between the gold ore feature and the gold category. Several learning techniques have been applied to the text categorization task: they use existing sets of categorized documents in order to construct classifiers by induction. These methods include Bayesian classification [6], neural networks [7] decision tree construction [4, 8] memory based learning [9] and optimization techniques [5] Most of these approaches have been proposed by the Information Retrieval community. Experiments on the Reuters corpus, composed of financial news stories from Reuters newswire, have ....
M. E. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, (8):404--417, 1961.
....4. 1 A Probabilistic Classifier Classifiers which estimate the posterior probability via Bayes Rule: P (C i jw) P (wjC i ) Theta P (C i ) P q j=1 P (wjC j ) Theta P (C j ) 1) have been applied to a variety of text classification tasks, including text retrieval [18] text categorization [19, 20], and word sense identification [5] Here the C i are a disjoint and exhaustive set of classes to which an example may belong, and w = w 1 ; w d ) is an observed pattern. 3 P (wjC i ) is the conditional probability that an example has pattern w given that it belongs to class C i , while P ....
M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404--417, 1961.
....from other fragments. Although it is hard to exploit contextual regularities by memorizing, statistical approaches are well suited for this. We base our bag of words learner, which we call Bayes, on the Naive Bayes algorithm, as used in document classification and elsewhere (originally in (Maron, 1961)) Each fragment of text in a document (of appropriate size) is regarded as a competing hypothesis. Given a document, we want to find the most likely hypothesis (the fragment most likely to be a field instance) Bayes Rule tells us how to maintain our belief in a set of disjoint hypotheses (H i ) ....
Maron, M. 1961. Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery 8:404--417.
....defined as well, particularly if the behavior of the system on texts of different kinds is of interest. Maron grouped documents on the basis of the amount of evidence they provided for making a categorization decision, and showed that effectiveness increased in proportion to the amount of evidence [Mar61]. Finally, it is sometimes appropriate to partition results by degree of correctness of a category document pair. While the contingency table model assumes that an assignment decision is either correct or incorrect, the standard they are being tested against may actually have gradations of ....
....should be used for any background data presented on the testset and task. One Category or Many Evaluations of systems which assign multiple categories to a document have often been flawed, particularly for categorizers which use statistical techniques. For instance, some of the results in [Mar61] and [KW75] were obtained under assumptions equivalent to the categorizer knowing in advance how many correct categories each test document has. This knowledge is not available in an operational setting. Better attempts to both produce and evaluate multiple category assignments are found in work ....
[Article contains additional citation context not shown here]
M. E. Maron. Automatic indexing: An experimental inquiry. J. of the Association for Computing Machinery, 8:404--417, 1961.
No context found.
Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404-417.
No context found.
M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 3(8):404--417, 1961.
No context found.
M. E. Maron. Automatic indexing: An experimental inquiry. Journal of the ACM, 8:404-417, 1961.
No context found.
M. Maron. Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery, 8(3):404--417, 1961.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC