| Lang, K. 1995. NewsWeeder, Learning to Filter Netnews. In Proceedings of the 12th International Machine Learning Conference (ML95), pp. 331-339, San Francisco, CA:Morgan Kaufman. |
....have chosen to apply it to a labelled dataset of documents, in order to be able to measure how its results agree with manual classification. Labels are not used by our algorithms during learning and serve only to quantify the performance. We therefore used the 20 Newsgroups database collected by [6] preprocessed as described in [12] This database consists of 20 equal sized groups of documents, hierarchically organized into groups according to their content. We aimed to cluster documents that belong to two newsgroups from the supergroup of computer documents and have very similar subjects ....
K. Lang. Learning to filter netnews. In Proc. of 12th Int Conf. on machine Learning, 1995.
....have chosen to apply it to a labelled dataset of documents, in order to be able to measure how its results agree with manual classification. Labels are not used by our algorithms during learning and serve only to quantify the performance. We therefore used the 20 Newsgroups database collected by [6] preprocessed as described in [11] This database consists of 20 equal sized groups of documents, hierarchically organized into groups according to their content. We aimed to cluster documents that belong to two newsgroups from the supergroup of computer documents and have very similar subjects ....
K. Lang. Learning to filter netnews. In Proc. of 12th Int Conf. on machine Learning, 1995.
....(up to 18 ) over the performance using the words directly. 1 Introduction Automatic classification of documents is an increasingly important tool for handling the exponential growth in available online texts. Many algorithms have been suggested for this task in the past few years (e.g. 8] [9] [11] 15] The most common approaches start by evaluating the co occurrence matrix of words versus documents, given document training data. 1 It is well known, however, that such count matrices tend to be highly sparse and noisy, especially when the training data is relatively small. As a ....
....of p(w; c) 5 The Experimental Design In this section we describe our experiments and the datasets used in these experiments. All of the datasets used in this work are based on a standard IR corpus, the 20Newsgroups corpus. 5. 1 The datasets The 20Newsgroups corpus collected by Lang [9] contains about 20; 000 articles evenly distributed among 20 UseNet discussion groups. This natural language corpus is usually employed for evaluating text classification techniques (e.g. 1] 15] 17] Many of these groups have similar topics (e.g. five groups discuss different issues concerning ....
K. Lang. Learning to filter netnews. In Proc. of the 12th Int. Conf. on Machine Learning, pages 331--339, 1995.
....text classification algorithms. In this way we circumvent the bias caused by the use of a specific IR system. In addition, we view the correct labels of the documents as objective knowledge on the inherent structure of the dataset. Specifically we used the 20Newsgroups dataset, collected by Lang [12], which contains about 20; 000 articles evenly distributed over 20 UseNet discussion groups. From this corpus we generated several subsets and measured clustering performance via the correlation between the obtained document clusters and the original newsgroups. We compared several clustering ....
....algorithms to our needs. Specifically, we measure clustering performance by the accuracy given by the contingency table of the obtained clusters and the real document categories. 6. 2 The datasets We constructed 10 different document subsets of the 20Newsgroups corpus collected by Lang [12]. This corpus contains about 20; 000 articles evenly distributed among 20 UseNet discussion groups, and is usually employed for evaluating supervised text classification techniques (e.g. 2] 21] Many of these groups have similar topics (e.g. five groups discuss different issues concerning ....
K. Lang. Learning to filter netnews. In Proc. of the 12th Int. Conf. on Machine Learning, pages 331--339, 1995.
....l A ; t r A ) p( t A ) JS (p(T B j t l A ) p(TB j t r A ) T 1)H( 6) while for merging in TB we will get an analogous expression. 6 Applications We examine a few applications of the examples presented above. As one data set we used a subset of the 20 newsgroups corpus [7] where we randomly choose 2000 documents evenly distributed among the 4 science discussion groups (sci.crypt, sci.electronics, sci.med and sci.space) 1 Our pre processing included ignoring file headers, lowering upper case and ignoring words that contained non a. z characters. Given this ....
K. Lang. Learning to filter netnews. In ICML'95.
....TB STOCK NEWS 0.1 0.2 0.3 0.4 0.5 0 Figure 6: Log loss performance on test data of the FindHidden algorithm with and without agglomeration on synthetic and real life data. Base line is the performance of the Original network given as an input to FindHidden messages from 20 newsgroups [13]. We represent each message as a vector containing one attribute for the newsgroup and attributes for each word in the vocabulary. We removed common stop words, and then sorted words based on their frequency in the whole data set. The data set used here included the group designator and the 99 ....
K. Lang. Learning to filter netnews. In ML '95, pp. 331-- 339, 1995.
....are practically overfitting effects and accordingly there is no real information gain due to these splits. As a more realistic example we used the standard 20 newsgroups corpus. This natural language corpus contains about 20; 000 articles evenly distributed among 20 USENET discussion groups [5] and has been employed for evaluating text classification techniques (e.g. 12] Many of these groups have similar topics. Five groups discuss different issues concerning computers, three groups discuss religion issues, etc. Thus, there is an inherent hierarchy among these groups. To model this ....
K. Lang. Learning to filter netnews. In 12th Int. Conf. on Machine Learning, pages 331--339. 1995.
....information, as well as interleaving information retrieval and problem solving has become a very critical task, due to the increasing amount of distributed, dynamically changing information. Most work in intelligent software agents that gather information from Internet based sources, e.g. [7, 8, 1] focussed on a single agent with simple 1 For hundreds of agents, coalition formation methods are usually too complex. However, several cooperation methods were developed for such cases (e.g. market oriented solutions [16] knowledge and problem solving capabilities whose main task is ....
K. Lang. Learning to filter netnews. In Proceedings of the Machine Learning Conference 1995, 1995.
....performance results were obtained and reported. Introduction We describe a reusable Learning Personal Agent (LPA) that learns a user s research interests and filters Webbased conference announcements and RFPs. Learning Personal Agents have been used for information filtering from the WWW (Lang 1995), Armstrong et al. 1995) Pazzani, Nguyen, Mantik 1995) In contrast to the environments of these systems where the probability of finding relevant links or documents is relatively high, so that the agent s task is to avoid user information overload, in our environment the probability of ....
Lang, K. 1995. Learning to filter netnews. In Proceedings of the Machine Learning Conference 1995.
....our agent is learning profiles for notifying users about conference announcements and requests for proposals that match their research interests. In contrast to environments where the goal is to avoid information overload (e.g. the Learning Interface Agent [Maes and Kozierok1993] and NewsWeeder [Lang1995] the filtering task for our agent involves judging whether an article is relevant or irrelevant to the user based on the user profile, in an environment where the prior probability of encountering a relevant document is very low compared to the probability of encountering an irrelevant document. ....
....can be used to model a priori preferences. This ensures that when the LPA starts interacting with the user it correctly classifies a large proportion of the conference announcements and requests for proposals. Learning Personal Agents have been used for the information filtering from the WWW [Lang1995] Armstrong et al..1995] Pazzani et al..1995] In case of the WebWatcher [Armstrong et al..1995] and the agent described in [Pazzani et al..1995] the agent tries to find an interesting link in a Web Page that has already been preselected by a user, which means that the probability of finding ....
[Article contains additional citation context not shown here]
K Lang. Learning to Filter Netnews. In Proceedings of the Machine Learning Conference 1995, 1995.
....can then be used either to suggest which links of the current Web page a user would be interested in exploring or to autonomously browse the Web in order to find pages that would interest a user. Several systems that learn Internet user interests have been presented in the literature. NewsWeeder [5] is a netnews filtering system that addresses this problem by letting the user rate her his interest level for each article being read, and then by learning a user profile based on these ratings. WebWatcher [2] interactively helps users locating desired information while browsing on the World Wide ....
Lang, K., Learning to Filter Netnews. Proc. of the 12th International Conference on Machine Learning, pp. 331-339, 1995
....by updating the weights according to the user s browsing behavior. In machine learning research, the main focus in text learning has been on filtering relevant documents, in particular Web pages (Pazzani, Muramatsu, Billsus 1996; Craven et al. 1997) email messages and usenet news messages (Lang 1995). These approaches do not consider any domain specific knowledge or semantic structure, and rather represent documents as highly dimensional vectors over the words. Statistical, probabilistic and information theoretical methods which use word frequency counts have been employed for deriving the ....
Lang, K. 1995. Learning to Filter Netnews. In Proceedings of the Twelfth International Conference on Machine Learning.
....in theoretical investigations for learning problems where the input consists of a large number of tokens, many of which are not relevant for the classification task. The algorithms are Rocchio using IR relevance feedback methods (Lewis et al. 1996; Frankes Baeza Yates 1992) TF IDF Prototype (Lang 1995; Joachims 1996) Winnow (Blum 1995; Golding Roth 1996) Widrow Hoff and Exponentiated Gardient (Papka, Callan, Barto 1996; Lewis et al. 1996) Since results from IR (Turtle 1991) and ML (Craven et al. 1997; Joachims 1996; Lang 1995) suggest that probabilistic methods are superior to ....
.... et al. 1996; Frankes Baeza Yates 1992) TF IDF Prototype (Lang 1995; Joachims 1996) Winnow (Blum 1995; Golding Roth 1996) Widrow Hoff and Exponentiated Gardient (Papka, Callan, Barto 1996; Lewis et al. 1996) Since results from IR (Turtle 1991) and ML (Craven et al. 1997; Joachims 1996; Lang 1995) suggest that probabilistic methods are superior to conventional approaches in document retrieval and classification, and we are currently including a probabilistic classifier. The opinions were converted into feature value vectors which could be used as an input to the algorithms, as described in ....
[Article contains additional citation context not shown here]
Lang, K. 1995. Learning to Filter Netnews. In Proceedings of the Twelfth International Conference on Machine Learning.
No context found.
Lang, K. 1995. NewsWeeder, Learning to Filter Netnews. In Proceedings of the 12th International Machine Learning Conference (ML95), pp. 331-339, San Francisco, CA:Morgan Kaufman.
No context found.
K. Lang. Learning to filter netnews. In Proc. of 12th Int Conf. on machine Learning, 1995.
No context found.
K. Lang. Learning to filter netnews. In Proc. of the 12th Int. Conf. on Machine Learning, 1995.
No context found.
K. Lang. Learning to filter netnews. In Proc. of the 12th Int. Conf. on Machine Learning, pages 331--339, 1995.
No context found.
Kenneth Lang. Learning to filter netnews. In Proc. 12th Intl. Conf. on Machine Learning (ICML), pages 331--339, July 1995.
No context found.
K. Lang. Learning to filter netnews. In Proc. of the 12th Int. Conf. on Machine Learning, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC