| K. Lang. Learning to lter netnews. In Proc. of ICML, 1995. |
....SVM classi er trained with a small training set. 6. THE EXPERIMENTAL DESIGN 6.1 The datasets Following [15, 5] we used several standard labeled data sets to evaluate the di erent clustering methods described above. As our rst data set we used the 20Newsgroup corpus collected by Lang [7]. This corpus contains about 20; 000 arti cles evenly distributed among 20 UseNet discussion groups, some of which are of very similar topics. After removing all le headers our pre processing included lowering upper case characters, uniting all digits into one symbol and ignoring non ....
K. Lang. Learning to lter netnews. In Proc. of the 12th Int. Conf. on Machine Learning, pages 331-339, 1995.
....our method performance, we choose this low level representation. We leave for future work to repeat the same tests, where we take to be the set of words occurred in the texts. 4 4.1.1 Experimental design We used several datasets based on the 20Newsgroup corpus. This corpus collected by Lang [11] contains about 20; 000 documents evenly distributed among 20 UseNet discussion groups, and is usually employed for evaluating text classi cation techniques (e.g. 15] We used 3 di erent subsets chosen from this corpus: the two newsgroups dealing with sport (rec.sport.baseball, ....
K. Lang. Learning to lter netnews. In Proc. of ICML, 1995.
No context found.
K. Lang. Learning to lter netnews. In Proc. of ICML, 1995.
No context found.
K. Lang. Learning to lter netnews. In Proc. of ICML, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC