MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Predicting Content from Hyperlinks (1999) [1 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Marko Grobelnik
Proceedings of the ICML-99 Workshop on Machine Learning in Text Data Analysis, J. Stephan Institute
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-4/text-learning/www/pww/papers/PWW/ICMLWshFinal.ps
Add To MetaCart

Abstract:

This paper describes an approach to prediction of a document content based on the hyperlink that points to the document. The k-Nearest Neighbor algorithm is used to predict a set of words that appear in the document. Experiments are performed on realworld data obtained from the Web. The proposed approach gives promising results. On the tested data in average about 33 % of document words are correctly predicted while among all the predicted words about 15% appeared in the document. The predicted words are chosen from about 4,000 to 8,000 different words and word pairs that appear in the training examples. 1

Citations

792 Instance-Based Learning Algorithms – Kibler - 1991
256 A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for text categorization – Joachims - 1997
193 Learning and revising user profiles: the identification of interesting web sites – Pazzani, Billsus
35 Turning Yahoo into an Automatic Web-Page Classifier – Mladenic - 1998
32 Word sequences as features in text-learning – Mladenic, Grobelnik - 1998
4 Evaluating and optimizating autonomous text classification systems – Lewis - 1995
3 Learning Machine: design and implementation – Grobelnik, Mladeni'c - 1998
1 BrowsingAssistenten, Tour Guides und Adaptive WWWServer – Joachims, Mladenic - 1998