Predicting Content from Hyperlinks (1999) [1 citations — 0 self]
Abstract:
This paper describes an approach to prediction of a document content based on the hyperlink that points to the document. The k-Nearest Neighbor algorithm is used to predict a set of words that appear in the document. Experiments are performed on realworld data obtained from the Web. The proposed approach gives promising results. On the tested data in average about 33 % of document words are correctly predicted while among all the predicted words about 15% appeared in the document. The predicted words are chosen from about 4,000 to 8,000 different words and word pairs that appear in the training examples. 1
Citations
| 792 | Instance-Based Learning Algorithms – Kibler - 1991 |
| 256 | A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for text categorization – Joachims - 1997 |
| 193 | Learning and revising user profiles: the identification of interesting web sites – Pazzani, Billsus |
| 35 | Turning Yahoo into an Automatic Web-Page Classifier – Mladenic - 1998 |
| 32 | Word sequences as features in text-learning – Mladenic, Grobelnik - 1998 |
| 4 | Evaluating and optimizating autonomous text classification systems – Lewis - 1995 |
| 3 | Learning Machine: design and implementation – Grobelnik, Mladeni'c - 1998 |
| 1 | BrowsingAssistenten, Tour Guides und Adaptive WWWServer – Joachims, Mladenic - 1998 |

