MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Using navigation data to improve IR functions in the context of Web search (2001) [5 citations — 1 self]

Download:
Download as a PDF | Download as a PS
by Mark H. Hansen, Elizabeth Shriver
Proceedings for the Tenth International Conference on Information and Knowledge Management (ACM CIKM 2001) (Atlanta, GA
http://www.bell-labs.com/project/websearch/cikm01.ps
Add To MetaCart

Abstract:

As part of the process of delivering content, devices like proxies and gateways log valuable information about the activities and navigation patterns of users on the Web. In this study, we consider how this navigation data can be used to improve Web search. A query posted to a search engine together with the set of pages accessed during a search task is known as a search session. We develop a mixture model for the observed set of search sessions, and propose variants of the classical EM algorithm for training. The model itself yields a type of navigation-based query clustering. By implicitly borrowing strength between related queries, the mixture formulation allows us to identify the "highly relevant " URLs for each query cluster. Next, we explore methods for incorporating existing labeled data (the Yahoo! directory, for example) to speed convergence and help resolve low-traffic clusters. Finally, the mixture formulation also provides for a simple, hierarchical display of search results based on the query clusters. The effectiveness of our approach is evaluated using proxy access logs for the outgoing Lucent proxy.

Citations

1632 The anatomy of a large-scale hypertextual web search engine – Brin, Page - 1998
416 A re-examination of text categorization methods – Yang, Liu - 1999
271 Fab: content-based, collaborative recommendation – Balabanović, Shoham - 1997
245 Birch: an efficient data clustering method for very large databases – Zhang, Ramakrishnan, et al. - 1996
180 Scaling Clustering Algorithms to Large Databases – Bradley, Fayyad, et al. - 1998
135 The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect – Lempel, Moran - 2000
103 Agglomerative clustering of a search engine query log – Beeferman, Berger - 2000
99 Learning collection fusion strategies – Voorhees, Gupta, et al. - 1995
56 Maximum likelihood for incomplete data via the EM algorithm – Dempster, Laird, et al. - 1977
35 Clustering hypertext with applications to Web searching – Modha, Spangler
32 On-line em algorithm for the normalized gaussian network – Sato, Ishii - 2000
19 Theseus: Categorization by context – Attardi, Gull, et al. - 1999
18 Capturing Human Intelligence in the Net – Kantor, Boros, et al. - 2000
15 Finding related web pages in the world wide web – Dean, Henzinger
15 Multiple search engines in database merging – Voorhees, Tong - 1997
4 Hitwise search engine ratings – Sullivan - 2001
3 User popularity ranked search engines – Culliss - 1999
2 Mining Web proxy logs: a user model of searching – Shriver, Hansen - 2001