Download:
|
by Mark H. Hansen, Elizabeth Shriver
Proceedings for the Tenth International Conference on Information and Knowledge Management (ACM CIKM 2001) (Atlanta, GA
http://www.bell-labs.com/project/websearch/cikm01.ps
Add To MetaCart
Abstract:
As part of the process of delivering content, devices like proxies and gateways log valuable information about the activities and navigation patterns of users on the Web. In this study, we consider how this navigation data can be used to improve Web search. A query posted to a search engine together with the set of pages accessed during a search task is known as a search session. We develop a mixture model for the observed set of search sessions, and propose variants of the classical EM algorithm for training. The model itself yields a type of navigation-based query clustering. By implicitly borrowing strength between related queries, the mixture formulation allows us to identify the "highly relevant " URLs for each query cluster. Next, we explore methods for incorporating existing labeled data (the Yahoo! directory, for example) to speed convergence and help resolve low-traffic clusters. Finally, the mixture formulation also provides for a simple, hierarchical display of search results based on the query clusters. The effectiveness of our approach is evaluated using proxy access logs for the outgoing Lucent proxy.
Citations
|
1632
|
The anatomy of a large-scale hypertextual web search engine
– Brin, Page
- 1998
|
|
416
|
A re-examination of text categorization methods
– Yang, Liu
- 1999
|
|
271
|
Fab: content-based, collaborative recommendation
– Balabanović, Shoham
- 1997
|
|
245
|
Birch: an efficient data clustering method for very large databases
– Zhang, Ramakrishnan, et al.
- 1996
|
|
180
|
Scaling Clustering Algorithms to Large Databases
– Bradley, Fayyad, et al.
- 1998
|
|
135
|
The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect
– Lempel, Moran
- 2000
|
|
103
|
Agglomerative clustering of a search engine query log
– Beeferman, Berger
- 2000
|
|
99
|
Learning collection fusion strategies
– Voorhees, Gupta, et al.
- 1995
|
|
56
|
Maximum likelihood for incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
35
|
Clustering hypertext with applications to Web searching
– Modha, Spangler
|
|
32
|
On-line em algorithm for the normalized gaussian network
– Sato, Ishii
- 2000
|
|
19
|
Theseus: Categorization by context
– Attardi, Gull, et al.
- 1999
|
|
18
|
Capturing Human Intelligence in the Net
– Kantor, Boros, et al.
- 2000
|
|
15
|
Finding related web pages in the world wide web
– Dean, Henzinger
|
|
15
|
Multiple search engines in database merging
– Voorhees, Tong
- 1997
|
|
4
|
Hitwise search engine ratings
– Sullivan
- 2001
|
|
3
|
User popularity ranked search engines
– Culliss
- 1999
|
|
2
|
Mining Web proxy logs: a user model of searching
– Shriver, Hansen
- 2001
|