• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Optimizing Search Engines using Clickthrough Data (2002)

Cached

  • Download as a PDF

Download Links

  • [www.cse.unsw.edu.au]
  • [www.joachims.org]
  • [www-poleia.lip6.fr]
  • [www.cs.cornell.edu]
  • [www.cs.cornell.edu]
  • [www-connex.lip6.fr]
  • [www.cs.cornell.edu]
  • [www.cs.cornell.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Thorsten Joachims
Citations:568 - 20 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Joachims02optimizingsearch,
    author = {Thorsten Joachims},
    title = {Optimizing Search Engines using Clickthrough Data},
    year = {2002}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

Citations

6694 Statistical Learning Theory - Vapnik - 1998
1926 Modern Information Retrieval - Baeza-Yates, Ribeiro-Neto - 1999
1216 Term-weighting approaches in automatic text retrieval - Salton, Buckley - 1988
1086 Making large-scale SVM learning practical - Joachims - 1999
516 Letizia: An Agent that Assists Web Browsing - Lieberman - 1995
299 Learning to Classify Text Using Support Vector Machines. Methods, Theory and Algorithms - Joachims - 2002
290 Webwatcher: A tour guide for the world wide web - Joachims, Freitag, et al. - 1997
265 Learning to order things - Cohen, Schapire, et al. - 1998
189 Rank Correlation Methods - Kendall, Gibbons - 1990
173 Agglomerative clustering of a search engine query log - Beeferman, Berger - 2000
164 Introduction to the Theory of Statistics - Mood, Graybill, et al. - 1974
148 Analysis of a very large AltaVista query log. SRC Technical note #1998-14. On-line at http://gatekeeper. dec.com/pub/DEC/SRC/technicalnotes/abstracts/src-tn-1998014.html - Silverstein, Henzinger, et al. - 1998
130 Automatic combination of multiple ranked retrieval systems - Bartell, Cottrell, et al. - 1994
127 Pranking with ranking - Crammer, Singer - 2002
113 Overview of the eighth Text REtrieval Conference - Voorhees, Harman - 2000
80 Finding information on the World Wide Web: The retrieval effectiveness of search engines - Gordon, Pathak - 1999
75 Robust trainability of single neurons - Höffgen, Simon - 1992
63 A machine learning architecture for optimizing web search engines - Boyan, Freitag, et al. - 1996
57 Cahan P: Ranking retrieval systems without relevance judgments - Soboroff, Nicholas
46 Air/x - a rule-based multistage indexing system for large subject fields - Fuhr, Hartmann, et al. - 1991
32 Optimum polynomial retrieval functions based on the probability ranking principle - Fuhr - 1989
27 First 20precision among World Wide Web search services �search engines - Leighton, Srivastava - 1999
20 A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching, Final Report to the National Science Foundation - Rees, Schultz - 1967
14 Report on the need for and provision of an "ideal" information retrieval test collection - Jones, Rijsbergen - 1975
12 Meta-scoring: Automatically evaluating term weighting schemes in ir without precision-recall - Jin, Falusos, et al. - 2001
9 Relevance assessments and retrieval system evaluation. Information Storage and Retrieval - Lesk, Salton - 1969
8 A new method for automatic performance comparison of search engines - LI, SHANG
8 Unbiased evaluation of retrieval quality using clickthrough data - Joachims - 2002
6 Probabilistic learning approaches for indexing and retrieval with the TREC-2 collection. The Second Text REtrieval Conference (TREC-d - Fuhr, Pfeifer, et al. - 1994
4 A traininig algorithm for optimal margin classifiers - Boser, Guyon, et al. - 1992
2 Determining the eectiveness of retrieval algorithms - Frei, Schauble - 1991
2 an eigenvector based ranking approach for hypertext - Pagerank - 1998
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University