• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Io-top-k: Index-access optimized top-k query processing (2006)

Cached

  • Download as a PDF

Download Links

  • [www.mpi-inf.mpg.de]
  • [www.mpi-inf.mpg.de]
  • [infolab.stanford.edu]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [www.vldb.org]
  • [domino.mpi-inf.mpg.de]
  • [www.searchforum.org.cn]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]
  • [domino.mpi-inf.mpg.de]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Holger Bast , Debapriyo Majumdar , Ralf Schenkel , Martin Theobald , Gerhard Weikum
Venue:In VLDB
Citations:43 - 3 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Bast06io-top-k:index-access,
    author = {Holger Bast and Debapriyo Majumdar and Ralf Schenkel and Martin Theobald and Gerhard Weikum},
    title = {Io-top-k: Index-access optimized top-k query processing},
    booktitle = {In VLDB},
    year = {2006},
    pages = {475--486}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries operate on index lists for a query’s elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates. The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation. The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsackrelated optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods.

Keyphrases

index-access optimized top-k query processing    random access    result candidate    sequential access    aggregate score    top-k query    score uncertainty    performance experiment    sensor data    prior proposal    access type    novel strategy    knapsackrelated optimization    trec terabyte    index list correlation    implementation method    ranked retrieval    sequential disk access    threshold algorithm    scheduling issue    main contribution    top-k query processing    important building block    cost model    integrated view    network log    different index list    different datasets    index list    query elementary condition    current paper    large margin    probabilistic estimator    prior literature    upper bound    significant performance gain    sorted index scan    data integration    final score   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University