• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Top-k selection queries over relational databases: Mapping strategies and performance evaluation (0)

by S Chaudhuri N Bruno, Luis Gravano
Venue:ACM TODS
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 54
Next 10 →

Evaluating Top-k Queries over Web-Accessible Databases

by Amélie Marian, Nicolas Bruno, Luis Gravano - ACM TRANS. ON DATABASE SYSTEMS , 2004
"... ... In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but ..."
Abstract - Cited by 172 (11 self) - Add to MetaCart
... In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes sourceaccess parallelism to minimize query response time, while satisfying source-access constraints.

Supporting top-k join queries in relational databases

by Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid - In VLDB , 2003
"... Abstract. Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retri ..."
Abstract - Cited by 94 (13 self) - Add to MetaCart
Abstract. Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance. Keywords: Ranking – Top-k queries – Rank aggregarion – Query operators

Top-k Query Evaluation with Probabilistic Guarantees

by Martin Theobald, Gerhard Weikum, Ralf Schenkel - In VLDB , 2004
"... Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind top-k queries is to i ..."
Abstract - Cited by 73 (15 self) - Add to MetaCart
Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce run-time costs. This paper introduces a family of approximate top-k algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.

Automated ranking of database query results

by Surajit Chaudhuri, Gautam Das - In CIDR , 2003
"... We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlatio ..."
Abstract - Cited by 67 (8 self) - Add to MetaCart
We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system. 1.

KLEE: A Framework for Distributed Top-K Query Algorithms

by Sebastian Michel - In VLDB , 2005
"... This paper addresses the efficient processing of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption ..."
Abstract - Cited by 53 (11 self) - Add to MetaCart
This paper addresses the efficient processing of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed top-k queries, designed for high performance and flexibility. KLEE makes a strong case for approximate top-k algorithms over widely distributed data sources. It shows how great gains in efficiency can be enjoyed at low result-quality penalties. Further, KLEE affords the query-initiating peer the flexibility to trade-off result quality and expected performance and to trade-off the number of communication phases engaged during query execution versus network bandwidth performance. We have implemented KLEE and related algorithms and conducted a comprehensive performance evaluation. Our evaluation employed real-world and synthetic large, web-data collections, and query benchmarks. Our experimental results show that KLEE can achieve major performance gains in terms of network bandwidth, query response times, and much lighter peer loads, all with small errors in result precision and other result-quality measures.

A Survey of Top-k Query Processing Techniques in Relational Database Systems

by Ihab F. Ilyas, George Beskales, Mohamed A. Soliman
"... Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search and distributed systems has shown a great impact on performance. In this surve ..."
Abstract - Cited by 49 (5 self) - Add to MetaCart
Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.

Rank-aware query optimization

by Ihab F. Ilyas, Rahul Shah, Walid G. Aref, Jeffrey Scott, Vitter Ahmed, K. Elmagarmid - In SIGMOD Conference , 2004
"... Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The ..."
Abstract - Cited by 36 (6 self) - Add to MetaCart
Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The new operators have a direct impact on traditional query processing and optimization. We introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this paper is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans, although challenging, is key to the full integration of rank-join operators in real-world query processing engines. We experimentally evaluate our framework by modifying the query optimizer of an open-source database management system. The experiments show the validity of our framework and the accuracy of the proposed estimation model. 1.

Continuous monitoring of top-k queries over sliding windows

by Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias - In SIGMOD , 2006
"... Given a dataset P and a preference function f, atop-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queri ..."
Abstract - Cited by 35 (3 self) - Add to MetaCart
Given a dataset P and a preference function f, atop-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queries. This paper studies continuous monitoring of top-k queries over a fixed-size window W of the most recent data. The window size can be expressed either in terms of the number of active tuples or time units. We propose a general methodology for top-k monitoring that restricts processing to the sub-domains of the workspace that influence the result of some query. To cope with high stream rates and provide fast answers in an on-line fashion, the data in W reside in main memory. The valid records are indexed by a grid structure, which also maintains book-keeping information. We present two processing techniques: the first one computes the new answer of a query whenever some of the current top-k points expire; the second one partially precomputes the future changes in the result, achieving better running time at the expense of slightly higher space requirements. We analyze the performance of both algorithms and evaluate their efficiency through extensive experiments. Finally, we extend the proposed framework to other query types and a different data stream model. 1.

Integrating DB and IR technologies: What is the sound of one hand clapping

by Surajit Chaudhuri, Raghu Ramakrishnan, Gerhard Weikum - In CIDR , 2005
"... Databases (DB) and information retrieval (IR) have evolved as separate fields. However, modern applications such as customer support, health care, and digital libraries require capabilities for both data and text management. In such settings, traditional DB queries, in SQL or XQuery, are not flexibl ..."
Abstract - Cited by 29 (0 self) - Add to MetaCart
Databases (DB) and information retrieval (IR) have evolved as separate fields. However, modern applications such as customer support, health care, and digital libraries require capabilities for both data and text management. In such settings, traditional DB queries, in SQL or XQuery, are not flexible enough to handle applicationspecific scoring and ranking. IR systems, on the other hand, lack efficient support for handling structured parts of the data and metadata, and do not give the application developer adequate control over the ranking function. This paper analyzes the requirements of advanced text- and data-rich applications for an integrated platform. The core functionality must be manageable, and the API should be easy to program against. A particularly important issue that we highlight is how to reconcile flexibility in scoring and ranking models with optimizability, in order to accommodate a wide variety of target applications efficiently. We discuss whether such a system needs to be designed from scratch, or can be incrementally built on top of existing architectures. The results of our analyses are cast into a series of challenges to the DB and IR communities.

Efficient maintenance of materialized top-k views

by Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, Yuguo Chen - In ICDE , 2003
"... We tackle the problem of maintaining materialized top-k views in this paper. Top-k queries, includingMIN andMAX as important special cases, occur frequently in common database workloads. A top-k view can be materialized to improve query performance, but in general it is not self-maintainable unless ..."
Abstract - Cited by 23 (1 self) - Add to MetaCart
We tackle the problem of maintaining materialized top-k views in this paper. Top-k queries, includingMIN andMAX as important special cases, occur frequently in common database workloads. A top-k view can be materialized to improve query performance, but in general it is not self-maintainable unless it contains all tuples in the base table. Deletions and updates on the base table may cause tuples to leave the top-k view, resulting in expensive queries over the base table to “refill ” the view. In this paper, we propose an algorithm that reduces the frequency of refills by maintaining a top-k ′ view instead of a top-k view, where k ′ changes at runtime between k and some kmax ≥ k. We show that in most practical cases, our algorithm can reduce the expected amortized cost of refill queries to O(1) while still keeping the view small. The optimal value of kmax depends on the update pattern and the costs of querying the base table and updating the view. Compared with the simple approach of maintaining either the top-k view itself or a copy of the base table, our algorithm can provide orders-of-magnitude improvements in performance with appropriate kmax values. We show how to choose kmax dynamically to adapt to the actual system workload and performance at runtime, without requiring accurate prior knowledge.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University