Results 1  10
of
52
Probabilistic skylines on uncertain data
 In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract

Cited by 103 (19 self)
 Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a pskyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottomup algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The topdown algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Selecting Stars: The k Most Representative Skyline Operator
 In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE
, 2007
"... Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic progr ..."
Abstract

Cited by 93 (3 self)
 Add to MetaCart
(Show Context)
Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2dspace. Then, we show that the problem is NPhard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1 − 1 e. To speedup the computation, an efficient, scalable, indexbased randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. 1.
Finding kdominant skylines in high dimensional space
 SIGMOD
"... Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
(Show Context)
Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications. Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called kdominant skyline which relaxes the idea of dominance to kdominance. A point p is said to kdominate another point q if there are k ( ≤ d) dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not kdominated by any other points is in the kdominant skyline. We prove various properties of kdominant skyline. In particular, because kdominant skyline points are not transitive, existing skyline algorithms cannot be adapted for kdominant skyline. We then present several new algorithms for finding kdominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.
Efficient Skyline Computation over LowCardinality Domains
, 2007
"... Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution (i.e. whether the dataset ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
(Show Context)
Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution (i.e. whether the dataset attributes are correlated, independent, or anticorrelated). In this paper, we propose the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from lowcardinality domains. LS continues to apply even if one attribute has high cardinality. Many skyline applications naturally have such data characteristics, and previous skyline methods have not exploited this property. We show that for typical dimensionalities, the complexity of LS is linear in the number of input tuples. Furthermore, we show that the performance of LS is independent of the input data distribution. Finally, we demonstrate through extensive experimentation on both real and synthetic datasets that LS can result in a significant performance advantage over existing techniques.
Efficient skyline query processing on peertopeer networks
 In IEEE International Conference on Data Engineering (ICDE) (2007
, 2007
"... Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peertopeer (P2P) network is still an emerging topic. The desiderata of efficien ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
(Show Context)
Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peertopeer (P2P) network is still an emerging topic. The desiderata of efficient skyline querying in P2P environment include: 1) progressive returning of answers, 2) low processing cost in terms of number of peers accessed and search messages, 3) balanced query loads among the peers. In this paper, we propose a solution that satisfies the three desiderata. Our solution is based on a balanced tree structured P2P network. By partitioning the skyline search space adaptively based on query accessing patterns, we are able to alleviate the problem of “hot ” spots present in the skyline query processing. By being able to estimate the peer nodes within the query subspaces, we are able to control the amount of query forwarding, limiting the number of peers involved and the amount of messages transmitted in the network. Load balancing is achieved in query load conscious data space splitting/merging during the joining/departure of nodes and through dynamic load migration. Experiments on real and synthetic datasets confirm the effectiveness and scalability of our algorithm on P2P networks. 1.
Efficient Processing of Topk Dominating Queries on MultiDimensional Data
, 2007
"... The topk dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of topk and sky ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
The topk dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of topk and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, topk dominating queries have not received adequate attention from the research community. In this paper, we design specialized algorithms that apply on indexed multidimensional data and fully exploit the characteristics of the problem. Experiments on synthetic datasets demonstrate that our algorithms significantly outperform a previous skylinebased approach, while our results on real datasets show the meaningfulness of topk dominating queries.
Scalable Skyline Computation Using Objectbased Space Partitioning
"... The skyline operator returns from a set of multidimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multiobjective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority o ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
The skyline operator returns from a set of multidimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multiobjective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority of them focuses on minimizing the I/O cost. However, in high dimensional spaces, the problem can easily become CPUbound due to the large number of computations required for comparing objects with current skyline points while scanning the database. Based on this observation, we propose a dynamic indexing technique for skyline points that can be integrated into stateoftheart sortbased skyline algorithms to boost their computational performance. The new indexing and dominance checking approach is supported by a theoretical analysis, while our experiments show that it scales well with the input size and dimensionality not only because unnecessary dominance checks are avoided but also because it allows efficient dominance checking with the help of bitwise operations.
Probabilistic Skyline Operators over Sliding Windows Windows
 ICDE 2009 2009
"... Abstract — Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of efficient processing of continuous skyline queries over sliding windows on uncertain data elements regarding given probability thresholds. We first characterize what ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
(Show Context)
Abstract — Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of efficient processing of continuous skyline queries over sliding windows on uncertain data elements regarding given probability thresholds. We first characterize what kind of elements we need to keep in our query computation. Then we show the size of dynamically maintained candidate set and the size of skyline. We develop novel, efficient techniques to process a continuous, probabilistic skyline query. Finally, we extend our techniques to the applications where multiple probability thresholds are given or we want to retrieve “topk ” skyline data objects. Our extensive experiments demonstrate that the proposed techniques are very efficient and handle a highspeed data stream in real time. I.
MultiDimensional Topk Dominating Queries
"... The topk dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of topk and sky ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
The topk dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of topk and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, topk dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of topk dominating queries. First, we propose a set of algorithms that apply on indexed multidimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skylinebased approach. We also illustrate the applicability of this multidimensional analysis query by studying the meaningfulness of its results on real data.
Deltasky: Optimal maintenance of skyline deletions without exclusive dominance region generation
 In UCSB Tech Report, 2006. http://www.cs.ucsb.edu/ ∼ pingwu/ deltasky.pdf
, 2007
"... This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. Previous work suggested the use of the so called exclusive dominance region (EDR) to achieve optimal I/O performance for deletion maintenance. However, the shape of an EDR becomes extremely complex in higher dimensions, and algorithms for its computation have not been developed. We derive a systematic way to decompose a ddimensional EDR into a collection of hyperrectangles. We show that the number of such hyperrectangles is O(m d), where m is the current skyline result size. We then propose a novel algorithm DeltaSky which determines whether an intermediate Rtree MBR intersects with the EDR without explicitly calculating the EDR itself. This reduces the worse case complexity of the EDR intersection check from O(m d) to O(md). Thus DeltaSky helps the branch and bound skyline algorithm achieve I/O optimality for deletion maintenance by finding only the newly appeared skyline points after the deletion. We discuss implementation issues and show that DeltaSky can be efficiently implemented using one extra BTree. Moreover, we propose two optimization techniques which further reduce the average cost in practice. Extensive experiments demonstrate that DeltaSky achieves orders of magnitude performance gain over alternative solutions. 1