Results 1  10
of
75
Probabilistic skylines on uncertain data
 In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract

Cited by 103 (19 self)
 Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a pskyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottomup algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The topdown algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Efficient Computation of Reverse Skyline Queries
, 2007
"... In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space whe ..."
Abstract

Cited by 63 (0 self)
 Add to MetaCart
In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space where point q becomes the origin and all points of P are represented by their distance vector to q. The reverse skyline query returns the objects whose dynamic skyline contains the query object q. In order to compute the reverse skyline of an arbitrary query point, we first propose a Branch and Bound algorithm (called BBRS), which is an improved customization of the original BBS algorithm. Furthermore, we identify a super set of the reverse skyline that is used to bound the search space while computing the reverse skyline. To further reduce the computational cost of determining if a point belongs to the reverse skyline, we propose an enhanced algorithm (called RSSA) that is based on accurate precomputed approximations of the skylines. These approximations are used to identify whether a point belongs to the reverse skyline or not. Through extensive experiments with both realworld and synthetic datasets, we show that our algorithms can efficiently support reverse skyline queries. Our enhanced approach improves reversed skyline processing by up to an order of magnitude compared to the algorithm without the usage of precomputed approximations.
Distancebased Representative Skyline
"... Abstract — Given an integer k, arepresentative skyline contains the k skyline points that best describe the tradeoffs among different dimensions offered by the full skyline. Although this topic has been previously studied, the existing solution may sometimes produce k points that appear in an arbitr ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
(Show Context)
Abstract — Given an integer k, arepresentative skyline contains the k skyline points that best describe the tradeoffs among different dimensions offered by the full skyline. Although this topic has been previously studied, the existing solution may sometimes produce k points that appear in an arbitrarily tiny cluster, and therefore, fail to be representative. Motivated by this, we propose a new definition of representative skyline that minimizes the distance between a nonrepresentative skyline point and its nearest representative. We also study algorithms for computing distancebased representative skylines. In 2D space, there is a dynamic programming algorithm that guarantees the optimal solution. For dimensionality at least 3, we prove that the problem is NPhard, and give a 2approximate polynomial time algorithm. Using a multidimensional access method, our algorithm can directly report the representative skyline, without retrieving the full skyline. We show that our representative skyline not only better captures the contour of the entire skyline than the previous method, but also can be computed much faster. I.
VoRTree: Rtrees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries ∗
"... A very important class of spatial queries consists of nearestneighbor (NN) query and its variations. Many studies in the past decade utilize Rtrees as their underlying index structures to address NN queries efficiently. The general approach is to use Rtree in two phases. First, Rtree’s hierarchic ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
A very important class of spatial queries consists of nearestneighbor (NN) query and its variations. Many studies in the past decade utilize Rtrees as their underlying index structures to address NN queries efficiently. The general approach is to use Rtree in two phases. First, Rtree’s hierarchical structure is used to quickly arrive to the neighborhood of the result set. Second, the Rtree nodes intersecting with the local neighborhood (Search Region) of an initial answer are investigated to find all the members of the result set. While Rtrees are very efficient for the first phase, they usually result in the unnecessary investigation of many nodes that none or only a small subset of their including points belongs to the actual result set. On the other hand, several recent studies showed that the
Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data
"... Abstract—We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probab ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Abstract—We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions. Index Terms—H.2.4.h Query processing, H.2.4.k Spatial databases 1
Distributed Skyline Retrieval with Low Bandwidth Consumption
"... We consider skyline computation when the underlying dataset is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (i) applicable only to d ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We consider skyline computation when the underlying dataset is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (i) applicable only to distributed systems adopting vertical partitioning or restricted horizontal partitioning, (ii) effective only when each server has limited computing and communication abilities, and (iii) optimized only for skyline search in subspaces but inefficient in the full space. This paper proposes an algorithm, called feedbackbased distributed skyline (FDS), to support arbitrary horizontal partitioning. FDS aims at minimizing the network bandwidth, measured in the number of tuples transmitted over the network. The core of FDS is a novel feedbackdriven mechanism, where the coordinator iteratively transmits certain feedback to each participant. Participants can leverage such information to prune a large amount of local data, which otherwise would need to be sent to the coordinator. Extensive experimentation confirms that FDS significantly outperforms alternative approaches in both effectiveness and progressiveness.
KernelBased Skyline Cardinality Estimation
"... The skyline of a ddimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
The skyline of a ddimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to nonindependent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel KernelBased (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the kdominant skyline, which is commonly used instead of the conventional skyline for highdimensional data.
Skyline query processing for incomplete data
 In Proc. 24th Int. Conf. on Data Engineering
, 2008
"... Abstract — Recently, there has been much interest in processing skyline queries for various applications that include decision making, personalized services, and search pruning. Skyline queries aim to prune a search space of large numbers of multidimensional data items to a small set of interesting ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Recently, there has been much interest in processing skyline queries for various applications that include decision making, personalized services, and search pruning. Skyline queries aim to prune a search space of large numbers of multidimensional data items to a small set of interesting items by eliminating items that are dominated by others. Existing skyline algorithms assume that all dimensions are available for all data items. This paper goes beyond this restrictive assumption as we address the more practical case of involving incomplete data items (i.e., data items missing values in some of their dimensions). In contrast to the case of complete data where the dominance relation is transitive, incomplete data suffer from nontransitive dominance relation which may lead to a cyclic dominance behavior. We first propose two algorithms, namely, “Replacement ” and “Bucket ” that use traditional skyline algorithms for incomplete data. Then, we propose the “ISkyline” algorithm that is designed specifically for the case of incomplete data. The “ISkyline ” algorithm employs two optimization techniques, namely, virtual points and shadow skylines to tolerate cyclic dominance relations. Experimental evidence shows that the “ISkyline ” algorithm significantly outperforms variations of traditional skyline algorithms. I.
Dynamic Skyline Queries in Metric Spaces
"... Skyline query is of great importance in many applications, such as multicriteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points u ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Skyline query is of great importance in many applications, such as multicriteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points usually assume static data objects in the database (i.e. their attribute vectors are fixed), whereas several recent work focus on skyline queries with dynamic attributes. In this paper, we propose a novel variant of skyline queries, namely metric skyline, whose dynamic attributes are defined in the metric space (i.e. not limited to the Euclidean space). We illustrate an efficient and effective pruning mechanism to answer metric skyline queries through a metric index. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques over the metric index in answering metric skyline queries. 1.
Processing spatial skyline queries in both vector spaces and spatial network databases
 TODS
"... In this article, we first introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of P which are ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
In this article, we first introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of P which are not dominated by any other point in P considering their derived spatial attributes. The main difference with the regular skyline query is that this spatial domination depends on the location of the query points Q. SSQ has application in several domains such as emergency response and online maps. The main intuition and novelty behind our approaches is that we exploit the geometric properties of the SSQ problem space to avoid the exhaustive examination of all the point pairs in P and Q. Consequently, we reduce the complexity of SSQ search from O(P  2 Q)toO(S  2 C+ √ P), where S  and C  are the solution size and the number of vertices of the convex hull of Q, respectively. Considering Euclidean distance, we propose two algorithms, B2S2 and VS2, for static query points and one algorithm, VCS2, for streaming Q whose points change location over time (e.g., are mobile). VCS2 exploits the pattern of change in Q to avoid unnecessary recomputation of the skyline and hence efficiently perform updates. We also propose two algorithms, SNS2 and VSNS2, that compute the spatial skyline with respect to the network distance in a spatial network database. Our