Results 1  10
of
12
Probabilistic skylines on uncertain data
 In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract

Cited by 103 (19 self)
 Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a pskyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottomup algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The topdown algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Scalable Skyline Computation Using Objectbased Space Partitioning
"... The skyline operator returns from a set of multidimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multiobjective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority o ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
The skyline operator returns from a set of multidimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multiobjective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority of them focuses on minimizing the I/O cost. However, in high dimensional spaces, the problem can easily become CPUbound due to the large number of computations required for comparing objects with current skyline points while scanning the database. Based on this observation, we propose a dynamic indexing technique for skyline points that can be integrated into stateoftheart sortbased skyline algorithms to boost their computational performance. The new indexing and dominance checking approach is supported by a theoretical analysis, while our experiments show that it scales well with the input size and dimensionality not only because unnecessary dominance checks are avoided but also because it allows efficient dominance checking with the help of bitwise operations.
On Nonmetric Similarity Search Problems in Complex Domains
, 2010
"... The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a query. A popular type of such a mechanism is similarity querying. For a long time, the databaseoriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to its topological properties, metric similarity can be effectively used to index a database which can be then queried efficiently by socalled metric access methods. However, together with the increasing complexity of data entities across various domains, in recent years there appeared many similarities that were not metrics – we call them nonmetric similarity functions. In this paper we survey domains employing nonmetric functions for effective similarity search, and methods for efficient nonmetric similarity search. First, we show that the ongoing research in many of these domains requires complex representations of data entities. Simultaneously, such complex representations allow us to model also complex and computationally expensive similarity functions (often represented by various matching algorithms). However, the more complex similarity function one develops, the more likely it will be a nonmetric. Second, we review the stateoftheart techniques for efficient (fast) nonmetric similarity search, concerning both exact and approximate search. Finally, we discuss some open problems and possible future research trends.
Preference Query Evaluation over Expensive Attributes
 In CIKM
, 2010
"... Most database systems allow query processing over attributes that are derived at query runtime (e.g., userdefined functions and remote data calls to web services), making them expensive to compute relative to relational data stored in a heap or index. In addition, core support for efficient prefere ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Most database systems allow query processing over attributes that are derived at query runtime (e.g., userdefined functions and remote data calls to web services), making them expensive to compute relative to relational data stored in a heap or index. In addition, core support for efficient preference query processing has become an important objectiveindatabasesystems. Thispaperaddressesanimportant problemattheintersectionof thesetwoqueryprocessingobjectives: efficient preference query evaluation involving expensive attributes. We explore an efficient framework for processing skyline and multiobjective queries in a database when the data involves a mix of“cheap”and“expensive”attributes. Our solution involves a threephase approach that evaluates a correct final preference answer while aiming to minimizing the number of expensive attributes computations. Unlike previous works for distributed preference algorithms that assume sorted access over each attribute, our framework assumes expensive attribute requests are stateless, i.e., know nothing previous requests. Thus, the proposed approachis more in line with realistic system architectures. Our framework is implemented inside the query processorofPostgreSQL,andevaluatedoverbothsyntheticand real data sets involving computation of expensive attributes over real webservice data (e.g., Microsoft MapPoint).
Efficient Skyline Evaluation over Partially Ordered Domains
, 2010
"... Although there has been a considerable body of work on skyline evaluation in multidimensional data with totally ordered attribute domains, there are only a few methods that consider attributes with partially ordered domains. Existing work maps each partially ordered domain to a total order and then ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Although there has been a considerable body of work on skyline evaluation in multidimensional data with totally ordered attribute domains, there are only a few methods that consider attributes with partially ordered domains. Existing work maps each partially ordered domain to a total order and then adapts algorithms for totallyordered domains to solve the problem. Nevertheless these methods either use stronger notions of dominance, which generate false positives, or require expensive dominance checks. In this paper, we propose two new methods, which do not have these drawbacks. The first method uses an appropriate mapping of a partial order to a total order, inspired by the lattice theorem and an offtheshelf skyline algorithm. The second technique uses an appropriate storage and indexing approach, inspired by column stores, which enables efficient verification of whether a pair of objects are incompatible. We demonstrate that both our methods are up to an order of magnitude more efficient than previous work and scale well with different problem parameters, such as complexity of partial orders.
Dynamic Skyline Queries in Large Graphs
"... Abstract. Given a set of query points, a dynamic skyline query reports all data points that are not dominated by other data points according to the distances between data points and query points. In this paper, we study dynamic skyline queries in a large graph (DSGquery for short). Although dynamic ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Given a set of query points, a dynamic skyline query reports all data points that are not dominated by other data points according to the distances between data points and query points. In this paper, we study dynamic skyline queries in a large graph (DSGquery for short). Although dynamic skylines have been studied in Euclidean space [16], road network [6], and metric space [4, 7], there is no previous work on dynamic skylines over large graphs. We employ a filterandrefine framework to speed up the query processing that can answer DSGquery efficiently. We propose a novel pruning rule based on graph properties to derive the candidates for DSGquery, that are guaranteed not to introduce false negatives. In the refinement step, with a carefullydesigned index structure, we compute short path distances between vertices in O(H), where H is the number of maximal hops between any two vertices. Extensive experiments demonstrate that our methods outperform existing algorithms by orders of magnitude. 1
Route Skyline Queries: A MultiPreference Path Planning Approach
"... Abstract — In recent years, the research community introduced various methods for processing skyline queries in multidimensional databases. The skyline operator retrieves all objects being optimal w.r.t. an arbitrary linear weighting of the underlying criteria. The most prominent example query is t ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract — In recent years, the research community introduced various methods for processing skyline queries in multidimensional databases. The skyline operator retrieves all objects being optimal w.r.t. an arbitrary linear weighting of the underlying criteria. The most prominent example query is to find a reasonable set of hotels which are cheap but close to the beach. In this paper, we propose an new approach for computing skylines on routes (paths) in a road network considering multiple preferences like distance, driving time, the number of traffic lights, gas consumption, etc. Since the consideration of different preferences usually involves different routes, a skylinefashioned answer with relevant route candidates is highly useful. In our work, we employ graph embedding techniques to enable a bestfirst based graph exploration considering route preferences based on arbitrary road attributes. The core of our skyline query processor is a route iterator which iteratively computes the top routes according to (at least one) preference in an efficient way avoiding that route computations need to be issued from scratch in each iteration. Furthermore, we propose pruning techniques in order to reduce the search space. Our pruning strategies aim at pruning as many route candidates as possible during the graph exploration. Therefore, we are able to prune candidates which are only partially explored. Finally, we show that our approach is able to reduce the search space significantly and that the skyline can be computed in efficient time in our experimental evaluation. I.
MetricBased Topk Dominating Queries
"... Topk dominating queries combine the natural idea of selecting the k best items with a comprehensive \goodness" criterion based on dominance. A point p1 dominates p2 if p1 is as good as p2 in all attributes and is strictly better in at least one. Existing works address the problem in settings ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Topk dominating queries combine the natural idea of selecting the k best items with a comprehensive \goodness" criterion based on dominance. A point p1 dominates p2 if p1 is as good as p2 in all attributes and is strictly better in at least one. Existing works address the problem in settings where data objects are multidimensional points. However, there are domains where we only have access to the distance between two objects. In cases like these, attributes re ect distances from a set of input objects and are dynamically generated as the input objects change. Consequently, prior works from the literature can not be applied, despite the fact that the dominance relation is still meaningful and valid. For this reason, in this work, we present the rst study for processing topk dominating queries over distancebased dynamic asttribute vectors, dened over a metric space. We propose four progressive algorithms that utilize the properties of the underlying metric space to eciently solve the problem, and present an extensive, comparative evaluation on both synthetic and real world data sets.
A Framework for Ranking and KNN Queries in a Probabilistic Skyline Model
"... Skyline computation has gained a lot of attention in recent years. According to the definition of skyline, objects that belong to skyline cannot be ranked among themselves because they are incomparable. This constraint limits the application of skyline. Fortunately, due to the recently proposed prob ..."
Abstract
 Add to MetaCart
(Show Context)
Skyline computation has gained a lot of attention in recent years. According to the definition of skyline, objects that belong to skyline cannot be ranked among themselves because they are incomparable. This constraint limits the application of skyline. Fortunately, due to the recently proposed probabilistic skyline model, skyline objects which contain multiple elements, can now be compared with each others. Different from the traditional skyline model where each object can either be a skyline object or not, in the probabilistic skyline model, each object is assigned a skyline probability to denote its likelihood of being a skyline object. Under this model, two simple but important questions will naturally be asked: (1) Given an object, which of the objects are the K nearest neighbors to it based on their skyline probabilities? (2) Given an object, what is the ranking of the objects which have skyline probabilities greater than the given object? To the best of our knowledge, no existing work can effectively answer these two questions. Yet, answering them is not trivial. For a mediumsize dataset (e.g. 10,000 objects), it may take more than an hour to compute the skyline probabilities of all objects. In this paper, we propose a novel framework to answering the above two questions on the fly efficiently. Our proposed work is based on the idea of boundingpruningrefining strategy. We first compute the skyline probabilities of the target object and
Subspace Global Skyline Query Processing
"... Global skyline, as an important variant of skyline, has been widely applied in multiple criteria decision making, business planning and data mining, while there are no previous studies on the global skyline query in the subspace. Hence in this paper we propose subspace global skyline (SGS) query, wh ..."
Abstract
 Add to MetaCart
(Show Context)
Global skyline, as an important variant of skyline, has been widely applied in multiple criteria decision making, business planning and data mining, while there are no previous studies on the global skyline query in the subspace. Hence in this paper we propose subspace global skyline (SGS) query, which is concerned about global skyline in ad hoc subspace. Firstly, we propose an appropriate index structure RBtree to rapidly find the initial scan positions of query. Secondly, by making analysis of basic properties of SGS, we propose a single SGS algorithm based on RBtree (SSRB) to compute SGS points. Then an optimized single SGS algorithm based on RBtree (OSSRB) is proposed, which can reduce the scan space and improve the computation efficiency in contrast to SSRB. Next, by sharing the scan space of different queries, a multiple SGS algorithm based on RBtree (MSRB) is proposed to compute multiple SGS (MSGS). Finally, the performances of our proposed algorithms are verified through a large number of simulation experiments.