Results 11  20
of
76
Toward Context and PreferenceAware Locationbased Services
, 2009
"... The explosive growth of locationdetection devices, wireless communications, and mobile databases has resulted in the realization of locationbased services as commercial products and research prototypes. Unfortunately, current locationbased applications (e.g., store finders) are rigid as they are c ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
The explosive growth of locationdetection devices, wireless communications, and mobile databases has resulted in the realization of locationbased services as commercial products and research prototypes. Unfortunately, current locationbased applications (e.g., store finders) are rigid as they are completely isolated from various concepts of user “preferences” and/or “context”. Such rigidness results in nonsuitable services (e.g., a vegetarian user may get a restaurant with nonvegetarian menu). In this paper, we introduce the system architecture of a Context and PreferenceAware Locationbased Database Server (CareDB, for short), currently under development at University of Minnesota, that delivers personalized services to its customers based on the surrounding context. CareDB goes beyond the traditional scheme of “one size fits all ” of existing locationaware database systems. Instead, CareDB tailors its functionalities and services based on the preference and context of each customer. Examples of services provided by CareDB include a restaurant finder application in which CareDB does not base its choice of restaurants solely on the user location. Instead, CareDB will base its choice on both the user location and surrounding context (e.g., user dietary restriction, user preferences, and road traffic conditions). Within the framework of CareDB, we discuss research challenges and directions towards an efficient and practical realization of contextaware locationbased query processing. Namely, we discuss the challenges for designing user profiles, multiobjective query processing, contextaware query optimizers, contextaware query operators, and continuous queries.
Distributed Skyline Retrieval with Low Bandwidth Consumption
"... We consider skyline computation when the underlying dataset is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (i) applicable only to d ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
We consider skyline computation when the underlying dataset is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (i) applicable only to distributed systems adopting vertical partitioning or restricted horizontal partitioning, (ii) effective only when each server has limited computing and communication abilities, and (iii) optimized only for skyline search in subspaces but inefficient in the full space. This paper proposes an algorithm, called feedbackbased distributed skyline (FDS), to support arbitrary horizontal partitioning. FDS aims at minimizing the network bandwidth, measured in the number of tuples transmitted over the network. The core of FDS is a novel feedbackdriven mechanism, where the coordinator iteratively transmits certain feedback to each participant. Participants can leverage such information to prune a large amount of local data, which otherwise would need to be sent to the coordinator. Extensive experimentation confirms that FDS significantly outperforms alternative approaches in both effectiveness and progressiveness.
On domination game analysis for microeconomic data mining
 TKDD
"... Game theory is a powerful tool for the analysis of the competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension re ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Game theory is a powerful tool for the analysis of the competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension represents one attribute of a commodity. Every product or customer is represented by a point in the multidimensional space, and a product is said to “dominate ” a customer if all of its attributes can satisfy the requirements of the customer. The expected market share of a product is measured by the expected number of the buyers in the customers, all of which are equally likely to buy any product dominating him. A Nash Equilibrium is a configuration of the products achieving stable expected market shares for all products. We prove that Nash Equilibrium in such a model can be computed in polynomial time if every manufacturer tries to modify its product in a round robin manner. To further improve the efficiency of the computation, we also design two algorithms for the manufacturers to efficiently find their best response to other products in the market.
Minimizing the communication cost for continuous skyline maintenance
 In SIGMOD Conference
, 2009
"... Existing work in the skyline literature focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in clientserver architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
Existing work in the skyline literature focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in clientserver architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids transmission of updates from objects that cannot influence the skyline. Specifically, each object is assigned a filter so that it needs to issue an update only if it violates its filter. Filter achieves significant savings over the naive approach of transmitting all updates. Going one step further, we introduce the concept of frequent skyline query over a sliding window (FSQW). The motivation is that snapshot skylines are not very useful in streaming environments because they keep changing over time. Instead, FSQW reports the objects that appear in the skylines of at least θ·s of the s most recent timestamps (0 < θ ≤ 1). Filter can be easily adapted to FSQW processing, however, with potentially high overhead for large and frequently updated datasets. To further reduce the communication cost, we propose a Sampling method, which returns approximate FSQW results without computing each snapshot skyline. Finally, we integrate Filter and Sampling in a Hybrid approach that combines their individual advantages.
Online interval skyline queries on time series
 In Proceedings of the 25th international conference on data engineering (ICDE’09
, 2009
"... Abstract — In many applications, we need to analyze a large number of time series. Segments of time series demonstrating dominating advantages over others are often of particular interest. In this paper, we advocate interval skyline queries, a novel type of time series analysis queries. For a set of ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Abstract — In many applications, we need to analyze a large number of time series. Segments of time series demonstrating dominating advantages over others are often of particular interest. In this paper, we advocate interval skyline queries, a novel type of time series analysis queries. For a set of time series and a given time interval [i: j], an interval skyline query returns the time series which are not dominated by any other time series in the interval. We illustrate the usefulness of interval skyline queries in applications. Moreover, we develop an onthefly method and a viewmaterialization method to online answer interval skyline queries on time series. The onthefly method keeps the minimum and the maximum values of the time series using radix priority search trees and sketches, and computes the skyline at the query time. The viewmaterialization method maintains the skylines over all intervals in a compact data structure. Through theoretical analysis and extensive experiments, we show that both methods only require linear space and are efficient in query answering as well as incremental maintenance. I.
A Unified Approach for Computing Topk Pairs in Multidimensional Space
"... Abstract—Topk pairs queries have many real applications. k closest pairs queries, k furthest pairs queries and their bichromatic variants are some of the examples of the topk pairs queries that rank the pairs on distance functions. While these queries have received significant research attention, ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Topk pairs queries have many real applications. k closest pairs queries, k furthest pairs queries and their bichromatic variants are some of the examples of the topk pairs queries that rank the pairs on distance functions. While these queries have received significant research attention, there does not exist a unified approach that can efficiently answer all these queries. Moreover, there is no existing work that supports topk pairs queries based on generic scoring functions. In this paper, we present a unified approach that supports a broad class of topk pairs queries including the queries mentioned above. Our proposed approach allows the users to define a local scoring function for each attribute involved in the query and a global scoring function that computes the final score of each pair by combining its scores on different attributes. We propose efficient internal and external memory algorithms and our theoretical analysis shows that the expected performance of the algorithms is optimal when two or less attributes are involved. Our approach does not require any prebuilt indexes, is easy to implement and has low memory requirement. We conduct extensive experiments to demonstrate the efficiency of our proposed approach. I.
(Approximate) uncertain skylines
 IN ICDT
, 2011
"... Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point’s uncertainty is described as a probability distribution over a discrete ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point’s uncertainty is described as a probability distribution over a discrete set of locations, we improve the best known exact solution. We also suggest why we believe our solution might be optimal. Next, we describe simple, nearlinear time approximation algorithms for computing the probability of each point lying on the skyline. In addition, some of our methods can be adapted to construct data structures that can efficiently determine the probability of a query point lying on the skyline.
Representative Skylines using Thresholdbased Preference Distributions
"... Abstract — The study of computing skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the number of tuples in a skyline is often too large to be useful to potential users ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract — The study of computing skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the number of tuples in a skyline is often too large to be useful to potential users, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline ” points. Several different definitions/formulations of representative skylines have been considered in the literature. Most of these formulations (i.e., objective functions) are intuitive in the sense they try to achieve some kind of clustering “spread” over the entire skyline, with k representative points. In this work, we have taken a more principled approach in defining the representative skyline objective. One of our major contributions is to formulate and solve the problem of displaying k representative skyline points such that the probability that a random user would
RegretMinimizing Representative Databases
, 2010
"... We propose the krepresentative regret minimization query (kregret) as an operation to support multicriteria decision making. Like topk, the kregret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions. Like skyline, it filte ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We propose the krepresentative regret minimization query (kregret) as an operation to support multicriteria decision making. Like topk, the kregret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions. Like skyline, it filters out a set of interesting points from a potentially large database based on the users ’ criteria; however, it never overwhelms the users by outputting too many tuples. In particular, for any number k and any class of utility functions, the kregret query outputs k tuples from the database and tries to minimize the maximum regret ratio. This captures how disappointed a user could be had she seen k representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable. The first challenge of this approach is that it is not clear if the maximum regret ratio would be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size. Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.
Thresholdbased Probabilistic Topk Dominating Queries
 VLDB JOURNAL
"... Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Topk dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the prob ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Topk dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing topk dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, thresholdbased algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.