Results 1  10
of
12
Using Trees to Depict a Forest
 PVLDB
"... When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the “best ” results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typi ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the “best ” results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach to the first page is not to try to show the best results, but instead to help users learn what is available in the whole result set and direct them to finding what they need. In this paper, we demonstrate through a user study that a page comprising one representative from each of k clusters (generated through a kmedoid clustering) is superior to multiple alternative candidate methods for generating representatives of a data set. Users often refine query specifications based on returned results. Traditional clustering may lead to completely new representatives after a refinement step. Furthermore, clustering can be computationally expensive. We propose a treebased method for efficiently generating the representatives, and smoothly adapting them with query refinement. Experiments show that our algorithms outperform the stateoftheart in both result quality and efficiency.
Group Enclosing Queries
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p ∗ ∈ P such that the maximum distance of p ∗ to all points in Q is minimized. This problem is equivalent to the MinMax case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatia ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Given a set of points P and a query set Q, a group enclosing query (GEQ) fetches the point p ∗ ∈ P such that the maximum distance of p ∗ to all points in Q is minimized. This problem is equivalent to the MinMax case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatial databases [27]. This work first designs a new exact solution by exploring new geometric insights, such as the minimum enclosing ball, the convex hull and the furthest voronoi diagram of the query group. To further reduce the query cost, especially when the dimensionality increases, we turn to approximation algorithms. Our main approximation algorithm has a worst case √ 2approximation ratio if one can find the exact nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a large number of data sets up to six dimension. We also discuss how to extend it to higher dimensions (up to 74 in our experiment) and show that it still maintains a very good approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend the √ 2approximation algorithm to get a (1 + ǫ)approximate solution for the GEQ problem. Both approximation algorithms have O(log N + M) query cost in any fixed dimension, where N and M are the sizes of the data set P and query group Q. Extensive experiments on both synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the efficiency, effectiveness and scalability of the proposed algorithms, especially their significant improvement over the stateoftheart method.
Continuous kMeans Monitoring over Moving Objects
"... Given a dataset P, a kmeans query returns k points in space (called centers), such that the average squared distance between each point in P and its nearest center is minimized. Since this problem is NPhard, several approximate algorithms have been proposed and used in practice. In this paper, w ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Given a dataset P, a kmeans query returns k points in space (called centers), such that the average squared distance between each point in P and its nearest center is minimized. Since this problem is NPhard, several approximate algorithms have been proposed and used in practice. In this paper, we study continuous kmeans computation at a server that monitors a set of moving objects. Reevaluating kmeans every time there is an object update imposes a heavy burden on the server (for computing the centers from scratch) and the clients (for continuously sending location updates). We overcome these problems with a novel approach that significantly reduces the computation and communication costs, while guaranteeing that the quality of the solution, with respect to the reevaluation approach, is bounded by a userdefined tolerance. The proposed method assigns each moving object a threshold (i.e., range) such that the object sends a location update only when it crosses the range boundary. First, we develop an efficient technique for maintaining the kmeans. Then, we present mathematical formulae and algorithms for deriving the individual thresholds. Finally, we justify our performance claims with extensive experiments.
Continuous Medoid Queries over Moving Objects. SSTD
, 2007
"... Abstract. In the kmedoid problem, given a dataset P, we are asked to choose k points in P as the medoids. The optimal medoid set minimizes the average Euclidean distance between the points in P and their closest medoid. Finding the optimal k medoids is NP hard, and existing algorithms aim at approx ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In the kmedoid problem, given a dataset P, we are asked to choose k points in P as the medoids. The optimal medoid set minimizes the average Euclidean distance between the points in P and their closest medoid. Finding the optimal k medoids is NP hard, and existing algorithms aim at approximate answers, i.e., they compute medoids that achieve a small, yet not minimal, average distance. Similarly in this paper, we also aim at approximate solutions. We consider, however, the continuous version of the problem, where the points in P move and our task is to maintain the medoid set onthefly (trying to keep the average distance small). To the best of our knowledge, this work constitutes the first attempt on continuous medoid queries. First, we consider centralized monitoring, where the points issue location updates whenever they move. A server processes the stream of generated updates and constantly reports the current medoid set. Next, we address distributed monitoring, where we assume that the data points have some computational capabilities, and they take over part of the monitoring task. In particular, the server installs adaptive filters (i.e., permissible spatial ranges, called safe regions) to the points, which report their location only when they move outside their filters. The distributed techniques reduce the frequency of location updates (and, thus, the network overhead and the server load), at the cost of a slightly higher average distance, compared to the centralized methods. Both our centralized and distributed methods do not make any assumption about the data moving patterns (e.g., velocity vectors, trajectories, etc) and can be applied to an arbitrary number of medoids k. We demonstrate the efficiency and efficacy of our techniques through extensive experiments.
Spatial Cohesion Queries
"... Given a set of attractors and repellers, the cohesion query returns the point in database that is as close to the attractors and as far from the repellers as possible. Cohesion queries find applications in various settings, such as facility location problems, locationbased services. For example, w ..."
Abstract
 Add to MetaCart
(Show Context)
Given a set of attractors and repellers, the cohesion query returns the point in database that is as close to the attractors and as far from the repellers as possible. Cohesion queries find applications in various settings, such as facility location problems, locationbased services. For example, when attractors represent favorable plases, e.g., tourist attractions, and repellers denote undesirable locations, e.g., competitor stores, the cohesion query would return the ideal location, among a database of possible options, to open a new store. These queries are not trivial to process as the best location, unlike aggregate nearest or farthest neighbor queries, may be far from the optimal point in space. Therefore, to achieve sublinear performance in practice, we employ novel bestfirst search and branch and bound paradigms that take advantage of the geometrical interpretation of the problem. Our methods are up to orders of magnitude faster than linear scan and adaptations of existing aggregate nearest/farthest neighbor algorithms.
KITS,Warangal.
"... Wireless communication technology has been rapidly increasing, it became quite common for people to view maps or get related services from the handheld devices, such as mobile phones and PDAs. Spatial databases have witnessed an increasing number of applications recently, due to the fast advance in ..."
Abstract
 Add to MetaCart
Wireless communication technology has been rapidly increasing, it became quite common for people to view maps or get related services from the handheld devices, such as mobile phones and PDAs. Spatial databases have witnessed an increasing number of applications recently, due to the fast advance in the fields of mobile computing and embedded systems and the spread of the Internet.Range queries are often posed by user to retrieve the useful information from a spatial database. We present a novel idea that a concise representation of a specified size for the range query results, while incurring minimal information loss, shall be computed and returned to the user. Such a concise range query not only reduces communication costs, but also offers better usability to the users, providing an opportunity for interactive exploration. The usefulness of the concise range queries is confirmed by comparing it with other possible alternatives, such as sampling and clustering. In this proposed system, we include the entities and associate the object attributes such as restaurants, shopping places etc which represents a point within a Hilbert curve which facilitates in reducing search space for spatial data, and to provide a range for attribute such that all the information is retrieved with minimal loss. The proposed system also includes peer to peer system through which multiple spatial databases can be accessed in efficient time.
Research Statement
, 2007
"... My research focuses on Spatiotemporal Databases and Location Based Services, and their bridging with Mobile Computing and Data Stream Processing. Below, I briefly describe the most representative aspects of my work. Continuous Nearest Neighbor Monitoring A k nearest neighbor (kNN) query retrieves t ..."
Abstract
 Add to MetaCart
(Show Context)
My research focuses on Spatiotemporal Databases and Location Based Services, and their bridging with Mobile Computing and Data Stream Processing. Below, I briefly describe the most representative aspects of my work. Continuous Nearest Neighbor Monitoring A k nearest neighbor (kNN) query retrieves the k objects in a dataset that lie closest to a given query point. There exist numerous approaches for efficient kNN processing over static datasets. Recently, however, the research focus has shifted towards dynamic environments where (i) the data objects and the query points move in an unpredictable manner, and (ii) the queries request continuous monitoring of their k nearest neighbors for long periods of time. In [1], [2], [3] and [4] we consider kNN monitoring for various settings and different optimization goals. [1] describes a method targeted to Euclidean spaces that aims at minimizing the computational overhead for centrally processing multiple queries. It achieves low running time by handling location updates only from objects that fall in the vicinity of some query. In [2] we tackle the same problem, but target at reducing the communication cost between the central query processor and the data objects. We present a thresholdbased algorithm that exploits the computational capabilities of the objects to achieve this goal. [1] and [2] assume the
1 The World in a Nutshell: Concise Range Queries
"... Abstract—With the advance of wireless communication technology, it is quite common for people to view maps or get related services from the handheld devices, such as mobile phones and PDAs. Range queries, as one of the most commonly used tools, are often posed by the users to retrieve needful inform ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—With the advance of wireless communication technology, it is quite common for people to view maps or get related services from the handheld devices, such as mobile phones and PDAs. Range queries, as one of the most commonly used tools, are often posed by the users to retrieve needful information from a spatial database. However, due to the limits of communication bandwidth and hardware power of handheld devices, displaying all the results of a range query on a handheld device is neither communication efficient nor informative to the users. This is simply because that there are often too many results returned from a range query. In view of this problem, we present a novel idea that a concise representation of a specified size for the range query results, while incurring minimal information loss, shall be computed and returned to the user. Such a concise range query not only reduces communication costs, but also offers better usability to the users, providing an opportunity for interactive exploration. The usefulness of the concise range queries is confirmed by comparing it with other possible alternatives, such as sampling and clustering. Unfortunately, we prove that finding the optimal representation with minimum information loss is an NPhard problem. Therefore, we propose several effective and nontrivial algorithms to find a good approximate result. Extensive experiments on realworld data have demonstrated the effectiveness and efficiency of the proposed techniques. Index Terms—Spatial databases, range queries, algorithms. 1