Results 1  10
of
12
EndtoEnd Support for Joins in LargeScale Publish/Subscribe Systems
"... We address the problem of supporting a large number of selectjoin subscriptions in a widearea publish/subscribe system. Subscriptions are interested in joins over different data sources (tables), with varying interests expressed as range selection predicates over table attributes. Naive schemes, s ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of supporting a large number of selectjoin subscriptions in a widearea publish/subscribe system. Subscriptions are interested in joins over different data sources (tables), with varying interests expressed as range selection predicates over table attributes. Naive schemes, such as computing and sending join results from a server, are inefficient because they produce redundant data, and are unable to share dissemination costs across subscribers and events. We propose a novel, scalable scheme that groupprocesses and disseminates a general mix of multiway selectjoin subscriptions. We also propose a simple and applicationagnostic extension tocontentdrivennetworks (CN), which further improves sharing of dissemination costs across events and subscribers. We develop and experimentally evaluate our scheme, and show that it can generate an order of magnitude lower network traffic at very low processing cost. Our extension to CN can further reduce traffic by another order of magnitude, with almost no increase in notification latency. 1
Online function tracking with generalized penalties
 In Proc. 12th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT
, 2010
"... Abstract. We attend to the classic setting where an observer needs to inform a tracker about an arbitrary time varying function f: N0 → Z. This is an optimization problem, where both wrong values at the tracker and sending updates entail a certain cost. We consider an online variant of this problem, ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
Abstract. We attend to the classic setting where an observer needs to inform a tracker about an arbitrary time varying function f: N0 → Z. This is an optimization problem, where both wrong values at the tracker and sending updates entail a certain cost. We consider an online variant of this problem, i.e., at time t, the observer only knows f(t ′ ) for all t ′ ≤ t. In this paper, we generalize existing cost models (with an emphasis on concave and convex penalties) and present two online algorithms. Our analysis shows that these algorithms perform well in a large class of models, and are even optimal in some settings. 1
An Efficient Publish/Subscribe Index for ECommerce Databases
"... Many of today’s publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as ecommerce), there is an increasing variety of items, each with different attributes. This leads to ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Many of today’s publish/subscribe (pub/sub) systems have been designed to cope with a large volume of subscriptions and high event arrival rate (velocity). However, in many novel applications (such as ecommerce), there is an increasing variety of items, each with different attributes. This leads to a very highdimensional and sparse database that existing pub/sub systems can no longer support effectively. In this paper, we propose an efficient inmemory index that is scalable to the volume and update of subscriptions, the arrival rate of events and the variety of subscribable attributes. The index is also extensible to support complex scenarios such as prefix/suffix filtering and regular expression matching. We conduct extensive experiments on synthetic datasets and two real datasets (AOL query log and Ebay products). The results demonstrate the superiority of our index over stateoftheart methods: our index incurs orders of magnitude less index construction time, consumes a small amount of memory and performs event matching efficiently. 1.
Valuebased predicate filtering of XML documents
 DATA & KNOWLEDGE ENGINEERING
, 2008
"... ..."
Small and Stable Descriptors of Distributions for Geometric Statistical Problems
, 2009
"... This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two exa ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two examples of coresets are εsamples and εkernels. An εsample can estimate the density of a point set in any range from a geometric family of ranges (e.g., disks, axisaligned rectangles). An εkernel approximates the width of a point set in all directions. Both coresets have size that depends only on ε, the error parameter, not the size of the original data set. We demonstrate several improvements to these coresets and how they are useful for geometric statistical problems. We reduce the size of εsamples for density queries in axisaligned rectangles to nearly a square root of the size when the queries are with respect to more general families of shapes, such as disks. We also show how to construct εsamples of probability distributions. We show how to maintain “stable” εkernels, that is, if the point set P changes by
Processing and Notifying Range Topk Subscriptions
"... Abstract—We considerhow tosupportalarge numberof users over a widearea network whose interests are characterized by range topk continuous queries. Given an object update, we need to notify users whose topk results are affected. Simple solutions include using a contentdriven network to notify all ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We considerhow tosupportalarge numberof users over a widearea network whose interests are characterized by range topk continuous queries. Given an object update, we need to notify users whose topk results are affected. Simple solutions include using a contentdriven network to notify all users whose interest ranges contain the update (ignoring topk), or using a server to compute only the affected queries and notifying them individually. The former solution generates too much network traffic, while the latter overwhelms the server. We present a geometric framework for the problem that allows us to describe the set of affected queries succinctly with messages that can be efficiently disseminated using contentdriven networks. We give fast algorithms to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all user interests. We also present extensions to our solution, including an approximate algorithm that trades off betweenthecostofserverside reformulationandthatof userside postprocessing, as well as efficient techniques for batch updates. I.
Advanced Institutes of Convergence Technology
"... XMLenabled publishsubscribe (pubsub) systems have emerged as an increasingly important tool for ecommerce and Internet applications. In a typical pubsub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against t ..."
Abstract
 Add to MetaCart
XMLenabled publishsubscribe (pubsub) systems have emerged as an increasingly important tool for ecommerce and Internet applications. In a typical pubsub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the service is critical to the success of pubsub systems. In this article, we propose a novel scalable filtering system called iFiST that transforms user profiles of a twig pattern expressed in XPath into sequences using the Prüfer’s method. Consequently, instead of breaking a twig pattern into multiple linear paths and matching them separately, iFiST performs holistic matching of twig patterns with each incoming document in a bottomup fashion. iFiST organizes the sequences into a dynamic hashbased index for efficient filtering, and exploits the commonality among user profiles to enable shared processing during the filtering phase. We demonstrate that the holistic matching approach reduces filtering cost and memory consumption, thereby improving the scalability of iFiST.
ABSTRACT EndtoEndSupportforJoinsinLargeScale Publish/SubscribeSystems ∗
"... We address the problem of supporting a large number of selectjoin subscriptions for widearea publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive schemes, such as computing and sending join res ..."
Abstract
 Add to MetaCart
(Show Context)
We address the problem of supporting a large number of selectjoin subscriptions for widearea publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive schemes, such as computing and sending join results from a server, are inefficient because they produce redundant data, and are unable to share dissemination costs across subscribers and events. We propose a novel, scalable scheme that groupprocesses and disseminates a general mix of multiway selectjoin subscriptions. We also propose a simple and applicationagnostic extension tocontentdrivennetworks (CN), which further improves sharing of dissemination costs. Experimental evaluations show that our schemes can generate orders of magnitude lower network traffic at very low processing cost. Our extension to CN can further reduce traffic by another order of magnitude, with almost no increase in notification latency. 1
Distributed Online Tracking
"... In online tracking, an observer S receives a sequence of values, one per time instance, from a data source that is described by a function f. A tracker T wants to continuously maintain an approximation that is within an error threshold of the value f(t) at any time instance t, with small communica ..."
Abstract
 Add to MetaCart
(Show Context)
In online tracking, an observer S receives a sequence of values, one per time instance, from a data source that is described by a function f. A tracker T wants to continuously maintain an approximation that is within an error threshold of the value f(t) at any time instance t, with small communication overhead. This problem was recently formalized and studied in [32, 34], and a principled approach with optimal competitive ratio was proposed. This work extends the study of online tracking to a distributed setting, where a tracker T wants to track a function f that is computed from a set of functions {f1,..., fm} from m distributed observers and respective data sources. This formulation finds numerous important and natural applications, e.g., sensor networks, distributed systems, measurement networks, and pubsub systems. We formalize this problem and present effective online algorithms for various topologies of a distributed system/network for different aggregate functions. Experiments on large real data sets demonstrate the excellent performance of our methods in practice.