Results 1  10
of
92
Communicationefficient distributed monitoring of thresholded counts
 In Proc. of SIGMOD’06
, 2006
"... Monitoring is an issue of primary concern in current and next generation networked systems. For example, the objective of sensor networks is to monitor their surroundings for a variety of different applications like atmospheric conditions, wildlife behavior, and troop movements among others. Simil ..."
Abstract

Cited by 78 (11 self)
 Add to MetaCart
(Show Context)
Monitoring is an issue of primary concern in current and next generation networked systems. For example, the objective of sensor networks is to monitor their surroundings for a variety of different applications like atmospheric conditions, wildlife behavior, and troop movements among others. Similarly, monitoring in data networks is critical not only for accounting and management, but also for detecting anomalies and attacks. Such monitoring applications are inherently continuous and distributed, and must be designed to minimize the communication overhead that they introduce. In this context we introduce and study a fundamental class of problems called “thresholded counts ” where we must return the aggregate frequency count of an event that is continuously monitored by distributed nodes with a userspecified accuracy whenever the actual count exceeds a given threshold value. In this paper we propose to address the problem of thresholded counts by setting local thresholds at each monitoring node and initiating communication only when the locally observed data exceeds these local thresholds. We explore algorithms in two categories: static thresholds and adaptive thresholds. In the static case, we consider thresholds based on a linear combination of two alternate strategies, and show that there exists an optimal blend of the two strategies that results in minimum communication overhead. We further show that this optimal blend can be found using a steepest descent search. In the adaptive case, we propose algorithms that adjust the local thresholds based on the observed distributions of updated information in the distributed monitoring system. We use extensive simulations not only to verify the accuracy of our algorithms and validate our theoretical results, but also to evaluate the performance of the two approaches. We find that both approaches yield significant savings over the naive approach of performing processing at a centralized location. 1.
Algorithms for Distributed Functional Monitoring
, 2008
"... We study what we call functional monitoring problems. We have k players each tracking their inputs, say player i tracking a multiset Ai(t) up until time t, and communicating with a central coordinator. The coordinator’s task is to monitor a given function f computed over the union of the inputs ∪iAi ..."
Abstract

Cited by 60 (12 self)
 Add to MetaCart
We study what we call functional monitoring problems. We have k players each tracking their inputs, say player i tracking a multiset Ai(t) up until time t, and communicating with a central coordinator. The coordinator’s task is to monitor a given function f computed over the union of the inputs ∪iAi(t), continuously at all times t. The goal is to minimize the number of bits communicated between the players and the coordinator. A simple example is when f is the sum, and the coordinator is required to alert when the sum of a distributed set of values exceeds a given threshold τ. Of interest is the approximate version where the coordinator outputs 1 if f ≥ τ and 0 if f ≤ (1 − ɛ)τ. This defines the (k, f, τ, ɛ) distributed, functional monitoring problem. Functional monitoring problems are fundamental in distributed systems, in particular sensor networks, where we must minimize communication; they also connect to problems in communication complexity, communication theory, and signal processing. Yet few formal bounds are known for functional monitoring. We give upper and lower bounds for the (k, f, τ, ɛ) problem for some of the basic f’s. In particular, we study frequency moments (F0, F1, F2). For F0 and F1, we obtain continuously monitoring algorithms with costs almost the same as their oneshot computation algorithms. However, for F2 the monitoring problem seems much harder. We give a carefully constructed multiround algorithm that uses “sketch summaries ” at multiple levels of detail and solves the (k, F2, τ, ɛ) problem with communication Õ(k2 /ɛ+ ( √ k/ɛ) 3). Since frequency moment estimation is central to other problems, our results have immediate applications to histograms, wavelet computations, and others. Our algorithmic techniques are likely to be useful for other functional monitoring problems as well.
Communicationefficient online detection of networkwide anomalies
 In IEEE Conference on Computer Communications (INFOCOM
, 2007
"... Abstract—There has been growing interest in building largescale distributed monitoring systems for sensor, enterprise, and ISP networks. Recent work has proposed using Principal Component Analysis (PCA) over global traffic matrix statistics to effectively isolate networkwide anomalies. To allow suc ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
Abstract—There has been growing interest in building largescale distributed monitoring systems for sensor, enterprise, and ISP networks. Recent work has proposed using Principal Component Analysis (PCA) over global traffic matrix statistics to effectively isolate networkwide anomalies. To allow such a PCAbased anomaly detection scheme to scale, we propose a novel approximation scheme that dramatically reduces the burden on the production network. Our scheme avoids the expensive step of centralizing all the data by performing intelligent filtering at the distributed monitors. This filtering reduces monitoring bandwidth overheads, but can result in the anomaly detector making incorrect decisions based on a perturbed view of the global data set. We employ stochastic matrix perturbation theory to bound such errors. Our algorithm selects the filtering parameters at local monitors such that the errors made by the detector are guaranteed to lie below a userspecified upper bound. Our algorithm thus allows network operators to explicitly balance the tradeoff between detection accuracy and the amount of data communicated over the network. In addition, our approach enables realtime detection because we exploit continuous monitoring at the distributed monitors. Experiments with traffic data from Abilene backbone network demonstrate that our methods yield significant communication benefits while simultaneously achieving high detection accuracy. I.
Functional Monitoring Without Monotonicity
, 2008
"... The notion of distributed functional monitoring was recently introduced by Cormode, Muthukrishnan and Yi [CMY08] to initiate a formal study of the communication cost of certain fundamental problems arising in distributed systems, especially sensor networks. In this model, each of k sites reads a str ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
(Show Context)
The notion of distributed functional monitoring was recently introduced by Cormode, Muthukrishnan and Yi [CMY08] to initiate a formal study of the communication cost of certain fundamental problems arising in distributed systems, especially sensor networks. In this model, each of k sites reads a stream of tokens and is in communication with a central coordinator, who wishes to continuously monitor some function f of σ, the union of the k streams. The goal is to minimize the number of bits communicated by a protocol that correctly monitors f (σ), to within some small error. As in previous work, we focus on a threshold version of the problem, where the coordinator’s task is simply to maintain a single output bit, which is 0 whenever f (σ) ≤ τ(1−ε) and 1 whenever f (σ) ≥ τ. Following Cormode et al., we term this the (k, f, τ, ε) functional monitoring problem. In previous work, some upper and lower bounds were obtained for this problem, with f being a frequency moment function, e.g., F0, F1, F2. Importantly, these functions are monotone. Here, we further advance the study of such problems, proving three new classes of results. First, we prove new lower bounds on this problem when f = Fp, for several values of p. Second, we study the effect of nonmonotonicity of f on our ability to give nontrivial monitoring protocols, by considering f = Fp with deletions allowed, as well as f = H, the empirical Shannon entropy of a stream. Third, we provide nontrivial monitoring protocols when f is either H, or any of a related class of entropy functions (Tsallis entropies). These are the first nontrivial algorithms for distributed monitoring of nonmonotone functions.
Shape sensitive geometric monitoring
 In Proc. ACM Symposium on Principles of Database Systems
, 2008
"... A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are in ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
(Show Context)
A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are interested in determining, at any given time, whether the value of an arbitrary function on the average of these vectors crosses a predetermined threshold. In this paper, we introduce a new method for monitoring distributed data, which we term shape sensitive geometric monitoring. It is based on a geometric interpretation of the problem, which enables to define local constraints on the data received at the nodes. It is guaranteed that as long as none of these constraints has been violated, the value of the function does not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local input vectors, here we incorporate their temporal behavior into the constraints. Also, the new constraints are tailored to the geometric properties of the specific function which is being monitored, while the previous constraints were generic. Experimental results on real world data reveal that using the new geometric constraints reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between existing results and a newly defined lower bound on the communication complexity.
Optimal sampling from distributed streams
 Proc. ACM Symposium on Principles of Database Systems
, 2009
"... A fundamental problem in data management is to draw a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The c ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
(Show Context)
A fundamental problem in data management is to draw a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The challenge is to ensure that a sample is drawn uniformly across the union of the data while minimizing the communication needed to run the protocol and track parameters of the evolving data. At the same time, it is also necessary to make the protocol lightweight, by keeping the space and time costs low for each participant. In this paper, we present communicationefficient protocols for sampling (both with and without replacement) from k distributed streams. These apply to the case when we want a sample from the full streams, and to the sliding window cases of only the W most recent items, or arrivals within the last w time units. We show that our protocols are optimal, not just in terms of the communication used, but also that they use minimal or near minimal (up to logarithmic factors) time to process each new item, and space to operate. Categories and Subject Descriptors F.2.2 [Analysis of algorithms and problem complexity]:
Efficient Detection of Distributed Constraint Violations
, 2006
"... In many distributed environments, the primary function of monitoring software is to detect anomalies, that is, instances when system behavior deviates substantially from the norm. Existing approaches for detecting such abnormal behavior record system state at all times, even during normal operation, ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
In many distributed environments, the primary function of monitoring software is to detect anomalies, that is, instances when system behavior deviates substantially from the norm. Existing approaches for detecting such abnormal behavior record system state at all times, even during normal operation, and thus incur wasteful communication overhead. In this paper, we propose communicationefficient schemes for the anomaly detection problem, which we model as one of detecting the violation of global constraints defined over distributed system variables. Our approach eliminates the need to continuously track the global system state by decomposing global constraints into local constraints that can be checked efficiently at each site. Only in the occasional event that a local constraint is violated, do we resort to more expensive global constraint checking. We formulate the problem of selecting local constraints as an optimization problem that takes into account the frequency distribution of individual system variables, and whose objective is to minimize communication costs. After showing the problem to be NPhard, we propose approximation algorithms for computing provably nearoptimal (in terms of the number of messages) local constraints. In our experiments with reallife network traffic data sets, we found that our techniques for detecting global constraint violations can reduce message communication overhead by as much as 70 % compared to existing data distributionagnostic approaches.
Lower Bounds for NumberinHand Multiparty Communication Complexity, Made Easy ∗
"... In this paper we prove lower bounds on randomized multiparty communication complexity, both in the blackboard model (where each message is written on a blackboard for all players to see) and (mainly) in the messagepassing model, where messages are sent playertoplayer. We introduce a new technique ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
In this paper we prove lower bounds on randomized multiparty communication complexity, both in the blackboard model (where each message is written on a blackboard for all players to see) and (mainly) in the messagepassing model, where messages are sent playertoplayer. We introduce a new technique for proving such bounds, called symmetrization, which is natural, intuitive, and often easy to use. For example, for the problem where each of k players gets a bitvector of length n, and the goal is to compute the coordinatewise XOR of these vectors, we prove a tight lower bounds of Ω(nk) in the blackboard model. For the same problem with AND instead of XOR, we prove a lower bounds of roughly Ω(nk) in the messagepassing model (assuming k ≤ n/3200) and Ω(n log k) in the blackboard model. We also prove lower bounds for bitwise majority, for a graphconnectivity problem, and for other problems; the technique seems applicable to a wide range of other problems as well. The obtained communication lower bounds imply new lower bounds in the functional monitoring model [11] (also called the distributed streaming model). All of our lower bounds allow randomized communication protocols with twosided error. We also use the symmetrization technique to prove several directsumlike results for multiparty communication. 1
Streaming in a Connected World: Querying and Tracking Distributed Data Streams
 SIGMOD'07
, 2007
"... Today, a majority of data is fundamentally distributed in nature. Data for almost any task is collected over a broad area, and streams in at a much greater rate than ever before. In particular, advances in sensor technology ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
Today, a majority of data is fundamentally distributed in nature. Data for almost any task is collected over a broad area, and streams in at a much greater rate than ever before. In particular, advances in sensor technology