Results 1 - 10
of
445
Gossip-Based Computation of Aggregate Information
, 2003
"... between computers, and a resulting paradigm shift from centralized to highly distributed systems. With massive scale also comes massive instability, as node and link failures become the norm rather than the exception. For such highly volatile systems, decentralized gossip-based protocols are emergin ..."
Abstract
-
Cited by 472 (2 self)
- Add to MetaCart
(Show Context)
between computers, and a resulting paradigm shift from centralized to highly distributed systems. With massive scale also comes massive instability, as node and link failures become the norm rather than the exception. For such highly volatile systems, decentralized gossip-based protocols are emerging as an approach to maintaining simplicity and scalability while achieving fault-tolerant information dissemination.
Automated worm fingerprinting
- In OSDI
, 2004
"... Network worms are a clear and growing threat to the security of today’s Internet-connected hosts and networks. The combination of the Internet’s unrestricted connectivity and widespread software homogeneity allows network pathogens to exploit tremendous parallelism in their propagation. In fact, mod ..."
Abstract
-
Cited by 317 (9 self)
- Add to MetaCart
(Show Context)
Network worms are a clear and growing threat to the security of today’s Internet-connected hosts and networks. The combination of the Internet’s unrestricted connectivity and widespread software homogeneity allows network pathogens to exploit tremendous parallelism in their propagation. In fact, modern worms can spread so quickly, and so widely, that no human-mediated reaction can hope to contain an outbreak. In this paper, we propose an automated approach for quickly detecting previously unknown worms and viruses based on two key behavioral characteristics – a common exploit sequence together with a range of unique sources generating infections and destinations being targeted. More importantly, our approach – called “content sifting ” – automatically generates precise signatures that can then be used to filter or moderate the spread of the worm elsewhere in the network. Using a combination of existing and novel algorithms we have developed a scalable content sifting implementation with low memory and CPU requirements. Over months of active use at UCSD, our Earlybird prototype system has automatically detected and generated signatures for all pathogens known to be active on our network as well as for several new worms and viruses which were unknown at the time our system identified them. Our initial experience suggests that, for a wide range of network pathogens, it may be practical to construct fully automated defenses – even against so-called “zero-day” epidemics. 1
Approximate aggregation techniques for sensor databases
- In ICDE
, 2004
"... In the emerging area of sensor-based systems, a significant challenge is to develop scalable, fault-tolerant methods to extract useful information from the data the sensors collect. An approach to this data management problem is the use of sensor database systems, exemplified by TinyDB and Cougar, w ..."
Abstract
-
Cited by 301 (6 self)
- Add to MetaCart
(Show Context)
In the emerging area of sensor-based systems, a significant challenge is to develop scalable, fault-tolerant methods to extract useful information from the data the sensors collect. An approach to this data management problem is the use of sensor database systems, exemplified by TinyDB and Cougar, which allow users to perform aggregation queries such as MIN, COUNT and AVG on a sensor network. Due to power and range constraints, centralized approaches are generally impractical, so most systems use in-network aggregation to reduce network traffic. Also, aggregation strategies must provide fault-tolerance to address the issues of packet loss and node failures inherent in such a system. An unfortunate consequence of standard methods is that they typically introduce duplicate values, which must be accounted for to compute aggregates correctly. Another consequence of loss in the network is that exact aggregation is not possible in general. With this in mind, we investigate the use of approximate in-network aggregation using small sketches. Our contributions are as follows: 1) we generalize well known duplicateinsensitive sketches for approximating COUNT to handle SUM (and by extension, AVG and other aggregates), 2) we present and analyze methods for using sketches to produce accurate results with low communication and computation overhead (even on low-powered CPUs with little storage and no floating point operations), and 3) we present an extensive experimental validation of our methods. 1
Synopsis diffusion for robust aggregation in sensor networks
- IN SENSYS
, 2004
"... ..."
(Show Context)
Wavelet-Based Histograms for Selectivity Estimation
"... Query optimization is an integral part of relational database management systems. One important task in query optimization is selectivity estimation, that is, given a query P, we need to estimate the fraction of records in the database that satisfy P. Many commercial database systems maintain histog ..."
Abstract
-
Cited by 245 (16 self)
- Add to MetaCart
Query optimization is an integral part of relational database management systems. One important task in query optimization is selectivity estimation, that is, given a query P, we need to estimate the fraction of records in the database that satisfy P. Many commercial database systems maintain histograms to approximate the frequency distribution of values in the attributes of relations. In this paper, we present a technique based upon a multiresolution wavelet decomposition for building histograms on the underlying data distributions, with applications to databases, statistics, and simulation. Histograms built on the cumulative data values give very good approximations with limited space usage. We give fast algorithms for constructing histograms and using
Mellin Transforms And Asymptotics: Harmonic Sums
- THEORETICAL COMPUTER SCIENCE
, 1995
"... This survey presents a unified and essentially self-contained approach to the asymptotic analysis of a large class of sums that arise in combinatorial mathematics, discrete probabilistic models, and the average-case analysis of algorithms. It relies on the Mellin transform, a close relative of the i ..."
Abstract
-
Cited by 202 (12 self)
- Add to MetaCart
This survey presents a unified and essentially self-contained approach to the asymptotic analysis of a large class of sums that arise in combinatorial mathematics, discrete probabilistic models, and the average-case analysis of algorithms. It relies on the Mellin transform, a close relative of the integral transforms of Laplace and Fourier. The method applies to harmonic sums that are superpositions of rather arbitrary "harmonics" of a common base function. Its principle is a precise correspondence between individual terms in the asymptotic expansion of an original function and singularities of the transformed function. The main applications are in the area of digital data structures, probabilistic algorithms, and communication theory.
Counting Distinct Elements in a Data Stream
, 2002
"... We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± epsilon. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs. ..."
Abstract
-
Cited by 191 (4 self)
- Add to MetaCart
We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± epsilon. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.
Clustering data streams: Theory and practice
- IEEE TKDE
, 2003
"... The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, ..."
Abstract
-
Cited by 157 (5 self)
- Add to MetaCart
(Show Context)
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm’s performance on synthetic and real data streams.
Computing iceberg queries efficiently
- In Proc. of the 24th VLDB Conf
, 1998
"... Many applications compute aggregate functions... ..."