Results 1 - 10
of
36
A Brief History of Generative Models for Power Law and Lognormal Distributions
- INTERNET MATHEMATICS
"... Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying ..."
Abstract
-
Cited by 192 (7 self)
- Add to MetaCart
Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a a lognormal distribution. In trying
Efficient erasure correcting codes
- IEEE Transactions on Information Theory
, 2001
"... Abstract—We introduce a simple erasure recovery algorithm for codes derived from cascades of sparse bipartite graphs and analyze the algorithm by analyzing a corresponding discrete-time random process. As a result, we obtain a simple criterion involving the fractions of nodes of different degrees on ..."
Abstract
-
Cited by 186 (19 self)
- Add to MetaCart
Abstract—We introduce a simple erasure recovery algorithm for codes derived from cascades of sparse bipartite graphs and analyze the algorithm by analyzing a corresponding discrete-time random process. As a result, we obtain a simple criterion involving the fractions of nodes of different degrees on both sides of the graph which is necessary and sufficient for the decoding process to finish successfully with high probability. By carefully designing these graphs we can construct for any given rate and any given real number a family of linear codes of rate which can be encoded in time proportional to ��@I A times their block length. Furthermore, a codeword can be recovered with high probability from a portion of its entries of length @IC A or more. The recovery algorithm also runs in time proportional to ��@I A. Our algorithms have been implemented and work well in practice; various implementation issues are discussed. Index Terms—Erasure channel, large deviation analysis, lowdensity parity-check codes. I.
The Power of Two Choices in Randomized Load Balancing
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d ..."
Abstract
-
Cited by 159 (22 self)
- Add to MetaCart
Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only log log n log d +O(1) with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems. In this thesis, we expand upon this result by examining related models and by developing techniques for stu...
How Useful Is Old Information
- IEEE Transactions on Parallel and Distributed Systems
, 2000
"... AbstractÐWe consider the problem of load balancing in dynamic distributed systems in cases where new incoming tasks can make use of old information. For example, consider a multiprocessor system where incoming tasks with exponentially distributed service requirements arrive as a Poisson process, the ..."
Abstract
-
Cited by 72 (10 self)
- Add to MetaCart
AbstractÐWe consider the problem of load balancing in dynamic distributed systems in cases where new incoming tasks can make use of old information. For example, consider a multiprocessor system where incoming tasks with exponentially distributed service requirements arrive as a Poisson process, the tasks must choose a processor for service, and a task knows when making this choice the processor queue lengths from T seconds ago. What is a good strategy for choosing a processor in order for tasks to minimize their expected time in the system? Such models can also be used to describe settings where there is a transfer delay between the time a task enters a system and the time it reaches a processor for service. Our models are based on considering the behavior of limiting systems where the number of processors goes to infinity. The limiting systems can be shown to accurately describe the behavior of sufficiently large systems and simulations demonstrate that they are reasonably accurate even for systems with a small number of processors. Our studies of specific models demonstrate the importance of using randomness to break symmetry in these systems and yield important rules of thumb for system design. The most significant result is that only small amounts of queue length information can be extremely useful in these settings; for example, having incoming tasks choose the least loaded of two randomly chosen processors is extremely effective over a large range of possible system parameters. In contrast, using global information can actually degrade performance unless used carefully; for example, unlike most settings where the load information is current, having tasks go to the apparently least loaded server can significantly hurt performance. Index TermsÐLoad balancing, stale information, old information, queuing theory, large deviations. æ 1
On the Analysis of Randomized Load Balancing Schemes
- IN PROCEEDINGS OF THE 9TH ANNUAL ACM SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1998
"... It is well known that simple randomized load balancing schemes can balance load effectively while incurring only a small overhead, making such schemes appealing for practical systems. In this paper, we provide new analyses for several such dynamic randomized load balancing schemes. Our work extends ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
It is well known that simple randomized load balancing schemes can balance load effectively while incurring only a small overhead, making such schemes appealing for practical systems. In this paper, we provide new analyses for several such dynamic randomized load balancing schemes. Our work extends a previous analysis of the supermarket model, a model that abstracts a simple, efficient load balancing scheme in the setting where jobs arrive at a large system of parallel processors. In this model, customers arrive at a system of n servers as a Poisson stream of rate #n, # < 1, with service requirements exponentially distributed with mean 1. Each customer chooses d servers independently and uniformly at random from the n servers, and is served according to the First In First Out (FIFO) protocol at the choice with the fewest customers. For the supermarket model, it has been shown that using d = 2 choices yields an exponential improvement in the expected time a customer spends in the syst...
Setting 2 variables at a time yields a new lower bound for random 3-SAT (Extended Abstract)
, 1999
"... Let X be a set of n Boolean variables and denote by C(X) the set of all 3-clauses over X, i.e. the set of all 8 n 3 possible disjunctions of three distinct, non-complementary literals from variables in X. Let F (n; m) be a random 3-SAT formula formed by selecting, with replacement, m clauses uniform ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Let X be a set of n Boolean variables and denote by C(X) the set of all 3-clauses over X, i.e. the set of all 8 n 3 possible disjunctions of three distinct, non-complementary literals from variables in X. Let F (n; m) be a random 3-SAT formula formed by selecting, with replacement, m clauses uniformly at random from C(X) and taking their conjunction. The satisfiability threshold conjecture asserts that there exists a constant r3 such that as n !1, F (n; rn) is satis able with probability that tends to 1 if r r3 . Experimental evidence suggests r3 4:2. We prove r3 > 3:145 improving over the previous best lower bound r3 > 3:003 due to Frieze and Suen. For this, we introduce a satisfiability heuristic that works iteratively, permanently setting the value of a pair of variables in each round. The framework we...
Analyses of Load Stealing Models Based on Differential Equations
- In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to m ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to model a large variety of systems and to provide accurate numerical approximations of system behavior even when the number of processors is relatively small. We show how this approach can yield significant intuition about the behavior of work stealing algorithms in realistic settings.
Allocating Weighted Jobs in Parallel
, 1997
"... It is well known that after placing m n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains \Theta(log n= log log n+ m n ) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O \Gamma m n \Delta can be obtained for all m n ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
It is well known that after placing m n balls independently and uniformly at random (i.u.r.) into n bins, the fullest bin contains \Theta(log n= log log n+ m n ) balls, with high probability. It is also known (see [Ste96]) that a maximum load of O \Gamma m n \Delta can be obtained for all m n if a ball is allocated in one (suitably chosen) of two (i.u.r.) bins. Stemann ([Ste96]) shows that r communication rounds suffice to guarantee a maximum load of maxf r p log n; O \Gamma m n \Delta g, with high probability. Adler et al. have shown in [ACMR95] that Stemanns protocol is optimal for constant r. In this paper we extend the above results in two directions: We generalize the lower bound to arbitrary r log log n. This implies that the result of Stemanns protocol is optimal for all r. Our main result is a generalization of Stemanns upper bound to weighted jobs: Let W A (W M ) denote the average (maximum) weight of the balls. Further let \Delta = W A =W M . Note that...
Analyzing an Infinite Parallel Job Allocation Process
"... In recent years the task of allocating jobs to servers has been studied with the "balls and bins" abstraction. Results in this area exploit the large decrease in maximum load that can be achieved by allowing each job (ball) a very small amount of choice in choosing its destination server (bin). T ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
In recent years the task of allocating jobs to servers has been studied with the "balls and bins" abstraction. Results in this area exploit the large decrease in maximum load that can be achieved by allowing each job (ball) a very small amount of choice in choosing its destination server (bin). The scenarios considered can be divided into two categories: sequential, where each job can be placed at a server before the next job arrives, and parallel, where the jobs arrive in large batches that must be dealt with simultaneously. Another, orthogonal, classification of load balancing scenarios is into fixed time and infinite. Fixed time processes are only analyzed for an interval of time that is known in advance, and for all such results thus far either the number of rounds or the total expected number of arrivals at each server is a constant. In the infinite case, there is an arrival process and a deletion process that are both defined over an infinite time line. In this pape...
On Large-Scale Peer-to-Peer Streaming Systems with Network Coding
"... Live peer-to-peer (P2P) streaming has recently received much research attention, with successful commercial systems showing its viability in the Internet. Nevertheless, existing analytical studies of P2P streaming systems have failed to mathematically investigate and understand their critical proper ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Live peer-to-peer (P2P) streaming has recently received much research attention, with successful commercial systems showing its viability in the Internet. Nevertheless, existing analytical studies of P2P streaming systems have failed to mathematically investigate and understand their critical properties, especially with a large scale and under extreme dynamics such as a flash crowd scenario. Even more importantly, there exists no prior analytical work that focuses on an entirely new way of designing streaming protocols, with the help of network coding. In this paper, we seek to show an in-depth analytical understanding of fundamental properties of P2P streaming systems, with a particular spotlight on the benefits of network coding. We show that, if network coding is used according to certain design principles, provably good performance can be guaranteed, with respect to high playback qualities, short initial buffering delays, resilience to peer dynamics, as well as minimal bandwidth costs on dedicated streaming servers. Our results are obtained with mathematical rigor, but without sacrificing realistic assumptions of system scale, peer dynamics, and upload capacities. For further insights, streaming systems using network coding are compared with traditional pull-based streaming in large-scale simulations, with a focus on fundamentals, rather than protocol details. The scale of our simulations throughout this paper exceeds 200, 000 peers at times, which is in sharp contrast with existing empirical studies, typically with a few hundred peers involved.

