Results 1  10
of
60
Improving the reliability of Internet paths with onehop source routing
 In OSDI
, 2004
"... Recent work has focused on increasing availability in the face of Internet path failures. To date, proposed solutions have relied on complex routing and pathmonitoring schemes, trading scalability for availability among a relatively small set of hosts. This paper proposes a simple, scalable approach ..."
Abstract

Cited by 175 (9 self)
 Add to MetaCart
(Show Context)
Recent work has focused on increasing availability in the face of Internet path failures. To date, proposed solutions have relied on complex routing and pathmonitoring schemes, trading scalability for availability among a relatively small set of hosts. This paper proposes a simple, scalable approach to recover from Internet path failures. Our contributions are threefold. First, we conduct a broad measurement study of Internet path failures on a collection of 3,153 Internet destinations consisting of popular Web servers, broadband hosts, and randomly selected nodes. We monitored these destinations from 67 PlanetLab vantage points over a period of seven days, and found availabilities ranging from 99.6 % for servers to 94.4 % for broadband hosts. When failures do occur, many appear too close to the destination (e.g., lasthop and endhost failures) to be mitigated through alternative routing techniques of any kind. Second, we show that for the failures that can be addressed through routing, a simple, scalable technique, called onehop source routing, can achieve close to the maximum benefit available with very low overhead. When a path failure occurs, our scheme attempts to recover from it by routing indirectly through a small set of randomly chosen intermediaries. Third, we implemented and deployed a prototype onehop source routing infrastructure on PlanetLab. Over a three day period, we repeatedly fetched documents from 982 popular Internet Web servers and used onehop source routing to attempt to route around the failures we observed. Our results show that our prototype successfully recovered from 56 % of network failures. However, we also found a large number of server failures that cannot be addressed through alternative routing. Our research demonstrates that onehop source routing is easy to implement, adds negligible overhead, and achieves close to the maximum benefit available to indirect routing schemes, without the need for path monitoring, history, or apriori knowledge of any kind. 1
The Power of Two Random Choices: A Survey of Techniques and Results
 in Handbook of Randomized Computing
, 2000
"... ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately ..."
Abstract

Cited by 137 (6 self)
 Add to MetaCart
ITo motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n= log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n= log d + (1) with high probability [ABKU99]. The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e.,...
Performance Modeling and System Management for Multicomponent Online Services
, 2005
"... Many dynamiccontent online services are comprised of multiple interacting components and data partitions distributed across server clusters. Understanding the performance of these services is crucial for efficient system management. This paper presents a profiledriven performance model for cluster ..."
Abstract

Cited by 117 (13 self)
 Add to MetaCart
Many dynamiccontent online services are comprised of multiple interacting components and data partitions distributed across server clusters. Understanding the performance of these services is crucial for efficient system management. This paper presents a profiledriven performance model for clusterbased multicomponent online services. Our offline constructed application profiles characterize component resource needs and intercomponent communications. With a given component placement strategy, the application profile can be used to predict system throughput and average response time for the online service. Our model differentiates remote invocations from fastpath calls between colocated components and we measure the network delay caused by blocking intercomponent communications. Validation with two J2EEbased online applications show that our model can predict application performance with small errors (less than 13 % for throughput and less than 14% for the average response time). We also explore how this performance model can be used to assist system management functions for multicomponent online services, with case examinations on optimized component placement, capacity planning, and costeffectiveness analysis.
How Useful Is Old Information
 IEEE Transactions on Parallel and Distributed Systems
, 2000
"... AbstractÐWe consider the problem of load balancing in dynamic distributed systems in cases where new incoming tasks can make use of old information. For example, consider a multiprocessor system where incoming tasks with exponentially distributed service requirements arrive as a Poisson process, the ..."
Abstract

Cited by 108 (9 self)
 Add to MetaCart
(Show Context)
AbstractÐWe consider the problem of load balancing in dynamic distributed systems in cases where new incoming tasks can make use of old information. For example, consider a multiprocessor system where incoming tasks with exponentially distributed service requirements arrive as a Poisson process, the tasks must choose a processor for service, and a task knows when making this choice the processor queue lengths from T seconds ago. What is a good strategy for choosing a processor in order for tasks to minimize their expected time in the system? Such models can also be used to describe settings where there is a transfer delay between the time a task enters a system and the time it reaches a processor for service. Our models are based on considering the behavior of limiting systems where the number of processors goes to infinity. The limiting systems can be shown to accurately describe the behavior of sufficiently large systems and simulations demonstrate that they are reasonably accurate even for systems with a small number of processors. Our studies of specific models demonstrate the importance of using randomness to break symmetry in these systems and yield important rules of thumb for system design. The most significant result is that only small amounts of queue length information can be extremely useful in these settings; for example, having incoming tasks choose the least loaded of two randomly chosen processors is extremely effective over a large range of possible system parameters. In contrast, using global information can actually degrade performance unless used carefully; for example, unlike most settings where the load information is current, having tasks go to the apparently least loaded server can significantly hurt performance. Index TermsÐLoad balancing, stale information, old information, queuing theory, large deviations. æ 1
The natural workstealing algorithm is stable
 In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS
, 2001
"... In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatoralloca ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
In this paper we analyse a very simple dynamic workstealing algorithm. In the workgeneration model, there are n (work) generators. A generatorallocation function is simply a function from the n generators to the n processors. We consider a fixed, but arbitrary, distribution D over generatorallocation functions. During each timestep of our process, a generatorallocation function h is chosen from D, and the generators are allocated to the processors according to h. Each generator may then generate a unittime task which it inserts into the queue of its host processor. It generates such a task independently with probability λ. After the new tasks are generated, each processor removes one task from its queue and services it. For many choices of D, the workgeneration model allows the load to become arbitrarily imbalanced, even when λ < 1. For example, D could be the point distribution containing a single function h which allocates all of the generators to just one processor. For this choice of D, the chosen processor receives around λn units of work at each step and services one. The natural workstealing algorithm that we analyse is widely used in practical applications and works as follows. During each time step, each empty
Cluster Load Balancing for Finegrain Network Services
 IN PROC. OF INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM, FORT LAUDERDALE, FL
, 2002
"... This paper studies cluster load balancing policies and system support for finegrain network services. Load balancing on a cluster of machines has been studied extensively in the literature, mainly focusing on coarsegrain distributed computation. Finegrain services introduce additional challenges ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
This paper studies cluster load balancing policies and system support for finegrain network services. Load balancing on a cluster of machines has been studied extensively in the literature, mainly focusing on coarsegrain distributed computation. Finegrain services introduce additional challenges because system states fluctuate rapidly for those services and system performance is highly sensitive to various overhead. The main contribution of our work is to identify effective load balancing schemes for finegrain services through simulations and empirical evaluations on synthetic workload and real traces. Another contribution is the design and implementation of a load balancing system in a Linux cluster that strikes a balance between acquiring enough load information and minimizing system overhead. Our study concludes that: 1) Random polling based loadbalancing policies are wellsuited for finegrain network services; 2) A small poll size provides sufficient information for load balancing, while an excessively large poll size may in fact degrade the performance due to polling overhead; 3) Discarding slowresponding polls can further improve system performance.
Analyses of Load Stealing Models Based on Differential Equations
 In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1998
"... In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to m ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to model a large variety of systems and to provide accurate numerical approximations of system behavior even when the number of processors is relatively small. We show how this approach can yield significant intuition about the behavior of work stealing algorithms in realistic settings.
Stability of load balancing algorithms in dynamic adversarial systems
 In Proc. of the 34th ACM Symp. on Theory of Computing (STOC
, 2002
"... Abstract. In the dynamic load balancing problem, we seek to keep the job load roughly evenly distributed among the processors of a given network. The arrival and departure of jobs is modeled by an adversary restricted in its power. Muthukrishnan and Rajaraman (1998) gave a clean characterization of ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In the dynamic load balancing problem, we seek to keep the job load roughly evenly distributed among the processors of a given network. The arrival and departure of jobs is modeled by an adversary restricted in its power. Muthukrishnan and Rajaraman (1998) gave a clean characterization of a restriction on the adversary that can be considered the natural analogue of a cut condition. They proved that a simple local balancing algorithm proposed by Aiello et. al. (1993) is stable against such an adversary if the insertion rate is restricted to a (1 − ε) fraction of the cut size. They left as an open question whether the algorithm is stable at rate 1. In this paper, we resolve this question positively, by proving stability of the local algorithm at rate 1. Our proof techniques are very different from the ones used by Muthukrishnan and Rajaraman, and yield a simpler proof and tighter bounds on the difference in loads. In addition, we introduce a multicommodity version of this load balancing model, and show how to extend the result to the case of balancing two different kinds of loads at once (obtaining as a corollary a new proof of the 2commodity MaxFlow MinCut Theorem). We also show how to apply the proof techniques to the problem of routing packets in adversarial systems. Awerbuch et. al. (2001) showed that the same load balancing algorithm is stable against an adversary inserting
Localityaware and churnresilient load balancing algorithms in structured peertopeer networks
 IEEE Trans. Parallel and Distributed Systems
, 2007
"... Abstract—Structured peertopeer overlay networks, like distributed hash tables (DHTs), map data items to the network based on a consistent hashing function. Such mapping for data distribution has an inherent load balance problem. Data redistribution algorithms based on randomized matching of heavil ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Structured peertopeer overlay networks, like distributed hash tables (DHTs), map data items to the network based on a consistent hashing function. Such mapping for data distribution has an inherent load balance problem. Data redistribution algorithms based on randomized matching of heavily loaded nodes with light ones can deal with the dynamics of DHTs. However, they are unable to consider the proximity of the nodes simultaneously. There are other methods that rely on auxiliary networks to facilitate localityaware load redistribution. Due to the cost of network construction and maintenance, the localityaware algorithms can hardly work for DHTs with churn. This paper presents a localityaware randomized loadbalancing algorithm to deal with both the proximity and network churn at the same time. We introduce a factor of randomness in the probing of lightly loaded nodes in a range of proximity. We further improve the efficiency by allowing the probing of multiple candidates (dway) at a time. Simulation results show the superiority of the localityaware twoway randomized algorithm in comparison with other random or localityaware algorithms. In DHTs with churn, it performs no worse than the best churnresilient algorithm. It takes advantage of node capacity heterogeneity and achieves good load balance effectively even in a skewed distribution of items. Index Terms—Cycloid, distributed hash table, peertopeer, load balancing, heterogeneity, proximity. Ç 1
On Balls and Bins with Deletions
 In Proc. of the RANDOM'98
, 1998
"... Microsystems. The views and conclusions contained here are those of the authors and should not be interpreted as necessarily representing the official policies or ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Microsystems. The views and conclusions contained here are those of the authors and should not be interpreted as necessarily representing the official policies or