Results 1  10
of
521
Cilk: An Efficient Multithreaded Runtime System
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1995
"... Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "cri ..."
Abstract

Cited by 750 (40 self)
 Add to MetaCart
(Show Context)
Cilk (pronounced "silk") is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the "work" and "criticalpath length" of a Cilk computation can be used to model performance accurately. Consequently, a Cilk programmer can focus on reducing the computation's work and criticalpath length, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of "fully strict" (wellstructured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal. The Cilk
Scheduling Multithreaded Computations by Work Stealing
, 1994
"... This paper studies the problem of efficiently scheduling fully strict (i.e., wellstructured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMDstyle computation is “work stealing," in which processors needing work steal com ..."
Abstract

Cited by 572 (43 self)
 Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., wellstructured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMDstyle computation is “work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good workstealing scheduler for multithreaded computations with dependencies. Specifically, our analysis shows that the ezpected time Tp to execute a fully strict computation on P processors using our workstealing scheduler is Tp = O(TI/P + Tm), where TI is the minimum serial ezecution time of the multithreaded computation and T, is the minimum ezecution time with an infinite number of processors. Moreover, the space Sp required by the execution satisfies Sp 5 SIP. We also show that the ezpected total communication of the algorithm is at most O(TmS,,,P), where S, is the site of the largest activation record of any thread, thereby justifying the folk wisdom that workstealing schedulers are more communication eficient than their worksharing counterparts. All three of these bounds are existentially optimal to within a constant factor.
The implementation of the cilk5 multithreaded language
 In PLDI ’98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
, 1998
"... The fth release of the multithreaded language Cilk uses a provably good \workstealing " scheduling algorithm similar to the rst system, but the language has been completely redesigned and the runtime system completely reengineered. The eciency of the new implementation was aided by a clear st ..."
Abstract

Cited by 493 (30 self)
 Add to MetaCart
(Show Context)
The fth release of the multithreaded language Cilk uses a provably good \workstealing " scheduling algorithm similar to the rst system, but the language has been completely redesigned and the runtime system completely reengineered. The eciency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this \workrst " principle has led to a portable Cilk5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the workrst principle was exploited in the design of Cilk5's compiler and its runtime system. In particular, we present Cilk5's novel \twoclone " compilation strategy and its Dijkstralike mutualexclusion protocol for implementing the ready deque in the workstealing scheduler.
The Merge/Purge Problem for Large Databases
 In Proceedings of the 1995 ACM SIGMOD
, 1995
"... Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsiste ..."
Abstract

Cited by 352 (3 self)
 Add to MetaCart
Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsistent and often incorrect fashion. The problem we study here is the task of merging data from multiple sources in as efficient manner as possible, while maximizing the accuracy of the result. We call this the merge/purge problem. In this paper we detail the sorted neighborhood method that is used by some to solve merge/purge and present experimental results that demonstrates this approach may work well in practice but at great expense. An alternative method based upon clustering is also presented with a comparative evaluation to the sorted neighborhood method. We show a means of improving the accuracy of the results based upon a multipass approach that succeeds by computing the Transitive Clos...
APPROXIMATION ALGORITHMS FOR SCHEDULING UNRELATED PARALLEL MACHINES
, 1990
"... We consider the following scheduling problem. There are m parallel machines and n independent.jobs. Each job is to be assigned to one of the machines. The processing of.job j on machine i requires time Pip The objective is to lind a schedule that minimizes the makespan. Our main result is a polynomi ..."
Abstract

Cited by 266 (6 self)
 Add to MetaCart
We consider the following scheduling problem. There are m parallel machines and n independent.jobs. Each job is to be assigned to one of the machines. The processing of.job j on machine i requires time Pip The objective is to lind a schedule that minimizes the makespan. Our main result is a polynomial algorithm which constructs a schedule that is guaranteed to be no longer than twice the optimum. We also present a polynomial approximation scheme for the case that the number of machines is fixed. Both approximation results are corollaries of a theorem about the relationship of a class of integer programming problems and their linear programming relaxations. In particular, we give a polynomial method to round the fractional extreme points of the linear program to integral points that nearly satisfy the constraints. In contrast to our main result, we prove that no polynomial algorithm can achieve a worstcase ratio less than ~ unless P = NP. We finally obtain a complexity classification for all special cases with a fixed number of processing times.
Project scheduling under uncertainty: Survey and research potentials
 European Journal of Operational Research
, 2005
"... 2 The vast majority of the research efforts in project scheduling assume complete information about the scheduling problem to be solved and a static deterministic environment within which the precomputed baseline schedule will be executed. However, in the real world, project activities are subject ..."
Abstract

Cited by 133 (20 self)
 Add to MetaCart
2 The vast majority of the research efforts in project scheduling assume complete information about the scheduling problem to be solved and a static deterministic environment within which the precomputed baseline schedule will be executed. However, in the real world, project activities are subject to considerable uncertainty, that is gradually resolved during project execution. In this survey we review the fundamental approaches for scheduling under uncertainty: reactive scheduling, stochastic project scheduling, stochastic GERT network scheduling; fuzzy project scheduling, robust (proactive) scheduling and sensitivity analysis. We discuss the potentials of these approaches for scheduling projects under uncertainty.
Practical Skew Handling in Parallel Joins
 IN PROCEEDINGS OF THE 18TH VLDB CONFERENCE
, 1992
"... We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on nonskewed data. The main idea is to use multiple algorithms, each ..."
Abstract

Cited by 126 (10 self)
 Add to MetaCart
(Show Context)
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on nonskewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skewhandling parallel join algorithms; one, which wecall virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the rst reported skewhandling numbers from an actual implementation.
The Structure and Complexity of Nash Equilibria for a Selfish Routing Game
, 2002
"... In this work, we study the combinatorial structure and the computational complexity of Nash equilibria for a certain game that models sel sh routing over a network consisting of m parallel links. We assume a collection of n users, each employing a mixed strategy, which is a probability distribu ..."
Abstract

Cited by 122 (28 self)
 Add to MetaCart
(Show Context)
In this work, we study the combinatorial structure and the computational complexity of Nash equilibria for a certain game that models sel sh routing over a network consisting of m parallel links. We assume a collection of n users, each employing a mixed strategy, which is a probability distribution over links, to control the routing of its own assigned trac. In a Nash equilibrium, each user sel shly routes its trac on those links that minimize its expected latency cost, given the network congestion caused by the other users. The social cost of a Nash equilibrium is the expectation, over all random choices of the users, of the maximum, over all links, latency through a link.
On The Granularity And Clustering Of Directed Acyclic Task Graphs
 IEEE Transactions on Parallel and Distributed Systems
, 1990
"... Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definiti ..."
Abstract

Cited by 110 (22 self)
 Add to MetaCart
(Show Context)
Clustering has been used as a compile time preprocessing step in the scheduling of task graphs on parallel architectures. A special case of the clustering problem arises in scheduling an unbounded number of completely connected processors. Using a generalization of Stone's granularity definition, the impact of the granularity on clustering strategies is analyzed. A clustering is called linear if every cluster is one simple directed path in the task graph; otherwise is called nonlinear. For coarse grain directed acyclic task graphs (DAGs), a completely connected architecture with unbounded number of processors and under the assumption that task duplication is not allowed, the following property is shown: For every nonlinear clustering there exists a linear clustering with less or equal parallel time. This property, along with a performance bound for linear clustering algorithms, shows that linear clustering is the best choice for coarse grain DAGs. It provides a theoretical justificati...
Achieving NearOptimal Traffic Engineering Solutions for Current OSPF/ISIS Networks
 IEEE/ACM Transactions on Networking
, 2002
"... Traffic engineering is aimed at distributing traffic so as to "optimize" a given performance criterion. The ability to carry out such an optimal distribution depends on both the routing protocol and the forwarding mechanisms in use in the network. In IP networks running the OSPF or ISIS p ..."
Abstract

Cited by 105 (4 self)
 Add to MetaCart
(Show Context)
Traffic engineering is aimed at distributing traffic so as to "optimize" a given performance criterion. The ability to carry out such an optimal distribution depends on both the routing protocol and the forwarding mechanisms in use in the network. In IP networks running the OSPF or ISIS protocols, routing is over shortest paths, and forwarding mechanisms are constrained to distributing traffic uniformly over equal cost shortest paths. These constraints often make achieving an optimal distribution of traffic impossible. In this paper, we propose and evaluate an approach, based on manipulating the set of next hops for routing prefixes, that is capable of realizing near optimal traffic distribution without any change to existing routing protocols and forwarding mechanisms. In addition, we explore the tradeoff that exists between performance and the overhead associated with the additional configuration steps that our solution requires. The paper's contributions are in formulating and evaluating an approach to traffic engineering for existing IP networks that achieves performance levels comparable to that offered when deploying other forwarding technologies such as MPLS.