40 citations found. Retrieving documents...
Eager, Derek L. and Zahorjan, John { Chores: Enhanced run-time support for shared memory parallel computing { ACM Transaction on Computer Systems, 11(1), 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
The Mercury User's Manual - Kontothanassis (1993)   (1 citation)  (Correct)

....structure is turned into a thread structure that has the ability to yield a processor and be resumed any number of times. Templates are most efficient when the programming style involves run to completion tasks, in which case several optimizations can be performed [Fowler and Kontothanassis, 1992; Eager and Zahorjan, 1991] Such optimizations include stack hand off to subsequent threads, thus avoiding the need for queue operations to reuse the stack and elimination of context switching to the scheduler thread that is to determine the next runnable user thread. Mercury employs both optimizations making templates an ....

Derek L. Eager and John Zahorjan, "Chores: Enhanced Run-Time Support for Shared-Memory Parallel Computing," Technical Report 91-08-05, Department of Computer Science and Engineering, University of Washington, August 1991.


The Performance Impact of Granularity Control - And Functional Parallelism   (Correct)

....finish. The Psyche [16] system has facilities for user level threads in which many tasks normally performed by the kernel, such as interrupt handling and preemptions, are handled at the user level. Like nanoThreads, it relies on multiple virtual processors sharing the same address space. Chores [6] is a paradigm for the exploitation of loop and functional parallelism. It allows dynamic scheduling at the user level and the expression of dependences between tasks without explicit synchronization. 11 Conclusion In this paper we have demonstrated the benefits of exploiting functional ....

Derek Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. A CM Transactions on Computer Systems, 11(1), February 1993.


Dynamic Partitioning in Different Distributed-Memory.. - Islam, Prodromidis.. (1996)   (9 citations)  (Correct)

....them reconfigurable. We present one such approach. We assume that each application can use all of the nodes allocated to it during its lifetime. We also assume that these applications can be decomposed into the structure depicted in Fig. 1. This structure has been variously called bag of tasks[1, 10], master slave parallelism[25] and task queue model [15, 5] Each application consists of a coordinator process along with a set of worker processes as shown in Fig. 1. When an application starts it spawns a set of worker processes and the logically centralized coordinator. Each worker process is ....

J. D. L. Eager and J. Zahorjan. Chores:Enhanced Run-Time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, Feb. 1993.


Software Support for Distributed and Parallel Computing - Freeh (1996)   (Correct)

....to the same pool. In such cases, the package at run time switches to code that iterates over the filaments, generating the arguments in registers rather than reading the filament descriptors from memory. Filaments currently recognizes a few common patterns that support 3 Systems such as Chores [22] and the Uniform System [53] have a fine grain specification and a coarse grain execution model, but use preprocessor support. Filaments generates different codes at compile time and chooses among them at run time. 34 Generator: i : 0 to 2; j : 0 to 4 Matcher Pattern Pool: 15 Filament ....

....parallelism. TAM defines an abstract machine of self scheduling parallel threads, which is used as an intermediate language that is mapped to existing processors, whereas Filaments defines a portable subroutine library, which is used to specify parallelism in a traditional, imperative way. Chores [22] is similar to both Filaments and the Uniform System. Chores runs on top of Presto on a Sequent Symmetry. It uses a central ready queue, but servers take jobs in chunks. This amortizes the lock overhead of the central ready queue. Like Filaments, Chores has no private stacks per thread, no context ....

[Article contains additional citation context not shown here]

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Parallel Programming with Parallel Sets in C - Michael Kilian Commercial   (Correct)

....compute time than other slices (in this case 81 of the longest running thread) it may indicate that objects should be moved out of the longer running slices to slice 4. If the skew is more pronounced, it may indicate that a more opportunistic method than slicewise decomposition should be used ( EZ93] explores different mechanisms for scheduling parallel tasks) 8 7 Conclusion This paper has attempted to outline what parallel sets are and how they can be used to construct parallel programs, especially programs that lack a regular structure. The key elements of the model are: ffl ParaSet ....

Derek L. Eager and John Zahorjan. Chores: Enhanced Run-Time Support for Shared Memory Parallel Computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


The Advantages of Multiple Parallelizations in.. - Crowl, Crovella.. (1994)   (4 citations)  (Correct)

....small vertex) they can be used individually or in tandem. In this paper, we focus on the tradeoffs between tree parallelism and loop parallelism. These two kinds of parallelism are particularly important because they occur in many problems that have a similar structure: expression evaluation [Eager and Zahorjan, 1993], parallel quicksort [Chen et al. 1984] and the Barnes Hut multibody algorithm [Singh et al. 1992] Although the presence of both tree parallelism and loop parallelism in backtracking search has been noted by other researchers [Wah et al. 1985; Natarajan, 1987; Finkel and Manber, 1987; Rao and ....

....efficiently. The VCODE compiler for the Encore Multimax [Chatterjee, 1993] and the NESL compiler for the Connection Machine CM 2 [Blelloch et al. 1993] support nested data parallelism, which can be used to express both types of parallelism in subgraph isomorphism. The Chores runtime system [Eager and Zahorjan, 1993] and the parallelizing Fortran compiler for the iWarp [Subhlok et al. 1993] are recent examples of programming systems designed to support both data parallelism (e.g. parallel loops) and task (or functional) parallelism (e.g. tree search) Our work with control abstraction in parallel ....

Derek L. Eager and John Zahorjan, "Chores: Enhanced Run-Time Support for Shared-Memory Parallel Computing," ACM Transactions on Computer Systems, 11:1--32, February 1993.


Obtaining Efficient Single-Processor Performance From.. - Lowenthal, Greene (1999)   (Correct)

....generates multiple versions of a threaded function at compile time through macro substitution, including a coarse grain version. At run time, if it determines that the threads are generated in a certain pattern, it switches from the fine grain to the coarse grain version. A related idea is Chores [EZ93] which uses a preprocessor that can statically convert a fine grain program to a coarse grain program. The Problem Each of the above approaches has drawbacks. Run time code selection has the basic drawback that the resulting program is coarse grain, which defeats the advantages of fine grain ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Affinity Scheduling of Unbalanced Workloads - Saskatoon (1993)   (Correct)

....not needed for each unit of sequential execution. In the lowest cost implementations, units of sequential execution that share address spaces (often termed lightweight processes or threads) are implemented outside the operating system, in a user level thread library (user level threads) 9] 20] [21] [33] although they may also be implemented in the operating system (kernel threads) Ideally, the availability of cheap threads would allow the programmer to structure a program in a very natural way, representing each logically separate unit of sequential computation by a separate thread. In ....

....in light of the overhead of thread management in the system in use. Thus the one process per processor program structure is common in the shared memory model as well (particularly in a form in which the processes are workers retrieving units of work from a queue maintained in shared memory) [21], ensuring negligible process management overhead. Communication overhead is introduced by interaction between processes. Communication manifests itself as cache misses in multiprocessors with caches, as nonlocal memory accesses (and also possibly cache misses) in machines that support a shared ....

D. L. Eager, J. Zahorjan, "Chores: Enhanced Run-Time Support for Shared Memory Parallel Computing", ACM Transactions on Computer Systems, Vol. 11, No. 1 (February 1993), pp. 1-32.


Usability of Parallel I/O Templates - Parsons, Unrau, al.   (Correct)

....a lack of portability between different operating systems, architectures, and even changes in the physical layout of the files. This paper proposes a design for high level parallel I O templates within the auspices of a parallel programming system (PPS) Examples of these systems can be found in [1 3, 7, 12, 13, 15, 23, 31]. A PPS could use these parallel I O templates along with templates for parallel computation to implement the desired parallel behaviour. The PPS integrates all components of developing, compiling, running, debugging, and evaluating the performance of a parallel application. That is, the ....

D. L. Eager and J. Zahorjan, "Chores: Enhanced Run-time Support for Shared Memory Parallel Computing," ACM Transactions on Computer Systems, 11(1), pp. 1-32, 1993.


Loop Re-ordering and Pre-fetching at Run-time - Suvas Vajracharya (1997)   (2 citations)  (Correct)

....first class objects which can be put together to describe complex loops. By putting together and specializing these objects, the user specializes the system to create a software systolic array for the application at hand. This object oriented model is based on AWESIME [7] and the Chores [6] run time systems. The following is a list of objects in Dude: ffl Data Descriptor: Data Descriptors describe a subsection of the data space. For example, a matrix can be divided into sub matrices with each sub matrix being defined by a data descriptor. The methods on this object, SX( EX( ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM. Trans on Computer Systems, 11(1):1--32, February 1993.


Analyzing the Behavior and Performance of Parallel Programs - Adve (1993)   (20 citations)  (Correct)

....or run time support (and perhaps also programmer annotation) will be required for creating the taskgraph of a given program. Some requisite infrastructure for program analysis is already available in parallelizing compilers such as Jade [RSL93] and others, and in run time systems such as Chores [EaZ93]. In parallelizing compilers, the compiler automatically detects and enforces (a superset of) the data dependencies in a program, and implements the partitioning and scheduling of work. The task graph defined by these data dependencies can be directly constructed, together with the scheduling ....

D. L. EAGER and J. ZAHORJAN, Chores: Enhanced Run-Time Support for SharedMemory Parallel Computing, ACM Trans. on Computer Systems 11, 1 (February 1993), 1-32.


Thread Optimizations in Concurrent Object Oriented Languages - Wang (1998)   (1 citation)  (Correct)

....discuss these two approaches in more detail. Some benchmark tests for them can be found in the next chapter. 6.2 Related Work Many efficient optimization schemas have been proposed to improve the performance of fine grained multiple threaded systems. Some of them like Filament [32, 31] Chores [12], Cilk [6] and Leapfrogging [38] use stack less threads. Some of them, like Concert[30, 9, 22] ABCL[35] and TAM[10] use a compiler to generate the code which supports the frame management and the context switching. In this section, we will study some of these thread schemas. 6.2.1 Stack less ....

....than the coarse grain one. Because state less threads do not have private stacks, they can not save their current states during the execution. The problem with this kind of threads is that they cannot make a blocking call. This restricts the usage of threads in a general system, like Kan. Chores[12] also uses stack less threads. Chores runtime system puts threads into a central ready queue as tasks. The server takes tasks in chunks so it can reduce the lock overhead on a central ready queue. A server can be a system thread that has a stack. Chores is more flexible than Filament because user ....

Derek L. Eager and John Zahorjan. Chores: enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, February 1993.


On the Implementation and Effectiveness of Autoscheduling - Moreira (1995)   (16 citations)  (Correct)

....model for computation and communication. Substantial effort has been put into the development of lightweight threads to reduce management overhead, allowing processors to switch between tasks in a very short time, and to exploit more levels of parallelism. Examples of such work can be found in [3, 2, 5, 10, 19, 11]. In particular, 3] presents data structure and algorithm alternatives for thread management in shared memorymultiprocessors, including centralized and distributed ready queues. Some memory management issues for parallel programs are addressed in [40] Issues on the partitioning of the physical ....

Derek Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1), February 1993.


Autoscheduling in a Shared Memory Multiprocessor - Moreira, Polychronopoulos   (Correct)

....to processors is done statically, at compile time, based on a cost model for computation and communication. Substantial effort has been put into the development of lightweight threads, which allow processors to switch between tasks in a very short time. Examples of such work can be found in [3, 10, 9, 14, 2]. 8 Conclusion We have shown that autoscheduling can be efficiently implemented in a shared memory multiprocessor based on commercial microprocessors. We have demonstrated the feasibility of an autoscheduling compiler, capable of generating scalable binary code, that executes correctly and ....

Derek Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1), February 1993.


Filaments: Efficient Support for Fine-Grain Parallelism - Engler, Andrews, Lowenthal (1994)   (21 citations)  (Correct)

....For example, Lin and Snyder [Lin90] found a fine grain implementation of Jacobi iteration using the Sequent Symmetry s parallel programming library to be 8 to 23 times slower than a coarse grain one, depending on the problem size and number of processors. More recent work, such as Chores [Eage93], has been able to get efficient performance by automatically clustering fine grain tasks into larger units, but executing fine grain tasks independently is still 10 to 20 times slower, again for Jacobi iteration on a Sequent Symmetry. This paper describes a threads package, Filaments, that ....

....program was around 10 slower than the coarse grain one. Of course, no user would ever want to eliminate common subexpressions manually. It just shows that thread packages are not inherently a lot less efficient when executing fine grain programs, as has been previously claimed, e.g. in [Lin90, Eage93]. For fine grain programs in which threads are statically assigned to servers, a good compiler might be able to eliminate common subexpressions. In any event, constants can always be saved across thread executions. Also, because at least some loops are eliminated in fine grain programs, more ....

[Article contains additional citation context not shown here]

Eager, Derek L., and Zahorjan, John. Chores: enhanced run-time support for shared-memory parallel computing. ACM Trans. on Computer Systems 11, 1 (Feb. 1993), 1-32.


An Object-Oriented Run-time System for Parallel Programming - MacDonald (1996)   (Correct)

....has an outgoing arc to every node in the application. Since a node can only fire when data is on all incoming arcs, the controller can be used to precisely control the application. 3.2. 7 Chores Chores is an object oriented system that was written to be used by programmers or as a compiler target [5]. It was targeted for uniform memory access shared memory machines, although the authors believe the system generalizes. It attempts to handle granularity, scheduling, and synchronization at run time rather than leaving these issues to the programmer or compiler. Chores express both functional ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run--time support for shared--memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


PI/OT, Parallel I/O Templates - Parsons, Unrau, Schaeffer, Szafron (1997)   (Correct)

....parallel programming system (PPS) to help the user implement the parallel I O requirements. One of the advantages of using a PPS is to shield the user from the low level details of implementating parallel requirements. Many examples of a PPS (with varying degrees of sophistication) can be found in [1 3, 7, 12, 13, 15, 24, 32]. A PPS could use these parallel I O templates along with their model for parallel computation to implement the desired parallel behaviour. The PPS integrates all components of developing, compiling, running, debugging, and evaluating the performance of a parallel application. That is, the ....

D. L. Eager and J. Zahorjan, "Chores: Enhanced Run-time Support for Shared Memory Parallel Computing," ACM Transactions on Computer Systems, 11(1), pp. 1-32, 1993.


Dynamic Partitioning in Different Distributed-Memory.. - Islam, Prodromidis, al. (1996)   (9 citations)  (Correct)

....them reconfigurable. We present one such approach. We assume that each application can use all of the nodes allocated to it during its lifetime. We also assume that these applications can be decomposed into the structure depicted in Figure 1. This structure has been variously called bag of tasks[1, 10], master slave parallelism[25] and task queue model [5, 14] Each application consists of a coordinator process along with a set of worker processes as shown in Figure 1. When an application starts it spawns a set of worker processes and the logically centralized coordinator. Each worker process ....

J. D. L. Eager and J. Zahorjan. Chores:Enhanced Run-Time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, Feb. 1993.


Portable, Efficient Futures - Wagner, Calder (1992)   (Correct)

....final attack on this problem, we demonstrate how to block deal the work to the workers. The precedence graph shown in Figure 19 gives no clue how to do this. Our solution is shown in Figure 23. The block deal algorithm is tricky because it requires the dependences of the program to be known, as in [3]. A deadlock will result if an iteration that is currently being executed is dependent on an iteration later in the same worker s block of work. The most straightforward way to insure the correct ordering is to embed the original iteration spawning code in every BlockFuture. BF[i] simply ....

Eager, D. L., and Zahorjan, J. Chores: Enhanced run-time support for shared-memory parallel computing. Tech. rep., University of Washington, 1991.


Migrant Threads on Process Farms: Parallel Programming with.. - Mascarenhas, Rego (1995)   (Correct)

....stack overflow and increases computation granularity. A related technique for increasing the granularity of fine grained applications is found in Lazy Task Creation [26] It is also possible to enhance stateful threads with support for low cost fine grained operations as in the Chores system [12]. An interesting feature of programming with Ariadne, as demonstrated above, is the relative ease with which one can move from sequential code to parallel code. This occurs because sequential and parallel versions of code do not differ much. Differences occur primarily in function main( see ....

D. L. Eager and J. Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Mercury: Object-Affinity Scheduling and Continuation.. - Fowler, Kontothanassis (1994)   (1 citation)  (Correct)

....items ( atomic tasks ) that run to completion once they are started. Other abstractions consistent with the underlying mechanisms include coarse grain dataflow computations, actors [1] remote (interprocessor) object invocation [7] and constructs that reduce overhead by scheduling atoms in groups [10, 22]. In order to minimize the overhead of creating and scheduling tasks Mercury provides two kinds of schedulable entities, both of which can use O AS. A Thread 1 These techniques are now common to many well tuned thread packages. Faust and Levy [11] applied a similar set of modifications to their ....

Derek L. Eager and John Zahorjan. "Chores: Enhanced Run-Time Support for Shared-Memory Parallel Computing". Technical Report 91-08-05, Department of Computer Science and Engineering, University of Washington, August 1991.


Application of an Object-Oriented Parallel Run-Time System to.. - Clive Baillie   (Correct)

....the concept of dependence and use specification in the runtime system, we can also execute multiple parallel operations concurrently. The QGMG program must solve two multigrid problems to advance a single time step. A traditional runtime system, or even advanced systems such as the Chores model [17], must sequentially schedule the computation in each doall or loop nesting. By allowing all operations to be evaluated in parallel, we increase the scheduling opportunities, allowing the runtime system to select a better schedule. The iteration space is initially subdivided into fixed sized ....

D.L. Eager and J. Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Trans. on Computer Systems, 11(1):1--32, February 1993.


Compilation of Scientific Programs into Multithreaded and.. - Holm, Lain, Banerjee (1994)   (14 citations)  (Correct)

....as well using a hybrid approach [27] Most of these packages provide a flexibility that we do not need in our basic threads model. Due to its simplicity, CSIM [25] was chosen as our threads package. Other software methods for multithreading include Active Messages [14] Chare Kernel [18] Chores [28], and NewThreads [13] Active messages are the most similar to our message driven scheme described in this paper because it uses existing hardware and keeps some information about firing the thread within the message. The Chare Kernel [18] is a new language for dynamically creating and scheduling ....

D. L. Eager and J. Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11:1--32, February 1993.


The Search for Lost Cycles: A New Approach to Parallel.. - Crovella, LeBlanc (1993)   (11 citations)  (Correct)

....cases is the best parallelization fixed; the choice of which parallelization of subgraph isomorphism performs best varies in all cases by significant margins. Other researchers have also noted that the best parallelization for a given problem can vary depending on the input, machine, or problem [Eager and Zahorjan, 1993; Rao and Kumar, 1989; Subhlok et al. 1993] Examples of these effects are shown in Table 1. This table shows the best running time in seconds for loop and tree parallel implementations, while varying one component of the environment. The underlined entries in the table are the better performing ....

Derek L. Eager and John Zahorjan, "Chores: Enhanced RunTime Support for Shared-Memory Parallel Computing," ACM Transactions on Computer Systems, 11:1--32, February 1993.


Architecture-Independent Parallelism for Both Sharedand.. - Lowenthal, Freeh   (Correct)

....of filaments assigned to the same pool. In such cases, the package at run time switches to code that iterates over the filaments, generating the arguments in registers rather than reading the filament descriptors from memory. Filaments currently recognizes a few 2 Systems such as Chores [EZ93] and the Uniform System [TC88] have a fine grain specification and a coarse grain execution model, but use preprocessor support to create a machine specific executable at compile time. Filaments generates different codes at compile time and chooses among them at run time depending on the ....

....idea to pruning. Markatos et al. MB92] present a thorough study of the tradeoffs between load balancing and locality in shared memory machines with respect to thread scheduling. The most related threads packages that provide efficient fine grain parallelism are the Uniform System [TC88] Chores [EZ93] and TAM [CDG 93] The first two do not support fork join parallelism or a distributed memory machine, and the latter is oriented towards functional programming and a distributed memory machine. Two other subsequent thread packages use an approach similar to Filaments: uThread [Shu95] and ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Experimental Evaluation of Blocking and Non-Blocking.. - Annavaram, Najjar, Roh (1997)   (Correct)

....An extensive discussion of these approaches is beyond the scope of this section. In the absence of architecture support, lightweight threads that support efficient inter thread communication provide an efficient and attractive platform for parallel processing. Further information can be found in [15, 17, 12, 16, 11, 23, 27, 29, 13, 6, 25, 28, 4]. 7 Conclusions This article compares the performance of two data driven multithreaded execution models and their associated synchronization schemes. The blocking model relies on frame based data storage. A frame contains all the data related to a code block. All the thread instances that belong ....

D.L. Eager and J. Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Trans. on Computer Systems, 11(1):1--32, February 1993.


The Performance Impact of Granularity Control and Functional.. - Jos Moreiray   (Correct)

....finish. The Psyche [16] system has facilities for user level threads in which many tasks normally performed by the kernel, such as interrupt handling and preemptions, are handled at the user level. Like nanoThreads, it relies on multiple virtual processors sharing the same address space. Chores [6] is a paradigm for the exploitation of loop and functional parallelism. It allows dynamic scheduling at the user level and the expression of dependences between tasks without explicit synchronization. 11 Conclusion In this paper we have demonstrated the benefits of exploiting functional ....

Derek Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1), February 1993.


Efficient Scheduling Of Parallel Tasks In A Multiprogramming.. - Schouten (1995)   (1 citation)  (Correct)

....in which threads are the basic unit of simulated parallelism. It is useful for simulating parallel systems and taking detailed measurements of such a system. For such threads systems, kernel support is not required as there is, at any given point, only one processor being used. Chores [EZ93] allow sets of threads (for example, multiple iterations of a given loop) to be grouped together, eliminating the overhead involved in creating each thread separately. This allows the structures containing the threads to be reused. It is especially useful for loop parallelism as it allows elegant ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 93.


Efficient Support for Fine-Grain Parallelism on.. - Lowenthal, Freeh.. (1999)   (2 citations)  (Correct)

....pool, and at run time switches 3 In the Filaments package, the only situation where preemption provides a gain occurs when filaments have vastly different workloads. Even then, the performance gain is likely to be outweighed by the overhead of implementing preemption. 4 Systems such as Chores [EZ93] and the Uniform System [TC88] have a fine grain specification and a coarse grain execution model, but use preprocessor support. Filaments generates different codes at compile time and chooses among them at run time. to code that iterates over the filaments, generating the arguments in registers ....

....parallelism. TAM defines an abstract machine of self scheduling parallel threads, which is used as an intermediate language that is mapped to existing processors, whereas Filaments defines a portable subroutine library, which is used to specify parallelism in a traditional, imperative way. Chores [EZ93] is similar to both Filaments and the Uniform System. Chores runs on top of Presto on a Sequent Symmetry. It uses a central ready queue, but servers take jobs in chunks. This amortizes the lock overhead of the central ready queue. Like Filaments, Chores has no private stacks per thread, no context ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Speculative Execution in Real-Time Systems - Ghosh (1995)   (Correct)

....Mohr et al. examine a technique for creating tasks only when resources favor task creation in a fine grain MIMD programming model. Eager and Zahorjan consider the merits and demerits of thread based models of user level concurrency, and propose an extension of the work heap model, called Chores [EZ93] Freeh et al. forward another extension of a lightweight threads model with minimal context, called Distributed Filaments [FLA94] The threads model used in our framework has similarities with these efforts. The up down calls between the various levels of our multi level scheduler is similar ....

Derek L. Eager and John Zahorjan. Chores: Enhanced Run-Time Support for Shared-Memory Parallel Computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Towards a Model for Portable Parallel Performance: Exposing.. - Alpern, Carter (1993)   (12 citations)  (Correct)

....jwcbmc01, Vers 01.01 MAY 1992) expose EXPOSING THE MEMORY HIERARCHY 7 as divide and conquer and dynamic programming are tools of good algorithm design. In particular, an effective way to accommodate the memory hierarchy is to recursively partition the computation into a collection of chores [Eag93], where each chore is a set of computations that share a substantial amount of data. The data relevant to each chore should sit nicely within the blocks (e.g. pages and cache lines) that are moved as an indivisible unit by the hardware. The entire computation is a sequence of chores. During the ....

D. Eager and J. Zahorjan. Chores: enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computing Systems, 11, pages 1--32, February 1993.


Distributed Filaments: Efficient Fine-Grain Parallelism.. - Freeh, Lowenthal.. (1994)   (46 citations)  (Correct)

....This not only increases the number of messages but increases the likelihood of a load balance denial (because only two have sufficient work) 4. 4 Binary Expression Trees The fork join paradigm can also be used to compute the value of a binary expression tree, an application described in [EZ93] The leaves are matrices and interior operators are matrix multiplication; the tree is traversed in parallel and the matrices are multiplied sequentially. Figure 7 contains the results of running the matrix expression program with 70 by 70 matrices and a balanced binary tree of height 7. The ....

....in Jacobi iteration. Hence, general purpose threads packages are most useful for providing coarse grain parallelism or for structuring a large concurrent system. A few threads packages support efficient fine grain parallelism, e.g. the Uniform System [TC88] Shared Filaments [EAL93] and Chores [EZ93] The first two restrict the generality of the threads model. In particular, the Uniform System uses task generators to provide parallelism, much in the way a parallelizing compiler works, and in Shared Filaments a thread (filament) cannot block. On the other hand, Chores uses PRESTO threads as ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Loop Re-Ordering and Pre-Fetching at Run-time - Vajracharya, Grunwald (1997)   (2 citations)  (Correct)

....first class objects which can be put together to describe complex loops. By putting together and specializing these objects, the user specializes the system to create a software systolic array for the application at hand. This object oriented model is based on AWESIME [7] and the Chores [6] run time systems. The following is a list of objects in DUDE: Data Descriptor: Data Descriptors describe a subsection of the data space. For example, a matrix can be divided into sub matrices with each sub matrix being defined by a data descriptor. The methods on this object, SX( EX( SY( ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM. Trans on Computer Systems, 11(1):1-32, February 1993.


Using Fine-Grain Threads and Run-Time Decision Making in .. - Lowenthal, Freeh.. (1996)   (8 citations)  (Correct)

....consists of inlining the body of each filament rather than making a procedure call. In particular, when processing a pool, a server thread executes a loop, the body of which is the code specified by filaments in the pool. This eliminates a function call for each 1 Systems such as Chores [EZ93] and the Uniform System [TC88] have a fine grain specification and a coarse grain execution model, but use preprocessor support. Filaments generates different codes at compile time, but chooses among them at run time. filament, but the server thread still has to traverse the list of filament ....

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared memory parallel computing. ACM Transactions on Computer Systems, 11(1):1--32, February 1993.


Using Runtime Measured Workload Characteristics in.. - Nguyen, Vaswani.. (1996)   (31 citations)  Self-citation (Zahorjan)   (Correct)

....the 3 representative applications used in this study. tion boundaries; the applications examine and adjust to the number of available processors each time they begin an iteration, but do not do so while executing any one iteration. It is clearly possible to do much more dynamic scheduling, e.g. [20, 1, 6, 14]; we did not do so because of the very large incremental implementation cost relative to our more restrictive change, and because we expect that ST EQUI would perform even better when jobs are more responsive to changes in their allocations. Of the three policies, STEQUI reallocates processors ....

D. L. Eager and J. Zahorjan. Chores: Enhanced RunTime Support for Shared-Memory Parallel Computing. ACM Transactions on Computer Systems, 11(1):1--32, Feb. 1993.


Using Runtime Measured Workload Characteristics in Parallel.. - Thu Nguyen Raj (1996)   (31 citations)  Self-citation (Zahorjan)   (Correct)

....scheduling only at the level of iteration boundaries: the applications examine and adjust to the number of available processors each time they begin an iteration, but do not do so while executing any one iteration. It is clearly possible to do much more dynamic scheduling (see, for example, [19, 6, 13]) we did not do so because of the very large incremental implementation cost relative to our more restrictive change, and because we anticipated that the additional benefits of this added flexibility at the application level would be quite modest. 3.2 Using Runtime Measurements: ST EQUI The ....

D. L. Eager and J. Zahorjan. Chores: Enhanced Run-Time Support for Shared-Memory Parallel Computing. ACM Transactions on Computer Systems, 11(1):1--32, Feb. 1993.


Using Runtime Measured Workload Characteristics in Parallel.. - Thu Nguyen Raj (1996)   (31 citations)  Self-citation (Zahorjan)   (Correct)

....application level dynamic scheduling only at iteration boundaries; the applications examine and adjust to the number of available processors each time they begin an iteration, but do not do so while executing any one iteration. It is clearly possible to do much more dynamic scheduling, e.g. [20, 1, 6, 14]; we did not do so because of the very large incremental implementation cost relative to our more restrictive change, and because we expect that ST EQUI would perform even better when jobs are more responsive in responding to allocation changes. Of the three policies, ST EQUI reallocates ....

D. L. Eager and J. Zahorjan. Chores: Enhanced Run-Time Support for Shared-Memory Parallel Computing. ACM Transactions on Computer Systems, 11(1):1--32, Feb. 1993.


Fair Threads in C - Boussinot (2003)   (Correct)

No context found.

Eager, Derek L. and Zahorjan, John { Chores: Enhanced run-time support for shared memory parallel computing { ACM Transaction on Computer Systems, 11(1), 1993.


A Scalable Multi-Discipline, Multiple-Processor Scheduling.. - James Barton Nawaf (1995)   (11 citations)  (Correct)

No context found.

D. Eager, J. Zahorjan, "Chores: Enhanced Run-time Support for Shared-Memory Parallel Computing", ACM Transactions on Computer Systems, February 1993.


Parallel Performance Prediction Using Lost Cycles Analysis - Crovella, LeBlanc (1994)   (41 citations)  (Correct)

No context found.

Derek L. Eager and John Zahorjan. Chores: Enhanced run-time support for shared-memory parallel computing. ACM Transactions on Computer Systems, 11:1--32, February 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC