| Robert H. Thomas and Will Crowther. The Uniform System: An approach to runtime support for large scale shared memory parallel processors. In Proceedings of the 1988. |
....remote accesses or access forwarding; this removes the choice of the amount of data to send in response to a request which is critical to our work. Most research projects in the area of NUMA architectures have implemented a shared memory programming model; the best known is BBN s Uniform System [49], and it is typical in that it directly exports the non uniform memory structure to users. Our work supports automatic management mechanisms that free users from some of the details involved in managing non uniform memory, and should make these machines easier to program. Other researchers have ....
Robert Thomas and Will Crowther. The Uniform System: An approach to runtime support for large scale shared memory multiprocessors. In Proc. of 1988.
....process which may share its processor with a worker process. Our application of a NUMA performance model to PPF systems assumes the following points: ffl There is no global operating system on distributed memory machines as there is on traditional NUMA machines with a global address space [ 44 ] However, a central farmer process for each farm can take the place of the operating system in respect to scheduling. ffl PPF applications are medium grained, without the comparatively fine granularity of NUMA loop iterations. This is because the access latency to remote memory is an order of ....
R. T. Thomas and W. Crowther. The Uniform system: An approach to runtime support for large scale shared memory parallel processors. In International Conference on Parallel Processing, volume 2, pages 245--254, August 1988.
....of work. To avoid an uneven assignment of units of work to processors, many shared memory programming systems use a central work queue from which either idle processors remove light weight threads (the central queue is a ready queue in this case) or idle worker processes remove units of work [14] [15] A central queue facilitates a dynamic, even distribution of load among processors, and ensures that no processor remains idle while there is work to be done. Although both load balancing and locality management policies attempt to improve the performance of the system, conflicts can arise ....
R. H. Thomas, W. Crowthier, "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors", Proceedings of the
....cases, the package at run time switches to code that iterates over the filaments, generating the arguments in registers rather than reading the filament descriptors from memory. Filaments currently recognizes a few common patterns that support 3 Systems such as Chores [22] and the Uniform System [53] have a fine grain specification and a coarse grain execution model, but use preprocessor support. Filaments generates different codes at compile time and chooses among them at run time. 34 Generator: i : 0 to 2; j : 0 to 4 Matcher Pattern Pool: 15 Filament descriptor (i,j) 0,0) 0,1) ....
....a thorough study of the tradeoffs between load balancing and locality in shared memory machines. Keppel [34] describes a portable threads package that supports efficient barrier synchronization and non pre emptive threads. Some threads packages support fine grain parallelism. The Uniform System [53], built for the BBN Butterfly, has several things in common with Filaments: There are no private stacks per thread, no context switches, and threads are not preemptable. The Uniform System s synchronous mode supports a simple form of iterative threads, and their finalization code is equivalent to ....
Robert H. Thomas and Will Crowther. The Uniform system: An approach to runtime support for large scale shared memory parallel processors. In 1988 Conference on Parallel Processing, pages 245--254, August 1988.
....g.clear; When a thread succeeds, the result is enqueued in g. After retrieving the result, g.clear is called. A thread that finds that CONFIG: clear request is true can then self abort. 17 We have so far only assumed a uniform shared memory model. In such a model, various runtime systems [215, 33] have demonstrated the feasibility of good load balancing techniques. Therefore the deferred assignment statement only provides the functionality of thread creation and leaves load balancing to the runtime system. But when we generalize the model to a more sophisticated one (Section 2.4.3) in ....
....load balancing among clusters, our current approach programmer managed inter cluster load balancing is practical and efficient, but does not offer a completely location transparent model. On the other hand, for intra cluster load balancing, there are various runtime systems (e.g. Uniform [215], Presto [94] and any state of the art operating system for symmetric multiprocessors e.g. Solaris, Mach) which perform well for shared memory machines. In a machine with 1 processor per cluster, such systems can be used to provide automatic load balancing within a cluster. Our model restricts ....
Robert H. Thomas and Will Crowther. The Uniform System: An Approach to Runtime Support for Large Scaled Shared Memory Parallel Processors. In Proceedings of the 1988 International Conference on Parallel Processing, pages 245--254, August 1988. 311
....as virtual PEs that are executed by whatever number of physical PEs are assigned to the processor set [65] A variant of this approach is used in the Uniform System, also on the BBN Butterfly. In this system the shared workpile is only used if multiple instantiations of the same chore are required [573]. None of these systems support automatic changes in PE allocation, but they could do so without changing their user interfaces. On the other hand, using the workpile model presents two problems. First, the shared data structures needed to distribute chores to workers may become a serial ....
....because there is no reason to expect the different threads to be homogeneous. However, it is possible to apply this optimization in the special case when a large number of threads are spawned at once, and they are all clones of each other. This is done by the Uniform System on the BBN Butterfly [573], by Chameleon [17] by the Chores system [178] and by the MAXI system on the Makbilan [527] 4.4.2 Variations on the Thread Model A totally different type of optimization for two level scheduling involves changes to the thread model. This is based on the observation that both user level ....
R. H. Thomas and W. Crowther, "The Uniform System: an approach to runtime support for large scale shared memory parallel processors". In Intl. Conf. Parallel Processing, vol. II, pp. 245--254, Aug 1988.
....of work. To avoid an uneven assignment of units of work to processors, most shared memory programming systems use a central work queue from which either idle processors remove light weight threads (the central queue is a ready queue in this case) or idle worker processes remove units of work [59] [61] A central queue facilitates a dynamic, even distribution of load among processors, and ensures that no processor remains idle while there is work to be done. This dynamic scheduling scheme is effective because, in a shared memory program on a shared memory multiprocessor, each unit of work ....
....that each processor is initially assigned the same amount of work. More complicated policies use run time assignment (or re assignments, as in migration policies) of work to dynamically adjust the load throughout execution. As described in the previous chapter, the central work queue model [59] [61] is a commonly used technique in shared memory multiprocessors. In this model, units of work are placed on a central ready queue accessible to every processor. When a processor finishes some work (its unit of work completes, blocks, or is preempted) it removes the next entry from the ready ....
R. H. Thomas, W. Crowthier, "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors", Proceedings of the 1988 International Conference on Parallel Processing, August 1988, pp. 245-254.
....algorithms can offer comparable performance to decreasing block size algorithms. Fixed block size self scheduling offers a number of other advantages. It is easier to implement and easier to analyze theoretically. It has already been implemented in runtime systems on a number of multiprocessors [14, 15, 16]. Commercial compilation systems, such as KSR s PRESTO runtime library [5] often use fixed size blocking. Finally, it is much easier to combine fixed block size self scheduling algorithms with data locality than decreasing block size schemes. No commercial compiler of which we are aware can ....
R. H. Thomas and W. Crowther, "The Uniform System: An approach to runtime support for large scale shared memory parallel processors.," in Proceedings of the 1988 International Conference on Parallel Processing, pp. 245--254, Aug. 1988.
....threads package, Filaments, that employs a unique combination of techniques to implement fine grain parallelism directly, efficiently, and portably on sharedmemory multiprocessors. In particular, Filaments synthesizes and extends previous work such as WorkCrews [Vand88] Chores, the Uniform System [Thom88], and TAM [Cull93] The most important technique is stateless threads; i.e. threads do not have a private stack. Thread descriptors are also small, so hundreds of thousands of threads can be supported without exhausting memory. Other techniques include control of thread placement for data ....
....and locality in shared memory machines. Keppel [Kepp93] describes a portable threads package that supports efficient barrier synchronization and non preemptive threads. Threads packages that support finer grain parallelism include the Uniform System, WorkCrews, TAM, and Chores. The Uniform System [Thom88], built for the BBN Butterfly, has several things in common with Filaments: There are no private stacks per thread, no context switches, and threads are not preemptable. The Uniform System s synchronous mode supports a simple form of barrier threads, and their finalization code is equivalent to ....
Thomas, Robert H. and Crowther, Will. The Uniform System: an approach to runtime support for large scale shared memory parallel processors. Proceedings of the 1988 Conference on Parallel Processing, p. 245-254, August 1988.
.... for shared memory by user or compiler generated code performing explicit block or page moves [194, 262] As a result, NUMA architectures implementing a shared memory programming model typically expose the existing memory access hierarchy to the application program, as done in BBN s Uniform System [247]. Motivations for exposing such information include: 1. Giving programmers the ability to minimize relatively expensive remote vs. less expensive local memory references (i.e. maximize program locality [120] and 2. permitting programmers to avoid several forms of potential contention (switch or ....
R. Thomas and W. Crowther. The uniform system: An approach to runtime support for large scale shared memory parallel processors. In Proceedings of the 1988 International Conference on Parallel Processing, V. II -- Software, pages 245--254, August 1988.
....to the same pool. In such cases, the package at run time switches to code that iterates over the filaments, generating the arguments in registers rather than reading the filament descriptors from memory. Filaments currently recognizes a few 2 Systems such as Chores [EZ93] and the Uniform System [TC88] have a fine grain specification and a coarse grain execution model, but use preprocessor support to create a machine specific executable at compile time. Filaments generates different codes at compile time and chooses among them at run time depending on the machine. common patterns that ....
....is a similar idea to pruning. Markatos et al. MB92] present a thorough study of the tradeoffs between load balancing and locality in shared memory machines with respect to thread scheduling. The most related threads packages that provide efficient fine grain parallelism are the Uniform System [TC88] Chores [EZ93] and TAM [CDG 93] The first two do not support fork join parallelism or a distributed memory machine, and the latter is oriented towards functional programming and a distributed memory machine. Two other subsequent thread packages use an approach similar to Filaments: uThread ....
Robert H. Thomas and Will Crowther. The Uniform system: An approach to runtime support for large scale shared memory parallel processors. In 1988 Conference on Parallel Processing, pages 245--254, August 1988.
....analogous coarse grain implementation. As a result, programmers avoid using fine grain threads, despite the many benefits of doing so. Previous work on thread scheduling has focussed on the goal of load balancing. For example, in the process control scheme [Tucker and Gupta, 1989] Uniform System [Thomas and Crowther, 1988], Brown Threads [Doeppner Jr. 1987] and Presto [Bershad et al. 1988] all threads of the same application are placed in a FIFO central work queue. Processors take threads from this queue and run them to completion. The load is evenly balanced in that no processor remains idle as long as there ....
....threads to processors based on the location of data. This is the approach we used in our implementation of memory conscious scheduling within an existing thread package running under the Psyche multiprocessor operating system [Marsh et al. 1991] on the Butterfly. BBN s Uniform System library [Thomas and Crowther, 1988] suggests an alternative approach to implementing memory conscious scheduling on a distributed shared memory machine. The Uniform System is a shared memory, data parallel programming environment. Within a Uniform System program, task generators are used to create parallel tasks (threads) Each ....
R.H. Thomas and W. Crowther, "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors," In Proceedings of the 1988 International Conference on Parallel Processing, pages 245--254, August 1988.
....3 In the Filaments package, the only situation where preemption provides a gain occurs when filaments have vastly different workloads. Even then, the performance gain is likely to be outweighed by the overhead of implementing preemption. 4 Systems such as Chores [EZ93] and the Uniform System [TC88] have a fine grain specification and a coarse grain execution model, but use preprocessor support. Filaments generates different codes at compile time and chooses among them at run time. to code that iterates over the filaments, generating the arguments in registers rather than loading from ....
....study of the tradeoffs between load balancing and locality in shared memory machines. Keppel [Kep93] describes a portable threads package that supports efficient barrier synchronization and non preemptive threads. Many threads packages support more fine grain parallelism. The Uniform System [TC88], built for the BBN Butterfly, has several things in common with Filaments: There are no private stacks per thread, no context switches, and threads are not preemptable. The Uniform System s synchronous mode supports a simple form of iterative threads, and their finalization code is equivalent to ....
Robert H. Thomas and Will Crowther. The Uniform system: an approach to runtime support for large scale shared memory parallel processors. In 1988 Conference on Parallel Processing, pages 245--254, August 1988.
....execute a small number of instructions, as in Jacobi iteration. Hence, general purpose threads packages are most useful for providing coarse grain parallelism or for structuring a large concurrent system. A few threads packages support efficient fine grain parallelism, e.g. the Uniform System [TC88] Shared Filaments [EAL93] and Chores [EZ93] The first two restrict the generality of the threads model. In particular, the Uniform System uses task generators to provide parallelism, much in the way a parallelizing compiler works, and in Shared Filaments a thread (filament) cannot block. On the ....
Robert H. Thomas and Will Crowther. The Uniform system: an approach to runtime support for large scale shared memory parallel processors. In 1988 Conference on Parallel Processing, pages 245--254, August 1988.
.... bag of tasks model, work is divided up into reasonable size pieces [of the problem] and placed in a central repository. Each process removes a piece from the bag, processes it, and possibly updates shared information with the result. Examples of this model of parallelism are the Uniform System [ Thomas and Crowther, 1988 ] or the Problem Heap Paradigm [ Cok, 1991; Moller Nielsen and Staunstrup, 1987 ] This paradigm provides tremendous flexibility since the runtime can choose to run any number processes to work on this technique. An example from the shepherding application is a parallel planner we designed. This ....
R.H. Thomas and A W. Crowther. The uniform system: An approach to runtime support for large scale shared memory parallel processors. In Proceedings of the 1988 International Conference on Parallel Processing, pages 245--254, St. Charles IL, August 1988.
....body of each filament rather than making a procedure call. In particular, when processing a pool, a server thread executes a loop, the body of which is the code specified by filaments in the pool. This eliminates a function call for each 1 Systems such as Chores [EZ93] and the Uniform System [TC88] have a fine grain specification and a coarse grain execution model, but use preprocessor support. Filaments generates different codes at compile time, but chooses among them at run time. filament, but the server thread still has to traverse the list of filament descriptors and load the ....
Robert H. Thomas and Will Crowther. The Uniform system: an approach to runtime support for large scale shared memory parallel processors. In 1988 Conference on Parallel Processing, pages 245--254, August 1988.
....is not yet clear to us whether all uses of address overloading should be considered anachronisms. We adopted a single translation for in core code and data in Psyche, but were not entirely happy with the result. We were forced, for example, to change the semantics of BBN s Uniform System library [58] in order to port it to Psyche. 2 We fear that other programs and programming environments (e.g. Lisp interpreters that use address bits for tags) may also insist on the ability to overload addresses. It is tempting to argue that all such programs are poorly designed, but we are hesitant to do ....
R. H. Thomas and W. Crowther, "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors," Proceedings of the 1988 International Conference on Parallel Processing, V. II - Software, 15-19 August 1988, pp. 245254.
....The following three sections illustrate the advantages of the Psyche approach through a series of examples. 4. Lightweight Processes in a Shared Address Space Programming models in which a single address space is shared by many lightweight processes, such as Presto [6] and the Uniform System [28], can be implemented easily and efficiently on Psyche. A lightweight process scheduler can be implemented as a library package that is linked into an application, creating a single realm and protection domain whose virtual processors share both the scheduling code and a central ready list. Virtual ....
R. H. Thomas and W. Crowther, "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors," Proceedings of the 1988 International Conference on Parallel Processing, V. II - Software, 15-19 August 1988, pp. 245-254.
....in user space. To ensure the integrity of scheduling within the thread package, the kernel provides software interrupts at every point where a scheduler action might be required. In our experiments with Psyche, we have successfully ported or implemented Multilisp futures [11] Uniform System tasks [25], Lynx threads [22] heavyweight single threaded programs, and two different thread libraries. Performance. As in all user level thread packages, the ability to create, destroy, schedule, and synchronize threads without the assistance of the kernel keeps the cost of these operations low. Shared ....
....performance degradation increases to 57 . Similar effects can occur in programs with condition synchronization. One of the most common models of parallel programming employs a collection of worker processes, one per processor, which repeatedly dequeue and execute tasks from a central work queue [11, 25, 26]. One of the things that a task may do is generate more tasks. It will often do so only if it is the last task of a certain kind to finish. Central work queue programs can thus be considered a generalization of barriers; parallel execution continues as long as the queue remains non empty, and ....
R. H. Thomas and W. Crowther, "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors," Proceedings of the 1988 International Conference on Parallel Processing, August 1988, pp. 245-254.
....scheduling is concerned with many of the same issues involved in loop scheduling, including concerns about load imbalance, synchronization overhead, and communication overhead. In many shared memory multiprocessor systems, a single ready queue is the primary mechanism for process scheduling [30, 29, 10, 3]. The attractive load balancing properties of a central ready queue address an important concern in these systems that a processor not remain idle while there is work to be performed. Recent work [1] has shown that a central ready queue can become a bottleneck in these systems, and that local ....
R.H. Thomas and W. Crowther. The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processorst. In Proceedings of the 1988 International Conference on Parallel Processing, pages 245--254, August 1988.
No context found.
Robert H. Thomas and Will Crowther. The Uniform System: An approach to runtime support for large scale shared memory parallel processors. In Proceedings of the 1988.
No context found.
R.H. Thomas and W. Crowther. "The Uni- form System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors ". In Proceedings of the 1988.
No context found.
R.H. Thomas and W. Crowther. "The Uniform System: An Approach to Runtime Support for Large Scale Shared Memory Parallel Processors". In Proceedings of the 1988.
No context found.
R. T. Thomas and W. Crowther. The Uniform system: An approach to runtime support for large scale shared memory parallel processors. In International Conference on Parallel Processing, volume 2, pages 245--254, August 1988.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC