36 citations found. Retrieving documents...
FW. Burton and MR. Sleep. Executing Functional Programs on a Virtual Tree of Processors. In FPCA '82, Int. Conference on Functional Programming Languages and Computer Architecture, pages 187 -- 194, Portsmouth, New Hampshire, October 1982.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Multithreaded Constraint Programming and Applications - Zabatta   (Correct)

....anomalies [LS,84] which can adversely effect load balance and speedup. Thus, another option is to divide the search space dynamically so that a thread with no work can obtain work from a busy thread. This is often referred to as dynamic load balancing. Several dynamic load balancing schemes exist [BS,81] WLY,85] RK,87] PB,90] MD,92] Some dynamic load balancing schemes are receiver initiated, i.e. when a processor runs out of work, it makes a request to another 48 processor for work. Other schemes use a farming model, where one processor is used to manage the search space while other ....

Burton, F.W.; Sleep, M.R.: Executing functional programs on a virtual tree of processors, Proceedings of the ACM Conference on Functional Programming Languages and Computer Architectures, 1981.


Multiprocessor Support for Event-Driven Programs - Zeldovich, Yip, Dabek.. (2003)   (2 citations)  (Correct)

....threads. This simple rule distributes callbacks approximately evenly among the worker threads. It also preserves the order of activation of callbacks with the same color and may improve cache locality. If a worker thread s task queue is empty it attempts to steal work from another thread s queue [9]. Work must be stolen at the granularity of all callbacks of the same color and the color to be stolen must not be executing currently to preserve guarantees on ordering of callbacks within the same color. libasync smp consults a per thread field containing the currently running color to guarantee ....

BURTON, F., AND SLEEP, M. Executing functional programs on a virtual tree of processsors. In Proceedings of the 1981.


Executing Multithreaded Programs Efficiently - Blumofe (1995)   (12 citations)  (Correct)

....but until our results, studies of work stealing have been based on heuristic notions and the algorithmic work has focused on particularly simple types of computations, such as the backtrack search already discussed. The workstealing idea dates back at least as far as Burton and Sleep s research [20] on parallel exe cution of functional programs and Halstead s implementation of Multilisp [51, 52] These researchers observed that heuristically, by having each processor work as if it is the only one (i.e. in serial, depth first order) and having idle processors steal threads from others, ....

....one heuristic and one algorithmic. First, to lower communication costs, we would like to steal large amounts of work, and in a tree structured computation, shallow threads are likely to spawn more work than deep ones. This heuristic notion is the justification cited by earlier researchers [20, 42, 52, 77, 103] who proposed stealing work that is shallow in the spawn tree. We cannot, however, prove that shallow threads are more likely to spawn work than deep ones. What we prove in Section 5.3 is the following algorithmic property. The threads that are on the critical path in the dag, are always at the ....

F. Warren Burton and M. Ronan Sleep. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981.


Balancing Load When Service Times Are Heavy-Tailed - Carroll (2000)   (Correct)

....most log log n= log d O(1) balls in the fullest bin with high probability. 1 Previously, Gonnet [13] had shown that purely random placement resulted in a maximum of approximately log n= log log n balls per bin with high probability. Work stealing has been around at least since Burton and Sleep [6] and Halstead [14] used it to schedule functional programs on multi processors. More recently, Blumofe and Leiserson have adapted it for the Cilk multi threaded runtime environment ( 4] and [5] Mitzenmacher [27] presents an extension of his work in [29] that develops tools to study the behavior ....

F. Burton and R. Sleep. Executing functional programs on a virtual tree of processors. In Proceedings of the


Dynamic Thread Creation: An Asynchronous Load Balancing Scheme .. - Zabatta, Ying (1998)   (Correct)

....to the number of processors being used. However, many factors that stem from work division can adversely effect the speedup of a parallel search. This has created a demand for load balancing which often shifts the focus away from solving the actual problem to resource allocation. Many researchers [4][5] 6] 7] have developed schemes for resource allocation with an emphasis on obtaining a theoretical speedup. Although, good results have been achieved, the schemes are either message based or require a management component. This further complicates the programming model and makes the task of ....

....anomalies [10] can lead to load imbalances and consequently a poor speedup. Another option is to divide the search space dynamically so that a thread with no work can obtain work from a busy thread. This is often referred to as dynamic load balancing. Several dynamic load balancing schemes exist [4][5] 6] 11] for example receiver initiated schemes like Nearest Neighbor, Global Round Robin and Random Polling. Most of these schemes were originally designed for distributed systems and are message based. The schemes have been successfully ported to tightly coupled systems by using shared memory ....

F.W. Burton and M.R. Sleep, Executing functional programs on a virtual tree of processors, Conference on Functional Programming Languages and Computer Architectures, 1981


The Data Locality of Work Stealing - Acar, Blelloch, Blumofe (2000)   (2 citations)  (Correct)

....very difficult when the program has complicated data access patterns. Perhaps the earliest class of techniques was to attempt to execute threads that are close in the computation graph on the same processor [1, 9, 20, 23, 26, 28] The work stealing algorithm is the most studied of these techniques [9, 11, 19, 20, 24, 36, 37]. Blumofe et al. showed that fully strict computations achieve a provably good data locality [7] when executed with the work stealing algorithm on a dag consistent distributed shared memory systems. In recent work, Narlikar showed that work stealing improves the performance of space efficient ....

....that do not have any RW or WW sharing as race free computations. In this paper we consider only race free computations. The work stealing algorithm is a thread scheduling algorithm for multithreaded computations. The idea of work stealing dates back to the research of Burton and Sleep [11] and has been studied extensively since then [2, 9, 19, 20, 24, 36, 37] In the work stealing algorithm, each process maintains a pool of ready threads and obtains work from its pool. When a process spawns a new thread the process adds the thread into its pool. When a process runs out of work and ....

F. Warren Burton and M. Ronan Sleep. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pages 187--194, Portsmouth, New Hampshire, October 1981.


Scheduling Threads for Low Space Requirement and Good Locality - Narlikar (1999)   (6 citations)  (Correct)

....thread creation and scheduling are typically local operations, they incur low overhead and contention. Further, threads close together in the computation graph are often scheduled on the same processor, resulting in good locality. Several systems have used work stealing to provide high performance [11, 17, 18, 20, 26, 39, 42, 44]. When each processor treats its own ready queue as a LIFO stack (that is, adds or removes threads from the top of the stack) and steals from the bottom of another processor s stack, the scheduler successfully throttles the excess parallelism [8, 39, 41, 44] For fully strict computations, such a ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Proc. ACM Conf. on Functional Programming Languages and Computer Architecture, pages 187--194, 1981. 17


A Cost Analysis for a Higher-order Parallel Programming Model - Rangaswami (1996)   (19 citations)  (Correct)

....Speed ups have been obtained on example programs, as compared to their corresponding sequential implementations on the G machine. ffl Other Systems There are several other parallel graph reduction machines that have been built. These include ZAPP (Zero Assignment Parallel Processor) BS81] MaRS (Machine a R eduction Symbolique) C 89] HDG (Highly Distributed Graph) Machine [LKB91] etc. Some functional language implementations employ parallel graph reduction techniques. Two Chapter 2 Related Work 20 such systems are Alfalfa and Buckwheat [Gol89, Gol88] which are ....

F W Burton and M R Sleep. Executing functional programs on a virtual tree of processors. In ACM Conference on Functional Programming Languages and Computer Architecture, pages 187--194. ACM, 1981. 187


Efficient Scheduling of Strict Multithreaded Computations - Fatourou, Spirakis (1999)   (Correct)

....Designing a scheduler to achieve all of the above goals is not a trivial task. There are three main performance parameters for scheduling algorithms for multithreaded computations, namely their space complexity, their execution time and the communication cost incurred by them. Several systems [16, 23, 27, 30, 35] have used work stealing to achieve the scheduling goals described above and to provide high performance. Work stealing is a technique in which underutilized processors try to steal work from other, hopefully overutilized processors. Indeed, work stealing has been proved (see e.g. 1, 8, 12, ....

....been proposed for parallel computing. The model of parallelism supported by a language determines the style in which threads may be created or synchronized in the language. Imposing no restrictions on the programming model leads to very complicated computations, which are dicult (or impossible [13, 16]) to be scheduled e ciently. On the other hand, such restrictions are undesirable, since they decrease the exibility and expressiveness of parallel programming languages. Narlikar [42] describes ve models of 3 parallelism in increasing order of exibility and expressiveness. In the at model ....

[Article contains additional citation context not shown here]

F. W. Burton and M. R. Sleep, \Executing functional programs on a virtual tree of processors," Proceedings of the Conference on Functional Programming Languages and Computer Architecture, pp. 187-194, Portsmouth, New Hampshire, October 1981.


A New Scheduling Algorithm for General Strict Multithreaded .. - Fatourou, Spirakis (1999)   (1 citation)  (Correct)

....overutilized processors try to migrate some threads to other (hopefully underutilized) processors. On the contrary, in the work stealing paradigm, underutilized processors steal work from other processors. The work stealing paradigm dates back at least as far as Burton and Sleep s research [11] on parallel execution of functional programs and Halstead s implementation of Multilisp [18] Since then a lot of work has been done in this direction (see e.g. 1, 4, 5, 6, 7, 8, 15] Three significant performance parameters of any scheduling algorithm for multithreaded computations are the ....

F. W. Burton and M. R. Sleep, "Executing functional programs on a virtual tree of processors," Proceedings of the Conference on Functional Programming Languages and Computer Architecture, pp. 187--194, Portsmouth, New Hampshire, October 1981.


Scheduling Threads for Low Space Requirement and Good Locality - Narlikar (1999)   (6 citations)  (Correct)

....thread creation and scheduling are typically local operations, they incur low overhead and contention. Further, threads close together in the computation graph are often scheduled on the same processor, resulting in good locality. Several systems have used work stealing to provide high performance [11, 17, 18, 20, 26, 39, 42, 44]. When each processor treats its own ready queue as a LIFO stack (that is, adds or removes threads from the top of the stack) and steals from the bottom of another processor s stack, the scheduler successfully throttles the excess parallelism [8, 39, 41, 44] For fully strict computations, such a ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Proc. ACM Conf. on Functional Programming Languages and Computer Architecture, pages 187--194, 1981.


Space-Efficient Scheduling of Multithreaded Computations - Blumofe, Leiserson (1993)   (30 citations)  (Correct)

....foundation that manages storage by allowing programmers to leverage their knowledge of storage requirements for serially executed programs. Other researchers have also addressed the storage issue by attempting to relate parallel storage requirements to serial storage requirements. Burton and Sleep [12] and Halstead [22] for example, considered unfair scheduling policies based on thread stealing. In these thread stealing strategies, each processor works depth first just like a serial execution but when a processor runs out of ready threads, it steals threads from other processors. In many ....

F. W. Burton and M. R. Sleep, Executing functional programs on a virtual tree of processors, in Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, Portsmouth, New Hampshire, Oct. 1981, pp. 187--194. SCHEDULING MULTITHREADED COMPUTATIONS 27


The Performance of Work Stealing in Multiprogrammed.. - Blumofe, Papadopoulos (1998)   (5 citations)  (Correct)

....mechanisms are not yet available in any commercial operating system of which we are aware. Finally, we point out that our use of work stealing and non blocking synchronization builds upon a long history in both areas, though they did not meet until now. The idea of work stealing goes back to 1981 [16] and has been used in many systems and applications since [20, 21, 27, 42, 46] The first provably efficient work stealing algorithm [15] and implementation [14] is fairly recent, however. The idea of non blocking and wait free synchronization was developed by Herlihy [29] There has been a long ....

F. Warren Burton and M. Ronan Sleep. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pages 187--194, Portsmouth, New Hampshire, October 1981.


Space-Efficient Scheduling of Parallelism with.. - Blelloch, Gibbons, .. (1997)   (13 citations)  (Correct)

....of languages with dynamic parallelism, and by the fact that parallel computations are often memory limited. A poor schedule can require exponentially more space than a good schedule [11] Early solutions to the space problem considered various heuristics to reduce the number of active threads [10, 23, 34, 16, 26]. More recent work has considered provable bounds on space usage. The idea is to relate the space required by the parallel execution to the space s1 required by the sequential execution. Burton [11] first showed that for a certain class of computations the space required by a parallel ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Proc. Conf. on Functional Programming Languages and Computer Architecture, pages 187--194, October 1981.


Une Taxonomie Des Algorithmes D'allocation Dynamique De Processus.. - Talbi   (Correct)

.... faut noter qu il existe des algorithmes d allocation dynamique qui ont et e con cus pour des langages ou des syst emes particuli ers : ffl les langages a parall elisme de donn ees [JBV90] Fon94] ffl les langages logiques et a flot de donn ees [RS87] ffl les langages fonctionnels [LKID89] BS82] HG84] KW91] Mel97] ffl les langages a objets [AH88] Ath87] ffl les syst emes temps r eel [SRC85] ZRS87] ffl les syst emes de gestion de bases de donn ees r eparties [Tho87] 4 Migration de processus dans les syst emes distribu es Le m ecanisme de migration de processus a et e mis en ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Proc. Conf. Functional Programming Languages and Computer Architecture, Portsmouth, USA, pages 187--194, Oct 1982.


Hood: A User-Level Thread Library for Multiprogramming.. - Papadopoulos (1998)   (Correct)

....mechanisms are not yet available in any commercial operating system of which we are aware. Finally, we point out that our use of work stealing and non blocking synchronization builds upon a long history in both areas, though they did not meet until now. The idea of work stealing goes back to 1981 [16] and has been used in many systems and applications since [21, 22, 28, 44, 50] The first provably efficient work stealing algorithm [15] and implementation [14] is fairly recent, however. The idea of non blocking and wait free synchronization was developed by Herlihy [30] There has been a long ....

F. Warren Burton and M. Ronan Sleep. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, pages 187--194, Portsmouth, New Hampshire, October 1981.


A Framework for Space and Time Efficient Scheduling of.. - Narlikar, Blelloch (1996)   (Correct)

....are typically used to run big problem sizes, reducing memory usage is often as important as reducing running time. Many researchers have addressed this problem in the past. Early attempts to reduce the memory usage of parallel computations were based on heuristics that limited the parallelism [10, 15, 28, 32], and are not guaranteed to be space efficient in general. These were followed by scheduling techniques that provide proven space bounds for parallel programs [6, 8, 9] If S 1 is the space required by the serial execution, these techniques generate executions for a multithreaded computation on p ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Conference on Functional Programming Languages and Computer Architecture, October 1981.


Space-Efficient Implementation of Nested Parallelism - Narlikar, Blelloch (1996)   (8 citations)  (Correct)

....are typically used to run big problem sizes, reducing memory usage is often as important as reducing running time. Many researchers have addressed this problem in the past. Early attempts to reduce the memory usage of parallel computations were based on heuristics that limited the parallelism [10, 16, 32, 36], and are not guaranteed to be space efficient in general. These were followed by scheduling techniques that provide proven space bounds for parallel programs [6, 7, 8, 9] If S1 is the space required by the serial execution, these techniques generate schedules for a multithreaded computation on p ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Conference on Functional Programming Languages and Computer Architecture, October 1981.


Space-Efficient Scheduling of Nested Parallelism - Narlikar, Blelloch (1999)   (4 citations)  (Correct)

....are typically used to run big problem sizes, reducing memory usage is often as important as reducing running time. Many researchers have addressed this problem in the past. Early attempts to reduce the memory usage of parallel computations were based on heuristics that limited the parallelism [Burton and Sleep 1981; Culler and Arvind 1988; Halstead 1985; Rugguero and Sargeant 1987] and are not guaranteed to be space efficient in general. These were followed by scheduling techniques that provide proven space bounds for parallel programs [Blumofe and Leiserson 1993; 1994; Burton 1988; Burton and Simpson ....

Burton, F. W. and Sleep, M. R. 1981. Executing functional programs on a virtual tree of processors. In Conference on Functional Programming Languages and Computer Architecture.


Scheduling Threads for Low Space Requirement and Good Locality - Girija Narlikar (1999)   (6 citations)  (Correct)

....thread creation and scheduling are typically local operations, they incur low overhead and contention. Further, threads close together in the computation graph are often scheduled on the same processor, resulting in good locality. Several systems have used work stealing to provide high performance [12, 18, 19, 21, 27, 41, 43, 45]. When each processor treats its own ready queue as a LIFO stack (that is, adds or removes threads from the top of the stack) and steals from the bottom of another processor s stack, the scheduler successfully throttles the excess parallelism [8, 11, 41, 45] For fully strict computations, such a ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Proc. ACM Conf. on Functional Programming Languages and Computer Architecture, pages 187--194, 1981.


Provably Efficient Scheduling for Languages with Fine-Grained.. - Guy Blelloch (1995)   (28 citations)  (Correct)

....multiplication would require Theta(n 3 ) space, whereas a sequential computation requires only Theta(n 2 ) space. See Figure 1. In order to obtain the same bounds for a parallel implementation, heuristic techniques that limit the amount of parallelism in the implementation have been used [BS81, Hal85, RS87, CA88] but these are not guaranteed to be space efficient in general. In this paper we are interested in specifying universal implementations that guarantee performance bounds, both in terms of time and space. These are specified by placing upper bounds on the running time and the ....

F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Proc. Conf. on Functional Programming Languages and Computer Architecture, pages 187--194, October 1981.


A Simple Implementation of Divide and Conquer Parallelism.. - Roe   (Correct)

....designing, writing and porting parallel programs is difficult. Data parallel programming is widely recognised as the simplest paradigm for parallel programming. Divide and conquer (D C) parallelism represents a particularly expressive form of data parallelism, which has been advocated by many [2, 3, 7 9]. This is more powerful than the simple data parallelism offered by Fortran 90 and C ; in particular it supports recursive parallel computation over arrays as advocated by Mou [8, 9] This enables basic operations such as scan and aggregation to be expressed. It also enables many other parallel ....

....not the easiest or most natural method in which to express parallel programs. In particular data parallelism, where applicable, is a much simpler programming paradigm. Divide and conquer (D C) parallelism has been advocated by many, and represents a particularly powerful form of data parallelism [2, 3, 7 9]. Essentially data parallelism involves performing the same operation, in parallel, over a collect of data items. For a multicomputer this means distributing data items over processors, and running the same program on each processor, over local data items (SPMD parallelism) Consider the scan data ....

[Article contains additional citation context not shown here]

F W Burton and M R Sleep. Executing functional programs on a virtual tree of processors. In Conference on Functional Programming Languages and Computer Architecture, pages 187--194, Portsmouth, New Hampshire, October 1982.


GRAPH for PVM: Graph Reduction for Distributed Hardware.. - Kevin Hammond Department (1994)   (Correct)

No context found.

FW. Burton and MR. Sleep. Executing Functional Programs on a Virtual Tree of Processors. In FPCA '82, Int. Conference on Functional Programming Languages and Computer Architecture, pages 187 -- 194, Portsmouth, New Hampshire, October 1982.


Load Balancing in a Parallel Graph Reducer - Loidl (2002)   (Correct)

No context found.

F.W. Burton and M.R. Sleep. Executing Functional Programs on a Virtual Tree of Processors. In FPCA'81 --- Conf. on Functional Programming Languages and Computer Architecture, pages 187--194, Portsmouth, USA, 1981.


The HDG-Machine: A Highly Distributed Graph-Reducer for a.. - Kingdon (1991)   (22 citations)  (Correct)

No context found.

F.W. Burton and M.R. Sleep. Executing functional programs on a virtual tree of processors. In Proceedings of the First Conference on Functional Programming and Computer Architecture, pages 187--194, Portsmouth, New Hampshire, October 1982.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC