8 citations found. Retrieving documents...
G. J. Narlikar. Scheduling threads for low space requirement and good locality. Theory of Computing Systems, 35(2):151--187, 2002.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
The Data Locality of Work Stealing - Acar, Blelloch, Blumofe (2000)   (2 citations)  (Correct)

....achieve a provably good data locality [7] when executed with the work stealing algorithm on a dag consistent distributed shared memory systems. In recent work, Narlikar showed that work stealing improves the performance of space efficient multithreaded applications by increasing the data locality [29]. None of this previous work, however, has studied upper or lower bounds on the data locality of multithreaded computations executed on existing hardware controlled shared memory systems. In this paper, we present theoretical and experimental results on the data locality of work stealing on ....

Grija J. Narlikar. Scheduling threads for low space requirement and good locality. In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 1999.


Low-Contention Depth-First Scheduling of Parallel Computations.. - Fatourou (2001)   (1 citation)  (Correct)

....improving upon a previous depth first scheduling algorithm [6] published in SPAA 97. Moreover, it is provably efficient for the general class of multithreaded computations with writeonce synchronization variables (as studied in [6] improving upon algorithm DFDeques (published in SPAA 99 [24]) which is only for the more restricted class of nested parallel computations. More specifically, consider such a computation with work T1 , depth T1 and oe synchronizations, and suppose that space S1 suffices to execute the computation on a singleprocessor computer. Then, on a P processor ....

....number IST 1999 14186 (ALCOM FT) dynamic, unstructured parallelism. During the execution of a multithreaded computation, a thread may spawn child threads which can be executed in parallel, and it can synchronize with other currently executing threads. In most of the work in the literature [1, 4, 5, 6, 7, 9, 15, 16, 24, 25, 26, 27], a multithreaded computation is modeled as a directed acyclic graph (see Figure 1(a) Of much concern is how a multithreaded computation can be executed efficiently on a parallel computer. A parallel execution of a multithreaded computation specifies which processor executes each thread and ....

[Article contains additional citation context not shown here]

G. Narlikar. Scheduling threads for low space requirement and good locality. In Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 83--95. Saint-Malo, France, June 1999.


Achieving High Performance for Parallel Programs that Contain.. - Oyama (2000)   (Correct)

....model for a given program. In particular, tools for measuring the critical path of a given program are essential. Determining the critical path length is a well known and extremely effective way of understanding the performance of parallel programs [BL94, BJK 95, BJK 96, BGM95, NB97, Nar99] 1.4 Contributions The principal contributions of our work are as follows. ffl We propose a technique for achieving efficient execution in bottlenecks. It reduces the number of mutual exclusion operations that accompany bottleneck modules and enhances the cache efficiency in the updating of ....

Girija J. Narlikar. Scheduling Threads for Low Space Requirement and Good Locality. In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '99), pages 83--95, Saint Malo, France, June 1999. 153


The Data Locality of Work Stealing - Acar, Blelloch, Blumofe (2000)   (2 citations)  (Correct)

....achieve a provably good data locality [7] when executed with the work stealing algorithm on a dag consistent distributed shared memory systems. In recent work, Narlikar showed that work stealing improves the performance of space efficient multithreaded applications by increasing the data locality [29]. None of this previous work, however, has studied upper or lower bounds on the data locality of multithreaded computations executed on existing hardware controlled shared memory systems. In this paper, we present theoretical and experimental results on the data locality of work stealing on ....

Grija J. Narlikar. Scheduling threads for low space requirement and good locality. In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), June 1999.


Efficient Scheduling of Strict Multithreaded Computations - Fatourou, Spirakis (1999)   (Correct)

....27, 30, 35] have used work stealing to achieve the scheduling goals described above and to provide high performance. Work stealing is a technique in which underutilized processors try to steal work from other, hopefully overutilized processors. Indeed, work stealing has been proved (see e.g. [1, 8, 12, 41]) to achieve a fair combination of the above objectives and to be quite ecient in terms of these performance parameters. Several programming models have been proposed for parallel computing. The model of parallelism supported by a language determines the style in which threads may be created or ....

....of fully strict multithreaded computations, where, additionally, each thread must synchronize with its parent before termination, and they have presented in [12] the rst provably good, fully distributed, work stealing scheduler for such computations. Blelloch et al. 6, 7] as well as Narlikar [41, 42] (and Narlikar et al. 44] have concentrated on the problem of eciently managing the storage requirements of parallel computations. They have proved that for particular nested parallel computations a better bound than the one provided in [12] on the space complexity can be achieved. However, ....

[Article contains additional citation context not shown here]

G. Narlikar, \Scheduling Threads for Low Space Requirement and Good Locality," Proceedings of the 11th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 83-95, Saint-Malo, France, June 1999.


A Parallel, Multithreaded Decision Tree Builder - Narlikar (1998)   (1 citation)  Self-citation (Narlikar)   (Correct)

....32, 31, 26] Increased memory requirements limit the size of the largest dataset than can be used to construct the tree without paging to disk. In general, the running time can be affected even before the physical memory is exhausted [26] Therefore, we use a provably efficient thread scheduler [27] that provides good space and time performance for the multithreaded decision tree builder. In effect, this thread scheduler partitions the work involved with the upper part of the tree across all the processors. However, when the working set size decreases below a user defined threshold (e.g. ....

....scheduler could potentially be extended to other decision tree building techniques. The decision tree building program uses a lightweight, user level implementation of Posix standard threads or Pthreads [17] the Pthreads implementation has been modified to use the space efficient sched1 uler [27]. The results of executing the program on an 8 processor SMP indicate that the implementation scales well with both the number of processors and the number of data instances (records) For example, for an input dataset with 1.6M instances on 8 processors, we obtain a speedup of 6.32 with respect ....

[Article contains additional citation context not shown here]

G.J. Narlikar. Scheduling threads for low space requirement and good locality, October 1998. Submitted for publication.


Scheduling Threads for Low Space Requirement and Good Locality - Girija Narlikar (1999)   (6 citations)  Self-citation (Narlikar)   (Correct)

....from top to bottom. 2. A thread currently executing on a processor has higher priority than all other threads on the processor s deque. 3. The threads in any given deque have higher priorities than threads in all the deques to its right in R. The proof uses induction on the timesteps (see [34] for details) The base case is the start of the computation. We can show that the ordering is maintained when a deque is deleted, or when a thread (a) forks a child thread, b) terminates, c) is preempted, or, d) is stolen by a processor. Work stealing as a special case of algorithm DFDeques . ....

.... Theta(log m) 4.2 Space bound We now analyze the space bound for a parallel computation executed by algorithm DFDeques . The analysis uses several ideas from previous work [2, 6, 36] Due to space limitations, we only present the outline of the proofs; detailed analysis can be found elsewhere [34]. Let G be the dag that represents the parallel computation being executed. Depending on the resulting parallel schedule, we classify its nodes (actions) into one of two types: heavy and light. Every time a processor performs a steal, the first node it executes from the stolen thread is called a ....

[Article contains additional citation context not shown here]

G. J. Narlikar. Scheduling threads for low space requirement and good locality. Technical Report CMU-CS-99-121, Computer Science Department, Carnegie Mellon University, 1999.


Effectively Sharing a Cache Among Threads - Guy Blelloch Carnegie   (Correct)

No context found.

G. J. Narlikar. Scheduling threads for low space requirement and good locality. Theory of Computing Systems, 35(2):151--187, 2002.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC