15 citations found. Retrieving documents...
Chatterjee Siddhartha. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, October 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
PłT+: A Performance Estimator for Distributed and Parallel.. - Pozgaj, Fahringer (2000)   (Correct)

....The key functionality of P 3 T is devoted to compute a set of performance parameters at compile time: ffl Work Distribution The work distribution parameter describes how well the computations of a program are distributed over the set of available processors. As shown by numerous researchers [15, 13, 80, 68, 49, 62, 22, 67, 53, 34], work distribution has a strong influence on the cost performance ratio of a multiprocessor system. An uneven work distribution CHAPTER 3. P 3 T 25 may lead to a significant reduction in a program s performance. Therefore, providing both programmer and compiler with a work distribution ....

Chatterjee Siddhartha. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, October 1991.


On Estimating the Useful Work Distribution of Parallel Programs.. - Fahringer (1996)   (4 citations)  (Correct)

....instantiations for which they own the corresponding sub domain. This naturally specifies the amount of work to be done by each processor and consequently the overall work distribution of a parallel program. Therefore domain decomposition inherently implies a work distribution. It is well known ([8, 5, 3, 28, 26, 18, 22, 6, 25, 19, 13]) that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore providing both programmer and parallelizing compiler with a work distribution parameter ....

Chatterjee Siddhartha. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, October 1991.


Modeling the Benefits of Mixed Data and Task Parallelism - Chakrabarti, Demmel, Yelick (1995)   (24 citations)  (Correct)

....to take advantage of this mixed parallelism. In the theory area, the best known on line scheduling algorithm for mixed parallelism is 2:62 optimal [4, 11] and the best off line algorithm is 2 optimal [25, 17] In the systems area, the Paradigm compiler [20] iWarp compiler [24] and NESL compiler [6] all support limited forms of mixed task and data parallelism, and there are plans to merge data Fortran D with Fortran M [12] and pC with CC [15] to support mixed parallelism. In this paper, we step back from these algorithmic and systems issues and address the question of how much benefit ....

S. Chatterjee. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. Technical Report CMU-CS-91-189, CMU, Pittsburgh, PA 15213, October 1991.


On Using Volume Computation to Estimate the Work Distribution .. - Fahringer, Hong (1995)   (Correct)

....instantiations for which they own the corresponding sub domain. This naturally specifies the amount of work to be done by each processor and consequently the overall work distribution of a parallel program. Therefore domain decomposition inherently implies a work distribution. It is well known ([8, 5, 3, 28, 26, 17, 21, 6, 25, 18, 12]) that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore providing both programmer and parallelizing compiler with a work distribution parameter for ....

Chatterjee Siddhartha. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, October 1991.


An Efficient Implementation of Nested Data Parallelism for.. - Hardwick (1996)   (4 citations)  (Correct)

....choice of team sizes until run time, allowing for some approximate load balancing. Again, there is no dynamic load balancing, and no use of serial code. Previous work on improving the performance of NESL has involved the compilation of VCODE into multi threaded C for a shared memory multiprocessor [14]. The compiler used extensive symbolic loop analysis and program graph clustering to improve locality and reduce synchronization, but retained a flat, implicitly load balanced model. There are now several other nested data parallel languages besides NESL. Proteus is a high level ....

S. Chatterjee. Compiling Data-Parallel Programs for Efficient Execution on Shared-Memory Multiprocessors. PhD thesis, School of Computer Science, Carnegie Mellon University, Oct. 1991.


Efficient Resource Scheduling in Multiprocessors - Chakrabarti (1996)   (1 citation)  (Correct)

....In the theory area, the best known on line scheduling algorithm for mixed parallelism is 2:62 optimal [16, 52] and the best off line algorithm is 2 optimal [113, 87] But these are worst case guarantees. In the systems area, the Paradigm compiler [95] iWarp compiler [110] and NESL compiler [32] all support forms of mixed task and data parallelism, and there are plans to merge data Fortran D with Fortran M [56] and pC with CC [84] to support mixed parallelism. The compiler efforts all depend on static task graph and profile data and perform extensive optimizations to allocate ....

S. Chatterjee. Compiling data-parallel programs for efficient execution on sharedmemory multiprocessors. Technical Report CMU-CS-91-189, CMU, Pittsburgh, PA 15213, October 1991.


Efficient Parallel Algorithms for Closest Point Problems - Peter Su (1994)   (7 citations)  (Correct)

....of parallel architectures. Algorithms for closest point problems have a more dynamic and irregular flavor than the benchmark programs normally used to analyze machine performance. In addition, the algorithms are more sophisticated than some irregular benchmarks that are already in use [Cha91] The dissertation does not address any of the following concerns: Full applications. The algorithms that I study fall into the class of kernel programs. Thus, my results will only reflect the performance of a small percentage of any large application. However, experience from numerical ....

.... have shown that algorithms using these primitives can be efficiently implemented on SIMD and vector architectures [CBZ90] In addition, Chatterjee has shown that with suitable compilers, programs based on these primitives can be efficiently translated to run on shared memory MIMD machines [Cha91] Similar work has also been done for languages that don t explicitly use vector models. Projects like Fortran D [HKT91] Dino [RSW90] Kali [KMR90] Crystal [CCL88] Proteus [MNP 91] 1 That is, RAM algorithms that are both asymptotically efficient and simple enough to have low constant ....

[Article contains additional citation context not shown here]

S. Chatterjee. Compiling Data-parallel Programs for Efficient Execution on SharedMemory Multiprocessors. PhD thesis, School of Computer Science, CMU, 1991.


Segmented Operations for Sparse Matrix Computation on.. - Blelloch, Heroux, Zagha (1993)   (4 citations)  (Correct)

....pack operation can be used to create a new sparse matrix consisting of a subset of elements from another sparse matrix. A segmented copy operation can be used to distribute a different value to the elements of each row. These operations have been efficiently implemented for a variety of machines [6, 7, 10]. 5.4 CSC SEGMV and Symmetric Matrices Segmented vector operations can also be used to implement a column oriented version of sparse matrix multiplication. This could be used along with a row oriented version to process symmetric matrices directly, rather than expanding them into a full ....

S. Chatterjee. Compiling Data-Parallel Programs for Efficient Execution on Shared-Memory Multiprocessors. PhD thesis, School of Computer Science, Carnegie Mellon University, Oct. 1991.


PHANTOM: Parallelization of Hierarchical Applications usiNg.. - Goil (1996)   (Correct)

....Data Parallelism. Of course, it is possible to use a combination of these strategies for optimal scheduling, and such a strategy is referred to as Mixed Parallelism. Several researchers have worked on exploiting mixed parallelism, both in theory [BB90, FST92, LT94, TWY92] and in practice [CDY95, Cha91, RSB94, SSOG93] In a number of problems, all the tasks may not be known in advance but may be generated dynamically as existing tasks are processed. This is the case with problems whose efficient solutions use the divide and conquer strategy. The execution of an instance of such a problem can be ....

S. Chatterjee. Compiling data-parallel programs for efficient execution of shared-memory multiprocessors. Technical Report CMU-CS-91-189, Carnegie Mellon University, 1991.


Concatenated Parallelism: A Technique for Efficient.. - Aluru, Goil, Ranka (1996)   (1 citation)  (Correct)

....the other is called Data Parallelism. Of course, it is possible to use a combination of these strategies for optimal scheduling, and such a strategy is referred to as Mixed Parallelism. Several researchers have worked on exploiting mixed parallelism, both in theory [3, 6, 11, 17] and in practice [4, 5, 13, 16]. In a number of problems, all the tasks may not be known in advance but may be generated dynamically as existing tasks are processed. This is the case with problems whose efficient solutions use the divide and conquer strategy. The execution of an instance of such a problem can be represented by ....

S. Chatterjee, Compiling data-parallel programs for efficient execution of shared-memory multiprocessors, Technical Report No. CMU-CS-91-189, Carnegie Mellon University, Pittsburgh, PA, 1991.


Implementation of a Portable Nested Data-Parallel.. - Blelloch, Hardwick.. (1993)   (97 citations)  Self-citation (Chatterjee)   (Correct)

No context found.

Chatterjee, S. Compiling Data-Parallel Programs for Efficient Execution on Shared-Memory Multiprocessors. PhD thesis, School of Computer Science, Carnegie Mellon University, Oct. 1991.


An Object-Oriented Approach to Nested Data Parallelism - Sheffler, Chatterjee (1995)   (3 citations)  Self-citation (Chatterjee)   (Correct)

....that flattens the nested parallelism in the foreach construct. This new method differentiates our system from others supporting nested parallelism [5, 12] Our project is a direct outgrowth of NESL, a language fully supporting nested data parallelism [4] We owe many of our ideas to it [8, 15]. However, we believe that our programming system offers many practical features over NESL. For one, NESL s runtime system is incompatible with other languages. In contrast, we translate our code into C , and many compilers support mixing C, C and Fortran modules and data. It is currently not ....

....written (sequentially) just like any other type. Note that string initialization is actual compiled into stream input, so that it incurs a runtime overhead proportional to the length of the string. Int x = 3; v Float vx(size(5) v Int vz = 1 2 3 4 5] v v Int vv = 1 2] 3 4 5] [6 7 8]] v Int vy; vy cin; The foreach construct is the basis for parallelism. Its argument list specifies the element type of each vector and gives a new name that refers to an individual element in the body of the foreach. Inside the body, the statements are executed on the corresponding ....

[Article contains additional citation context not shown here]

S. Chatterjee. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors.PhD thesis, Carnegie Mellon University, 1991.


Compiling Nested Data-Parallel Programs for.. - Siddhartha.. (1993)   (3 citations)  Self-citation (Chatterjee)   (Correct)

No context found.

CHATTERJEE,S.CompilingData-Parallel Programs for Efficient Execution on Shared-Memory Multiprocessors. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Oct. 1991. Available as Technical Report CMU-CS-91-189.


Implementation of a Portable Nested Data-Parallel Language - Blelloch (1994)   (97 citations)  Self-citation (Chatterjee)   (Correct)

....interact with each other through side effects. This problem can probably be overcome with aggressive compiler analysis to determine places where there is the possibility of interaction. 3 System Overview Our implementation of Nesl consists of an intermediate language called Vcode [9] a compiler [15, 16] and interpreter for Vcode, and a portable library of parallel routines called Cvl [10] Figure 5 illustrates how the implementation is organized. We split our system along these lines so that we could concentrate on different aspects of the system in isolation. This section gives an overview of ....

....memory. The interpreter uses free lists of varying sized blocks of memory and reference counting on vectors in order to create and destroy vectors of differing and dynamically determined sizes. The algorithm used is fully described in [11] VCODE compiler: Chatterjee s doctoral dissertation [15] discusses the design, implementation, and evaluation of an optimizing Vcode compiler for sharedmemory MIMD machines. There is, of course, a trivial implementation of Vcode on a MIMD machine: each Vcode instruction is written as a parallel loop, and the processors synchronize between ....

S. Chatterjee. Compiling Data-Parallel Programs for Efficient Execution on Shared-Memory Multiprocessors. PhD thesis, School of Computer Science, Carnegie Mellon University, Oct. 1991.


P³T+: A Performance Estimator for Distributed and.. - Pozgaj, Fahringer (2000)   (Correct)

No context found.

Chatterjee Siddhartha. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, October 1991.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC