79 citations found. Retrieving documents...
Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3--28, March 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Parallel Systems in Symbolic and Algebraic Computation - Matooane (2002)   (2 citations)  (Correct)

....synchronization primitive for communication between them. Computation is initiated by a single processor called the root, and data is distributed through task splitting with synchronous communication. The standard PRAM requires shared memory between processors. A modified model is the LPRAM model [1] which describes a non uniform memory access (NUMA) computer consisting of: Unlimited number of processors . A global memory pool accessible to all processors . Each processor has local, separately addressable memory . A processor can have at most one communication request outstanding at ....

....Similarly, optimizing for memory locality will often starve some processors by clustering work on a few processors. Thus achieving good time and space balance is a challenge [60] Several researchers in parallel algorithm analysis consider the time and communication complexity of algorithms [1, 95] where the memory model is fixed. 2.2.5 Parameters for parallel algorithms Parallel applications incur overheads adding to the total computation time. Several parameters may be used to classify and compare parallel implementations. Granularity The algorithms implemented in this research are ....

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3--28, March 1990.


Communication-Efficient Parallel Gaussian Elimination - Tiskin (2003)   (Correct)

.... Gaussian elimination Figure 2: Recursive block Gaussian elimination A lower communication cost for LU decomposition can be achieved by applying the block algorithm recursively (see e.g. 8, 7] This standard method was suggested as a means of reducing the communication cost in [1] (for the transitive closure problem) The BSP cost of block Gauss Jordan elimination was analysed in [17] we summarise the results here for completeness. Given a nonsingular matrix A, the algorithm produces the LU decomposition A = L U , together with the inverse matrices L and U . ....

A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3-28, March 1990.


Scheduling tree-structured programs in the LogP model - Verriet (1997)   (1 citation)  (Correct)

....computer programs: a communication step takes the same amount of time as a local computation step, whereas, in a real parallel computer architecture, a communication step is far more time consuming. There are several PRAM based models that include aspects of real parallel machines, such as latency [2, 3, 21], memory contention [14, 13] and asynchrony [6, 12] The BSP model [22] and the LogP model [7, 9] are models of parallel computation that consist of a collection of processors that communicate using message passing. These models are more realistic, because they include several aspects of real ....

A. Aggarwal, A.K. Chandra and M. Snir. Communication complexity in PRAMs. Theoretical Computer Science, 71:3--28, 1990.


A General-Purpose Model for Heterogeneous Computation - Williams (2000)   (Correct)

....to bring it closer to practical parallel computers. Goodrich [Goo93] and McColl [McC93] survey the PRAM model and its extensions. A brief overview of machine characteristics that have been the focus of efforts to improve the PRAM is given below. 1. Memory Access. The LPRAM (Local memory PRAM) [ACS90] augments the CREW PRAM by associating with each processor an unlimited amount of local private memory. The QRQW (Queue read, queue write) PRAM [GMR94] assumes that simultaneous accesses to the same memory block will be inserted into a request queue and served in a FIFO manner. The cost of a ....

A. Aggarwal, A. K. Chandra, and M. Snir. "Communication complexity of PRAMs." In J. Theoretical Computer Science, March 1990.


Communication Performance of Wormhole Interconnection Networks - Petrini (1997)   (Correct)

....is the Module Parallel Computer. In this model shared global memory is broken in m modules and only one memory access can occur within each module per time step. The L PRAM, or Local memory PRAM, augments the CREW PRAM by associating with each processor an unlimited amount of local private memory [5]. The standard PRAM forces a rigid execution pattern in which all processors are synchronized by a global clock. There are several variants that allow asynchronous execution with irregular synchronization points. as the Asynchronous PRAM. Periodic synchronization between intervals of asynchronous ....

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Communication Complexity of PRAMs. Theoretical Computer Science, pages 3--28, March 1990.


Communication Lower Bounds for Distributed-Memory Matrix.. - Irony, Toledo   (Correct)

....that must be sent and received by at least one of the nodes. The bounds apply even if each processor memory node include several processors, which is a fairly common today in machines ranging from clusters of dual processor workstations to SGI Origin 2000s and IBM SPs. Aggarwal, Chandra, and Snir [3] presented lower bounds on the amount of communication in matrix multiplication and a number of other computations. Their bounds, however, assume a shared memory computational model that does not model well existing computers. Their LPRAM model assumes that P processors, each with a private cache, ....

....words per processor) We show that the amount of communication that they perform per processor, O(n 2 P 1 2 ) is asymptotically optimal for this amount of memory. Section 5 uses a more sophisticated argument to show that so called 3 dimensional (3D) algorithms are also optimal. 3D algorithms [3, 4, 8, 10, 13] replicate the input matrices #(P 1 3 ) times, so they need #(n 2 P 2 3 ) words of memory per processor. But this allows them to reduce the amount of communication to only #(n 2 P 2 3 ) per processor. We show that this amount of communication is optimal for the amount of memory that is ....

[Article contains additional citation context not shown here]

Alok Aggarwal, Ashok Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3--28, 1990.


Trading Replication For Communication In Parallel.. - Irony, Toledo (2002)   (2 citations)  (Correct)

....9060 99) and by the University Research Fund of Tel Aviv University. 2 D. Irony and S. Toledo contiguous in the i , j ,ork dimensions) Prior to our work, this 3D framework was only applied to matrix multiplication algorithms. Such algorithms were first proposed by Aggarwal, Chandra, and Snir [2] and independently by Berntsen [3] Both of these papers were purely theoretical and showed that the total amount of communication in the 3D algorithms is #(n 2 P 1 3 ) where n is the dimension of the matrices and P is the number of processors. In comparison, 2D algorithms transfer a total of ....

Alok Aggarwal, Ashok Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3--28, 1990.


Programming Research Group - Direct Bulk-Synchronous Parallel   (Correct)

.... This phenomenon may be attributed to the fact that most of the complexity of parallel computing is due to the difficulty in communication and synchronization rather than in computation (and obviously this problem will proliferate as the number of available processors increases) As noted in [5], researchers have recognized this issue and have investigated the communication complexity of special purpose parallel machines and VLSI chips [1, 7, 95, 130, 161, 162, 175] Communication complexity is an equally important issue for general purpose parallel machines [115] However, except for a ....

.... recognized this issue and have investigated the communication complexity of special purpose parallel machines and VLSI chips [1, 7, 95, 130, 161, 162, 175] Communication complexity is an equally important issue for general purpose parallel machines [115] However, except for a few papers (i.e. [5, 131, 132]) which analyze communication complexity issues in rather abstract terms, a limited amount of research was directed towards the architecture independent analysis of the communication complexity of parallel algorithms. The purpose of this report is a unified investigation of a wide range of ....

[Article contains additional citation context not shown here]

Aggarwal, A., Chandra, A.K., and Snir, M. "Communication complexity of PRAMs." Theoretical Computer Science, 71:3-28, 1990.


Communication-Efficient Parallel Dense LU Using a.. - Irony, Toledo (2001)   (1 citation)  (Correct)

....perform less communication but use more temporary storage than existing algorithms, which all use a 2 dimensional (2D) approach. Until now, the 3D approach has only been used for parallel matrix multiplication in algorithms that were proposed by Berntsen [3] by Aggarwal, Chandra, and Snir [2], by Gupta and Kumar [5] by Johnsson [7] and by Agarwal, Balle, Gustavson, Joshi, and Palkar [1] 3D algorithms work by distributing the 3D iteration space of the computation among processors. Matrix matrix computations that can be implemented using three nested loops have a natural ....

Alok Aggarwal, Ashok Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3--28, 1990.


Optimum Binary Search Trees On The Hierarchical Memory Model - Thite (2001)   (2 citations)  (Correct)

....is a signi cant bottleneck in multiprocessor architectures, and it becomes more severe as the number of processors increases. In fact, depending on the degree of parallelism of the problem itself, the communication time between processors frequently limits the execution speed. Aggarwal et al. ACS90] proposed the LPRAM model for parallel random access machines that incorporates both the computational power and communication delay of parallel architectures. For this model, they proved upper bounds on both computation time and communication steps using p proces20 sors for a number of ....

A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3-28, 1990.


Parallel Pointer-Based Join Algorithms in Memory Mapped .. - Buhr, Goel, Nishimura, .. (1996)   (1 citation)  (Correct)

....the I O bottleneck, and spatial and temporal locality from within the theoretical framework. These models build on the framework of the sequential RAM [5] and its parallel variant, the PRAM [14] The first step towards a more realistic memory model is distinguishing between local and global memory [27, 2], yielding a two level memory scheme. More recently, there have been attempts to model multi level memory [1, 3, 6] both in sequential and parallel settings. The notions of block transfer and hierarchy are developed further in a parallel model in which memory consists of a tree of modules, where ....

Aggarwal, A. and Chandra, A. K. Communication Complexity of PRAMs. In ICALP, pp. 1--17, 1988.


A 3D Parallel Communication-Efficient Dense Linear Solver - Irony (2000)   (Correct)

....to use large numbers of processors to multiply small matrices, because of the communication overhead. Three dimensional matrix multiplication algorithms reduce communication over 2D algorithms using replication. Such algorithms were first proposed by Berntsen [3] and by Aggarwal, Chandra and Snir [2] at about the same time. Essentially the same algorithm that Aggarwal et al. proposed was later proposed, independently, by Gupta and Kumar [13] and by Johnsson [17] Berntsen described a somewhat more complex algorithm, while Aggarwal, Chandra and Snir also prove that the 3D algorithm is optimal ....

Alok Aggarwal, Ashok Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3--28, 1990.


Problem Space Promotion and Its Evaluation as a.. - Chamberlain, Lewis..   (Correct)

....gather operation needed in this paper is the transpose, which uses ZPL s constant arrays 1 and 2 to specify dimensional interchange. In general, the value of array k at index i 1 ; i d is i k . Thus, the following statement performs the transpose on V needed to compute Vt: 1. n,1] Vt : V#[ 2, 1]. Performance Model. ZPL supports a performance model that permits programmers to reason about the parallelism and communication overhead in their codes [5] In ZPL, all arrays are aligned. 1 There are two main implications of this: i) the data elements of two different arrays at index (i; j) ....

....structure of this computation. PSP like solutions to specific problems have appeared in the literature. For example, Aggarwal et al. analyze the communication complexity of a 3 dimensional matrix multiple algorithm in the context of the LPRAM (local memory parallel random access machine) model [2], and Agarwal et al. evaluate a 3 dimensional matrix multiple algorithm on the IBM SP2 [1] They do not explore the work as a general solution technique. Researchers have proposed parallel machine models in an effort to understand and predict performance. The LogP machine model is a well studied ....

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3--28, March 1990.


Cost study of different pivoting strategies on the BSP Model - Calomardo, Marí   (Correct)

....since it assumes that all processors work synchronously and that interprocessor communication is free. Different variations to the basic PRAM model have been proposed to overcome these limitations in an attempt to obtain a more practical model while preserving great part of its simplicity [1, 2, 7, 9]. Another approach which is being seriously considered as the basis of a general purpose parallel computation is the BSP model (Bulk Synchronous Parallel) It was proposed by Leslie G. Valiant in 1990 [17] as a bridge between theory and practice. The BSP model views a parallel machine as a set of ....

A. Aggarwal, A. K. Chandra y M. Snir. Communication Complexity of PRAMs. Theoretical Computer Science, 71:3--28, March 1990.


Models and Resource Metrics for Parallel and Distributed.. - Li, Mills, Reif (1989)   (12 citations)  (Correct)

....make the model more practical while still preserving much of its simplicity. The variations extend the PRAM to incorporate realistic aspects such as asynchrony of processes (e.g. the Phase PRAM [Gib89] and APRAM [CZ89] communication costs such as network latency and bandwidth (e.g. the LPRAM [ACS90] Postal Model [BNK92] BSP [Val90] and LogP [CKP 93] and memory hierarchy, reflecting the effects of multileveled memory such as differing access times for registers, local cache, main memory, and disk I O (e.g. the P HMM [VS94] PMH [AC94] and P UMH [NV91] The approach followed by ....

....latency models such as the LPRAM and BPRAM which add the notion of latency into the PRAM model, and models such as the BSP and LogP which not only incorporate asynchrony and latency but also address the issue of bandwidth limitation. 8 4.2. 1 LPRAM The Local Memory PRAM (LPRAM) model [ACS90] consists of a shared global memory and a set of processors with unbounded local memory executing in lock step. The access protocol to global memory is CREW. At every time step, each processor can perform either a communication step, in which it can write and then read a word from the global ....

A. Aggarwal, A. K. Chandra, and M. Snir, "Communication complexity of PRAMs," J. Theoretical Computer Science, Mar. 1990.


Models of Embedded Computation - Axel Jantsch Royal (2005)   (Correct)

No context found.

Alok Aggarwal, Ashok K. Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3--28, March 1990.


Optimum Binary Search Trees On The Hierarchical Memory Model - Shripad Thite University (2001)   (2 citations)  (Correct)

No context found.

A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3-28, 1990.


Synchronisation-Efficient Parallel All-Pairs Shortest Paths.. - Tiskin (2004)   (Correct)

No context found.

A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3-28, March 1990.


Some Models for Scheduling Parallel Programs with.. - Bampis, Guinand.. (1997)   (Correct)

No context found.

A. Aggarwal, A.K. Chandra, M. Snir, Communication complexity of PRAMs, Theoretical Computer Science 71 (1990) 3-28.


Communication-Efficient Parallel Gaussian Elimination - Tiskin (2004)   (Correct)

No context found.

A. Aggarwal, A. K. Chandra, M. Snir, Communication complexity of PRAMs, Theoretical Computer Science 71 (1) (1990) 3-28.


Accurate Performance Models of Parallel Low Level Image.. - Seinstra, Koelma (2000)   (Correct)

No context found.

A. Aggarwal, A. Chandra, and M. Snir. Communication Complexity of PRAMs. Theoretical Computer Science, 71(1):3--28, March 1990.


Communication Lower Bounds for Distributed-Memory Matrix.. - Irony, Toledo (2004)   (Correct)

No context found.

Alok Aggarwal, Ashok Chandra, and Marc Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3-28, 1990.


Minimizing the Overhead for Some Tree-Scheduling Problems - Bampis, Guinand, Trystram (1996)   (Correct)

No context found.

A. Aggarwal, A. K. Chandra, M. Snir, Communication Complexity of PRAMs, Theoretical Computer Science 71 3-28 (1990).


The Design and Analysis of Bulk-Synchronous Parallel Algorithms - Tiskin (1998)   (7 citations)  (Correct)

No context found.

A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3-- 28, March 1990.


The Design and Analysis of Bulk-Synchronous Parallel Algorithms - Tiskin (1998)   (7 citations)  (Correct)

No context found.

A. Aggarwal, A. K. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71(1):3-- 28, March 1990.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC