Results 1  10
of
183
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 562 (15 self)
 Add to MetaCart
(Show Context)
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
LogGP: Incorporating Long Messages into the LogP Model  One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract

Cited by 288 (1 self)
 Add to MetaCart
We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixedsized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP2, Paragon, Meiko CS2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
Evaluating MapReduce for multicore and multiprocessor systems
 In HPCA â€™07: Proceedings of the 13th International Symposium on HighPerformance Computer Architecture
, 2007
"... This paper evaluates the suitability of the MapReduce model for multicore and multiprocessor systems. MapReduce was created by Google for application development on datacenters with thousands of servers. It allows programmers to write functionalstyle code that is automatically parallelized and s ..."
Abstract

Cited by 248 (3 self)
 Add to MetaCart
(Show Context)
This paper evaluates the suitability of the MapReduce model for multicore and multiprocessor systems. MapReduce was created by Google for application development on datacenters with thousands of servers. It allows programmers to write functionalstyle code that is automatically parallelized and scheduled in a distributed system. We describe Phoenix, an implementation of MapReduce for sharedmemory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multicore and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lowerlevel APIs such as Pthreads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on sharedmemory systems with simple parallel code. 1
Implementation of a Portable Nested DataParallel Language
 Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract

Cited by 203 (28 self)
 Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of dataparallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM2, the Cray YMP C90, and serial machines. We compare initial benchmark results of Nesl with those of machinespecific code on these machines for three algorithms: leastsquares linefitting, median finding, and a sparsematrix vector product. These results show that Nesl's performance is competitive with that of machinespecific codes for regular dense da...
Models and Languages for Parallel Computation
 ACM COMPUTING SURVEYS
, 1998
"... We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architectureindependent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in ..."
Abstract

Cited by 169 (4 self)
 Add to MetaCart
(Show Context)
We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architectureindependent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in 6 categories, depending on the level of abstraction they provide.
The JMachine multicomputer: An architectural evaluation
 In Proceedings of the 20th International Symposium on Computer Architecture
, 1993
"... The MIT JMachine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each JMachine node consists of an integrated multicomputer component, the MessageDriven Processor (MDP), and 1 MByte of DRAM. The MDP prov ..."
Abstract

Cited by 155 (5 self)
 Add to MetaCart
(Show Context)
The MIT JMachine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each JMachine node consists of an integrated multicomputer component, the MessageDriven Processor (MDP), and 1 MByte of DRAM. The MDP provides mechanisms to support efficient communication, synchronization, and naming. A 512 node JMachine is operational and is due to be expanded to 1024 nodes in March 1993. In this paper we discuss the design of the JMachine and evaluate the effectiveness of the mechanisms incorporated into the MDP. We measure the performance of the communication and synchronization mechanisms directly and investigate the behavior of four complete applications. 1
NESL: A Nested DataParallel Language
 CARNEGIE MELLON UNIVERSITY
, 1992
"... This report describes NESL, a stronglytyped, applicative, dataparallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dat ..."
Abstract

Cited by 154 (4 self)
 Add to MetaCart
(Show Context)
This report describes NESL, a stronglytyped, applicative, dataparallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on vectors, including a mechanism for applying any function over the elements of a vector in parallel, and a broad set of parallel functions that manipulate vectors. NESL fully supports nested vectors and nested parallelismthe ability to take a parallel function and then apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph or sparse matrix algorithms. NESL also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM).
Prefix Sums and Their Applications
"... Experienced algorithm designers rely heavily on a set of building blocks and on the tools needed to put the blocks together into an algorithm. The understanding of these basic blocks and tools is therefore critical to the understanding of algorithms. Many of the blocks and tools needed for parallel ..."
Abstract

Cited by 128 (2 self)
 Add to MetaCart
(Show Context)
Experienced algorithm designers rely heavily on a set of building blocks and on the tools needed to put the blocks together into an algorithm. The understanding of these basic blocks and tools is therefore critical to the understanding of algorithms. Many of the blocks and tools needed for parallel
Methods and problems of communication in usual networks
, 1994
"... This paper is a survey of existing methods of communication in usual networks. We particularly study the complete network, the ring, the torus, the grid, the hypercube, the cube connected cycles, the undirected de Bruijn graph, the star graph, the shuffleexchange graph, and the butterfly graph. Two ..."
Abstract

Cited by 118 (12 self)
 Add to MetaCart
This paper is a survey of existing methods of communication in usual networks. We particularly study the complete network, the ring, the torus, the grid, the hypercube, the cube connected cycles, the undirected de Bruijn graph, the star graph, the shuffleexchange graph, and the butterfly graph. Two different models of communication time are analysed, namely the constant model and the linear model. Other constraints like fullduplex or halfduplex links, processorbound, DMAbound or linkbound possibilities are separately studied. For each case we give references, upper bound (algorithms) and lower bounds. We have also proposed improvements or new results when possible. Hopefully, optimal results are not always known and we present a list of open problems.
NESL: A nested dataparallel language (version 2.6
, 1993
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supe ..."
Abstract

Cited by 110 (8 self)
 Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supercomputers, nested parallelism, This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. Nesl fully supports nested sequences and nested parallelismâ€”the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with irregular nested loops (where the inner loop lengths depend on the outer iteration) and for divideandconquer algorithms. Nesl also provides a performance model for calculating the asymptotic performance of a program on