Results 1 - 10
of
47
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract
-
Cited by 471 (14 self)
- Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
LogGP: Incorporating Long Messages into the LogP Model - One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract
-
Cited by 204 (1 self)
- Add to MetaCart
We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
The Uniform Memory Hierarchy Model of Computation
- Algorithmica
, 1992
"... The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use im ..."
Abstract
-
Cited by 108 (9 self)
- Add to MetaCart
The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved data-movement strategies. A sequential computer's memory is modelled as a sequence hM 0 ; M 1 ; :::i of increasingly large memory modules. Computation takes place in M 0 . Thus, M 0 might model a computer's central processor, while M 1 might be cache memory, M 2 main memory, and so on. For each module M U , a bus B U connects it with the next larger module M U+1 . All buses may be active simultaneously. Data is transferred along a bus in fixed-sized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parameterized by the rate at which the blocksizes i...
A cost calculus for parallel functional programming
- Journal of Parallel and Distributed Computing
, 1995
"... Abstract Building a cost calculus for a parallel program development environment is difficult because of the many degrees of freedom available in parallel implementations, and because of difficulties with compositionality. We present a strategy for building cost calculi for skeleton-based programmin ..."
Abstract
-
Cited by 57 (6 self)
- Add to MetaCart
Abstract Building a cost calculus for a parallel program development environment is difficult because of the many degrees of freedom available in parallel implementations, and because of difficulties with compositionality. We present a strategy for building cost calculi for skeleton-based programming languages which can be used for derivational software development and which deals in a pragmatic way with the difficulties of composition. The approach is illustrated for the Bird-Meertens theory of lists, a parallel functional language with an associated equational transformation system. Keywords: functional programming, parallel programming, program transformation, cost calculus, equational theories, architecture independence, Bird-Meertens formalism.
Efficient parallel graph algorithms for coarse grained multicomputers and BSP (Extended Abstract)
- in Proc. 24th International Colloquium on Automata, Languages and Programming (ICALP'97
, 1997
"... In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and s ..."
Abstract
-
Cited by 55 (23 self)
- Add to MetaCart
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2-edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition (finding a perfect elimination ordering). The algorithms for Problems 1-7 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2, i.e.they are fully scalable, and for Problems hold for arbitrary ratios n p 3-8 it is assumed that n p,>0, which is true for all commercially
A Randomized Parallel 3D Convex Hull Algorithm For Coarse Grained Multicomputers
- In Proc. ACM Symp. on Parallel Algorithms and Architectures
, 1995
"... We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconection network and n=p local memory per processor, where n=p p 2+ffl (for some arbitrarily small ffl ? 0). For any given set of n points in ..."
Abstract
-
Cited by 49 (11 self)
- Add to MetaCart
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconection network and n=p local memory per processor, where n=p p 2+ffl (for some arbitrarily small ffl ? 0). For any given set of n points in 3-space, the algorithm computes the 3D convex hull, with high probaility, in O( n log n p ) local computation time and O(1) communication phases with at most O(n=p) data sent/received by each processor. That is, with high probability, the algorithm computes the 3D convex hull of an arbitrary point set in time O( n logn p + \Gamma n;p ), where \Gamma n;p denotes the time complexity of one communication phase. The assumption n p p 2+ffl implies a coarse grained, limited parallelism, model which is applicable to most commercially available multiprocessors. In the terminology of the BSP model, our algorithm requires, with high probability, O(1) supersteps, synchronization period L = \Th...
Modeling Parallel Computers as Memory Hierarchies
- In Proc. Programming Models for Massively Parallel Computers
, 1993
"... A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for each machine that it is intended to represent, it should have a reasonably accurate specific model. The Parallel Memory Hierarchy (PMH) model of computation uses a single mechanism to model the costs of both interprocessor communication and memory hierarchy traffic. A computer is modeled as a tree of memory modules with processors at the leaves. All data movement takes the form of block transfers between children and their parents. This paper assesses the strengths and weaknesses of the PMH model as a generic model. 1 Introduction The raw computing power of multiprocessor computers is exploding. The challenge is to create software that can take advantage of this computing power. The diversit...
Horizons of Parallel Computation
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1993
"... This paper considers the ultimate impact of fundamental physical limitations---notably, speed of light and device size---on parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
This paper considers the ultimate impact of fundamental physical limitations---notably, speed of light and device size---on parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the consequences of the accomplished attainment of the physical bounds. The main result is that scalability holds only for neighborly interconnections, such as the square mesh, of bounded-size synchronous modules, presumably of the area-universal type. We also discuss the ultimate infeasibility of latencyhiding, the violation of intuitive maximal speedups, and the emerging novel processor-time tradeoffs.
A Timestamp-based Cache Coherence Scheme
, 1989
"... this paper, we propose a software-assisted cache coherence scheme which overcomes some of the inefficiencies of previous approaches by using a combination of a compile-time marking of references and a hardware-based local incoherence detection scheme. In section 2, we give the notation used througho ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
this paper, we propose a software-assisted cache coherence scheme which overcomes some of the inefficiencies of previous approaches by using a combination of a compile-time marking of references and a hardware-based local incoherence detection scheme. In section 2, we give the notation used throughout the paper. Section 3 reviews previous software-assisted methods to enforcing cache coherence. In section 4, a complete description of our approach is given along with a correctness proof. Section 5 gives a qualitative comparison of our scheme and the directory-based approaches. Section 6 provides some concluding remarks. Definitions In conventional programs, there are four kinds of data dependences : flow-dependence, antidependence, output-dependence and input-dependence [14]. Let r and r

