Results 1  10
of
24
Efficient parallel graph algorithms for coarse grained multicomputers and BSP (Extended Abstract)
 in Proc. 24th International Colloquium on Automata, Languages and Programming (ICALP'97
, 1997
"... In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulksynchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and s ..."
Abstract

Cited by 62 (22 self)
 Add to MetaCart
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulksynchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition (finding a perfect elimination ordering). The algorithms for Problems 17 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2, i.e.they are fully scalable, and for Problems hold for arbitrary ratios n p 38 it is assumed that n p,>0, which is true for all commercially
Modeling Parallel Computers as Memory Hierarchies
 In Proc. Programming Models for Massively Parallel Computers
, 1993
"... A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for each machine that it is intended to represent, it should have a reasonably accurate specific model. The Parallel Memory Hierarchy (PMH) model of computation uses a single mechanism to model the costs of both interprocessor communication and memory hierarchy traffic. A computer is modeled as a tree of memory modules with processors at the leaves. All data movement takes the form of block transfers between children and their parents. This paper assesses the strengths and weaknesses of the PMH model as a generic model. 1 Introduction The raw computing power of multiprocessor computers is exploding. The challenge is to create software that can take advantage of this computing power. The diversit...
A Randomized Parallel 3D Convex Hull Algorithm For Coarse Grained Multicomputers
 In Proc. ACM Symp. on Parallel Algorithms and Architectures
, 1995
"... We present a randomized parallel algorithm for constructing the 3D convex hull on a generic pprocessor coarse grained multicomputer with arbitrary interconection network and n=p local memory per processor, where n=p p 2+ffl (for some arbitrarily small ffl ? 0). For any given set of n points in ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic pprocessor coarse grained multicomputer with arbitrary interconection network and n=p local memory per processor, where n=p p 2+ffl (for some arbitrarily small ffl ? 0). For any given set of n points in 3space, the algorithm computes the 3D convex hull, with high probaility, in O( n log n p ) local computation time and O(1) communication phases with at most O(n=p) data sent/received by each processor. That is, with high probability, the algorithm computes the 3D convex hull of an arbitrary point set in time O( n logn p + \Gamma n;p ), where \Gamma n;p denotes the time complexity of one communication phase. The assumption n p p 2+ffl implies a coarse grained, limited parallelism, model which is applicable to most commercially available multiprocessors. In the terminology of the BSP model, our algorithm requires, with high probability, O(1) supersteps, synchronization period L = \Th...
ZPL: A Machine Independent Programming Language for Parallel Computers
 IEEE Transactions on Software Engineering
, 2000
"... The goal of producing architectureindependent parallel programs is complicated by the competing need for high performance. The ZPL programming language achieves both goals by building upon an abstract parallel machine and by providing programming constructs that allow the programmer to "see ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
The goal of producing architectureindependent parallel programs is complicated by the competing need for high performance. The ZPL programming language achieves both goals by building upon an abstract parallel machine and by providing programming constructs that allow the programmer to "see" this underlying machine. This paper describes ZPL and provides a comprehensive evaluation of the language with respect to its goals of performance, portability, and programming convenience. In particular, we describe ZPL's machineindependent performance model, describe the programming benefits of ZPL's regionbased constructs, summarize the compilation benefits of the language's highlevel semantics, and summarize empirical evidence that ZPL has achieved both high performance and portability on diverse machines such as the IBM SP2, Cray T3E, and SGI Power Challenge. Index Terms: portable, efficient, parallel programming language. This research was supported by DARPA Grant F306029710152, a grant of HPC time from the Arctic Region Supercomputing Center, NSF Grant CCR9707056, and ONR grant N000149910402. 1 1
Parallel RAMs with Owned Global Memory and Deterministic ContextFree Language Recognition
, 1997
"... We identify and study a natural and frequently occurring subclass of ConcurrentRead, ExclusiveWrite Parallel Random Access Machines (CREWPRAMs). Called ConcurrentRead, OwnerWrite, or CROWPRAMs, these are machines in which each global memory location is assigned a unique "owner" proc ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
We identify and study a natural and frequently occurring subclass of ConcurrentRead, ExclusiveWrite Parallel Random Access Machines (CREWPRAMs). Called ConcurrentRead, OwnerWrite, or CROWPRAMs, these are machines in which each global memory location is assigned a unique "owner" processor, which is the only processor allowed to write into it. Considering the difficulties that would be involved in physically realizing a full CREWPRAM model, it is interesting to observe that in fact, most known CREWPRAM algorithms satisfy the CROW restriction or can be easily modified to do so. This paper makes three main contributions. First, we formally define the CROWPRAM model and demonstrate its stability
A Comparison of Three Programming Models for Adaptive Applications on the Origin2000
 Journal of Parallel and Distributed Computing
, 2000
"... Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications is therefore a challenging task ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin2000 system, a machine which supports all three models efficiently. Results indicate that the three models deliver comparable performance; however, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the messagepassing (using MPI) and SHMEM programming models, the cachecoherent shared address space (CCSAS) model provides substantial ease of programming at both the conceptual ...
HighLevel Programming Language Abstractions for Advanced and Dynamic Parallel Computations
 DISSERTATION
, 2005
"... This thesis presents a combination of pindependent and pdependent extensions to ZPL. ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
This thesis presents a combination of pindependent and pdependent extensions to ZPL.
Randomized Parallel List Ranking For Distributed Memory Multiprocessors
, 1996
"... We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP like model. We first describe a simple version which requires, with high probability, log(3p) + log ln(n) = ~ O(logp+ log log n) communication rounds (hrelations with h = ~ O( n p )) and ~ O ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
(Show Context)
We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP like model. We first describe a simple version which requires, with high probability, log(3p) + log ln(n) = ~ O(logp+ log log n) communication rounds (hrelations with h = ~ O( n p )) and ~ O( n p ) local computation. We then outline an improved version which requires, with high probability, only r (4k + 6) log( 2 3 p) + 8 = ~ O(k log p) communication rounds where k = minfi 0j ln (i+1) n ( 2 3 p) 2i+1 g. Note that k ! ln (n) is an extremely small number. For n 10 10 100 and p 4, the value of k is at most 2. Hence, for a given number of processors, p, the number of communication rounds required is, for all practical purposes, independent of n. For n 1; 500; 000 and 4 p 2048, the number of communication rounds in our algorithm is bounded, with high probability, by 78, but the actual number of communication rounds observed so far is 25 in the worst case. Fo...
A Hybrid Shared Memory/Message Passing Parallel Machine
 In Proceedings of the 1993 International Conference on Parallel Processing
, 1993
"... Current and emerging highperformance parallel computer architectures generally implement one of two types of communication mechanisms: shared memory (SM) or message passing (MP). In this paper we propose a hybrid SM/MP architecture together with a hybrid SM/MP programming model, that we believe eff ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Current and emerging highperformance parallel computer architectures generally implement one of two types of communication mechanisms: shared memory (SM) or message passing (MP). In this paper we propose a hybrid SM/MP architecture together with a hybrid SM/MP programming model, that we believe effectively combines the advantages of each system. The SM/MP architecture contains both a highperformance coherence protocol for shared memory, and messagepassing primitives that coexist with the coherence protocol but have no coherence overhead. The SM/MP programming model provides a framework for safely and effectively using the SM/MP communication primitives. We illustrate the use of the model and primitives to reduce communication overhead in SM systems. 1. Introduction Two data communication mechanisms currently dominate in existing and emerging largescale parallel computers. In the Message Passing (or Distributed Memory) model, each processor has its own private main memory and comm...
Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy
, 1992
"... The challenge of building a program that attains high performance on a variety of parallel computers is formidable. Actually, attaining high performance on a variety of sequential computers is challenging. Indeed, its hard enough to get high performance on a single sequential computer. Constructing ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
The challenge of building a program that attains high performance on a variety of parallel computers is formidable. Actually, attaining high performance on a variety of sequential computers is challenging. Indeed, its hard enough to get high performance on a single sequential computer. Constructing a highperformance program requires detailed knowledge of the computer 's architectural features  its memory hierarchy in particular. This knowledge constitutes a detailed, albeit informal, model of computation against which the performance program is written. Similar characteristics must be considered in building a portable highperformance program but the appropriate details are elusive and often unavailable when the program is written. In order to support this type of programming, we call for a generic model. Such a model is parameterized by machine parameters. Judicious specification of these parameters results in a specific model that should capture the performancerelevant features...