115 citations found. Retrieving documents...
G. Blelloch, "Scans as primitive parallel operations", Proc. 1987 Int'l. Conf. Parallel Proc., pp. 355--362.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Empirical Study of Parallel LRU Simulation Algorithms - David Nicol College (1994)   (Correct)

....variants of these algorithms, in order to provide a better understanding of the tradeoffs inherent in choosing an architecture, and algorithm, for parallel cache simulation. Three of the algorithms upon which we report are SIMD algorithms, and rely upon massively parallel operations such as scans [2], and sorting. The first is the level by level algorithm described in [6] This algorithm computes the stack distances for all references whose stack distances are C or smaller, C being the maximum number of references the cache holds. The performance results previously reported were from the ....

....3.3e 2 0.14 2.40 9.7 16K rand 4 8.4e 3 4.0e 3 0.67 2.7 16K rand 1024 8.4e 3 3.6e 3 0.59 2. 4 Table 1: Wallclock processing time per reference, in microseconds, for the level by level algorithm, as a function of cache size C The copy step is accomplished in parallel using a segmented copy scan[2]. Techniques like this are standard in SIMD programming, indeed the MasPar library contains numerous variations on scan operations. The level by level algorithm computes stack distances a level at a time. First, all references with stack distance 1 are determined. Next, all references with stack ....

G.E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526--1538, November 1989.


Shape-based Cost Analysis of Skeletal Parallel Programs - Hayashi   (Correct)

....result of fold t up to the ith element is returned as the ith element of the resulting vector. scan t , x n ] x 1 , x 1 t , x 1 t # It is known that the scan operation is useful for describing various data parallel algorithms, and leads to efficient run time codes. For example, [12] describes five algorithms that illustrate how the scan can be used in algorithm design: a radix sort, a quick sort, a minimum spanning tree algorithm, a line drawing algorithm and a merging algorithm. In some parallel computation models such as the Scan Vector Model [13] simple operations are ....

G. E. Blelloch. Scans as Primitive Parallel Operations. IEEE Transactions on Computers, 38(11):1526--1538, 1989.


Automated Parallelization of Discrete State-space Generation - Nicol, Ciardo (2002)   (12 citations)  (Correct)

....and directly. Each processor has a picture of the global load distribution, and when implementing a remapping, a piece of workload or data goes directly from its source processor to its target processor. Direct methods have been considered in the context of large scale parallel processing systems [7, 1, 15], but these do not take a global view. While our approach is pragmatic and driven by available technology, we will argue, at the end of Section 7, that it scales up to moderately large parallel systems and so has some future value as well. 3 Automated State Mapping To help understand the ....

....the lexicographical ordering. Using synchronized random number generator seeds, each processor generates the same permutation vector idx[ to randomize the ordering of state components. Thus, idx[0] holds the index of the rst state component examined in the lexicographical comparison, idx[1] holds the second index examined, and so on. We found this a valuable tool to protect us from the potential problems arising from correlation of states components (consider hashing on a vector of components whose values are highly correlated e ectively diminishes the spreading of the hashing ....

G.E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526{ 1538, November 1989.


High-Performance All-Software Distributed Shared Memory - Johnson (1995)   (9 citations)  (Correct)

....improvement for many applications. On Alewife, the global synchronization primitives described in Section 3.3 are implemented entirely in software using message passing. Broadcasts are implemented using a binary broadcast tree rooted at each node. Barriers and reductions are implemented in a scan [6] style: For an processor system, messages are sent between nodes according to a butterfly network pattern requiring log 2 stages of messages each. On the CM 5, the baseline CRL implementation takes advantage of the CM 5 s hardware support for global synchronization and communication (the ....

G. E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, pages 1526--1538, November 1989.


Practical Structured Parallelism Using BMF - Crooke (1998)   (Correct)

.... ematrix.c[3] 2] ec1 ec2 ec12) bj cj ematrix.c[3] 3] ec1 cj cj ec12 bj bj ematrix.c[4] 0] ec1 bi bm ec12 ci cm ematrix.c[4] 1] ec1 ec2 bm ci ec12 bi cm ematrix.c[4] 2] ec1 bj bm ec12 cj cm ematrix.c[4] 3] ec1 ec2 bm cj ec12 bj cm ematrix.c[4] 4] ec1 bm bm ec12 cm cm ematrix.c[5][0] ec1 ec2 bi cm ec12 bm ci ematrix.c[5] 1] ec1 ci cm ec12 bi bm ematrix.c[5] 2] ec1 ec2 bj cm ec12 bm cj ematrix.c[5] 3] ec1 cj cm ec12 bj bm ematrix.c[5] 4] ec1 ec2 ec12) bm cm ematrix.c[5] 5] ec1 cm cm ec12 bm bm for (row=0; row 5; row ) for (col=row 1; col 6; col ) ....

.... ec12 bj bj ematrix.c[4] 0] ec1 bi bm ec12 ci cm ematrix.c[4] 1] ec1 ec2 bm ci ec12 bi cm ematrix.c[4] 2] ec1 bj bm ec12 cj cm ematrix.c[4] 3] ec1 ec2 bm cj ec12 bj cm ematrix.c[4] 4] ec1 bm bm ec12 cm cm ematrix.c[5] 0] ec1 ec2 bi cm ec12 bm ci ematrix.c[5][1] ec1 ci cm ec12 bi bm ematrix.c[5] 2] ec1 ec2 bj cm ec12 bm cj ematrix.c[5] 3] ec1 cj cm ec12 bj bm ematrix.c[5] 4] ec1 ec2 ec12) bm cm ematrix.c[5] 5] ec1 cm cm ec12 bm bm for (row=0; row 5; row ) for (col=row 1; col 6; col ) ematrix.c[row] col] ematrix.c[col] row] ....

[Article contains additional citation context not shown here]

G. E. Blelloch. "Scans as Primitive Parallel Operations". Proceedings of the International Conference on Parallel Processing, pages 355--362, Aug 1987.


Scalable Hardware-Algorithms for Binary Prefix Sums - Lin Nakano Olariu (2000)   (Correct)

....using (k 2)w k Gamma2 O(kw k Gamma3 ) blocks. 1 Introduction Recent advances in VLSI have made it possible to implement algorithm structured chips as building blocks for high performance computing systems. Since computing binary prefix sums (BPS) is a fundamental computing problem [1, 4], it makes sense to endow general purpose computer systems with a special purpose parallel BPS device, invoked whenever its services are needed. Recently, Blelloch [1] argued convincingly that scan operations that boil down to parallel prefix computation should be considered primitive ....

....blocks for high performance computing systems. Since computing binary prefix sums (BPS) is a fundamental computing problem [1, 4] it makes sense to endow general purpose computer systems with a special purpose parallel BPS device, invoked whenever its services are needed. Recently, Blelloch [1] argued convincingly that scan operations that boil down to parallel prefix computation should be considered primitive parallel operations and, whenever possible, implemented in hardware. In fact, scans have been implemented in microcode in the Thinking Machines CM 5 [10] Given an n bit ....

G. Blelloch, Scans as primitives parallel operations, IEEE Transactions on Computers, C-18, (1989), 1526--1538.


NPSI Adaptive Synchronization Algorithms for Parallel Discrete.. - Srinivasan (1995)   (Correct)

....proposed here. Chapter 6 presents the results of this performance analysis. It is important to note that reduction networks have been proposed and constructed in practice, to support global computations such as barrier synchronization, summation, determining maxima and parallel prefix computation [Ston90, Hosh89, CrKn28, Blel89]. One such network is the control network in the CM 5 [Ponn93] It is used to perform nonlocal data distribution operations such as broadcasting, combining (reduction and parallel prefix) bit wise operations and barrier synchronizations, very rapidly. For bit wise logical OR operations, it can ....

Blelloch, G., "Scans as primitive parallel operations", IEEE Transactions on Computers, Vol. 38, November 1989, 1526-1538.


Systolic Combining Switch Designs - Dickey (1994)   (Correct)

....using this operator to each processor. Scan operations take a binary associative operator Omega with identity I and an ordered set [a 0 ; a 1 ; an Gamma1 ] and returns the ordered set [I; a 0 ; a 0 Omega a 1 ) a 0 Omega a 1 Omega Delta Delta Delta Omega an Gamma2 ] [18]. Such operations could be used to simulate many of the actions of combinable fetch and OE operations. In the CM 5, forward and reverse scans can be implemented for any of the reduction operations, and scans may be segmented. Combining networks on sorted traffic streams If all references to ....

Guy Blelloch. Scans as primitive parallel operations. International Conference on Parallel Processing, pages 355--362, August 1986.


Parallel Line: a Unified Solution - Ivo Povazan Tom'as   (Correct)

....is presented in [9] where a processing element is assigned to each pixel in raster space. The Connection Machine library [1] implements an algorithm that assigns a processing element to every line segment, then all line segments are rendered in parallel. The scan operation as is defined in [2] can be also used for parallel line interpolation. The line interpolation problem in raster space can be split into two parts. Firstly, there is a problem how to determine quickly the integer coordinates of the points Ivo Povazan, Institute of Control Theory and Robotics, Slovak Academy of ....

G. E. Blelloch, Scans as Primitive Parallel Operations. IEEE Transactions on Computers, Vol. 38, No. 11, p. 1526-1538. 7


Merging on the BSP Model - Gerbessiotis, Siniolakis (1999)   (1 citation)  (Correct)

....over the associative operator Omega in parallelprefix computations and unit cost for every comparison performed. 2 Fundamental Operations The parallel prefix operation [12, 13] and its variant, segmented parallel prefix, is a powerful primitive operation with many parallel applications [3]. BSP algorithms for parallel prefix are described in [7] They will become auxiliary routines for the merging algorithm described in a later section. Definition 1 Let hX 1 ; X 2 ; X p i be a sequence of p vectors each of size n, such that X i = x 1;i ; x n;i ) 1 i p. The prefix ....

G. E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, C-38(11):1526-1538, 1989.


Merging on the BSP Model - Gerbessiotis, Siniolakis (2001)   (1 citation)  (Correct)

....for operations over the associative operator# in parallel prefix computations and unit cost for every comparison performed. 2 Fundamental Operations The parallel prefix operation [15, 16] and its variant, segmented parallel prefix, is a powerful primitive operation with many parallel applications [3]. BSP algorithms for parallel prefix are described in [10] They will become auxiliary routines for the merging algorithm described in a later section. Definition 1 Let #X 1 , X 2 , X p # be a sequence of p vectors each of size n, such that X i = x 1,i , x n,i ) 1 # i # p. ....

G. E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, C-38(11):1526-1538, 1989.


Data Structures for Parallel Recursion - Kornerup (1997)   (2 citations)  (Correct)

.... operation. Through a similar derivation as given for rev we get H: O id As was the case for rev, we can prove that P: O one 33 2.4 Prefix Sum The prefix sum algorithm is one of the most fundamental parallel algorithms. It is often used as a building block for other parallel algorithms [Ble89, Ble90, Ble93] We will see its use in the specification of the Carry lookahead adder in Chapter 3. Given a PowerList of scalars and an associative, binary operator Phi on these scalars, the prefix sum ps returns a PowerList of the same length where each element is the result of applying the ....

Guy E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, C-38(11):1526--1538, November 1989.


Pipelined Parallel Prefix Computations, and Sorting on a.. - Mayr, Plaxton (1993)   (1 citation)  (Correct)

.... ) Dropping the inner parentheses and simplifying, this amounts to (a 0 ; b 0 ; x 0 ) Phi 0 (a 1 ; b 1 ; x 1 ) a 0 or a 1 ; if a 1 then b 1 else b 0 or b 1 ; if a 1 or not b 0 then x 1 else x 0 ) Note that the above formulation allows bit pipelining in the sense described by Blelloch [6]. In other words, as each bit of the two operands is received, the next output bit can be computed. This holds not only for the Copy operator, but also for any other single pass operator, as defined in [6] Finally, we observe that the data distribution operation defined by Ullman [16] is ....

....) Note that the above formulation allows bit pipelining in the sense described by Blelloch [6] In other words, as each bit of the two operands is received, the next output bit can be computed. This holds not only for the Copy operator, but also for any other single pass operator, as defined in [6]. Finally, we observe that the data distribution operation defined by Ullman [16] is equivalent to a segmented Prefix operation with the Copy operator. Thus, the techniques outlined in this paper immediately lead to efficient pipelined implementations of this primitive for the complete inorder ....

G. E. Blelloch. Scans as primitive parallel operations. In Proceedings of the 1987 IEEE International Conference on Parallel Processing, pages 355--362, 1987.


Parallel Integer Sorting With Medium And Fine-Scale.. - Leonardo Dagum Computer   (2 citations)  (Correct)

....to the number of equal keys in a sequence. The prefix sum of a sequence is the sequence obtained as the running sum of the original sequence elements. The j th element of the prefix sum of sequence K i is given by K j = P j 0 K i . Prefix operations are also referred to as scan operations [5]. A scan operation with binary operator Phi across an ordered set [a 0 ; a 1 ; a n Gamma1 ] returns the ordered set [a 0 ; a 0 Phi a 1 ) a 0 Phi a 1 Phi Delta Delta Delta Phi a n Gamma1 ) All logarithms are to base 2 unless otherwise indicated. 2. Machine Models. The ....

....the computed ranks to the appropriate processors in VP1 as given by the addresses stored in each queue. Processors in VP1 which receive ranks are marked and do not participate in the next iteration. Iterations are repeated until all the keys have been ranked. 3.2. Theoretical Analysis. Blelloch [5] describes a scan model of computation for the Connection Machine (that is, the Exclusive Read Exclusive Write (EREW) model but including prefix of scan operations as unit time primitives) This model is assumed in the following analysis. The performance of this algorithm depends on the key ....

Blelloch, G., Scans as Primitive Parallel Operations, Proceedings of 1987 Int. Conf. on Parallel Processing, University Park, PA, 1987.


Pipelined Parallel Prefix Computations, and Sorting on a.. - Mayr, Plaxton (1993)   (1 citation)  (Correct)

....data. All of our algorithms are SIMD. We will assume that the x i s, as well as all partial sums of the x i s, are word sized quantities. Binary tree. The first implementation of Prefix that we consider is the standard twopass algorithm for the inorder complete binary tree (see, for example, [3, 6]) Assume that we are given a tree of size p = 2 d 0 1, with processors numbered inorder (i.e. numbered according to an inorder traversal of the tree) from 0 to 2 d 0 2. The first pass of the algorithm is upward, from the leaves to the root, and the second pass is downward. For every ....

.... 6 Dropping the inner parentheses and simplifying, this amounts to (a 0 ; b 0 ; x 0 ) 8 0 (a 1 ; b 1 ; x 1 ) a 0 or a 1 ; if a 1 then b 1 else b 0 or b 1 ; if (a 1 or not b 0 ) then x 1 else x 0 ) Note that the above formulation allows bit pipelining in the sense described by Blelloch [3]. Finally, we observe that the data distribution operation defined by Ullman [9] is equivalent to a segmented Prefix operation with the Copy operator. Thus, the techniques outlined in this paper immediately lead to efficient pipelined implementations of this primitive for the complete inorder ....

G. E. Blelloch. Scans as primitive parallel operations. In Proceedings of the 1987 IEEE International Conference on Parallel Processing, pages 355--362, 1987.


Synthesizing Divide-and-Conquer Algorithms via Induction - Chin, Tan, Teo   (Correct)

.... g in (a u,b v) The parallel characteristics of tup is not apparent since the z parameter of tup(xs,z) actually depends on an output from the other recursive call tup(xr,no) However, function tup has a similar structure as the highly versatile scan function, as popularised by Blelloch [Ble89] Like scan, it can be implemented efficiently in a multi processor system which supports bi directional tree like communications using parallel computation time proportional to O(log n) where n is the length of the list. An upsweep phase in the computation can be used to compute the second ....

Guy E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526-- 1538, November 1989.


Enhanced Parallelization via Constraints - Wei-Ngan Chin National (1997)   (Correct)

.... uH(xr) Inductive Derivation (d) ascan(xr xs,w) ascan(xr,w) ascan(xs,w Omega uH(xr) uH( x] x uH(xr xs) uH(xr) Omega uH(xs) Figure 1: Steps for Example Based Parallelization We highlight ascan because its versatility for parallel computation is well known[Ble89, BCH 93] Our method could automatically derive its parallel implementation. Its main steps are illustrated in Figure 1. The key idea is to obtain two normalised sequential equations (see Fig. 1(b) for ascan, whose contexts for the recursive call and accumulative argument(s) are identical. ....

Guy E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526--1538, November 1989.


Catamorphism-Based Transformation of Functional Programs - Hu, Iwasaki, Takeichi (1994)   (Correct)

....computation on each element of the object keeping some intermediate results. Accumulations have gained a wide interest in the design of both sequential programs [1, 2] and parallel programs [4] Especially in parallel programming, accumulations are considered as one of basic parallel operators [3], and a special hardware for scan accumulations has been installed in CM5 [5] recently. The purpose of this paper is to deal with the transformation on accumulations so that more efficient programs can be derived. Wehave two problems here: one is how to formulate accumulations and the other is how ....

....[10] suggested us to use functions as accumulation parameters in our higher order catamorphisms. One of our future work is to apply our method to the derivation of correct parallel programs. It has been shown that accumulations are becoming more and more important in parallel programs. Blelloch [3] and many others argued that the accumulations could be regarded as a basic parallel operators and many useful parallel programs can be constructed by them. We hope that our study will be useful for the development of efficient parallel programs based on accumulations. Another future work that ....

G. Blelloch, Scans as primitive parallel operations, Proceedings of the International Conference on Parallel Processing, pp. 355--362.


Abstraction and Performance in the Design of Parallel Programs - Gorlatch (1997)   (1 citation)  (Correct)

....pieces of the list. Since the computations of h x and h y are independent of each other, they can be carried out in parallel. Note that fi in (4) is necessarily associative on the range of h, because is associative. As a running example, we take the scan function, also called parallel prefix [8]. This simple function has been extremely useful in many applications; at the same time it is well parallelizable. These, at first glance, surprising properties can be explained by the fact that scan expresses a general pattern of linear recursion and, thus, inherits its expressive power and the ....

....of values is a result of the systematic adjustment to the DH skeleton, rather than an ad hoc trick. At the same time, a comparatively laborious process of adjustment indicates that the DH skeleton is too general for scan. This suggests that scan should be viewed as a primitive skeleton itself [8], which is the case, e.g. in the MPI standard [32] 6.5 Architecture Independent Implementation In this subsection, based on joint work with Holger Bischof [H7] we derive an architecture independent, generic implementation of DH, which makes a considerable use of global communication ....

G. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526--1538, November 1989.


Optimizing Compositions of Scans and Reductions in Parallel.. - Gorlatch (1997)   (3 citations)  (Correct)

....parallel prefix, etc. and reduction (also known as fold) Originally from the functional world [3] they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher order combinators are adequate and useful for a broad class of applications [4], second, they encourage well structured, coarse grained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows: We formally prove ....

G. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526--1538, November 1989.


Using Dynamic Programming to Benchmark Communications on.. - Gregory Wilson And (1992)   (1 citation)  (Correct)

....execution of the algorithm. Gathering, which is best implemented by incrementally combining messages, is not provided as a primitive by any commercially available message passing system for MIMD computers, although it is available in a number of experimental systems [4] and on some SIMD machines [1]. Incremental combining is an efficient way to implement reduction operators, which includes global summation, determination of global minima, and the like, and could be used effectively in this program to combine signals from worker processes to the barrier synchronisation manager in order to ....

Guy E. Blelloch. Scans as Primitive Parallel Operations. IEEE Trans. Computers, 38(11), November 1989.


The use of the CAPE Environment in the Simulation of.. - Norman, Henderson.. (1991)   (3 citations)  (Correct)

....to other interprocess communications is non deterministic and does not necessarily interrupt all processes in the same state. This breaks the SPMD abstraction. Finally, to complete the distributed data abstraction, Cape provides a single example of the class of scan operations (see Blelloch [16]) the form of a global sum procedure which adds up an array of values at each process and returns, at every process, the sum of all such sums. In contrast to the above, Cape provides some functionality which allows the programmer to think in global data space rather than in terms of the local ....

G.E. Blelloch. Scans as primitive parallel operations. IEEE Trans. Comput., C-38(11):1526-- 1538, 1989.


Active Disks - Remote Execution for Network-Attached Storage - Riedel (1999)   (18 citations)  (Correct)

....in the NESL parallel programming language and allow the programmer to express explicit parallelism in their computation. These primitives were specified in the context of a functional language, thereby simplifying a number of the problems using legacy code written in procedural languages such as C [Blelloch89]. 9.4.2 Data and Task Parallelism A considerable body of work has explored the parallelization of applications across the nodes of both massively parallel machines [JaJa92] and networks of workstations [Subhlok93] This has been done both by parallelizing compilers [HPF93] and by application ....

Blelloch, G.E. "Scans as Primitive Parallel Operations" IEEE Transactions on Computers 38 (11), November 1989.


A Cost Analysis for a Higher-order Parallel Programming Model - Rangaswami (1996)   (19 citations)  (Correct)

....f (2 n p p 1 p 2 Gamma 3) p 1 Gamma1 X i=1 T i n p m com p 2 Gamma1 X i=1 T i n p mp 1 com p 1 2 T n p mp 1 (p 2 Gamma1) com : 6.53) 6.2.3. 4 scan on the s ary Tree The parallel implementation on an s ary tree is a modification of the version described in [Ble89] for binary trees. Assumption 4 requires modification for this algorithm. The elements of the input list are initially at the leaves of the tree. Any communication costs incurred in achieving this distribution will also be accounted for when the cost for the entire problem is computed. xs = xs ....

Guy Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 11:1526--1538, November 1989.


Using Emulations to Enhance the Performance of Parallel.. - Bojana Obreni Martin (1999)   (1 citation)  (Correct)

....and maximization, among others. The repertoire of efficiently realizable operations grows significantly when one uses a leaves to root to leaves updown sweep of the tree to compute the parallel prefix (or, scan) 11 of a binary associative operation ; a sampler of such operations appears in [4]. Notably, for our purposes, carrylookahead addition can be computed efficiently using scan [13] Implementing Tree Sweep Algorithms. Importantly, one can implement (up and or down) sweeps on k leaf binary trees efficiently via a sequence of lg2k 1 simple routing operations, provided that each ....

G.E. Blelloch, Scans as Primitive Parallel Operations, IEEE Trans. Computers, vol. 38, pp. 1,5261,538, 1989.


ParaDict, a Data Parallel Library for Dictionaries - Gabarro, Silvestre   (Correct)

....transformation into C # give us a code where only the more external loop remains sequential. The other loops corresponding to enumerations, copy and send operations have been vectorized by the Convex C compiler. Segmented scans. Segmented scans have been extensively studied by G. Blelloch in [1, 2]. Assume we have an array v and another array f of flags. Each flag specifies the start of a new segment. For instance, if we consider the following array v = 1 1 1 7 1 4 1 5 6 8 4] and f = 1 0 0 1 0 1 0 0 0 1 0] the segmented array is represented as s = 1 1 1 j 7 1 j 4 1 5 6 j 8 4] and the ....

....have been vectorized by the Convex C compiler. Segmented scans. Segmented scans have been extensively studied by G. Blelloch in [1, 2] Assume we have an array v and another array f of flags. Each flag specifies the start of a new segment. For instance, if we consider the following array v = [1 1 1 7 1 4 1 5 6 8 4] and f = 1 0 0 1 0 1 0 0 0 1 0] the segmented array is represented as s = 1 1 1 j 7 1 j 4 1 5 6 j 8 4] and the segmented prefix sum will be = s = 1 2 3 j 7 8 j 4 5 10 16 j 8 12] The segmented version of an Phi operation can be implemented as (v a ; f a ) Omega (v b ; f b ) v r ; f r ) ....

[Article contains additional citation context not shown here]

G. Blelloch. Scans as primitive parallel operations. IEEE Trans. Comp., 38(11):1526--1538, 1989.


Mob - A Parallel Heuristic for Graph-Embedding - Savage, Wloka (1993)   (2 citations)  (Correct)

....as a 12 dimensional hypercube with sixteen nodes at each corner of the hypercube. We use edge and vertex data structures to support the exchange of vertices and the computation of gains and costs with minimal communication overhead. Our edge data structure is based on that described by Blelloch [3] and implemented for the graph partitioning Mob heuristic [21] The Connection Machine supports the concept of virtual processor. Our implementation assigns one virtual processor per record in each vertex and edge data structure. Edge Data Structure The edge data structure is constructed of ....

....associative operation. Segmented scan operations are scan operations performed over contiguous segments of the full vector. If a segment begins at the ith position, y i = x i ; otherwise, y i = y i Gamma1 Phi x i . Scans and segmented scans can be implemented efficiently on most parallel machines [3]. 12 3 Experimental Results for Random Graphs The performance of the Mob hypercube and grid embedding algorithms were evaluated by conducting experiments on the CM 2. 1 to 1 and 16 to 1 mappings of source to target graphs were studied. The random source graphs used in these experiments are more ....

G. E. Blelloch, "Scans as Primitive Parallel Operations," IEEE Transactions on Computers 38 (1989), 1526--1538.


Efficient Functional Programming Communication Functions on the.. - Loke (1994)   (Correct)

.... ( which takes an associative operator ( and a list and returns a list of values computed in the following way: a 1 ; a 2 ; a n ] a 1 ; a 1 ( a 2 ; a 1 ( a 2 ( a n ] Its type is: ff Theta ff ff) Theta [ff] ff] Prefix is sometimes called scan [27]. Here the definition is slightly different from Blelloch s scan which computes the following list of values: a 1 ; a 2 ; a n ] id ( a 1 ; a 1 ( a 2 ; a 1 ( a 2 ( a n Gamma1 ] prefix can be defined by: Delta [ Delta] where u v = u ( last ....

....A recent use of the formalism to derive the fast Fourier (Cooley Tukey) algorithm is found in [33] However, the derivation makes use of other operations on lists that are not in the set defined earlier. Also, much program development using Blelloch s scan (comparable to prefix) has also been done [27]. An example that uses scan for lexical scanning which effectively emulates the computation of a finite state automaton is given in [42] In this thesis, several examples have already been seen in Chapters 2 and 4. In the following sub sections, several programming examples are presented and the ....

G. Blelloch, "Scans as Primitive Parallel Operations," in Proceedings of the International Conference on Parallel Processing, pp. 355--362, August 1987.


Repartition dynamique de donnees regulieres pour des machines .. - Juganaru, Sakho   (Correct)

....distribution initiale de la charge a repartir. Elles consistent d abord en l evaluation des quantites de donnees que les processeurs doivent echanger, puis en l echange effectif des charges correspondantes. Elles utilisent pour cela une technique de type calcul de prefixe developpee par Blelloch [BLE89]. Jusque la utilisees pour des topologies telles que le reseau lineaire [MIG92] GER93] ou l hypercube [JAJ92] les strategies de cette classe exploitent la forte regularite du reseau d interconnexion des processeurs. La strategie que nous proposons dans ce papier est aussi avec information ....

....la charge totale du reseau, charge(T v ) tel que chaque processeur ait approximativement la meme quantite de charges. Plus precisement tout processeur devrait avoir bcharge(T v ) jV jc ou dcharge(T v ) jV je unites de charges a la fin de la distribution. Pour cela un calcul de prefixe [BLE89] de type generalise est initialise en chaque noeud. A la suite d un tel calcul, chaque noeud v connat charge(T v ) et taille(T v ) charge(T u ) et taille(T u ) pour tout noeud fils u. En particulier, la racine v est a meme d evaluer la charge moyenne charge de tout le reseau et de la diffuser ....

Guy E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38(11):1526--1538, November 1989.


Detection of Recurrences in Sequential Programs with Loops - Redon, Feautrier (1993)   (9 citations)  (Correct)

....n) 2; 1; m)g; i1 i 2x:x a i 1 ;i 2 ; i1 :0) i; j) if j 1 i 1 6. 3 Comparison to Other Recurrence Operators Some other recurrence operators already exist, namely the reduction operator in the Alpha language (see [6] and the scan primitives also known as parallel prefix operations (see [3]) However we have introduced our own recurrence operator for the following motives. The Alpha operator is an operator on un ordered set of values, which is thus restricted to reduction by associative and commutative operators. It only gives the final result of the reduction, while we need the ....

G.E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526--1539, 1989.


Assessing the Usability of Parallel Programming Systems: The.. - Wilson (1993)   (16 citations)  (Correct)

....A more sophisticated strategy, in which a process only relinquished the boundary cell with the lower population, was more successful. However, this required a complex multi iteration protocol. A very different data parallel approach to the problem uses the parallel prefix (or scan) operator [Ble89]. This technique maintains several arrays, corresponding entries of which represent a single predator or prey. Populations are sorted by location index. In each iteration, one scan calculates and distributes the populations of each cell. Local calculations then determine which predators or prey ....

Guy E. Blelloch. Scans as Primitive Parallel Operations. IEEE Transactions on Computers, 38(11), November 1989.


Models and Languages for Parallel Computation - Skillicorn, Talia (1996)   (51 citations)  (Correct)

....inspired by the architecture of the Connection Machine 2. These often included a map operation, some form of reduction, perhaps using only a fixed set of operators, and later scans (parallel prefixes) and permutation operations. In approximately chronological order, these models are: scan [32], multiprefix [170] paralations [100, 171] the C data parallel language [111, 165] the scan vector model and NESL [33 38] and CamlFlight [109] As for other data parallel languages, these models are simple and fairly abstract. For instance, C is an extension of the C language that ....

G. Blelloch. Scans as primitive parallel operations. In Proceedings of the International Conference on Parallel Processing, pages 355--362, August 1987.


Hardware Support For Parallel Discrete Event Simulations - Reynolds, Jr., Pancerella (1992)   (Correct)

....can be viewed as a synchronous network that continually produces results of m global reduction operations, where each physical processor has m input registers to the PRN. The tree structure of the PRN facilitates the pipelining since a tree circuit is easier to synchronize than other structures [Blel89]. The PRN, however, operates asynchronously to the PP s, and it does not block if an input register has not been updated during the previous major cycle. 13 Each node of the PRN operates asynchronously. As input values and their corresponding operation code are available, the ALU computes a ....

....Researchers at IBM have constructed a configuration of barrier synchronization modules [HeRS89] as a low cost device for barrier synchronization. The hardware that we propose, on the other hand, provides support for a larger class of algorithms than barrier synchronization algorithms. Blelloch [Blel89] proposed a tree structured hardware implementation of parallel prefix operations. One of our future goals is to enhance our hardware design to calculate and disseminate target specific synchronization information in a PDES. Parallel prefix computations may be useful in realizing that goal. 11. ....

Blelloch, G. E., "Scans as Primitive Parallel Operations", IEEE Transactions on Computers, Vol. 38, No. 11, pp. 1526-1538, (November 1989).


The Subspace Model: Shape-based Compilation for Parallel Systems - Knobe (1997)   (Correct)

....S References that are not cyclic are parallel For each object, obj For each index, index, in the subspace of obj If obj is not expanded cyclically over index Annotate it as parallel The distinction between serial and parallel prefix expansions depends on the detection of recurrences. See [11, 9, 22] for more on parallel prefix computations. We now address the location of SCCs. The analysis of the SCCs with respect to a given loop, say i, begins with the graph whose nodes are the program nodes within the i loop. The edges among these nodes include ffl the expression tree edges ffl the ....

G. Blelloch. Scans as primitive parallel operations. In IEEE Transaction on Computers, November 1989.


Constructing List Homomorphisms for Parallelism - Gorlatch   (Correct)

.... (BMF) 1, 2] Examples of homomorphisms are simple functions, such as length, which computes the length of the list, or sum, which, for a list of numbers, yields the sum of all its elements, and also functions that are more complicated and important in practice, such as e.g. scan (prefix sums) [3]. An important property of homomorphisms, which is one of the reasons of the interest in them, is Bird s First Homomorphism Theorem: Theorem. Function h is a homomorphism iff it can be factored into the following composition of two functions: h = red ( Phi) ffi map (f ) 2) Both map and red are ....

G. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526--1538, November 1989.


Data-Parallel Primitives for Spatial Operations Using PM Quadtrees - Hoel, Samet   (Correct)

....allowed by the bucket PMR quadtree. For example, consider Figure 1c where the regions corresponding to the endpoints of line i subdivide until the maximal depth of the quadtree (three in this case) is reached. 3 Scan Model of Parallel Computation The scan model of parallel computation [2, 3] is defined in terms of a collection of primitive operations that can operate on arbitrarily long vectors (single dimensional arrays) of data. Three types of primitives (elementwise, permutation, and scan) are used to produce result vectors of equal length. A scan operation [14] takes an ....

....curved arrows in the figure) 4.2 Unshuffling Unshuffling is the process of physically separating two arbitrary, mutually exclusive and collectively exhaustive subsets of an original group. This operation, when applied without monotonic mappings, has also been termed packing [8] or splitting [2]. Unshuffling can be accomplished using two inclusive scans (one upward and one downward) two elementwise operations (an addition and a subtraction) and a permutation operator. An example unshuffling operation is shown in Figure 7. 0 1 2 3 4 5 6 a b c d e f g 1 0 0 1 0 1 0 0 1 1 1 2 2 3 0 2 3 4 ....

G. E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, 38(11):1526-- 1538, Nov. 1989.


The Reconfigurable Ring of Processors: Fine-Grain.. - Arnold Rosenberg (1997)   (2 citations)  (Correct)

....L log N) log L N = log N log log L log 2 N= log L: The second step here uses bound (3) log L N times to bound the T (N=N 0 ; L) term. 2 2. 2 Specific LTS Algorithms We describe here simple LTS algorithms for three fundamental operations: broadcasting, accumulation, and parallel prefix [2] [4] One to all broadcasting. In the operation of broadcasting, one PE with no loss of generality, P 0 sends a single packet message M to all other PEs. Let the PEs of the architecture be mapped to the nodes of a complete binary tree in any way that places PE P 0 at the root of the tree. ....

G.E. Blelloch (1989): Scans as primitive parallel operations. IEEE Trans. Comp. 38, 1526-1538.


Some Topics in Parallel Computation and Branching Programs - Sinha (1995)   (Correct)

....against the basic philosophy that the most important function of a model is to provide a high level view of the machine. The other approach is to augment the PRAM model with a set of primitives that have the same hardware complexity as reads and writes. A variety of theoretical and empirical work [Ble89, Ble90, CBZ90, KRS86, RBJ88, PS88, KRS88] has suggested that parallel prefix computations for certain associative operations can be done in time comparable to implementing reads or writes. Providing these extra primitives makes many algorithms simpler and or efficient. We call these models ....

....As we argued in Chapter 1, the solution is to augment the model with a set of primitives that can be implemented in time comparable to implementing reads and writes, in the hope that a richer instruction set will help algorithm designers. Some practical and theoretical works for parallel machines [Ble89, Ble90, CBZ90, KRS86, RBJ88, PS88, KRS88] have suggested that multiprefix operations for certain multiary operators be allowed at unit cost. We will call all such models multiprefix PRAMs. Later we will give a precise definition of multiprefix operations and multiprefix PRAMs. Informally, the ....

[Article contains additional citation context not shown here]

Guy E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38(11):1526--1538, 1989.


Implementing the Recursive Definition of Abelian Group-Based.. - Giavitto, Michel   (Correct)

.... is one of the most popular: vectors (e.g. in LISP) nested vectors (in NESL [6] 81 2 [15, 30] and multidimensional arrays (HPF [34] MOA [20] Indexical Lucid [1] Typical operations on collections structured as arrays are maps (point wise application of functions) reductions, scans [5] and various permutations or rearranging operations. Managing an array as a whole offers an abstract and semantically clean programming model, which can be embedded more easily into a declarative language. They are more problem oriented than traditional arrays because they relieve the programmer ....

....induces a topological structure on the defined collection. Let us introduce this notion through an example. In the 81 2 language [15, 30] a vector iota of size 5 with value i for the i th element, can be defined recursively, without referring to any element, by: iota = 0#(1 iota) [5] (1) where # is the concatenation of vectors and x : 5] takes the 5 first elements of x. A type inference system [16, 18] coerces the scalar 0 to a single element vector and the scalar 1 to a 5 elements vector. The operation is overloaded and the type system infers the signature [5] Theta [5] ....

[Article contains additional citation context not shown here]

Blelloch, G. Scans as primitive parallel operations. IEEE Transactions on Computers 38, 11 (Nov. 1989), 1526--1538.


Practical Parallel Divide-and-Conquer Algorithms - Hardwick (1997)   (1 citation)  Self-citation (Blelloch)   (Correct)

No context found.

Guy E. Blelloch. Scans as primitive parallel operations. In Proceedings of the 16th International Conference on Parallel Processing, pages 355--362, August 1987.


Pipelining with Futures - Blelloch, Reid-Miller (1997)   Self-citation (Blelloch)   (Correct)

.... implies time bounds of O(gw p d(T s ( p) L) on the BSP [30] where g is the BSP gap parameter and is inversely related to bandwidth and L is the BSP periodicity parameter and is related to latency, O(w p d lg p) on an asynchronous EREW PRAM [20] and O(w p d) on the EREW scan model [6]. The conversion to linear code is a simple manipulation that can be done by a compiler. Although this conversion can potentially increase the work and or depth of a computation, it does not for any of the algorithms described in this paper. In fact, linear code seems to be a natural way to define ....

....be executed. The trees L 1 and R 1 appear twice in both then and else parts, but one case is simply defining them (lines 7 and 11) while the other actually references them (lines 8 and 12) We now consider the main result of this section. Here we state the bounds in terms of the EREW scan model [6], which is the EREW extended with a unit time plus scan 4 Note that to copy the structure, the copy must be strict on the full structure all futures must be written before they can be copied. 234 G. E. Blelloch and M. Reid Miller (all prefix sums) operation. The bounds we prove on the scan ....

[Article contains additional citation context not shown here]

G. E. Blelloch. Scans as primitive parallel operations. IEEETransactions onComputers, C-38(11):1526-- 1538, Nov. 1989.


Portable Parallel Algorithms for Geometric Problems - Miller, Stout (1988)   (Correct)

No context found.

G. Blelloch, "Scans as primitive parallel operations", Proc. 1987 Int'l. Conf. Parallel Proc., pp. 355--362.


Reinstatement - Of Parent Regular   (Correct)

No context found.

G. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers, 38:1526-1538, 1989.


Descriptive Simplicity in Parallel Computing - Marr (1997)   (Correct)

No context found.

G.E. Blelloch. Scans as primitive parallel operations. In Proc. of the International Conference on Parallel Processing, pages pp. 355--362, August 1987.


Random Sampling Techniques in Parallel Computation - Raman (1998)   (Correct)

No context found.

G. E. Blelloch. Scans as primitive parallel operations. IEEE Transactions on Computers C-38 #1989#, pp. 1526#1538.


The Queue-Read Queue-Write PRAM Model: Accounting for.. - Gibbons, al. (1996)   (6 citations)  (Correct)

No context found.

G. E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, C-38(11):1526--1538, 1989.


Reduction Operations in Parallel Discrete Event Simulations - Pancerella (1994)   (Correct)

No context found.

Blelloch, G. E., "Scans as Primitive Parallel Operations", IEEE Transactions on Computers, Vol. 38, No. 11, pp.1526-1538, (November 1989).


The Queue-Read Queue-Write PRAM Model: Accounting for Contention. .. - Gibbons (1996)   (6 citations)  (Correct)

No context found.

G. E. Blelloch. Scans as primitive parallel operations. IEEE Trans. on Computers, C-38(11):1526--1538, 1989.


Parallel Algorithms for Line Generation - Rok Sosic   (Correct)

No context found.

G.E. Blelloch, Scans as Primitive Parallel Operations, IEEE Transactions on Computers, Vol. 38, No. 11, November 1989, pp. 1526-1538.


Concurrent Procesing of Linearly Ordered Data Structures on.. - Ghosh, Das, John (1993)   (Correct)

No context found.

G. Blelloch. Scans as primitive parallel operations. IEEE Trans. Comput., 38(11): 15261537, Nov. 1989.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC