54 citations found. Retrieving documents...
G. E. Blelloch, S. Chatterjee, J. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. Cvl: a c vector library manual, version 2. Technical Report CMU-CS-93-114, Carnegie Mellon University, 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

One Flip per Clock Cycle - Henz, Tan, Yap   (Correct)

....the O(1) implementation of ipping given size assumptions on clause length and variable occurrences. 2 Notation In order to analyse the parallel complexity of GSAT algorithms, we adapt the notation used in [BM99] which in turn adopts central constructs of the parallel functional language NESL [BHSZ95] We adapt the work depth model of [BM99] so that we can asymptotically determine the two factors that determine the cost of running a program on an FPGA. The number of gates needed for running the program P is denoted by g(P ) which re ects the total size of the FPGA. The depth of a program P ....

Guy Blelloch, Jonathan Hardwick, Jay Sipelstein, and Marco Zagha. NESL user's manual, version 3.1. Technical Report CMU-CS-95-169, Carnegie Mellon University, Pittsburgh, PA, 1995.


Quantitative Performance Modeling of Scientific Computations and.. - Toledo (1995)   (2 citations)  (Correct)

....computer systems requires a runtime system that is implemented on several computers. Since PERFSIM models the performance of the CM Fortran runtime system, which is implemented only on the Connection Machines CM 2 and CM 5, I have chosen to implement a new system that models the performance of CVL [18], the runtime system supporting the NESL programming language [17] Runtime subroutines in CVL operate on one dimensional vectors. Implementations of CVL exist for workstations, Cray vector computers, Connection Machines CM 2 and CM 5, Maspar computers, and for other parallel and distributed ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL:ACvector library. Technical Report CMU-CS93 -114, School of Computer Science, Carnegie Mellon University, February 1993.


Template Based Structured Collections - Nolte, Sato, Ishikawa (2000)   (6 citations)  (Correct)

....operate only on specific subsets of a collection. Furthermore a combination of reduction patterns is found in so called scan( operations that compute for each member of the collection partial reductions amongst all members with either higher (suffix operations) or lower ranks (prefix operations [3]) It should be noted that all patterns can in principle be supported by most remote method invocation systems. a) can easily be introduced by means of repeater objects that act as representatives of object groups and spread a request message to all group members. b) c) and (d) can be ....

....MPI implementation. 6. 3 Scan Operations In a scan operation all members of a collection compute a partial reduction amongst all those members either with lower equal ranks (prefix) or higher equal ranks (suf fix) 11] Scan operations are not only very important for many numerical algorithms [3], but they are suitable to study pipelining effects and the impact of topologies on collective operations. We measured therefore several prefix calculations over various balanced tree topologies. First we implemented a partial reduction by means of a conditional method (section 4.4) such that we ....

G. E. Blelloch. Prefix Sums and Their Applikations. Technical Report CMU-CS-90-190, Carnegie Mellon University, Pittsburgh, PA 15213, 1990.


TACO - Template Based Collections for Distributed.. - Nolte, Sato, Ishikawa   (Correct)

....operate only on specific subsets of a collection. Furthermore a combination of reduction patterns is found in so called scan( operations that compute for each member of the collection partial reductions amongst all members with either higher (suffix operations) or lower ranks (prefix operations [3]) We implemented only those operations that are fundamental to all collective operation patterns. An asynchronous global map( operation is provided to initiate data parallel computations. The asynchronous map( operation is complemented with a synchronous reduce( operation that executes a ....

....MPI implementation. 5. 2 Scan Operations In a scan operation all members of a collection compute a partial reduction amongst all those members either with lower equal ranks (prefix) or higher equal ranks (suf fix) 10] Scan operations are not only very important for many numerical algorithms [3], but they are suitable to study pipelining effects and the impact of topologies on collective operations. We measured therefore several prefix calculations over various balanced tree topologies. First we implemented a partial reduction by means of a conditional method such that we could ....

G. E. Blelloch. Prefix Sums and Their Applikations. Technical Report CMU-CS-90-190, Carnegie Mellon University, Pittsburgh, PA 15213, 1990.


Functional Array Fusion - Chakravarty, Keller (2001)   (4 citations)  (Correct)

....which given a data vector and corresponding segment descriptor sums up the sub arrays determined by the segment descriptor individually, resulting in an array of sums. This operation is also known as a segmented sum and known to be useful for the high performance implementation of array algorithms [4, 6]. Finally, backpermuteP is a permutation operation where the permutation vector gives the source rather than the destination index of each value. There is a list of these array combinators that appear in attened array code. The most common ones are de ned in Appendix A. The ecient implementation ....

G. E. Blelloch. Prex sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.


Nepal - Nested Data-Parallelism in Haskell - Chakravarty, Keller, al. (2001)   (1 citation)  (Correct)

.... such as foldP : scanP : The order in which individual array elements are processed is unspeci ed and the binary operation is required to be associative, thus permitting a tree like evaluation strategy with logarithmic depth (cf. [4]) Other parallel reductions are de ned in terms of these basic operations, e.g. Haskell Parallel Arrays Front End Nested Core Flattening Flat Core Unfolding Primitives Simpli er (Fusion) Flat Core Distributed Types Code Generation C library operations Figure 1: GHC with NDP ....

G. E. Blelloch. Prex sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, Nov. 1990.


On the Distributed Implementation of Aggregate Data.. - Keller, Chakravarty (1999)   (3 citations)  (Correct)

....of operations on simple and segmented vectors, which are always uniformly distributed over the available processing elements. In fact, CMU s implementation does not generate an executable, but instead emits VCODE and interprets it at runtime by an interpreter linked to their C Vector Library (CVL) [3]. Unfortunately, this approach, while working fine for vector computers, is not satisfying for shared memory [7] and distributed memory machines [14] The problems with this approach are mainly for three reasons: 1) Processor caches are badly utilized, 2) communication operations cannot be ....

....code intact. 5 Benchmarks We summarize results collected by applying our method to the implementation of Nesllike [1] nested data parallelism [17] However, we do not directly compare our code and that of CMU s implementation of Nesl [4] because the implementation of CMU s vector library CVL [3] is already an order of magnitude slower and scales worse than our vector primitives on the Cray T3E, which we used for the experiments. Our code is hand generated using the compilation and transformation rules of [17] we are currently implementing a full compiler. Local optimizations. In ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, Carnegie Mellon University, 1993.


Provably Correct Vectorization of Nested-Parallel Programs - Riely, Prins, Iyer (1996)   (3 citations)  (Correct)

....Section 3. This paper is devoted almost entirely to the first part; we address the second in Sections 4 6, but only informally. While we would like to formalize this second step, we believe that it has already been well established by implementation, experimentation, and some less formal proofs [4, 2, 10]. Because of space limitations, we have cut quite a bit from this extended abstract. We assume that the reader has a familiarity with the basic notions of operational semantics [15] For pointers to the literature on highlevel cost models, see the excellent summaries in articles by Blelloch and ....

.... 1 ; A ) O Gamma ( P i SA i ) ffi W p (A 1 ; A ) Delta D ffi E p (A 1 ; A ) O Gamma (max i DA i ) ffi T p (A 1 ; A ) Delta The primitives that we use are implemented by the Data Parallel Library (dpl) 9] an extension of the C Vector Library [4]. The implementation is non magical and meets the other constraints given above. Theorem 4. Assume the construct parameters semantics and a primitive specification that meets the constraints outlined in this section. Then a b implies a C b. 6 The construct result semantics The ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, Carnegie-Mellon University, 1993.


A Data-Parallel Declarative Language for the Simulation .. - Michel, Giavitto.. (1994)   (1 citation)  (Correct)

....It is still necessary to compute the dependency between the tasks to determine their relative order of activation. IV.2. The code generation Three different kinds of code generation have been thought of and are to be generated: code for a virtual SIMD machine written in C very close to CVL [25] and adaptated to the execution on a SIMD or vectorial architecture; a sequential standalone C code using no dynamical memory allocation nor function call stack; a full MIMD code. At the moment, the compiler written in C [26] and in an ML dialect [27] generates a code for a virtual SIMD ....

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, M. Zagha, CVL: A C Vector libray. Technical Report CMU-CS-93-114 , School of Computer Science, Carnegie Mellon University, 1993.


Special Purpose Parallel Computing - McColl (1993)   (9 citations)  (Correct)

....enormously from parallel computing. The use of parallelism in the simulation of multiple paths of interest rates is relatively straightforward. To exploit parallelism in performing a path dependent calculation along each of the paths one can use techniques such as parallel prefix computation [44, 45, 218]. Using advanced parallel systems, Hutchinson and Zenios [179] have shown that it is possible to perform the valuation of a single mortgage backed security in real time (1 2 seconds) and that the analysis of a portfolio of such securities can be performed very rapidly. In the future, special ....

G E Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Practical Issues in the Flattening of Nested.. - Faith, Palmer, Prins, .. (1995)   (Correct)

....parallel programming language, all data parallelism is expressed using an apply to all iterator construct. Our transformation system compiles Proteus iterator constructs into C with calls to the Data Parallel Library (DPL) 5] which is built on the portable vector model library, Cvl [2], as outlined in Figure 1. The key step in this process is the elimination of all iterators and the introduction of semantically equivalent vector operations. As an example of how several small transformations are applied to produce C code from Proteus, consider Example 1. In this example, the ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. Cvl: a C vector library manual, version 2. Technical report CMU-CS-93-114. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, February 1993.


Porting a Vector Library: a Comparison of MPI, Paris, CMMD and PVM - Hardwick (1994)   (Correct)

.... different machine architectures can be broken down into two categories: Scans and Reductions These apply an associative combining operator such as addition or maximum across a vector, returning either a single value (reduction) or a vector containing the running total (scan, or parallel prefix [2]) Their implementation on a parallel machine is normally via a binary tree combining network, either in hardware (CM 2, CM 5) or in software (MPI) Permutations CVL has an extensive set of functions which permute the elements of a vector into a new vector. They are specialized by type, ....

Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Semantics and compilation of sequential streams into a static.. - De Vito (1996)   (Correct)

....scheme, it is possible to generate code for a sequential, vectorial or SIMD parallel architecture. In the first case, any collection oriented data parallel operator used in 81 2 can be translated into a loop [45] For vectorial or SIMD architectures, an ad hoc library can be used (like CVL [46] or the Paris Interface [47] 3 Optimizations of the generated code The code generation can be optimized using several high level techniques. They do not interfere with low level architecture dependent optimizations possibly carried out by the final compiler. 3.1 Control expressions sharing In ....

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Enlarging the Scope of Vector-Based Computations.. - Au, Chakravarty.. (1997)   (Correct)

....nested data parallelism. Adopting a new programming language, however, has proved to be very costly and difficult throughout the history of computing. Moreover, special libraries that realise basic vector operations are being developed specifically to implement these new languages, e.g. CVL [7] for NESL. These libraries are usually machine dependent, often consisting of a number of vector operations implemented in assembler language to obtain the desired efficiency see, for example, the CRAY Y MP implementation of CVL. Such a machine dependent implementation technique constitutes ....

....An apply to each consists of a body, bindings and an optional filter. In the following example, the body sum (v) is evaluated for all subvectors bound to v. An optional filter expression can restrict the values for which the body is evaluated. Hence, the expression sum (v) v in [ 2,6] [7,4,7], 6] is evaluated to [8, 18, 6] An apply to each (realising the outer parallelism) can have a body which itself specifies a parallel computation (the inner parallelism) either by calling a parallel primitive function, as in the example above, or by employing a second (nested) apply to each, ....

[Article contains additional citation context not shown here]

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. CVL: A C vector library manual (Version 2). Technical Report CMU-CS-93-114, Carnegie Mellon University, Feb 1993.


Parallel Algorithms for Image Enhancement and.. - Bader, JaJa, Harwood.. (1996)   (5 citations)  (Correct)

....(n; p) O i n p p j : 3) A second data movement needed for SNF is the reduction operation. Each processor i has a data value, Z i , and we need the value of Z 0 Phi Z 1 Phi : Phi Z p Gamma1 , where Phi is any associative operator. Parallel computers can handle this efficiently [7], and Split C implements this as a primitive library function. A simple algorithm consists of p Gamma 1 rounds that can be pipelined [25] Each processor P i initializes a local sum to Z i . During round r, each processors then reads Z (i r)modp , for 1 r p Gamma 1, and adds this value to the ....

G.E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


General Purpose Parallel Computing - McColl (1993)   (64 citations)  (Correct)

....depth two for the operation ffi. We have shown that n bit addition can be realised by a Boolean circuit of size O(n) and depth O(log n) In functional programming [39] the second order function scan corresponds to the prefix sums computation. The above mentioned results, and the work of Blelloch [41, 42, 43] and others, have shown it to be a parallel primitive of extremely wide applicability. 2.7. Matrix Multiplication Let A; B be two n Theta n matrices of rational numbers. Then the product of A; B is an n Theta n matrix C, where c i;j = P n k=1 a i;k b k;j . The exact determination of the ....

....paper, the case for the BSP PRAM approach has been presented. In this section we will briefly mention some of the other approaches. Perhaps the most conservative of the alternatives is SIMD or data parallelism. Although a number of interesting algorithms have been developed for such architectures [41, 42, 43, 123, 246] , the model does not appear to be sufficiently general, even when extended to its SPMD form. Another conservative approach is simply to continue with architectures based on message passing across a fixed set of channels [124, 125, 135] Although such a model is adequate McCOLL : GENERAL PURPOSE ....

G E Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Practical Algorithms for Selection on Coarse-Grained Parallel.. - Al-Furiah   (Correct)

....sets, parallel graph partitioning and parallel construction of multidimensional binary search trees. Many parallel algorithms for selection have been designed for the PRAM model [2, 3, 4, 9, 14] and for various network models including trees, meshes, hypercubes and reconfigurable architectures [6, 7, 13, 16, 22]. More recently, Bader et.al. 5] implement a parallel deterministic selection algorithm on several distributed memory machines including CM 5, IBM SP 2 and INTEL Paragon. In this paper, we consider and evaluate parallel selection algorithms for coarse grained distributed memory parallel ....

G.E. Blelloch, Prefix sums and their applications, Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Practical Parallel Algorithms for Dynamic Data Redistribution.. - Bader, JaJa (1995)   (17 citations)  (Correct)

.... with one element per processor, the PREFIX Communication Library Primitive coalesces the data such that each processor k contains a single element PS[k] A[0] Phi A[1] Phi : Phi A[k] Parallel computers can handle this efficiently when the element PS[k] is assumed to reside on processor k [10], and Split C implements this as a primitive library function. An analysis for this operation on the BDM model is given in [4] Since these rounds can be realized with an CONCAT primitive operation followed by O(p) local computation of the prefix sums, the resulting complexity is ( T comm (n; ....

G.E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


A Case Study of Code Generator Generation for Embedded.. - Persson, Ringström..   (Correct)

....for front end generation, and the BEG system for code generator generation. The DML P [14] compiler generation system currently is able to produce code generators for two dataparallel architectures: the MasPar MP 1 SIMD system and the RVIP system. Data parallel calls are generated to the CVL [1] library which is implemented in full on the MasPar, but for which special code for each used operator was needed on the RVIP. The DML P system has so far been used to produce compilers for the Predula Nouveau language, which is a Pascal like programming language with data parallel extensions. ....

Guy Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library manual (version 2). Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, February 1993.


V - Nested Parallelism in C - Chakravarty, Schröer, Simons (1995)   (Correct)

....have neither implemented structure lifting nor advanced control structures. The objective is to gain some experience and performance results and, then, to improve the compiler, which translates V programs into ANSI C programs that make calls to a special runtime system and the vector library CVL [16]. The V compiler itself is written in the Gentle compiler description language [17] Future work will cover general control structures and optimizations of the basic transformations developed so far. Acknowledgments. We are grateful to Phil Bacon and Gabriele Keller for their help in preparing ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, Carnegie Mellon University, 1993.


V - Nested Parallelism in C - Chakravarty, Schröer, Simons (1995)   (Correct)

....there may be multiple generators per apply to each; in this case, all the vectors must have the same length since the ith elements of each vector are processed at the same time. 2. 2 Nested vectors and divide and conquer algorithms Prefix scan operations are often used in parallel programming [7, 12]. They are provided as primitives in V, e.g. the function plusscan folds addition into a given vector while preserving all intermediate results. For example, the result of plusscan ( 5, 3, 4, 1] is [0, 5, 8, 12] It is well known that a prefix scan for a vector of length n can be implemented in ....

....algorithms Prefix scan operations are often used in parallel programming [7, 12] They are provided as primitives in V, e.g. the function plusscan folds addition into a given vector while preserving all intermediate results. For example, the result of plusscan ( 5, 3, 4, 1] is [0, 5, 8, 12]. It is well known that a prefix scan for a vector of length n can be implemented in parallel with O(log n) time complexity [7, 12] To get a taste of the expressiveness of nested data parallelism and an idea of the principle of flattening nested dataparallel programs, consider the assignment v ....

[Article contains additional citation context not shown here]

Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Design and Implementation of 8 1/2 , a Declarative Data-Parallel.. - Michel (1996)   (Correct)

....3 Implementation of the 81 2 compiler The compiler described hereafter is restricted to programmes defining webs with a static structure. A high level block diagram of the compiler is shown in figure 4. The output can either be sequential C code or code for a virtual SIMD machine (similar to CVL [36]) 8,5 source code x = 5 y = x 2 a= 3 = x Iota 5 x = y 5 x = y 5 = x Iota 5 int x[5] int y[5] 2[5] 5 x = 5 2 x 2 y = x 2 compile y = a 3 Scheduling inference SIMD VM code generation C sequential code generation parsing Binding Geometry inference Figure 4: Block diagram ....

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Performance Prediction with Benchmaps - Toledo   (1 citation)  (Correct)

....the benchmap s parameters. Addressing these issues requires a data parallel programming system that has been ported to several computer systems. We chose NESL [5] an experimental data parallel programming language developed at CMU. The NESL programming system uses a runtime system called CVL [6] to execute NESL programs. CVL has been ported to several computers. Our performance prediction system, called BENCHCVL, predicts the performance of NESL programs on several computer systems by modeling the performance of CVL on each of them. Several related systems have been described in the ....

....across the communication network, the sizes of caches, and the cost of random and sequential data transfers between levels of the local memory hierarchy. We begin the discussion with a brief description of CVL, and then turn to a description of the models themselves. The CVL runtime library [6], which is the runtime system for NESL programs, implements operations on entire and segmented one dimensional vectors. A segmented vector is partitioned into segments of arbitrary lengths. Vector operations in CVL include element wise operations, such as adding two vectors, scans, or parallel ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Piecewise Execution of Nested Data-Parallel Programs - Palmer, Prins, Chatterjee.. (1995)   (2 citations)  Self-citation (Chatterjee)   (Correct)

No context found.

G. E. Blelloch, S. Chatterjee, J. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. Cvl: a c vector library manual, version 2. Technical Report CMU-CS-93-114, Carnegie Mellon University, 1993.


Class Notes : Programming Parallel Algorithms - Cs Fall Guy (1993)   (1 citation)  Self-citation (Blelloch)   (Correct)

No context found.

Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Cvl:ACVector Library - Manual Version Guy   Self-citation (Blelloch Chatterjee Sipelstein Zagha Manual)   (Correct)

No context found.

Guy E. Blelloch, Siddhartha Chatterjee, Fritz Knabe, Jay Sipelstein, and Marco Zagha. VCODE reference manual (version 1.1). Technical Report CMU-CS-90-146, School of Computer Science, Carnegie Mellon University, July 1990.


Practical Parallel Divide-and-Conquer Algorithms - Hardwick (1997)   (1 citation)  Self-citation (Blelloch Hardwick)   (Correct)

No context found.

Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha. NESL user's manual (for NESL version 3.1). Technical Report CMU-CS-95-169, School of Computer Science, Carnegie Mellon University, July 1995.


Practical Parallel Divide-and-Conquer Algorithms - Hardwick (1997)   (1 citation)  Self-citation (Blelloch Hardwick)   (Correct)

No context found.

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret ReidMiller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Piecewise Execution of Nested Data-Parallel Programs - Palmer, Prins, Chatterjee.. (1995)   (2 citations)  Self-citation (Chatterjee)   (Correct)

....operations in nested apply to all constructs into large data parallel operations. Both NESL [5] and Proteus [13] are high level, nested data parallel languages that use this technique to provide architecture independence by implementing the data parallel operations with portable vector operations [6]. 1.2 Excessive memory requirements of flattened programs The flattening technique fully parallefizes every apply to all construct, providing large amounts of fine grained potential parallelism, but introduces temporaries whose sizes are proportional to the potential parallelism. The generality ....

....and have set D = 5, 2, 7, 3] For the serialized outer iterator code, if the size of T exceeds the number of processors, we must use multiple steps to complete the computation using virtual processors (see Fig. lb) Serialized Outer Iterator 2 5 [1,2,3,4,5] 120, 1 2 [1,2] 120,2, 2 7 [1,2,3,4,5,6,7] [120,2,5040, 1 3 [1,2,3] 120,2,5040,6] 1 4 [1,2,3,4 [ 1 4 5] 1,2] 1 [120,2, 1 4 2,3,4,5 [120,2, 1 4 6,7] 1,2 [120,2,5040, 1 4 3] 120,2,5040,6] Table 1. Comparison of Approaches to Partial Serialization of Factorial Program For comparison, executing the flattened program ....

[Article contains additional citation context not shown here]

G.E. Blelloch, S. Chatterjee, J. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. Cvl: a c vector library manual, version 2. Technical Report CMU-CS-93-114, Carnegie Mellon University, 1993.


Porting a Vector Library: a Comparison of MPI, Paris, CMMD and PVM - Hardwick (1994)   Self-citation (Hardwick)   (Correct)

....for certain primitives. Finally, we discuss the design limitationsof CVL when implemented on current RISC based MPP architectures, and outline our plans to overcome this by using MPI as a compiler target. CVL and associated languages are available via FTP. 1 CVL overview CVL (C Vector Library [6]) is a library of over 220 low level vector functions callable from C. It provides an abstract vector memory model that is independent of the underlying architecture, and was designed so that efficient implementations could be developed for a wide variety of parallel machines. Machine specific ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL:AC vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Interactive Simulations on the Web: Compiling NESL into.. - Hardwick, Narlikar.. (1997)   (1 citation)  Self-citation (Hardwick Sipelstein)   (Correct)

.... of three layers, as shown in Figure 1 (see [7] for full details) The interactive front end of the system compiles Nesl programs into a machine independent intermediate language called Vcode [5] The front end then invokes a Vcode interpreter, which in turn calls the low level Cvl vector library [6]. The primary advantage of using an interpreted intermediate language for Nesl is that it allows users to switch transparently between running their programs on workstations (for development) and supercomputers (for performance) Serial Cray CVL MPI CVL VCODE Interpreter CVL parallel language ....

....a stack based execution model, and a design that allows easy interpretation. The interpreter s main tasks are to manage the stack and vector memory, and to implement the vector operations via calls to Cvl (C Vector Library) a machine specific library that implements an abstract vector machine [6]. 3 Compiling Nesl into Java We considered three different ways to improve Nesl s portability using Java: write a Vcode interpreter in Java, rewrite the Nesl compiler so that it generates Java, or write a translator from Vcode into Java. The first approach would impose an additional layer of ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


UnCvL: The University of North Carolina C Vector Library - Faith, Hoffman, Stahl (1993)   (6 citations)  Self-citation (Library)   (Correct)

....Chapel Hill, North Carolina 27599 3175 1 INTRODUCTION 1 1 Introduction This paper describes our efforts in implementing a version of Cvl (C Vector Library) for the MasPar MP 1 SIMD computer. Cvl is a library of rudimentary vector routines, callable from C, as described by G. Blelloch, et al. [2, 6]. UnCvl is an implementation of the Cvl library routines written in mpl, MasPar s parallel version of C. The main motivation for implementing Cvl on the MP 1 is to provide support for Nesl, a high level, portable parallel programming interface [3] Nesl compiles to an intermediate language called ....

....programs on the MasPar MP 1. 2 Problem Description The obvious goals of this project were to correctly implement all of the Cvl functions and to have the coded routines run as efficiently as possible on the MP 1 architecture. Working from the rather vague descriptions of the Cvl functions in [2, 6], and checking the results of our work against test programs known to run on the sequential version of Cvl, we have written code for all the documented functions and for many of the undocumented functions. 2 The question of efficiency is a bit more complicated since the efficiency of the library ....

J. Sipelstein, G. Blelloch, S. Chatterjee, J. Hardwick, and M. Zagha. Cvl: C vector library manual, version 2. Technical Report CMU-CS-93-114, Carnegie Mellon University, School of Computer Science, 1993.


Solving Linear Recurrences with Loop Raking - Blelloch, Chatterjee, Zagha (1992)   (2 citations)  Self-citation (Blelloch)   (Correct)

....= 0; i s; i ) Va = a[i:k i:s] Vb = b[i:k i:s] Vprod = Vsum Va; Vsum = Vprod Vb; x[i:k i:s] Vsum; 4. 2 Algorithm for R2 We now consider the second order linear recurrence x[i] a[i] x[i 1] b[i] x[i 2] The computations in this recurrence involve multiplying 2 Theta 2 matrices [2]. The registers V 11, V 12, V 21, and V 22 hold the elements from the corresponding positions in the matrices. The first phase of this algorithm is as follows. V11 = 1.0; V12 = 0.0; V21 = 0.0; V22 = 1.0; k = l 1) s; for (i = 0; i s; i ) Vb = b[i:k i:s] Va = a[i:k i:s] Vtemp2 = V12 Vb; ....

Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


An Efficient Implementation of Nested Data Parallelism for.. - Hardwick (1996)   (4 citations)  Self-citation (Hardwick)   (Correct)

....some examples of which are listed in Table 1. On two sample algorithms (quicksort and a two dimensional convex hull algorithm) the code produced is significantly faster than the equivalent algorithm expressed in the current NESL system (a highlevel sequence based nested data parallel language [7]) and shows the potential for further speedup. The rest of this paper is arranged as follows. Section 2 uses a simple irregular divide and conquer algorithm to il Algorithm Reference Barnes Hut n body [9] Delaunay triangulation [2] Geometric graph separators [26] Two dimensional convex hull ....

....[7] and shows the potential for further speedup. The rest of this paper is arranged as follows. Section 2 uses a simple irregular divide and conquer algorithm to il Algorithm Reference Barnes Hut n body [9] Delaunay triangulation [2] Geometric graph separators [26] Two dimensional convex hull [7] Table 1. Examples of irregular divide andconquer algorithms. lustrate the theoretical performance advantages of the nested data parallel model, and the practical difficulties of writing an efficient implementation. Section 3 contains a brief summary of the nested data parallel language NESL, and ....

[Article contains additional citation context not shown here]

G. E. Blelloch, J. C. Hardwick, J. Sipelstein, and M. Zagha. NESL user's manual (for NESL version 3.1). Technical Report CMU-CS-95-169, School of Computer Science, Carnegie Mellon University, July 1995.


An Efficient Implementation of Nested Data Parallelism for.. - Hardwick (1996)   (4 citations)  Self-citation (Hardwick)   (Correct)

....final layer is CVL, a run time library that is the only part of the system that needs to be rewritten for a new machine. CVL provides an abstract segmented vector machine model that is independent of the underlying architecture, and a variety of data parallel functions that operate on the vectors [6]. This three layer model is a good match for the vector and SIMD machines originally targeted by NESL. Although their compilers could not directly generate efficient code for segmented operations expressed in a language such as C, it was easy to implement the operations in CVL in terms of the ....

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. ReidMiller, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, Feb. 1993.


Implementation of a Portable Nested Data-Parallel.. - Blelloch, Chatterjee.. (1994)   (97 citations)  Self-citation (Blelloch Chatterjee Sipelstein Zagha)   (Correct)

....all polymorphic functions to specific types and generates code for each type. The type system is such that the compiler can determine all types to which a particular function is going to be applied at compile time. The compiler must, however, have access to the whole program. 3. 2 VCODE VCODE [12, 14] was designed as a testbed for a systematic study of compiler and implementation issues that arise in data parallel languages. Accordingly, its design concentrates on data parallelism to the exclusion of other issues more commonly seen in language designs, such as data structures and advanced ....

Guy E. Blelloch, Siddhartha Chatterjee, Fritz Knabe, Jay Sipelstein, and Marco Zagha. VCODE reference manual (version 1.1). Technical Report CMU-CS-90-146, School of Computer Science, Carnegie Mellon University, July 1990.


Implementation of a Portable Nested Data-Parallel.. - Blelloch, Chatterjee.. (1994)   (97 citations)  Self-citation (Blelloch Chatterjee Hardwick Sipelstein Zagha)   (Correct)

....language are so delicately balanced that permitting nested structures would likely topple it. 3 System Overview The full implementation of NESL consists of a NESL compiler, an intermediate language called VCODE [12] an interpreter for VCODE, and a portable library of parallel routines called CVL [13]. We also have an experimental VCODE compiler for shared memory MIMD machines [19, 20] The roles of the different components are shown in Figure 5. This section gives an overview of the each of these components. The NESL execution times reported in this paper are for interpreted VCODE. Use of an ....

....multiprocessors. The native C compiler is then invoked to produce the final machine code. This makes the compiler portable and allows the use of the best compiler technology available for the machine. 3. 5 CVL To enable rapid porting of VCODE to new machines, we designed CVL (C Vector Library) [13], a library of low level segmented vector routines callable from C. These are used by a VCODE interpreter, described in Section 3.3. The purpose of CVL is to provide a portable segmented vector abstraction that can be efficiently implemented on a wide range of machines. This library can then be ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Implementation of a Portable Nested Data-Parallel.. - Blelloch, Chatterjee.. (1994)   (97 citations)  Self-citation (Blelloch)   (Correct)

....parallel implementations. Nested parallelism is achieved through the ability to apply functions in parallel to each element of a sequence. NESL s apply to each form is specified using a set like notation similar to set formers in SETL [52] For example, the NESL expression negate(a) a in [3, 4, 9,5] a 4 is read as in parallel, for each a in the sequence [3, 4, 9, 5] such that a is less than 4, negate a . The expression returns [ 3, 4, 9] Parallelism is available both in the evaluation of the expression to the left of the colon ( and in the subselection to the right of the pipe ....

....ability to apply functions in parallel to each element of a sequence. NESL s apply to each form is specified using a set like notation similar to set formers in SETL [52] For example, the NESL expression negate(a) a in [3, 4, 9,5] a 4 is read as in parallel, for each a in the sequence [3, 4, 9, 5] such that a is less than 4, negate a . The expression returns [ 3, 4, 9] Parallelism is available both in the evaluation of the expression to the left of the colon ( and in the subselection to the right of the pipe ( This parallel subselection can be implemented with packing techniques ....

[Article contains additional citation context not shown here]

Guy E. Blelloch. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990.


Porting a Vector Library: a Comparison of MPI, Paris, CMMD and.. - Hardwick (1994)   Self-citation (Hardwick Library)   (Correct)

....as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the United States Government. Keywords: CVL, MPI, NESL, parallel language, vector library, nested data parallelism, segmented operations 1 CVL overview CVL (C Vector Library [5]) is a library of vector functions callable from C. It provides an abstract vector memory model that is independent of the underlying architecture, and is designed so that efficient CVL implementations can be developed for a wide variety of parallel machines. Machinespecific versions currently ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


NESL: A Nested Data-Parallel Language (Version 3.1) - Blelloch (1995)   Self-citation (Blelloch)   (Correct)

....to optimized machine specific code for these machines. Note: This report is an updated version of CMU CS 92 103, which described version 2.4 of the language, and of CMU CS 93 129, which described version 2.6 of the language. Some other documents that describe Nesl are: ffl The user s manual [11]. ffl An overview of the implementation with some timing results [8] ffl A formal definition of the Nesl cost model [23] Contents 1 Introduction 2 1.1 Parallel Operations on Sequences : 4 1.2 Nested Parallelism : ....

....the apply to each construct, a second way to take advantage of parallelism in Nesl is through a set of sequence functions. The sequence functions operate on whole sequences and all have relatively simple parallel implementations. For example the function sum sums the elements of a sequence. sum([2, 1, 3, 11, 5]) 16 : int Since addition is associative, this can be implemented on a parallel machine in logarithmic time using a tree. Another common sequence function is the permute function, which permutes a sequence based on a second sequence of indices. For example: permute( nesl , 2,1,3,0] ....

[Article contains additional citation context not shown here]

Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha. NESL user's manual (for NESL version 3.1). Technical Report CMU-CS-95-169, Carnegie Mellon University, July 1995.


NESL: A Nested Data-Parallel Language (Version 3.1) - Blelloch (1995)   Self-citation (Blelloch)   (Correct)

....both divide and conquer algorithms and algorithms with nested data structures [7] 3. To generate efficient code for a variety of architectures, including both SIMD and MIMD machines, with both shared and distributed memory. Nesl currently generates a portable intermediate code called Vcode [9], which runs on vector multiprocessors (the CRAY C90 and J90) as well as distributed memory machines (the IBM SP2, Intel Paragon, and Connection Machine CM 5) Various benchmark algorithms achieve very good running times on these machines [16, 8] 4. To be well suited for describing parallel ....

.... rules can be used to combine the two complexities across expressions and, based on Brent s scheduling principle [14] the two complexities place an upper bound on the asymptotic running times for the parallel random access machine (PRAM) 19] The current compiler translates Nesl to Vcode [9], a portable intermediate language. The compiler uses a technique called flattening nested parallelism [13] to translate Nesl into the simpler flat data parallel model supplied by Vcode. Vcode is a small stackbased language with about 100 functions all of which operate on sequences of atomic ....

[Article contains additional citation context not shown here]

Guy E. Blelloch, Siddhartha Chatterjee, Fritz Knabe, Jay Sipelstein, and Marco Zagha. VCODE reference manual (version 1.1). Technical Report CMU-CS-90-146, School of Computer Science, Carnegie Mellon University, July 1990.


Programming Parallel Algorithms - Blelloch (1996)   (80 citations)  Self-citation (Blelloch)   (Correct)

.... syntax 2 , and allows the user to run the environment, including the compiler, on a local workstation while executing interactive calls to Nesl programs on the CRAY Y MP or CM 2 (or any other workstation, if so desired) Nesl currently generates a portable intermediate code called Vcode [7]. Control parallel languages that have some feature that are similar to Nesl include ID [22, 1] Sisal [17] and Proteus [20] ID and Sisal are both side effect free and supply operations on collections of values. 2 The Vanilla PRAM model This section considers a few of the problems with trying ....

Guy E. Blelloch, Siddhartha Chatterjee, Fritz Knabe, Jay Sipelstein, and Marco Zagha. VCODE reference manual (version 1.1). Technical Report CMU-CS-90-146, School of Computer Science, Carnegie Mellon University, July 1990.


Implementation of a Portable Nested Data-Parallel Language - Blelloch (1994)   (97 citations)  Self-citation (Blelloch Chatterjee Hardwick Sipelstein Zagha)   (Correct)

....compiler analysis to determine places where there is the possibility of interaction. 3 System Overview Our implementation of Nesl consists of an intermediate language called Vcode [9] a compiler [15, 16] and interpreter for Vcode, and a portable library of parallel routines called Cvl [10]. Figure 5 illustrates how the implementation is organized. We split our system along these lines so that we could concentrate on different aspects of the system in isolation. This section gives an overview of the various components of the system: a more detailed description can be found in the ....

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, Feb. 1993.


Java as an Intermediate Language - Hardwick, Sipelstein (1996)   (10 citations)  Self-citation (Hardwick Sipelstein)   (Correct)

....Note that Vcode shares several properties with Java bytecode [10] portability, strong typing, a stack based execution model, and a design allowing for easy interpretation. At the bottom of the system is Cvl (C Vector Library) a machine specific library that implements an abstract vector machine [6]. An example of a Cvl function is addwuz, which adds the corresponding elements of two integer vectors together and returns the results in a third vector. Cvl is the only part of the system that must be rewritten for a new architecture [11] 3 Implementing Vcode in Java To use Java as an ....

Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Margaret Reid-Miller, Jay Sipelstein, and Marco Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


Segmented Operations for Sparse Matrix Computation on.. - Blelloch, Heroux, Zagha (1993)   (4 citations)  Self-citation (Blelloch Zagha)   (Correct)

....pack operation can be used to create a new sparse matrix consisting of a subset of elements from another sparse matrix. A segmented copy operation can be used to distribute a different value to the elements of each row. These operations have been efficiently implemented for a variety of machines [6, 7, 10]. 5.4 CSC SEGMV and Symmetric Matrices Segmented vector operations can also be used to implement a column oriented version of sparse matrix multiplication. This could be used along with a row oriented version to process symmetric matrices directly, rather than expanding them into a full ....

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, Feb. 1993.


Nesl: A Nested Data-Parallel Language - Blelloch (1990)   (152 citations)  Self-citation (Blelloch)   (Correct)

....describing both divide and conquer algorithms and algorithms with nested data structures. 3. To generate efficient code for a variety of architectures, including both SIMD and MIMD machines, with both shared and distributed memory. Nesl currently generates a portable intermediate code called Vcode [5], which runs on the CRAY Y MP, the Connection Machine CM 2, and the Encore Multimax. Various benchmarks algorithms achieve very good running times on these machines [10, 4] 4. To be well suited for describing parallel algorithms, and to supply a mechanism to derive the theoretical running time ....

....a = b 0 i = 3 (rep v a i) a 0 a 1 a 2 b 0 a 4 ] length v) fint v.alpha : alpha in anyg Returns the length of a vector. index l) fint intg Given an integer, index returns a vector of that length with consecutive integers starting at 0 in the elements. For example: l = 8 (index l) [0 1 2 3 4 5 6 7] 4.2.2 Scans and Reduces ( scan a) fv.alpha v.alpha : alpha in numberg Given a vector of integers, scan returns to each position of a new equal length vector, the sum of all previous positions in the source. For example: a = 1 3 5 7 9 11 13 15] scan a) 0 1 4 9 16 25 36 49] ....

[Article contains additional citation context not shown here]

Guy E. Blelloch, Siddhartha Chatterjee, Fritz Knabe, Jay Sipelstein, and Marco Zagha. VCODE reference manual (version 1.1). Technical Report CMU-CS-90-146, School of Computer Science, Carnegie Mellon University, July 1990.


NESL: A Nested Data-Parallel Language (Version 2.6) - Blelloch (1993)   Self-citation (Blelloch)   (Correct)

....both divide and conquer algorithms and algorithms with nested data structures [5] 3. To generate efficient code for a variety of architectures, including both SIMD and MIMD machines, with both shared and distributed memory. Nesl currently generates a portable intermediate code called Vcode [7], which runs on the CRAY Y MP, the Connection Machine CM 2, and the Encore Multimax. Various benchmark algorithms achieve very good running times on these machines [12, 6] 4. To be well suited for describing parallel algorithms, and to supply a mechanism for deriving the theoretical running time ....

.... rules can be used to combine the two complexities across expressions and, based on Brent s scheduling principle [10] the two complexities place an upper bound on the asymptotic running times for the parallel random access machine (PRAM) 16] The current compiler translates Nesl to Vcode [7], a portable intermediate language. The compiler uses a technique called flattening nested parallelism [9] to translate Nesl into the much simpler flat data parallel model supplied by Vcode. Vcode is a small stack based language with about 100 functions all of which operate on sequences of atomic ....

[Article contains additional citation context not shown here]

Guy E. Blelloch, Siddhartha Chatterjee, Fritz Knabe, Jay Sipelstein, and Marco Zagha. VCODE reference manual (version 1.1). Technical Report CMU-CS-90-146, School of Computer Science, Carnegie Mellon University, July 1990.


Design and Implementation of a Declarative Data-Parallel.. - Michel, Giavitto (1994)   (Correct)

No context found.

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, M. Zagha, CVL: A C Vector libray. Technical Report CMU-CS-93-114 , School of Computer Science, Carnegie Mellon University, February 1993. 10


Design and Implementation of 8½, a Declarative.. - Michel (1995)   (Correct)

No context found.

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.


8½: Data-Parallelism and Data-Flow - Michel, De Vito, Sansonnet (1996)   (1 citation)  (Correct)

No context found.

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, M. Reid-Miller, J. Sipelstein, and M. Zagha. CVL: A C vector library. Technical Report CMU-CS-93-114, School of Computer Science, Carnegie Mellon University, February 1993.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC