Results 1 - 10
of
152
Models and Languages for Parallel Computation
- ACM COMPUTING SURVEYS
, 1998
"... We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architecture-independent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in ..."
Abstract
-
Cited by 168 (4 self)
- Add to MetaCart
We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architecture-independent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in 6 categories, depending on the level of abstraction they provide.
Region streams: functional macroprogramming for sensor networks
- In Proceeedings of the 1st international workshop on Data manageBibliography 295 ment for sensor networks: in conjunction with VLDB 2004
, 2004
"... Sensor networks present a number of novel pro-gramming challenges for application develop-ers. Their inherent limitations of computational power, communication bandwidth, and energy de-mand new approaches to programming that shield the developer from low-level details of resource management, concurr ..."
Abstract
-
Cited by 138 (8 self)
- Add to MetaCart
(Show Context)
Sensor networks present a number of novel pro-gramming challenges for application develop-ers. Their inherent limitations of computational power, communication bandwidth, and energy de-mand new approaches to programming that shield the developer from low-level details of resource management, concurrency, and in-network pro-cessing. We argue that sensor networks should be programmed at the global level, allowing the com-piler to automatically generate nodal behaviors from a high-level specification of the network’s global behavior. This paper presents the design of a functional macroprogramming language for sensor net-works, called Regiment. The essential data model in Regiment is based on region streams, which represent spatially distributed, time-varying col-lections of node state. A region stream might rep-resent the set of sensor values across all nodes in an area or the aggregation of sensor values within that area. Regiment is a purely functional lan-guage, which gives the compiler considerable lee-way in terms of realizing region stream opera-tions across sensor nodes and exploiting redun-dancy within the network. We describe the initial design and implementation of Regiment, including a compiler that transforms a macroprogram into an efficient nodal program based on a token machine. We present a progress-sion of simple programs that illustrate the power of Regiment to succinctly represent robust, adap-tive sensor network applications. Copyright 2004, held by the author(s)
Accelerator: using data parallelism to program GPUs for general-purpose uses
- in Proceedings of the 12th international conference on Architectural
, 2006
"... GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs ..."
Abstract
-
Cited by 117 (0 self)
- Add to MetaCart
(Show Context)
GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead. Programmers use a conventional imperative programming language and a library that provides only high-level data-parallel operations. No aspects of GPUs are exposed to programmers. The library implementation compiles the data-parallel operations on the fly to optimized GPU pixel shader code and API calls. We describe the compilation techniques used to do this. We evaluate the effectiveness of using data parallelism to program GPUs by providing results for a set of compute-intensive benchmarks. We compare the performance of Accelerator versions of the benchmarks against hand-written pixel shaders. The speeds of the Accelerator versions are typically within 50 % of the speeds of hand-written pixel shader code. Some benchmarks significantly outperform C versions on a CPU: they are up to 18 times faster than C code running on a CPU.
Powerlist: a structure for parallel recursion
- ACM Transactions on Programming Languages and Systems
, 1994
"... Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic pro ..."
Abstract
-
Cited by 66 (2 self)
- Add to MetaCart
(Show Context)
Many data parallel algorithms – Fast Fourier Transform, Batcher’s sorting schemes and prefixsum – exhibit recursive structure. We propose a data structure, powerlist, that permits succinct descriptions of such algorithms, highlighting the roles of both parallelism and recursion. Simple algebraic properties of this data structure can be exploited to derive properties of these algorithms and establish equivalence of different algorithms that solve the same problem.
Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors
- In PASCO ’07: Proceedings of the 2007 international workshop on Parallel symbolic computation
, 2007
"... The high availability of multiprocessor clusters for com-puter science seems to be very attractive to the engineer because, at a first level, such computers aggregate high per-formances. Nevertheless, obtaining peak performances on irregular applications such as computer algebra problems re-mains a ..."
Abstract
-
Cited by 64 (14 self)
- Add to MetaCart
(Show Context)
The high availability of multiprocessor clusters for com-puter science seems to be very attractive to the engineer because, at a first level, such computers aggregate high per-formances. Nevertheless, obtaining peak performances on irregular applications such as computer algebra problems re-mains a challenging problem. The delay to access memory is non uniform and the irregularity of computations requires to use scheduling algorithms in order to automatically balance the workload among the processors. This paper focuses on the runtime support implementa-tion to exploit with great efficiency the computation re-sources of a multiprocessor cluster. The originality of our approach relies on the implementation of an efficient work-stealing algorithm for a macro data flow computation based on minor extension of POSIX thread interface.
Transforming High-Level Data-Parallel Programs into Vector Operations
- Proceedings Principles and Practices of Parallel Programming 93, ACM
, 1993
"... Fully-parallel execution of a high-level data-parallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators throug ..."
Abstract
-
Cited by 54 (21 self)
- Add to MetaCart
(Show Context)
Fully-parallel execution of a high-level data-parallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators through the constructs of the language. 1.
Optimal Evaluation of Array Expressions on Massively Parallel Machines
- ACM TRANS. PROG. LANG. SYST
, 1992
"... ..."
C**: A Large-Grain, Object-Oriented, Data-Parallel Programming Language
- LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (5TH INTERNATIONAL WORKSHOP
, 1992
"... C** is a new data-parallel programming language based on a new computation model called large-grain data parallelism. C** overcomes many disadvantages of existing data-parallel languages, yet retains their distinctive and advantageous programming style and deterministic behavior. This style makes da ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
(Show Context)
C** is a new data-parallel programming language based on a new computation model called large-grain data parallelism. C** overcomes many disadvantages of existing data-parallel languages, yet retains their distinctive and advantageous programming style and deterministic behavior. This style makes data parallelism well-suited for massively-parallel computation. Large-grain data parallelism enhances data parallelism by permitting a wider range of algorithms to be expressed naturally. C is an object-oriented programming language that inherits data abstraction features from C++. Existing scientific programming languages do not provide modern programming facilities such as operator extensibility, abstract datatypes, or object-oriented programming. C ---and its sequential subset C ++ ---support modern programming practices and enable a single language to be used for all parts of large, complex programs and libraries. This technical report consists of three parts. The body of t...
The Cilk System for Parallel Multithreaded Computing
, 1996
"... Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
(Show Context)
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel programming. In particular, it is still diiticult to build eiticient implementations of parallel applications whose communication patterns are either highly irregular or dependent upon dynamic information. Multithreading has become an increasingly popular way to implement these dynamic, asynchronous, concurrent programs. Cilk (pronounced "silk") is our C-based multithreaded computing system that provides provably good performance guarantees. This thesis describes the evolution of the Cilk language and runtime system, and describes applications which affected the evolution of the system.
Parallelization in Calculational Forms
- In 25th ACM Symposium on Principles of Programming Languages
, 1998
"... The problems involved in developing efficient parallel programs have proved harder than those in developing efficient sequential ones, both for programmers and for compilers. Although program calculation has been found to be a promising way to solve these problems in the sequential world, we believe ..."
Abstract
-
Cited by 40 (27 self)
- Add to MetaCart
(Show Context)
The problems involved in developing efficient parallel programs have proved harder than those in developing efficient sequential ones, both for programmers and for compilers. Although program calculation has been found to be a promising way to solve these problems in the sequential world, we believe that it needs much more effort to study its effective use in the parallel world. In this paper, we propose a calculational framework for the derivation of efficient parallel programs with two main innovations: - We propose a novel inductive synthesis lemma based on which an elementary but powerful parallelization theorem is developed. - We make the first attempt to construct a calculational algorithm for parallelization, deriving associative operators from data type definition and making full use of existing fusion and tupling calculations. Being more constructive, our method is not only helpful in the design of efficient parallel programs in general but also promising in the construc...