The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables ' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver
|
975
|
Performance FORTRAN Forum. High performance FORTRAN language specification version 2.0
– High
- 1997
|
|
883
|
Actors: A Model of Concurrent Computation in Distributed Systems
– Agha
- 1986
|
|
406
|
Supercompilers for Parallel and Vector Computers
– Zima, Chapman
- 1991
|
|
403
|
Introduction to parallel computing: design and analysis of algorithms, (Benjamin-Cummings Addison-Wesley Publishing Company,Inc
– Kumar, Grama, et al.
- 1994
|
|
392
|
Monitors: an operating systems structuring concept
– Hoare
- 1974
|
|
319
|
Orca: A language for parallel programming of distributed systems
– Bal, Kaashoek, et al.
- 1992
|
|
303
|
Viewing control structures as patterns of passing messages
– Hewitt
- 1977
|
|
301
|
Fortran-D language specification
– Fox, Hiranandani, et al.
- 1991
|
|
250
|
Scheduling techniques for concurrent systems
– Ousterhout
- 1982
|
|
175
|
Compiler-based prefetching for recursive data structures
– Luk, Mowry
- 1996
|
|
142
|
J.R.: Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling
– Ammons, Ball, et al.
- 1997
|
|
119
|
Run-time scheduling and execution of loops on message passing machines
– Saltz, Crowley, et al.
- 1990
|
|
74
|
On the Design of Chant: A Talking Threads Package
– Haines, Cronk, et al.
- 1994
|
|
68
|
Vienna Fortran - a language specification
– Zima, Brezany, et al.
- 1992
|
|
66
|
Improving data-flow analysis with path profiles
– Ammons, Larus
- 1998
|
|
57
|
Extending HPF for advanced data-parallel applications
– Chapman, Mehrotra, et al.
- 1994
|
|
51
|
Multilanguage Parallel Programming of Heterogeneous Machines
– Bisiani, Forin
- 1988
|
|
51
|
Programming distributed memory architectures using Kali
– Mehrotra, Rosendale
- 1991
|
|
46
|
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
– Hall, Kogge, et al.
- 1999
|
|
39
|
Runtime compilation techniques for data partitioning and communication schedule reuse
– Ponnusamy, Saltz, et al.
- 1993
|
|
37
|
Opus: A Coordination Language for Multidisciplinary Applications,” ICASE
– Chapman, Haines, et al.
- 1997
|
|
32
|
Qualified data flow problems
– Holley, Rosen
- 1981
|
|
31
|
A Unified Framework for Optimizing Communication in Data-Parallel Programs
– Gupta, Schonberg, et al.
- 1996
|
|
24
|
Microservers: A new memory semantics for massively parallel computing
– Brockman, Kogge, et al.
- 1999
|
|
24
|
Compiling for Distributed Memory Systems
– Zima, Chapman
- 1993
|
|
22
|
The EXECUBE approach to massively parallel processing
– Kogge
- 1994
|
|
21
|
A case for intelligent DRAM: IRAM
– Patterson, Anderson, et al.
- 1997
|
|
20
|
CHOICES: A parallel object-oriented operating system
– Campbell, Islam
- 1993
|
|
19
|
A future-based parallel language for a general-purpose highly-parallel computer
– Callahan, Smith
- 1990
|
|
12
|
High-Level Management of Communication Schedules in HPF-like Languages
– Benkner, Mehrotra, et al.
- 1998
|
|
12
|
Ropes: Support for collective operations among distributed threads
– Haines, Mehrotra, et al.
- 1995
|
|
12
|
Enhancing OpenMP with features for locality control
– Chapman, Mehrotra, et al.
- 1998
|
|
12
|
An Actor System for Efficient and Scalable Concurrent Computing
– THAL
- 1997
|
|
11
|
A Language for Concurrent Symbolic Computation
– Multilisp
- 1985
|
|
9
|
High Performance Fortran: History, Status and Future
– Mehrotra, Zima
- 1998
|
|
6
|
Compiling High Performance Fortran for Distributed-Memory Architectures
– Benkner, Zima
- 1999
|
|
6
|
A Framework for Parallel Distributed Computing
– PVM
- 1990
|
|
5
|
Language Support for Multidisciplinary Applications
– Mehrotra, Zima
- 1998
|
|
5
|
Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation
– Vienna
- 1997
|
|
5
|
COSMOS: An operating system for a fine-grain concurrent computer
– Horwat, Totty, et al.
- 1993
|
|
4
|
A Methodology for Programming Scalable Architectures
– Panwar, Agha
- 1994
|
|
4
|
The Complete Reference
– MPI
- 1995
|
|
3
|
OpenMP Fortran Application Program Interface, Version 1.1. http://www.openmp.org
– Consortium
- 1999
|
|
3
|
In: G.V.Wilson and P.Lu (Eds.): Parallel Programming Using CC++, Chapter 13
– pC
- 1996
|
|
2
|
On the Design of Chant: ATalking Threads Package
– Haines, Mehrotra
- 1994
|
|
2
|
In: G.V.Wilson and P.Lu (Eds.): Parallel Programming Using
– CC
- 1996
|
|
2
|
An Actor System for Ecient and Scalable Concurrent Computing
– THAL
- 1997
|
|
1
|
Enhancing OpenMP with Features for LocalityControl
– Chapman, Zima
- 1998
|
|
1
|
Fortran D language speci
– Fox, Kennedy, et al.
- 1991
|
|
1
|
Mapping Irregular Applications to DIVA,aPIM-BasedDataIntensiveArchitecture
– Hall, Diniz, et al.
- 1999
|