The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate asetof variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables ' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributions are rst-class objects of the model, supporting the dynamic management of data and threads in memory. This oers the
exibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of
exible scheduling strategies for the access to resources, using a monitor-like
|
963
|
Performance Fortran Forum. High Performance Fortran language specification version 1.0
– High
- 1993
|
|
865
|
ACTORS, a model of concurrent computation in distributed systems
– Agha
- 1986
|
|
401
|
Supercompilers for Parallel and Vector Computers
– Zima, Chapman
- 1991
|
|
389
|
Introduction To Parallel Computing: Design And Analysis
– Kumar, Grama, et al.
- 1994
|
|
384
|
Monitors: an operating system structuring concept
– Hoare
- 1974
|
|
314
|
Orca: A language for parallel programming of distributed systems
– Bal, Kaashoek, et al.
- 1992
|
|
299
|
Control structure as patterns of passing messages
– Hewitt
- 1979
|
|
298
|
Fortran D Language Specification
– Fox, Hiranandani, et al.
- 1990
|
|
240
|
Scheduling Techniques for Concurrent Systems
– Ousterhout
- 1982
|
|
165
|
Compiler-based prefetching for recursive data structures
– Luk, Mowry
- 1996
|
|
134
|
Exploiting hardware performance counters with flow and context sensitive profiling
– Ammons, Ball, et al.
- 1997
|
|
114
|
Run-time scheduling and execution of loops on message passing machines
– Saltz, Crowley, et al.
- 1991
|
|
74
|
On the design of Chant: A talking threads package
– Haines, Cronk, et al.
- 1994
|
|
67
|
Vienna Fortran - a language specification
– Zima, Brezany, et al.
- 1992
|
|
65
|
Improving data-flow analysis with path profiles
– Ammons, Larus
- 1998
|
|
57
|
Extending HPF for Advanced Data Parallel Applications
– Chapman, Mehrotra, et al.
- 1994
|
|
51
|
Multilanguage Parallel Programming of Heterogeneous Machines
– Bisiani, Forin
- 1988
|
|
51
|
Programming distributed memory architectures using Kali
– Mehrotra, Rosendale
- 1991
|
|
44
|
Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture,” Supercomputing
– Hall, Kogge, et al.
- 1999
|
|
38
|
Runtime Compilation Techniques for Data Partitioning and Communication Schedule Reuse
– Ponnusamy, Saltz, et al.
- 1993
|
|
37
|
Opus: A Coordination Language for Multidisciplinary Applications,” ICASE
– Chapman, Haines, et al.
- 1997
|
|
31
|
A Unified Framework for Optimizing Communication in Data-Parallel Programs
– Gupta, Schonberg, et al.
- 1996
|
|
31
|
Qualified data flow problems
– Holley, Rosen
- 1981
|
|
24
|
Compiling for Distributed Memory Systems
– Zima, Chapman
- 1993
|
|
22
|
Microservers: A new memory semantics for massively parallel computing
– Brockman, Kogge, et al.
- 1999
|
|
21
|
The EXECUBE Approach to Massively Parallel Processing
– Kogge
- 1994
|
|
20
|
CHOICES: A parallel object-oriented operating system
– Campbell, Islam
- 1993
|
|
19
|
A future-based parallel language for a general-purpose highly-parallel computer
– Callahan, Smith
- 1990
|
|
18
|
A Case for Intelligent DRAM: IRAM
– Patterson, Anderson, et al.
|
|
12
|
High-Level Management of Communication Schedules in HPF-like Languages
– Benkner, Mehrotra, et al.
- 1998
|
|
12
|
Enhancing OpenMP with features for locality control
– Chapman, Mehrotra, et al.
- 1998
|
|
12
|
An Actor System for Efficient and Scalable Concurrent Computing
– THAL
- 1997
|
|
11
|
Ropes: Support for Collective Operations Among Distributed Threads
– Haines, Mehrotra, et al.
- 1995
|
|
11
|
A Language for Concurrent Symbolic Computation
– Multilisp
- 1985
|
|
9
|
High Performance Fortran: History, Status and Future
– Mehrotra, Zima
- 1998
|
|
6
|
Compiling High Performance Fortran for Distributed-Memory Architectures
– Benkner, Zima
- 1999
|
|
6
|
A Framework for Parallel Distributed Computing
– PVM
- 1990
|
|
5
|
Language Support for Multidisciplinary Applications
– Mehrotra, Zima
- 1998
|
|
5
|
Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation
– Vienna
- 1997
|
|
5
|
COSMOS: An operating system for a fine-grain concurrent computer
– Horwat, Totty, et al.
- 1993
|
|
4
|
A Methodology for Programming Scalable Architectures
– Panwar, Agha
- 1994
|
|
4
|
The Complete Reference
– MPI
- 1995
|
|
3
|
OpenMP Fortran Application Program Interface, Version 1.1. http://www.openmp.org
– Consortium
- 1999
|
|
3
|
In: G.V.Wilson and P.Lu (Eds.): Parallel Programming Using CC++, Chapter 13
– pC
- 1996
|
|
2
|
On the Design of Chant: ATalking Threads Package
– Haines, Mehrotra
- 1994
|
|
2
|
In: G.V.Wilson and P.Lu (Eds.): Parallel Programming Using
– CC
- 1996
|
|
2
|
An Actor System for Ecient and Scalable Concurrent Computing
– THAL
- 1997
|
|
1
|
Enhancing OpenMP with Features for LocalityControl
– Chapman, Zima
- 1998
|
|
1
|
Fortran D language speci
– Fox, Kennedy, et al.
- 1991
|
|
1
|
Mapping Irregular Applications to DIVA,aPIM-BasedDataIntensiveArchitecture
– Hall, Diniz, et al.
- 1999
|