51 citations found. Retrieving documents...
ARVIND, AND IANNUCCI, R. A. 1987. Two Fundamental Issues in Multiprocessing.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

A Streaming Multi-Threaded Model - Caspi, DeHon, Wawrzynek (2001)   (6 citations)  (Correct)

....control. The SCORE model is more stylized, making it amenable to virtualization and use of a strong hardware software interface. SCORE emphasizes a design that preserves deterministic behavior regardless of target hardware size and scheduling. Data flow processing from Dennis [10] and Arvind [3] introduced parallel models and architectures with more flexible scheduling. Later work in Culler s Threaded Abstract Machine (TAM) 8] and Active Messages (AM) 11] was an important attempt to capture the essence of a parallel programming model at the software hardware boundary and to make ....

Arvind and R. A. Ianucci. Two fundamental issues in multiprocessing. In Proceedings of DFVLR Conference on Parallel Processing in Science and Engineering, pages 61--88, West Germany, June 1987.


Compiling Dataflow into Threads - Efficient Compiler-Controlled.. - Schauser (1991)   (2 citations)  (Correct)

....Multithreaded execution appears to be a key ingredient in general purpose parallel computing systems. Many researchers suggest that processors should support multiple instruction streams and switch very rapidly between them in response to remote memory reference latencies or synchronization[AI87, Smi90, HF88, ALKK90, ACC 90] However, the proposed architectural solutions make thread scheduling invisible to the compiler, preventing it from applying optimizations that might reduce the cost of thread switching or improve scheduling based on analysis of the program. Inherently parallel ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR - Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.


Multithreading: A Revisionist View of Dataflow Architectures - Papadopoulos, Traub (1991)   (44 citations)  (Correct)

....and join mechanisms. 3.3 Split Phase Transactions The addressing modes described earlier can only access local memory, and only when the address (or offset from the current frame pointer) are compile time constants. Computed, global memory references are performed with splitphase transactions [4], in which a thread issues a memory request and continues while the request is processed concurrently. This is the means by which the Monsoon architecture tolerates long memory latency. To illustrate, here is a statement from the inner loop of DAXPY: y[i] a x[i] y[i] Assuming that ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proceedings of DFVLR - Conference 1987 on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg, W. Germany, June 1987.


Compiling for Hierarchical Shared-Memory Multiprocessors - Martens, Jayasimha (1994)   (Correct)

....use of caches, since caches form part of the physical hierarchy; in particular, the issue of cache coherence is orthogonal to the issues discussed in this paper. HSMAs are appealing since they reduce latency and synchronization overheads often associated associated with large scale multiprocessing[2] (see [15] for a detailed discussion on how different types of communication and synchronization costs are reduced) These architectures form a bridge between the shared memory and distributed memory approaches; for example, HSMAs support for consumer initiated communication is analogous to that ....

Arvind and Robert A. Iannucci. Two Fundamental Issues in Multiprocessing. Technical Report 226-6, Laboratory for Computer Science, MIT, May 1987.


Advances in the Dataflow Computational Model - Najjary, Lee, Gao (1999)   (1 citation)  (Correct)

....more data parallelism. Because of the inherently parallel nature of dataflow execution, dataflow computers provide an efficient and elegant solution to the two fundamental problems of von Neumann computers: the memory latency and synchronization overhead, as described by Arvind and Iannucci [6]. The ability of the dataflow model to tolerate latency, by switching dynamically between ready computation threads, and to support low overhead distributed synchronization in hardware, has made it the candidate of choice for what has later been called latency tolerant architectures. 1 A ....

Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing. CSG Memo 226, Computation Structures Group, MIT Lab. for Comp. Sci., 1987.


Thal: An Actor System For Efficient And Scalable Concurrent.. - Kim (1997)   (8 citations)  (Correct)

....time makes split phase allocation (Figure 4. 12.a) desirable on platforms with hardware support for context switch; context switch to another from a thread requesting remote creation while the latter waits for the mail address to be delivered, effectively hides the latency and saves idle cycles [9, 34]. However, it is less desirable in stock hardware multicomputers where context switch is very costly (e.g. 52 sec in the TMC CM 5) We use aliases instead of relying on context switch. An alias is a locally allocated clone of a mail address, which is equally capable of uniquely identifying an ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In 4th International DFVLR Seminar on Foundations of Engineering Sciences, pages 61--88, 1987. LNCS 295.


Compiling For Multithreaded Architectures - Tang (1999)   (1 citation)  (Correct)

....to reduce latency caused by data accesses. However, no matter how advanced the technology is and how extensively we apply the compiler optimization we cannot eliminate all latencies. Certain latencies such as those due to communication and synchronization are inherent in parallel applications [11]. Therefore tolerating latency may be the only available solution under certain circumstances. Among many techniques to tolerate latency, prefetching and multithreading are two representatives. Prefetching is very effective if the latency length can be predicted accurately at compile time [20, ....

Arvind and Robert A. Iannucci. Two fundamental issues in multiprocessing. In Parallel Computing in Science and Engineering, number 295 in Lecture Notes in Computer Science, pages 61--88. Springer-Verlag, 1987. Proceedings of the 4th International DFVLR Seminar on Foundations of Engineering Sciences, Bonn, West Germany, June 25--26, 1987.


Robust, High-Speed Network Design for Large-Scale Multiprocessing - DeHon (1993)   (1 citation)  (Correct)

....of a data value and the time when the value can actually be used. Data parallel operations are limited by the rate at which processors can obtain access to the data on which they need to operate. Multithreaded ( Smi78] Jor83] ALKK90] SBCvE90] CSS 91] NPA92] and dataflow ( ACM88] AI87] PC90] architectures have been developed to mitigate communication latency by hiding its effects. These techniques all rely on an abundance of parallelism to provide useful processing to perform while waiting on slow communications. The limit to the usable parallelism then, can be determined by ....

....models using a parallelism profile which 1 The basic argument presented here is drawn from an unpublished manuscript by Professor Michael Dertouzos. shows the number of operations which may be executed simultaneously at each time step assuming an unbounded number of processors (e.g. AI87] The available parallelism will be a function of the compiler and run time system in addition to being dependent on the problem being solved and the algorithm used to solve it. Communication latency is also, generally, not constant. Section 2.4 looks at the factors that affect latency in a ....

Arvind and R. A. Ianucci. Two Fundamental Issues in Multiprocessing. In Proceedings of DFVLR Conference on Parallel Processing in Science and Engineering, pages 61-- 88, West Germany, June 1987.


Superscalar Performance in a Multithreaded Microprocessor - Gunther (1993)   (3 citations)  (Correct)

....is that multithreading permits controldriven scheduling in addition to the data driven scheduling of dataflow; the more flexible scheduling policy of multithreading offers practical advantages by way of optimizing hardware utilization and allowing extended computational state. Arvind and Iannucci [ArIa87] argued that traditional von Neumann proces Chapter 1 Introduction 3 Arvind and Iannucci [ArIa87] argued that traditional von Neumann processors offer a poor base for scalable general purpose multiprocessors due to the high synchronization and context switch costs associated with von Neumann ....

....dataflow; the more flexible scheduling policy of multithreading offers practical advantages by way of optimizing hardware utilization and allowing extended computational state. Arvind and Iannucci [ArIa87] argued that traditional von Neumann proces Chapter 1 Introduction 3 Arvind and Iannucci [ArIa87] argued that traditional von Neumann processors offer a poor base for scalable general purpose multiprocessors due to the high synchronization and context switch costs associated with von Neumann processors. Arvind and Iannucci identified long memory latencies and synchronization delays as the ....

Arvind and R.A. Iannucci, "Two fundamental issues in multiprocessing," Proc. DFVLR Conf. 1987 on Parallel Processing in Science and Engineering, Springer-Verlag LNCS 295, June 1987.


A Design Principle of Massively Parallel Distributed-Memory.. - Amamiya, Kawano (1994)   (Correct)

....it scalable, that is, to achieve higher performance when more processors are added to the computer. In order to achieve this goal, processing element should be designed to exploit program parallelism as much as possible. For this purpose, we have to solve fundamental problems of multiprocessing [4], long memory access latency and remote process procedure call latency and waiting time for events synchronization. These problems are strongly related each other, and conventional control flow multiprocessor architectures attempt to make a compromise between them. Processor idling time, which is ....

Arvind, R. Iannucci, "Two Fundamental Issues in Multiprocessing," Proc. of DFVLR Conf. on Parallel Processing in Science and Engineering, Bon-Bad Godesberg, Germany (1987).


Computation Structures Group Progress Report 1990-91 - (ed.) (1991)   (Correct)

....All communications should be split transactions, in which an issuing processor does not block to await a response, and a receiving processor can efficiently identify and enable the thread that awaits an incoming communication. For a more thorough explication of this argument, please refer to [4]. 1 The term is apparently due to Valiant [20] Previous dataflow architectures (TTDA, Monsoon) have always had these capabilities; however, they were deficient in certain other respects: ffl Poor single thread performance: The interleaved pipelines of TTDA and Monsoon meant that instructions ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proceedings of DFVLR - Conference 1987 on Parallel Processing in Science and Engineering, BonnBad Godesberg, W. Germany, Springer-Verlag LNCS 295, June 25-29 1987.


A Parallel Functional Language Compiler for Message-Passing.. - Junaidu (1998)   (1 citation)  (Correct)

.... Therefore the architectural requirements for a machine to support our model efficiently are that, it should support fast dynamic thread scheduling, provide tolerance to long range communication latencies and support cheap and rapid switching between multiple executable threads (cf. NiAr89, ArIa87, CSS 91] 1.6 Thesis structure The remainder of this thesis is structured as follows: Chapter 2 Presents a short history of computers and the evolution of methodologies for programming them leading to discussion of functional languages: their computational basis, sequential ....

....is linked, its execution triggered and the value of the program printed. 5. 2 Runtime design framework The design theme underlying our implementation is efficient medium grained, data driven execution, which is widely believed to be essential for parallel processing on large scale MIMD machines [ArIa87, Nikh89, CSS 91, Osth93] The architectural requirements needed to support our model efficiently is that of multi threaded, distributed memory machines consisting of a network of (conventional) processors which communicate with each other via asynchronous message passing. Each of the ....

[Article contains additional citation context not shown here]

Arvind and RA Iannucci, Two Fundamental Issues in Multiprocessing. In Proc. of 4th Int. DFVLR Seminar on Parallel Processing in Science and Engineering, Bonn, FRG, Springer Verlag LNCS 295, June 1987. Bibliography 223


Loop Unfolding and Its Degree Determination for Multithreaded.. - Ha, Kim   (Correct)

....over several benchmarks, and compare simulation results with the equations. Keywords: multithreading, loop unfolding, loop unfolding degree, synchronization, communication 1 Introduction Multithreading is attractive because it can address two issues of memory latency and synchronization[2], which have been recognized as obstacles to scalable parallel architectures, by effectively overlapping computation with communication. In general, multithreaded architectures allow split phase memory operations and support fast context switching between computations without blocking processors. ....

Arvind, R. A. Iannucci, Two Fundamental Issues in Multiprocessing, CSG Memo 226-5, Lab. for Computer Science, MIT, Cambridge, MA 02139, 1986.


Hybrid Multithreaded Architecture with Symmetric Multiprocessors - Junghwan Kim   (Correct)

....and also high variance of utilization among units is reduced and overall utilization is increased. Keywords: multithreading, hybrid, symmetric multiprocessor, message passing, locality 1. Introduction Communication latency and synchronization have been traditional issues of parallel processing[1]. Today, many researchers become interested in multithreading idea to solve those two problems. Multithreading is very effective where remote memory references or synchronization is required, since it continues to utilize processor resource by thread switching. Multithreaded architecture is ....

Arvind, R.A. Iannucci, "Two Fundamental Issues in Multiprocessing," MIT CSG Memo 226-5, 1986.


SMALL: A Scalable Multithreaded Architecture to Exploit.. - Govindarajan, Nemawarkar (1992)   (1 citation)  (Correct)

....appropriate synchronization and scheduling of these three levels are provided to achieve high performance. Our architecture was influenced by Culler s Threaded Abstract Machine which also realizes the three levels of program hierarchy. Addressing the fundamental issues in multiprocessor design [3] using appropriate compilation technique rather than requiring elaborate hardware support has been the design design philosophy of TAM. On the other hand these issues are addressed using a combination of compilation technique and appropriate hardware support in SMALL. Furthermore, a processor in ....

Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing. Computation Structures Group Memo 226, MIT Laboratory for Computer Science, 1987.


Synchronized MIMD Computing - Kuszmaul (1994)   (3 citations)  (Correct)

....hardware for split phase global barriers, and described a scheme to broadcast the same program to every processor which would then run it locally. Split phase barriers were described for the FMP, and were also described by C. Polychronopoulos [Pol88] and R. Gupta [Gup89] Arvind and R. Iannucci [AI87] present general justification for split phase operations. B. H. Lim [Lim91] performs a study to determine when it is better to simply wait and when it is better to try to use split phase transactions. The CM 5 does not include any message combining mechanisms in the data network. G. Pfister and ....

Arvind and Robert A. Iannucci. Two fundamental issues in multiprocessing. In Proceedings of DFVLR - Conference 1987 on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg, W. Germany, Springer-Verlag LNCS 295, June 25-29 1987.


Analysis of Multithreaded Architectures for Parallel.. - Saavedra-Barrera..   (Correct)

....of such requests or perform other useful work while requests are outstanding. In uniprocessors, the former is realized by caches [Smit82] and the latter by dynamic hazard resolution logic [Ande67, Russ78] Unfortunately, neither of these approaches extends trivially to large scale multiprocessors [Arvi87]. The latency avoidance strategy embodied in caches is attractive for multiprocessors because local copies of shared data are produced on demand. However, this replication leads to a thorny problem of maintaining a coherent image of the memory [Good83, Arga88] Concrete solutions exist for small ....

....deepens our understanding of the behavior of this new class of machines. 2. Multithreaded Processor Model In this section, we present the architectural model and definitions used throughout the paper. We focus on one processor in a multiprocessor configuration and ignore issues of synchronization[Arvi87]. In addition, we assume the latency in processing a request is determined primarily by the machine structure and minimally affected by program behavior, so it is considered constant. We also assume that many outstanding requests can be issued and will be processed in a pipelined fashion. We say ....

Arvind, and Ianucci, R.A., "Two Fundamental Issues in Multiprocessing". Proc. of DFVLR - Conf. 1987 on Parallel Proc. in Sc. and Eng., West Germany, June 1987, pp. 61-88.


Supporting a Dynamic SPMD Model in a Multi-Threaded Architecture - Hum, Gao (1993)   (6 citations)  (Correct)

....of this work in other works must be obtained from the IEEE. pear necessary. A thread of control is a sequence of instructions which may be executed with some other threads of control in parallel. Two fundamental issues in multiprocessing using von Neumann style architectures are well known [3]: memory latency which is unavoidable in parallel machines; and the cost of synchronization which is high on a von Neumann machine which generally keeps a large processor state for currently executing tasks. Access to information held in memories of remote nodes can lead to high and unpredictable ....

Arvind and Robert A. Iannucci. Two fundamental issues in multiprocessing. In Parallel Computing in Science and Engineering, pages 61-- 88. Springer-Verlag, LNCS-295, 1987. Proceedings of the 4th International DFVLR Seminar on Foundations of Engineering Sciences, Bonn, June 1987.


Datarol: A Parallel Machine Architecture for Fine-Grain.. - Makoto Amamiya   (Correct)

....it scalable, that is, to achieve higher performance when more processors are added to the computer. In order to achieve this goal, processing element should be designed to exploit program parallelism as much as possible. For this purpose, we have to solve fundamental problems of multiprocessing [5] such as long memory access latency and remote process procedure call latency and waiting time for events synchronization. These problems are strongly related each other, and conventional control flow multiprocessor architectures attempt to make a compromise between them. Processor idling time, ....

Arvind, R. Iannucci, "Two Fundamental Issues in Multiprocessing," Proc. of DFVLR Conf. on Parallel Processing in Science and Engineering, Bon-Bad Godesberg, Germany (1987).


The Named-State Register File - Nuth (1993)   (2 citations)  (Correct)

....this reason, processors for parallel computers must be able to efficiently run both parallel and sequential code. 1.2. 3 Context Switching In spite of the large amount of parallelism available in many applications, there are several problems in running programs on large scale multicomputer systems [8]. The first problem is that most applications must pass data between physically separate components of a parallel computer system. As ever larger systems are built, the time required to communicate across the computer network increases. This communication latency has not kept pace with decreasing ....

Arvind and Robert A. Iannucci. "Two fundamental issues in multiprocessing." Technical report, Massachusetts Institute of Technology Laboratory for Computer Science, Cambridge, Massachusetts, May 1987.


Efficient Support of Location Transparency in Concurrent.. - Wooyoung Kim (1995)   (6 citations)  (Correct)

....overhead. The runtime system implements dynamic load balancing using location transparency and migration. ffl latency hiding in remote creation: A common way to reduce the inefficiency caused by the unpredictable remote creation time in fine grained multicomputers is split phase allocation [5]. An object is context switched to another when it requests a remote creation. However, splitphase allocation is not desirable in stock hardware multicomputers because of their high context switching cost. We propose an efficient scheme which masks latency in remote object creation without context ....

Arvind and Robert A. Iannucci. Two Fundamental Issues in Multiprocessing. In 4th International DFVLR Seminar on Foundations of Engineering Sciences, pages 61--88, 1987. LNCS 295.


Implementing a Programming Model Integrating Functional and.. - Juno Chang   (Correct)

....may suffer from inefficient synchronization mechanism and communication latencies. In contrast, dataflow model causes problems of excessive synchronization costs and inefficient execution of sequential programs while it offers the ability to exploit massively parallelism inherent in programs[1]. Therefore, hybrid multithreaded architectures are proposed to take advantages of two computational models. They not only preserves good single thread performance but also 2 tolerates latency and synchronization costs. T[2] is a typical approach trying to achieve good single thread ....

Arvind and R.A.Iannucci, "Two Fundamental Issues in Multiprocessing," MIT CSG Memo 226-5, July, 1986.


Pace: A Prototype Design - Reynolds, Waite, Ieromnimon   (Correct)

....an unwelcome complication to the programmer who is required to think at two completely different levels of concern. We therefore advocate a fine grained approach to parallelism and special purpose hardware to support it. This has long been the position of workers on various dataflow projects [ArI87] [Kell86] Traub91] We further specialise our work to the support of functional languages implemented using a combinator graph reduction model. There were some successful early attempts at the production of serial hardware for the support of SKcombinator graph reduction in [Sto85] Sch86] ....

....the evaluation model, a description of the PACE processor, and some initial results from simulation. We shall not be discussing connection topologies or the behaviour of large programs on large PACE set ups, this will have to await further work. 2 The PACE Evaluation Model It has been argued in [ArI87] that dataflow models can use fine grained multi tasking to hide network communication latency. In the PACE model we combine this potential with the synchronisation properties enjoyed by graph reduction models. Essentially, we treat the combinator graph as a dynamic form of dataflow graph, and ....

Arvind and R.A.Ianucci, "Two Fundamental Issues in Multiprocessing", CSG Memo 226-6, Lab for Computer Science, MIT, 1987.


An Analytical Solution for a Markov Chain Modeling.. - Saavedra-Barrera, Culler (1991)   (Correct)

....between the CPUs and the memory modules increases with the number of processors and eventually limits processor utilization. The interval between sending a request to memory and receiving the result is called memory latency, and it is consider one of the fundamental problems in parallel computing [Arvi87]. The use of cache memories, that have proved to be effective in eliminating latency in uniprocessors, do not appear to provide a satisfactory solution in systems containing a large number of processors. In addition, caches in multiprocessors have to be kept coherent and it is not clear that ....

Arvind, and Ianucci, R.A., "Two Fundamental Issues in Multiprocessing". Proc. of DFVLR - Conf. 1987 on Parallel Proc. in Sc. and Eng., West Germany, June 1987, pp. 61-88.


*T: A Multithreaded Massively Parallel Architecture - Nikhil, Papadopoulos, Arvind (1992)   (27 citations)  (Correct)

....size the best case scenario is that remote access time will vary with log N , where N is the number of nodes. We now present two problems which, we hope the reader will agree, a general purpose MPA must solve efficiently in order to have good performance over a wide variety of applications [6]. 2.1 The Latency of Remote Loads The remote load situation is illustrated in Figure 2. Variables A and B are located on nodes N2 and N3, respecC vA vB pA A pB B CTXT A B N1 N2 N3 Figure 2: The remote load problem: Node N1 to compute difference of variables in nodes N2 and N3 tively, and need ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. DFVLR - 1987 Conf. on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg, W. Germany (Springer-Verlag LNCS 295), June 1987.


Partitioning a Lenient Parallel Language into Sequential Threads - Sangho Ha (1995)   (Correct)

....precedence relation. The simulation results show that our scheme reduces control and branch instructions effectively. 1 Introduction Parallel architectures with a number of off the shelf microprocessors should address performance degradation due to increased latency and synchronization overheads[2]. Currently, most architecture designers are trying to conquer these problems by aids of cacheing and multithreading. Multithreading is attractive in a large scale parallel system since it allows split phase memory operations and fast context switching between computations without blocking the ....

Arvind, R.A. Iannucci, "Two Fundamental Issues in Multiprocessing", MIT CSG Memo 226-5, 1986.


GUM: a portable parallel implementation of Haskell - Trinder, Hammond, Mattson.. (1996)   (39 citations)  (Correct)

....value, a PE packs (a copy of) nearby data into the reply, on the grounds that the requesting PE is likely to need it soon (Section 2.4) Since the sending PE retains its copy, locality is not lost. ffl All messages are asynchronous. The idea which is standard in the multithreading community [1] is that once a processor has sent a message it can forget al..l about it and schedule further threads or messages without waiting for a reply (Section 2.3.4) Notably, when a processor wishes to fetch data from another processor it sends a message whose reply can be arbitrarily delayed for ....

....instead it simply continues, or returns to the main scheduler. Indeed, sometimes the reply may be delayed a long time, if (for example) it requests the value of a remote thunk that is being evaluated by some other thread. These techniques are standard practice in the multithreading community [1]. 2.1 Thread Management A thread is a virtual processor. It is represented by a (heapallocated) Thread State Object (TSO) containing slots for the thread s registers. The TSO in turn points to the thread s (heap allocated) Stack Object (SO) As the thread s stack grows, further Stack Objects are ....

Arvind and Iannucci RA, "Two Fundamental Issues in Multiprocessing", Proc DFVLR Conference on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg (June 1987).


GUM: a portable parallel implementation of Haskell.. - Trinder, Hammond.. (1996)   (39 citations)  (Correct)

....value, a PE packs (a copy of) nearby data into the reply, on the grounds that the requesting PE is likely to need it soon (Section 2.4) Since the sending PE retains its copy, locality is not lost. ffl All messages are asynchronous. The idea which is standard in the multithreading community [1] is that once a processor has sent a message it can forget al..l about it and schedule further threads or messages without waiting for a reply (Section 2.3.4) Notably, when a processor wishes to fetch data from another processor it sends a message whose reply can be arbitrarily delayed for ....

....not await a reply; instead it simply continues, or returns to the main scheduler. Indeed, sometimes the reply may be delayed a long time, if (for example) it requests the value of a remote thunk that is being evaluated by some other thread. All of this is standard in the multithreading community [1]. 2.1 Thread Management A thread is a virtual processor. It is represented by a (heapallocated) Thread State Object (TSO) containing slots for the thread s registers. The TSO in turn points to the thread s (heap allocated) Stack Object (SO) As the thread s stack grows, further Stack Objects are ....

Arvind and Iannucci RA, "Two Fundamental Issues in Multiprocessing", Proc DFVLR Conference on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg (June 1987).


Compiling for Parallel Multithreaded Computation on Symmetric.. - Shaw (1998)   (Correct)

....the compilation to the linking phase. 1 The Id S benchmarks that we used to evaluate the language, compiler and run time system can be retrieved at http: www.csg.lcs.mit.edu shaw shaw phd id.tar.gz. Year System Researchers and References 1987 Id TTDA Arvind, Nikhil, Iannucci, Traub, etc. [9] [10] 1989 Id Monsoon Papadopoulos and Culler [62] Hicks, Chiou, Ang, Arvind [43] 1988 1995 Id partitioning Traub [81] 82] Schauser [72] Coorg [22] 1991 Id TAM Culler, Goldstein, Schauser, von Eiken [25] 32] 1992 Id P RISC Nikhil [56] 1992 EM C EM 4 Sato, Sakai, etc. 66] 68] 1993 Id ....

....support towards more conventional languages running on standard commercial hardware. 1.4.1 Direct ancestors to this research Figure 1.5 shows some of the systems which are direct ancestors to the Id97 system. The original implementation of Id was on the Tagged Token Dataflow Architecture (TTDA) [9] [10] 80] which was a simulated dataflow architecture. Each TTDA instruction synchronized on the arrival of its input values, and direct support for element wise data structure synchronization existed in the form of Istructure boards. Monsoon [62] was the direct successor of TTDA, and was ....

Arvind and Robert A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proceedings of DFVLR - Conference 1987 on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg, W. Germany, Springer-Verlag LNCS 295, June 25-29 1987. TTDA, parallel processing, synchronization, latency, also CSG Memo 226-6.


GUM: a portable parallel implementation of Haskell - Hammond, Mattson, Jr.. (1995)   (39 citations)  (Correct)

....and the name of the game is seeing how far it extends to more realistic programs. Nevertheless, such tests provide an important sanity check: if the system does badly here then all is lost. ffl All messages are asynchronous. The idea which is standard in the multithreading community [2] is that once a processor has sent a message it can forget al..l about it and schedule further threads or messages without waiting for a reply (Section 2.3.2) Notably, when a processor wishes to fetch data from another processor it sends a message whose reply can be arbitrarily delayed for ....

Arvind and Iannucci RA, "Two Fundamental Issues in Multiprocessing", Proc DFVLR Conference on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg (June 1987).


Run-Time Parallelization: A Framework For Parallel Computation - Lawrence Rauchwerger (1995)   (8 citations)  (Correct)

....operations, were implemented in the hardware of the CDC 6600 and the IBM 360 91 [Tho71, Tom67] Later more aggressive designs have incorporated a full empty bit [Smi87] to enforce data dependences on the fly. This type of memory structure has been exploited by the data flow computing model [AC86, AI86, AN86] in quite an original way. It is probably also the most fundamental model of exploiting parallelism. Another hardware approach that is useful for run time parallelization was taken by the Cedar project with its hardware synchronization primitives [ZYL83, ZY84, TZY84] Today, microprocessors ....

Arvind and R. Iannucci. Two fundamental issues in multiprocessing. Technical report, Laboratory for Computer Science, M.I.T, Cambridge, MA, July 1986. Memo 226-5.


A Massively Parallel Multithreaded Architecture: DAVRID - Sangho Ha (1994)   (Correct)

....One of the problems which have bothered dataflow machine developers is that a dataflow machine needs special hardware and is radical to the common run of people. In addition, dataflow architectures cause problems of excessive synchronization costs and inefficient execution of sequential programs[1]. To make dataflow machines practical we cannot help combining dataflow computing rule with conventional, von Neu y Partially supported by the Korea Science Engineering Foundation under the contract 941 0900 017 2. z Partially supported by the Korea Science ....

Arvind and R.A. Iannucci, "Two Fundamental Issues in Multiprocessing," MIT CSG Memo 226-5, July 1986.


A System for Parallel Media Processing - Watlington, Bove, Jr. (1997)   (6 citations)  (Correct)

.... to such tasks as matrix multiplication, convolution, and vector distance calculations (e.g. vector quantization or motion estimation) Another example is a reconfigurable processor, which may be rapidly configured to provide application specific functionality [23] 8] 1 According to Arvind [1], the two fundamental issues encountered in building a parallel processor computer system are: 1. The non deterministic latency associated with accessing shared memory in a multiprocessor, and 2. Obtaining efficient synchronization of a process across multiple processors. Algorithm tasks are ....

Arvind and Robert A. Iannucci. Two fundamental issues in multiprocessing. In Proc. of DFLVR Conf. on Parallel Processing in Science and Eng., 1987. Also in Architectural


APRIL: A Processor Architecture for Multiprocessing - Agarwal (1990)   (186 citations)  (Correct)

....of 55 cycles. 1 Introduction The requirements placed on a processor in a large scale multiprocessing environment are different from those in a uniprocessing setting. A processor in a parallel machine must be able to tolerate high memory latencies and handle process synchronization efficiently [2]. This need increases as more processors are added to the system. Parallel applications impose processing and communication bandwidth demands on the parallel machine. An efficient and cost effective machine design achieves a balance between the processing power and the communication bandwidth ....

Arvind and Robert A. Iannucci. Two Fundamental Issues in Multiprocessing. Technical Report TM 330, MIT, Laboratory for Computer Science, October 1987.


Implementing Data-Parallel Software on Dataflow Hardware - Shaw (1993)   (2 citations)  (Correct)

....Multiple Data (SPMD) style, where data is distributed and the identical program is run on each processor. In a study by Geoffrey Fox, 83 out of 84 scientific problems addressed at the California Institute of Technology could be expressed in the SPMD style [22] Dataflow computer architectures [4] [5] [26] 30] 43] have been primarily associated with functional programming and exploitation of control parallelism. However, the advantages of dataflow computers are not specific to efficient implementation of functional languages, and recent research has pointed towards a synthesis of dataflow ....

....low latency, highbandwidth, fine grained network, and a fast processor network interface ability provide more general purpose, effective and sufficient mechanisms for the implementation of the data parallel communications on MIMD computers. 1. 2 What is dataflow Dataflow computer architectures [4] [5] [25] 26] 30] 32] 40] 43] 58] began their existence as hardware to directly execute dataflow graphs. As such, much of the architectural work was concerned with how to handle token matching and instruction firing in dataflow graphs. Language work was primarily concerned with compilation of ....

Arvind and Robert A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proceedings of DFVLR - Conference 1987 on Parallel Processing in Science and Engineering, Bonn-Bad Godesberg, W. Germany, Springer-Verlag LNCS 295, June 25-29 1987. TTDA, parallel processing, synchronization, latency, also CSG Memo 226-6.


Active Messages: a Mechanism for Integrated.. - von Eicken.. (1992)   (540 citations)  (Correct)

....are not buffered except as required for network transport. Much like a traditional pipeline, the sender blocks until the message can be injected into the network and the handler executes immediately on arrival. Tolerating communication latency has been raised as a fundamental architectural issue[1]; this is not quite correct. The real architectural issue is to provide the ability to overlap communication and computation, which, in turn, requires low overhead asynchronous communication. Tolerating latency then becomes a programming problem: a communication must be initiated sufficiently in ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR - Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.


Fine-grain Parallelism with Minimal Hardware.. - Culler, Sah.. (1991)   (125 citations)  (Correct)

....hardware support or merely an appropriate compilation strategy and program representation. 1 Introduction Multithreading at the instruction level may provide the key to general purpose parallel computing[26] because it allows the processor to tolerate long, unpredictable communication latency [2, 4, 17, 24, 29]. In addition, this level of multithreading is required to support certain modern parallel programming languages[28] such as Id[20] and Multilisp[18] and extensions of more conventional languages with synchronizing data structures, e.g. I structures[6] On the other hand, asynchronous transfer ....

....Invoking a code block involves allocating a frame for local variables and for the continuation vector holding the list of enabled threads. That is, the instruction that issues a fetch request does not wait for the data value to be returned, instead the response will initiate a new execution thread[4, 6]. This allows the processor to be well utilized while remote requests are outstanding. In addition, a synchronization event may take place at the site of the accessed object, so the request latency is unbounded. 2.2 Activations An executing code block may invoke several code blocks concurrently, ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR - Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.


Analysis of Multithreaded Architectures for Parallel.. - Saavedra-Barrera..   (Correct)

....of such requests or perform other useful work while requests are outstanding. In uniprocessors, the former is realized by caches [Smit82] and the latter by dynamic hazard resolution logic [Ande67, Russ78] Unfortunately, neither of these approaches extends trivially to large scale multiprocessors [Arvi87]. The latency avoidance strategy embodied in caches is attractive for multiprocessors because local copies of shared data are produced on demand. However, this replication leads to a thorny problem of maintaining a coherent image of the memory [Good83, Arga88] Concrete solutions exist for small ....

....deepens our understanding of the behavior of this new class of machines. 2. Multithreaded Processor Model In this section, we present the architectural model and definitions used throughout the paper. We focus on one processor in a multiprocessor configuration and ignore issues of synchronization[Arvi87]. In addition, we assume the latency in processing a request is determined primarily by the machine structure and minimally affected by program behavior, so it is considered constant. We also assume that many outstanding requests can be issued and will be processed in a pipelined fashion. We say ....

Arvind, and Ianucci, R.A., "Two Fundamental Issues in Multiprocessing". Proc. of DFVLR - Conf. 1987 on Parallel Proc. in Sc. and Eng., West Germany, June 1987, pp. 61-88.


Two Fundamental Limits on Dataflow Multiprocessing - Culler, Schauser, von Eicken (1993)   (13 citations)  (Correct)

....Limits on Dataflow Multiprocessing David E. Culler Klaus Erik Schauser Thorsten von Eicken Report No. UCB CSD 92 716 Computer Science Division University of California, Berkeley Abstract: This paper examines the argument for dataflow architectures in Two Fundamental Issues in Multiprocessing[5]. We observe two key problems. First, the justification of extensive multithreading is based on an overly simplistic view of the storage hierarchy. Second, the local greedy scheduling policy embodied in dataflow is inadequate in many circumstances. A more realistic model of the storage hierarchy ....

....multithreading, latency tolerance, storage hierarchy, scheduling hierarchy. 1 Introduction The advantages of dataflow architectures were argued persuasively in a seminal 1983 paper by Arvind and Iannucci[4] and in a 1987 revision entitled Two Fundamental Issues in Multiprocessing [5]. However, reality has proved less favorable to this approach than their arguments would suggest. This motivates us to examine the line of reasoning that has driven dataflow architectures and fine grain multithreading to understand where the argument went awry. We observe two key problems. First, ....

[Article contains additional citation context not shown here]

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR - Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.


Performance Analysis of Multithreaded Architectures using an.. - Nemawarkar, Gao (1996)   (2 citations)  (Correct)

....multithreaded processing nodes. A processing node consists of the processor, a part of distributed shared memory, and a network interface. The latency to communicate with other nodes through the interconnection network subsystem significantly affects the performance of the multiprocessor system [8]. The latency of a subsystem is the response time of the subsystem to a request for an access. The longer the latency, the larger the waiting time for the computation requesting the access. This may decrease the processor utilization. Multithreading technique has been proposed to improve the ....

....the pipelining does not increase the capacity. Also, a higher locality in accesses increases the capacity. A higher capacity permits to achieve a higher U p . 7 Network Latency The communication latency is considered as a fundamental cause for a decrease in performance of multiprocessor systems [8]. So, we analyze the variations of observed network latency S obs with workload and architectural parameters. We also investigate how the network latency affects processor utilization of an MMS. 7.1 Parameter Characterization Given the input workload parameters and the architectural ....

Arvind and R.A. Iannucci. Two fundamental issues in multiprocessing. Computation Structures Group Memo 226, Laboratory for Computer Science, MIT, 1987.


JUNE-BUG: A Debugger for Parallel Programs on the ALEWIFE.. - Smoot (1990)   (Correct)

....[32] 3.1.2 The April Processor The April processor is designed to minimize the effects of communication and synchronization delays found in multiprocessors. It is very important that a processor in a multiprocessing environment tolerate high memory latencies and efficient process synchronization [28]. However, in scalable multiprocessor designs, high memory latency is unavoidable. APRIL hides the effects of both memory latency and synchronization waits by switching contexts. APRIL has a low context switch overhead to permit coarse grained multithreading. Coarse grained multithreading defeats ....

Robert A. Iannucci and Arvind. Two fundamental issues in multiprocessing. In Proceedings of DFVLR --- Conference 1987 on Parallel Processing in Science and Engineering, Springer-Verlag LNCS 295, Bon-Bad Godesberg, West Germany, June 1987. Also MITLCS Computation Group Memo 226-6.


Compiler-Controlled Multithreading for Lenient Parallel.. - Schauser, Culler, von.. (1991)   (36 citations)  (Correct)

....Multithreaded execution appears to be a key ingredient in general purpose parallel computing systems. Many researchers suggest that processors should support multiple instruction streams and switch very rapidly between them in response to remote memory reference latencies or synchronization[AI87, Smi90, HF88, ALKK90, ACC 90] However, the proposed architectural solutions make thread scheduling invisible to the compiler, preventing it from applying optimizations that might reduce the cost of thread switching or improve scheduling based on analysis of the program. Inherently parallel ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR - Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.


An Efficient Implementation Scheme of Concurrent Object-Oriented.. - Taura (1993)   (28 citations)  (Correct)

....mail address. Since the latency of remote communication is unpredictable, it is not acceptable to wait until a pointer to the allocated memory (chunk) is returned. Split phase allocation is preferable on fine grain machines where context switching to another thread effectively hide the latency[1]. On conventional multicomputers, however, split phase allocation is undesirable because of high contextswitching overhead. Rather, a scheme whereby the object can continue execution even in the presence of remote allocation request is preferable. For this purpose, we employ a prefetch scheme ....

Arvind and Robert A. Iannucci. Two fundamental issues in multiprocessing. In 4th International DFVLR Seminar on Foundations of Engineering Sciences, volume 295 of Lecture Notes in Computer Science, pages 61--88, 1987.


Interprocessor Communications and the McGill Multiprocessor. . . - Monti (1991)   (Correct)

....from conventional processors. Long memory communication latencies, unavoidable in parallel machines, considerably degrade their performance. Furthermore, conventional processors have failed to provide inexpensive synchronization mechanisms for task switching, also frequent in a parallel machine [1, 2]. The design of a parallel system must be based on a sound model of parallel computation, from its programming model down to its architecture. This conviction has recently lead to the introduction of novel forms of computer architecture, in an attempt to eliminate the crucial von Neumann ....

....the reader is familiar with the basics of dataflow computing. ecti e o the esearch In 1987, Arvind and Iannucci write: The two most important characteristics of the dataflow processor are split phase memory operations and the ability to put aside computations without blocking the processor [1] 1 . These are some of the main reasons why dataflow architectures static or dynamic show great potential in multiprocessor applications: they can support the implementation of interprocessor data communications and instruction synchronization without the overhead of context switching and ....

Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing. Computation Structures Group Memo 226, Laboratory for Computer Science, MIT, 1987.


Performance Studies of Id on the Monsoon Dataflow System - Hicks, Chiou, Ang, Arvind (1994)   (21 citations)  Self-citation (Arvind)   (Correct)

....frame is accessed only locally, it is straightforward to cache frames. Caching of heap store, however, raises the usual multiprocessor cache coherence issues. To exploit parallelism effectively in this model, two architectural issues memory latency and synchronization have to be addressed [5, 27]. Dataflow architectures offer a solution to these problems as will be discussed in the next section. Whenever a procedure is invoked, a frame or a set of frames (in case it is a loop procedure) needs to be allocated. Since the frame store is tied to processors, distribution of work depends upon ....

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proceedings of 4th International DFVLR Seminar on, "Parallel Processing in Science and Engineering ", Bonn, FRG, volume 295 of Lecture Notes in Computer Science. Springer-Verlag, June 1987.


Sparse Matrix Solvers on the GPU: Conjugate Gradients and.. - Jeff Bolz Ian   (Correct)

No context found.

ARVIND, AND IANNUCCI, R. A. 1987. Two Fundamental Issues in Multiprocessing.


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

Arvind and R. Ianucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR -- Conf. on Paral lel Processing in Science and Eng., June 1987.


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

Arvind and R. Ianucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR -- Conf. on Paral lel Processing in Science and Eng., June 1987.


Unknown - Wul Wulf Compilers   (Correct)

No context found.

Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR Conf. 1987 on Pararallel Processing in Science and Engineering., Bonn-Bad Godesberg, W. Germany, June 1987.


Building Multithreaded Architectures with Off-the-Shelf.. - Hum, al. (1993)   (15 citations)  (Correct)

No context found.

Arvind and Robert A. Iannucci. Two fundamental issues in multiprocessing. In Parallel Computing in Science and Engineering, number 295 in Lecture Notes in Computer Science, pages 61--88. Springer-Verlag, 1987. Proceedings of the 4th International DFVLR Seminar on Foundations of Engineering Sciences, Bonn, West Germany, June 25--26, 1987.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC