(Enter summary)
Abstract: The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling,a mechanism for controlling multiple ALUs to exploit both instruction-level and inter-thread parallelism, by using compile time and runtime scheduling. The compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting ... (Update)
Context of citations to this paper: More
...level to the process level. The 12 function units in a single M Machine node are controlled using a form of Processor Coupling [18] to exploit instruction level parallelism by executing 12 operations from the same thread, or to exploit thread level parallelism by...
...one thread to be filled by instructions from another. An extreme model of multithreading, variously proposed as processor coupling [Keckler92], parallel multithreading [Hirata92] and simultaneous multithreading [Tullsen95] allows multiple threads to issue instructions during...
Cited by: More
Proceedings of 12th Intl Conference on Parallel.. - Initial Observations Of
(Correct)
Efficient Remapping Mechanisms for an Adaptable Memory System - Zhang (2002)
(Correct)
Exploiting Thread-Level Parallelism On . . . - Lo (1998)
(Correct)
Similar documents (at the sentence level):
39.7%: A Coupled Multi-ALU Processing Node for a Highly Parallel Computer - Keckler (1992)
(Correct)
Active bibliography (related documents): More All
0.5: Partitioning Non-strict Functional Languages for Multi-threaded.. - Coorg (1995)
(Correct)
0.2: HPF-2 Scope of Activities and Motivating Applications - Forum (1994)
(Correct)
0.2: Parallel Model Valuation for Circuit Simulation on the.. - Agrawal, Goil, Liu..
(Correct)
Similar documents based on text: More All
0.3: No. 2, April 1992, pp. 23-39. - Alverson Callahan Cummings
(Correct)
0.2: Register Organization for Media Processing - Rixner, Dally, Khailany.. (2000)
(Correct)
0.2: The Verification of a Bit-slice ALU - Hunt, Jr., Brock (1989)
(Correct)
Related documents from co-citation: More All
23: The Tera computer system
- Alverson, Callahan et al. - 1990
22: Simultaneous multithreading: Maximizing on-chip parallelism
- Tullsen, Eggers et al. - 1995
20: Multiscalar processors
- Sohi, Breach et al. - 1995
BibTeX entry: (Update)
Stephen W. Keckler and William J. Dally. Processor coupling: Integrating compile time and runtime scheduling for parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 202--213, May 19--21, 1992. http://citeseer.ist.psu.edu/keckler92processor.html More
@inproceedings{ keckler92processor,
author = "Stephen W. Keckler and William J. Dally",
title = "Processor coupling: integrating compile time and runtime scheduling for parallelism",
booktitle = "Proceedings of the19th Annual International Symposium on Computer Architecture",
address = "Gold Coast, Australia",
pages = "202--213",
year = "1992",
url = "citeseer.ist.psu.edu/keckler92processor.html" }
Citations (may not include all citations):
358
The Tera computer system
- ALVERSON, CALLAHAN et al. - 1990
353
Software pipelining: An effective scheduling technique for V.. (context) - LAM - 1988
193
Superscalar Microprocessor Design (context) - JOHNSON - 1991
173
Bulldog: A Compiler for VLIW Architectures (context) - ELLIS - 1986
157
Architecture and applications of the HEP multiprocessor comp.. (context) - SMITH - 1981
150
An efficient algorithm for exploiting multiple arithmetic un.. (context) - TOMASULO - 1967
130
A VLIW architecture for a trace scheduling compiler (context) - COLWELL, NIX et al. - 1988
110
Available instruction level parallelism for superscalar and ..
- JOUPPI, WALL - 1989
72
MASA: A multithreaded processor architecture for parallel sy.. (context) - HALSTEAD, FUJITA - 1988
55
Exploring the benefits of multiple hardware contexts in a mu.. (context) - GUPTA, WEBER - 1989
32
A variable instruction stream extension to the VLIW architec.. (context) - WOLFE, SHEN - 1991
25
Instruction-level parallel processing (context) - FISHER, RAU - 1991
22
The Horizon supercomputing system: Architecture and software (context) - KUEHN, SMITH - 1988
20
Architecture and implementation of a VLIW supercomputer (context) - COLWELL, HALL et al. - 1990
16
A mechanism for efficient context switching
- NUTH, DALLY - 1991
13
Annual Reviews in Computer Science (context) - CULLER, architectures - 1986
4
Circuit simulation on shared memory multiprocessors (context) - SADAYAPPAN, VISVANATHAN - 1988
2
Von Neumann hybrid architecture (context) - IANUCCI, Toward - 1988
1
A coupled multi-ALU processing node for a highly parallel co..
- KECKLER - 1992
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://cva.stanford.edu/cva_publications.html): More
An Assembler and Linker System for the M-Machine Software Project - Gurevich (1994)
(Correct)
Fast Thread Communication and Synchronization Mechanisms for a.. - Keckler (1998)
(Correct)
Efficient, Protected Message Interface in the MIT.. - Lee, Dally, Keckler..
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC