(Enter summary)
Abstract: Much of the improvement in computer performance over the last twenty years has come from
faster transistors and architectural advances that increase parallelism. Smaller feature sizes have
decreased the transistor switching time but at the same time increased the resistance of interconnect
wires, resulting in slower signal transmission in on-chip wiring. Since future chips will have
more silicon area and include more execution units, a much larger demand for parallelism is emerging.
However,... (Update)
Context of citations to this paper: More
...exotic, huge size problems. Few existing architectures are able to take advantage of fine grain parallelism, although a recent study [8] shows that it is readily available even in common benchmark programs. Figure 1 1 (reproduced from pp.113 [8] courtesy of Stephen Keckler)...
...the inter cluster communication mechanisms to reduce the communication and synchronization overhead. This approach has been shown [18] [16] to be very effective, achieving noticably higher speedups then memory based communication on a number of fine grained applications. Figure...
Cited by: More
Processor Mechanisms for Software Shared Memory - Carter
(Correct)
Mechanisms for Efficient, Protected Messaging - Lee
(Correct)
Similar documents (at the sentence level):
7.4%: Exploiting Fine-Grain Thread Level Parallelism on.. - Keckler, Dally.. (1998)
(Correct)
Active bibliography (related documents): More All
0.5: Exploiting Load Latency Tolerance in Dynamically.. - Srikanth Srinivasan.. (1998)
(Correct)
0.5: Which Algorithms Are Feasible? Maxent Approach - Cooke, Kreinovich.. (1998)
(Correct)
0.3: Reducing The Impact Of Register Pressure On Software Pipelined Loops - Llosa (1996)
(Correct)
Similar documents based on text: More All
0.2: ADAM: A Decentralized Parallel Computer Architecture Featuring.. - Huang (2002)
(Correct)
0.2: Building Grounded Abstractions for Artificial Intelligence.. - Hearn (2001)
(Correct)
Related documents from co-citation: More All
2: The DASH prototype: Implementation and performance (context) - Lenoski, Laudon et al. - 1992
2: Hardware Support for Fast Capability-based Addressing
- Carter, Keckler et al. - 1994
2: Integration of Message Passing and Shared Memory in the Stanford FLASH Multiproc..
- Heinlein, Gharachorloo et al. - 1994
BibTeX entry: (Update)
Steve Keckler, "Fast Thread Communication and Synchronization Mechanisms for a Scalable Single Chip Multiprocessor", PhD. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1998. http://citeseer.ist.psu.edu/keckler98fast.html More
@misc{ keckler98fast,
author = "S. Keckler",
title = "Fast Thread Communication and Synchronization Mechanisms for a Scalable
Single Chip Multiprocessor",
text = "Steve Keckler, Fast Thread Communication and Synchronization Mechanisms
for a Scalable Single Chip Multiprocessor, PhD. Thesis, Department of Electrical
Engineering and Computer Science, Massachusetts Institute of Technology,
1998.",
year = "1998",
url = "citeseer.ist.psu.edu/keckler98fast.html" }
Citations (may not include all citations):
353
Software pipelining: An effective scheduling technique for V.. (context) - Lam - 1988
269
Multiscalar processors
- Sohi, Breach et al. - 1995
251
Simultaneous multithreading: Maximizing on-chip parallelism
- Tullsen, Eggers et al. - 1995
230
Limits of instruction-level parallelism
- Wall - 1991
222
The SGI Origin: a ccNUMA highly scalable server (context) - Laudon, Lenoski - 1997
164
The network architecture of the connection machine CM
- Leiserson, Abuhamdeh et al. - 1996
158
Effective compiler support for predicated execution using th..
- Mahlke, Lin et al. - 1992
156
The multiflow trace scheduling compiler
- Lowney, Freudenberger et al. - 1993
150
An efficient algorithm for exploiting multiple arithmetic un.. (context) - Tomasulo - 1967
134
The Verilog Hardware Description Language (context) - Thomas, Moorby - 1991
127
A multithreaded massively parallel architecture (context) - Nikhil, Papadopoulos - 1992
121
Monsoon: an explicit token-store architecture (context) - Papadopoulos, Culler - 1990
104
for semiconductors. Semiconductor Industry Association (context) - technology - 1997
80
Machine multicomputer: An architectural evaluation (context) - Noakes, Wallach et al. - 1993
66
Boosting beyond static scheduling in a superscalar processor
- Smith, Lam et al. - 1990
60
and bandwidth in a cluster architecture (context) - Martin, Vahdat et al. - 1997
38
Evaluation of design alternatives for a multiprocessor micro.. (context) - Nayfeh, Hammond et al. - 1996
6
The mercury interconnect architecture: A cost-effective infr.. (context) - Weber, Gold et al. - 1997
3
LIW microprocessor for multicomputers (context) - Peterson, Sutton et al. - 1991
2
protected message interface in the MIT M-Machine (context) - Lee, Dally et al. - 1998
2
Limitation of superscalar microprocessor performance (context) - Tran, Wu - 1992
2
Lithography and the future of Moore's law (context) - Moore - 1995
1
Robbins and Steven Robbins (context) - Kay - 1987
Documents on the same site (http://cva.stanford.edu/cva_publications.html): More
An Assembler and Linker System for the M-Machine Software Project - Gurevich (1994)
(Correct)
Efficient, Protected Message Interface in the MIT.. - Lee, Dally, Keckler..
(Correct)
A Coupled Multi-ALU Processing Node for a Highly Parallel Computer - Keckler (1992)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC