• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 18,284
Next 10 →

A Multiprocessor Memory Processor for

by Efficient Sharing, Access, Coordination David
"... The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed with ..."
Abstract - Add to MetaCart
The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed

A Multiprocessor Memory Processor for Efficient Sharing And Access Coordination

by David M. Koppelman
"... The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed with ..."
Abstract - Add to MetaCart
The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed

Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors

by John M. Mellor-crummey, Michael L. Scott - ACM Transactions on Computer Systems , 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract - Cited by 573 (32 self) - Add to MetaCart
markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally

Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors

by Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, John Hennessy - In Proceedings of the 17th Annual International Symposium on Computer Architecture , 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract - Cited by 730 (17 self) - Add to MetaCart
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory

Multiscalar Processors

by Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar - In Proceedings of the 22nd Annual International Symposium on Computer Architecture , 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract - Cited by 589 (30 self) - Add to MetaCart
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks

Memory Coherence in Shared Virtual Memory Systems

by Kai Li, Paul Hudak , 1989
"... This paper studies the memory coherence problem in designing said inaplementing a shared virtual memory on looselycoupled multiprocessors. Two classes of aIgoritb. ms for solving the problem are presented. A prototype shared virtual memory on an Apollo ring has been implemented based on these a ..."
Abstract - Cited by 957 (17 self) - Add to MetaCart
This paper studies the memory coherence problem in designing said inaplementing a shared virtual memory on looselycoupled multiprocessors. Two classes of aIgoritb. ms for solving the problem are presented. A prototype shared virtual memory on an Apollo ring has been implemented based

Software Transactional Memory

by Nir Shavit, Dan Touitou , 1995
"... As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load Linked/Store Conditional operation on a single word. Building ..."
Abstract - Cited by 695 (14 self) - Add to MetaCart
on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a

Composable memory transactions

by Tim Harris, Mark Plesko, Avraham Shinnar, David Tarditi - In Symposium on Principles and Practice of Parallel Programming (PPoPP , 2005
"... Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring changes ..."
Abstract - Cited by 509 (43 self) - Add to MetaCart
Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring

The Case for a Single-Chip Multiprocessor

by Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, Kunyung Chang - IEEE Computer , 1996
"... Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to ..."
Abstract - Cited by 440 (6 self) - Add to MetaCart
to implement a single-chip multiproces-sor in the same area as a wide issue superscalar processor. We find that for applications with little parallelism the performance of the two microarchitectures is comparable. For applications with large amounts of parallelism at both the fine and coarse grained levels

The Stanford FLASH multiprocessor

by Jeffrey Kuskin, David Ofelt, Mark Heinrich, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, John Hennessy - In Proceedings of the 21st International Symposium on Computer Architecture , 1994
"... The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection n ..."
Abstract - Cited by 349 (20 self) - Add to MetaCart
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection
Next 10 →
Results 1 - 10 of 18,284
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University