Results 1 - 10
of
18,284
A Multiprocessor Memory Processor for
"... The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed with ..."
Abstract
- Add to MetaCart
The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed
A Multiprocessor Memory Processor for Efficient Sharing And Access Coordination
"... The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed with ..."
Abstract
- Add to MetaCart
The growing disparity between instruction issue rates and memory access speed impacts multiprocessors especially hard under certain circumstances. To alleviate the problem a system is described here in which smart memory chips can execute simple operations so that certain tasks can be completed
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 573 (32 self)
- Add to MetaCart
markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 17th Annual International Symposium on Computer Architecture
, 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract
-
Cited by 730 (17 self)
- Add to MetaCart
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory
Multiscalar Processors
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract
-
Cited by 589 (30 self)
- Add to MetaCart
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks
Memory Coherence in Shared Virtual Memory Systems
, 1989
"... This paper studies the memory coherence problem in designing said inaplementing a shared virtual memory on looselycoupled multiprocessors. Two classes of aIgoritb. ms for solving the problem are presented. A prototype shared virtual memory on an Apollo ring has been implemented based on these a ..."
Abstract
-
Cited by 957 (17 self)
- Add to MetaCart
This paper studies the memory coherence problem in designing said inaplementing a shared virtual memory on looselycoupled multiprocessors. Two classes of aIgoritb. ms for solving the problem are presented. A prototype shared virtual memory on an Apollo ring has been implemented based
Software Transactional Memory
, 1995
"... As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load Linked/Store Conditional operation on a single word. Building ..."
Abstract
-
Cited by 695 (14 self)
- Add to MetaCart
on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a
Composable memory transactions
- In Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2005
"... Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring changes ..."
Abstract
-
Cited by 509 (43 self)
- Add to MetaCart
Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring
The Case for a Single-Chip Multiprocessor
- IEEE Computer
, 1996
"... Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to ..."
Abstract
-
Cited by 440 (6 self)
- Add to MetaCart
to implement a single-chip multiproces-sor in the same area as a wide issue superscalar processor. We find that for applications with little parallelism the performance of the two microarchitectures is comparable. For applications with large amounts of parallelism at both the fine and coarse grained levels
The Stanford FLASH multiprocessor
- In Proceedings of the 21st International Symposium on Computer Architecture
, 1994
"... The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection n ..."
Abstract
-
Cited by 349 (20 self)
- Add to MetaCart
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection
Results 1 - 10
of
18,284