Results 1 - 10
of
19,712
Memory access buffering in multiprocessors
- In Proceedings of the 13th Annual International Symposium on Computer Architecture
, 1986
"... In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access la-tency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. W ..."
Abstract
-
Cited by 254 (4 self)
- Add to MetaCart
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access la-tency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 17th Annual International Symposium on Computer Architecture
, 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract
-
Cited by 730 (17 self)
- Add to MetaCart
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory
Cache memories
- ACM Computing Surveys
, 1982
"... Cache memories are used in modern, medium and high-speed CPUs to hold temporarily those portions of the contents of main memory which are {believed to be) currently in use. Since instructions and data in cache memories can usually be referenced in 10 to 25 percent of the time required to access main ..."
Abstract
-
Cited by 688 (11 self)
- Add to MetaCart
Cache memories are used in modern, medium and high-speed CPUs to hold temporarily those portions of the contents of main memory which are {believed to be) currently in use. Since instructions and data in cache memories can usually be referenced in 10 to 25 percent of the time required to access
Composable memory transactions
- In Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2005
"... Atomic blocks allow programmers to delimit sections of code as ‘atomic’, leaving the language’s implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multiprocessor performance without requiring changes ..."
Abstract
-
Cited by 509 (43 self)
- Add to MetaCart
changes to the basic structure of objects in the heap. However, these implementations perform poorly because they interpose on all accesses to shared memory in the atomic block, redirecting updates to a thread-private log which must be searched by reads in the block and later reconciled with the heap when
A theory of memory retrieval
- PSYCHOL. REV
, 1978
"... A theory of memory retrieval is developed and is shown to apply over a range of experimental paradigms. Access to memory traces is viewed in terms of a resonance metaphor. The probe item evokes the search set on the basis of probe-memory item relatedness, just as a ringing tuning fork evokes sympath ..."
Abstract
-
Cited by 769 (83 self)
- Add to MetaCart
A theory of memory retrieval is developed and is shown to apply over a range of experimental paradigms. Access to memory traces is viewed in terms of a resonance metaphor. The probe item evokes the search set on the basis of probe-memory item relatedness, just as a ringing tuning fork evokes
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 573 (32 self)
- Add to MetaCart
markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible
Implementation and performance of Munin
- IN PROCEEDINGS OF THE 13TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 1991
"... Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, sha ..."
Abstract
-
Cited by 587 (22 self)
- Add to MetaCart
to keep memory consistent. Munin's multiprotocol release consistency is implemented in software using a delayed update queue that buffers and merges pending outgoing writes. A sixteen-processor prototype of Munin is currently operational. We evaluate its implementation and describe the execution
Long Short-term Memory
, 1995
"... "Recurrent backprop" for learning to store information over extended time intervals takes too long. The main reason is insufficient, decaying error back flow. We briefly review Hochreiter's 1991 analysis of this problem. Then we overcome it by introducing a novel, efficient method c ..."
Abstract
-
Cited by 454 (58 self)
- Add to MetaCart
called "Long Short Term Memory" (LSTM). LSTM can learn to bridge minimal time lags in excess of 1000 time steps by enforcing constant error flow through internal states of special units. Multiplicative gate units learn to open and close access to constant error flow. LSTM's update
An integrated theory of the mind
- PSYCHOLOGICAL REVIEW
, 2004
"... There has been a proliferation of proposed mental modules in an attempt to account for different cognitive functions but so far there has been no successful account of their integration. ACT-R (Anderson & Lebiere, 1998) has evolved into a theory that consists of multiple modules but also explain ..."
Abstract
-
Cited by 780 (73 self)
- Add to MetaCart
explains how they are integrated to produce coherent cognition. The perceptual-motor modules, the goal module, and the declarative memory module are presented as examples of specialized systems in ACT-R. These modules are associated with distinct cortical regions. These modules place chunks in buffers
Vogels, U-Net: a user-level network interface for parallel and distributed computing, in:
- Proceedings of the 15th ACM Symposium on Operating System Principles, ACM,
, 1995
"... Abstract The U-Net communication architecture provides processes with a virtual view of a network device to enable user-level access to high-speed communication devices. The architecture, implemented on standard workstations using off-the-shelf ATM communication hardware, removes the kernel from th ..."
Abstract
-
Cited by 597 (17 self)
- Add to MetaCart
Abstract The U-Net communication architecture provides processes with a virtual view of a network device to enable user-level access to high-speed communication devices. The architecture, implemented on standard workstations using off-the-shelf ATM communication hardware, removes the kernel from
Results 1 - 10
of
19,712