• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,152
Next 10 →

Memory access buffering in multiprocessors

by Michel Dubois, Chrlstoph Scheurich, Faye Briggs - In Proceedings of the 13th Annual International Symposium on Computer Architecture , 1986
"... In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access la-tency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. W ..."
Abstract - Cited by 254 (4 self) - Add to MetaCart
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access la-tency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss

The Cache Performance and Optimizations of Blocked Algorithms

by Monica S. Lam, Edward E. Rothberg, Michael E. Wolf - In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems , 1991
"... Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorithms operate on submatrices or blocks, so that data loaded into the faster levels of the memory hierarchy are reused. This ..."
Abstract - Cited by 574 (5 self) - Add to MetaCart
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorithms operate on submatrices or blocks, so that data loaded into the faster levels of the memory hierarchy are reused

Hitting the Memory Wall: Implications of the Obvious

by Wm. A. Wulf, Sally A. Mckee - Computer Architecture News , 1995
"... This brief note points out something obvious--- something the authors "knew" without really understanding. With apologies to those who did understand, we offer it to those others who, like us, missed the point. We all know that the rate of improvement in microprocessor speed exceeds the ra ..."
Abstract - Cited by 393 (1 self) - Add to MetaCart
an issue, downstream someplace it will be a much bigger one. How big and how soon? The answers to these questions are what the authors had failed to appreciate. To get a handle on the answers, consider an old friend--- the equation for the average time to access memory, where t c and t m are the cache

Geometric Compression through Topological Surgery

by Gabriel Taubin, Jarek Rossignac - ACM TRANSACTIONS ON GRAPHICS , 1998
"... ... this article introduces a new compressed representation for complex triangulated models and simple, yet efficient, compression and decompression algorithms. In this scheme, vertex positions are quantized within the desired accuracy, a vertex spanning tree is used to predict the position of each ..."
Abstract - Cited by 283 (28 self) - Add to MetaCart
vertex from 2, 3, or 4 of its ancestors in the tree, and the correction vectors are entropy encoded. Properties, such as normals, colors, and texture coordinates, are compressed in a similar manner. The connectivity is encoded with no loss of information to an average of less than two bits per triangle

LimitLESS Directories: A Scalable Cache Coherence Scheme

by David Chaiken , John Kubiatowicz, Anant Agarwal , 1991
"... Caches enhance the performance of multiprocessors by reducing network tra#c and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose the LimitLESS directory protocol to solve this problem. The LimitLESS scheme uses a combination of hardw ..."
Abstract - Cited by 224 (29 self) - Add to MetaCart
Caches enhance the performance of multiprocessors by reducing network tra#c and average memory access latency. However, cache-based systems must address the problem of cache coherence. We propose the LimitLESS directory protocol to solve this problem. The LimitLESS scheme uses a combination

List Processing in Real Time on a Serial Computer

by Henry G. Baker - SERIAL COMPUTER, COMM. ACM , 1977
"... A real-time list processing system is one in which the time required by the elementary list operations (e.g. CONS, CAR, COR, RPLACA, RPLACD, EQ, and ATOM in LISP) is bounded by a (small) constant. Classical implementations of list processing systems lack this property because allocating a list cell ..."
Abstract - Cited by 228 (14 self) - Add to MetaCart
from the heap may cause a garbage collection, which process requires time proportional to the heap size to finish. A real-time list processing system is presented which continuously reclaims garbage, including directed cycles, while linearizing and compacting the accessible cells into contiguous

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

by Onur Mutlu, et al.
"... DRAM memory is a major resource shared among cores in a chip multiprocessor (CMP) system. Memory requests from different threads can interfere with each other. Existing memory access scheduling techniques try to optimize the overall data throughput obtained from the DRAM and thus do not take into ac ..."
Abstract - Cited by 139 (50 self) - Add to MetaCart
.26X to 1.4X, while the average system throughput improves by 7.6%. We qualitatively and quantitatively compare STFM to one new and three previouslyproposed memory access scheduling algorithms, including network fair queueing. Our results show that STFM provides the best fairness, system throughput

Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors

by Chi-Keung Luk - In Proceedings of the 28th Annual International Symposium on Computer Architecture , 2001
"... Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures become increasingly popular, one attractive appr ..."
Abstract - Cited by 174 (0 self) - Add to MetaCart
software to control pre-execution, we are able to handle some of the most important access patterns that are typically difficult to prefetch. Compared with existing work on pre-execution, our technique is significantly simpler to implement (e.g., no integration of pre-execution results, no need

Speculative Precomputation: Long-range Prefetching of Delinquent Loads

by Jamison D. Collins, Yong-Fong Lee, Hong Wang Z, Dean M. Tullsen, Christopher Hughes, Yong-fong Lee Q, John P. Shen, Dan Lavery, John P. Shen Z , 2001
"... This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, an ..."
Abstract - Cited by 180 (23 self) - Add to MetaCart
This paper explores Speculative Precomputation, a technique that uses idle thread contexts in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts

Concurrent Average Memory Access Time

by Xian-he Sun, Dawei Wang
"... Concurrency is a common technique used in modern memory systems. However, the effectiveness of memory concurrency is application dependent. It varies largely from application to application and from implementation to implementation. Understanding and utilizing memory concurrency is a vital and timel ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
and timely task for data intensive applications. Traditional memory performance metrics, such as Average Memory Access Time (AMAT), are designed for sequential data accesses, and have inherent limitations in characterizing concurrency. In this study, we propose Concurrent Average Memory Access Time (C
Next 10 →
Results 1 - 10 of 1,152
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University