45 citations found. Retrieving documents...
S. Przybylski, "Cache and Memory Hierarchy Design. A Performance Directed Approach," Morgan Kaufman Publishers, 1990. 7

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Code Transformation-Based Methodology for.. - Liveris, Zervas.. (2002)   (2 citations)  (Correct)

....that the code enclosed by a loop nest is also distributed after the application of the transformation. With function insertion, rarely executed code segments (e.g. code in the scope of a condition) that are contained in loop nests, are replaced by function calls. In this way, less capacity misses [5] occur in the I Cache, since only the size of the most frequently executed code remains in the scope of the loop nests. Furthermore, an insight analysis, which results to the formulation of analytical equation that can be used to predict the number of I cache misses, is presented. This analysis ....

....an application code, namely the code contained in loop nests. Specifically, three different cases of I Cache behaviour are identified according to the relation among the code size contained in a loop nest and the I cache size: 2. 1 Size ICache size L In this case there are no capacity misses [5] since the whole code of the loop can be placed in the cache. Therefore, the only misses that occur are the compulsory misses during [5] the first iteration of the loop (Fig.1) So, in this case the number of instruction cache misses is: size Block size L size Block size ICache size ICache ....

[Article contains additional citation context not shown here]

S. A. Przybylski, CACHE AND MEMORY HIERARCHY DESIGN -- A Performance Directed Approach, Morgan Kaufman Publishers, 1990.


High-Level Cache Modeling For 2-D Discrete Wavelet.. - Andreopoulos..   (Correct)

....L2 Cache the cache of level 2, which can be a separable configuration of instruction and data caches or a joint configuration for both. The data transfer and storage organization is left to hardware. Thus, the instructions or data enter the processor core after passing through the cache hierarchy [16]. The typical path followed is: Main Memory )L2 Cache )I Cache or D Cache )Processor. When the instruction or data do not exist at the cache of a certain level, a miss occurs, in which case the processor waits until a block of instructions or data is fetched from the upper level cache (or the ....

....or data is fetched from the upper level cache (or the main memory) causing a delay to the execution not related to the computational complexity of the executed program. Following the classical 3 C model of the cache misses, the latter can be separated into capacity, compulsory and conflict misses [16]. This paper first proposes single processor software designs for all the transform production methods in section II. Based on them, analytical equations are proposed in section III that allow for the prediction of the expected number of data cache misses in a generic memory hierarchy. The ....

[Article contains additional citation context not shown here]

Steven K. Przybylski, Cache and Memory Hierarchy Design - A Performance-Directed Approach, San Fransisco, Morgan-Kaufmann, 1990.


The Feasibility of Using Compression to Increase Memory System.. - Wang, Quong (1994)   (1 citation)  (Correct)

....high performance workstations of today, compression already shows promise; as miss penalties increase in future, compression will only become more feasible. Keywords: Memory system performance, multilevel memory system, cache, data compression. 1 Introduction Multi level memory hierarchies [3][5][7] are the standard way to reduce average memory access time in a cost effective manner. The average access time of a cache is function of its hit time, miss rate, and miss penalty. We can reduce the miss rate of a cache either by making the cache bigger or by making the program smaller. The ....

Przybylski, S., Cache and Memory Hierarchy Design: a Performance-Directed Approach. Morgan Kaufmann Publishers, 1990


Cache Misses And Energy-Dissipation Results For.. - Andreopoulos..   (Correct)

....the execution not related to the computational complexity of the executed program. If the instruction (data) is not found in the L2 Cache either, then it is transferred from the main memory. In every case where a replacement of a cache block occurs, the LRU replacement strategy is typically used [18], where the cache block that was least recently used (LRU) is replaced with the new input block. With respect to the misses, the most severe ones (in terms of delay and energy dissipation) are those of L2 Cache, because in this case off chip accesses occur. Off chip accesses are slower and more ....

S. K. Przybylski, "Cache and Memory Hierarchy Design - A Performance-Directed Approach," MorganKaufman, 1990.


Coming Challenges in Microarchitecture and Architecture - Ronen, Mendelson, Lai.. (2001)   (4 citations)  (Correct)

....unit (ALU) instructions, so the structure of memory hierarchy has a major impact on performance. Much work has been done in improving cache performance. Caches are made bigger and heuristics are used to make sure the cache contains those portions of memory that are most likely to be used [8] [9]. Change in the control flow can cause a stall. The length of the stall is proportional to the length of the pipe. In a super pipelined machine, this stall can be quite long. Modern microprocessors partially eliminate these stalls by employing a technique called branch prediction. When a branch ....

S. A. Przybylski, Cache and Memory Hierarchy Design: A Performance -Directed Approach. San Mateo, CA: Morgan Kaufmann, 1990.


Analysis Of Wavelet Transform Implementations For.. - Andreopoulos.. (2001)   (Correct)

....modulo (N 2 ipm hs) # buffer i 29 30 31 platform. For simplicity in the presented analysis, the case of a fully associative instruction and data cache will be selected. In this way however, the conflict misses of an n way set associative cache are excluded from the description [13]. In the case of a miss due to lack of space in the cache (capacity miss) the LRU replacement strategy is assumed [13] where the cache block that was leastrecently used (LRU) is replaced with the new input block. For each of the different approaches of the wavelet transform decomposition, the ....

....of a fully associative instruction and data cache will be selected. In this way however, the conflict misses of an n way set associative cache are excluded from the description [13] In the case of a miss due to lack of space in the cache (capacity miss) the LRU replacement strategy is assumed [13], where the cache block that was leastrecently used (LRU) is replaced with the new input block. For each of the different approaches of the wavelet transform decomposition, the boundary effects (initialization and finalization phenomena) are ignored so as to facilitate the description. In ....

[Article contains additional citation context not shown here]

Steven K. Przybylski, "Cache and Memory Hierarchy Design -- A Performance-Directed Approach," Morgan-Kaufman, 1990.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee, Dumir (1999)   (13 citations)  (Correct)

....a balance between abstraction and fidelity, so as not to make the model unwieldy for theoretical analysis or simplistic to the point of lack of predictiveness. Memory hierarchy models used by computer architects to design caches have numerous parameters and suffer from the first shortcoming [1, 26]. Early algorithmic work in this area focussed on a two layered memory model[21] a very large capacity memory with slow access time (secondary memory) and a limited size faster memory (internal memory) All computation is performed on elements in the internal memory and there is no restriction ....

S. A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.


Techniques for Cache and Memory Simulation Using Address.. - Holliday (1990)   (9 citations)  (Correct)

....of caches and memories, this paper surveys the current techniques. There are important related subjects which we do not attempt to cover; namely, the vast literature on the performance results found using tracedriven simulation and the analytical models mentioned above. The book by Przybylski [3] and the bibliography by Smith [4] are good starting points for the former. An important class of the latter models predicts cache miss ratios in either the transient [5, 6] or steady state cases [7, 8] Another class applies Mean Value Analysis techniques [9] to study the overhead due to cache ....

S. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach. San Mateo, CA: Morgan Kaufmann, 1990.


Tradeoffs in Two-Level On-Chip Caching - Jouppi, Wilton (1993)   (60 citations)  (Correct)

....issues for on chip memory system performance modeling. Previous work by Hill [3] has studied access times and miss rates, and recommended that firstlevel caches should be direct mapped. However, he did not study on chip RAM area, and studied only single level caching organizations. Przybylski [7] has studied execution times of multi level cache systems as a function of many parameters. However, no mapping of configuration to chip area was done, nor was access time computed from cache parameters. Mulder [5] modeled the area of on chip caches, but did not consider the access time ....

....Caching Performance In this section we consider the performance of two level on chip cache configurations. Although direct mapped caches usually provide the best performance for first level caches [3] Przybylski points out that associativity is useful in lower levels of the cache hierarchy [7]. In this section we assume the second level cache is four way set associative, while the first level cache is direct mapped. Set associative caches tend to result in lower miss rates, but their access and cycle times are larger than the same sized direct mapped caches, since the tag must be read ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, Inc., 1990.


Measuring Data Cache And Tlb Parameters Under Linux - Yuanhua   (Correct)

....especially in high performance computer systems. Cache memories help bridge the cycle time gap between microprocessors and relatively slower main memories, by taking advantage of data locality in programs. Compilers and application programmers are increasingly designed with knowledge of caches [1, 2, 5, 7, 8]. In recent studies, cache conscious algorithmic design improved the performance of an operating system by more than 30 [11] some standard search algorithms by a factor of 2 to 5 [3] and full application programs by 27 to 42 [3] Any person (or compiler) who optimises memory operations for ....

Przybylski, S.A., Cache and Memory Hierarchy Design: A Performance-directed Approach, Morgan Kaufmann Publishers, Inc., 1990.


Memory System Architecture for Real-Time Multitasking Systems - Rixner (1995)   (Correct)

....725.6s 51.2s 725.6s 0.929 = 969696 task must be stalled waiting for the memory. This is actually an important point in any cache design, and a thorough discussion of the art is given by Przybylski in [18]. As the cache in a system using EQ is essentially a regular set associative cache with minor modifications, most all of the considerations that Przybylski raises are applicable in this case. Since those issues are not fundamental to real time caching, they will be largely ignored here, but in ....

Stephen A. Przybylski. Cache and Memory Hierarchy Design: A PerformanceDirected Approach, San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1990.


UTLB: A Mechanism for Address Translation on Network Interfaces - Angelos (1998)   (8 citations)  (Correct)

....translation entries over I O bus. The miss penalty is therefore several times the hit cost which is simply a memory reference on the network interface. A miss in the Shared UTLB Cache will have a perceivable impact on small message latency. We apply existing techniques in processor cache design [21, 37] to reduce the miss rates in the Shared UTLB Cache. Misses fall into three categories: capacity misses, conflict misses, and compulsory misses [23] When only one process is using the network interface, both capacity misses and conflict misses may occur in the Shared UTLB Cache. Multi programming ....

Steven A. Przybylski. Cache and memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, 1990.


Performance Aspects Of Computers With Graphical User Interfaces - Gupta (1993)   (Correct)

....used to design profiles for any general client server system. Information gained from such profiles would be of great help in designing strategies for task partitioning and load balancing. 60 5. FACTORS IN MEMORY SYSTEM DESIGN Cache memory is frequently used to improve memory access performance [65, 66, 67, 68, 69, 70, 71, 72]. Graphical displays make use of a specialized frame buffer memory to maintain the bit map image of the display. However, the benefit of using conventional caches is not clear for frame buffer accesses since these accesses have large working sets and are mainly writes. As computers with graphical ....

S. Przybylski, Cache and Memory Hierarchy Design: A Performance Directed Approach. San Mateo, CA: Morgan Kaufmann, 1990.


Local Area Network Traffic Locality: Characteristics and.. - Gulati (1992)   (3 citations)  (Correct)

....simulation offers the advantages of flexibility, accuracy, and ease of use. The experiments are fully repeatable and the same data can be used to compare different cache strategies. For the last several years trace driven simulation has been the mainstay of cache performance estimation [5, 11, 12, 17, 31, 35, 36]. The usefulness of trace driven simulation depends on the representativeness of the traces used to drive the simulations [31] It is important to obtain traces that provide accurate cache performance predictions. 59 Given the general advantages of using trace driven simulation and the ....

....used to compare different cache strategies. For the last several years trace driven simulation has been the mainstay of cache performance estimation [5, 11, 12, 17, 31, 35, 36] The usefulness of trace driven simulation depends on the representativeness of the traces used to drive the simulations [31]. It is important to obtain traces that provide accurate cache performance predictions. 59 Given the general advantages of using trace driven simulation and the availability of a tracing device, the HP 4972A LAN Protocol Analyzer, trace driven simulation was chosen as the methodology for studying ....

Przybylski S. A., Cache and Memory Hierarchy Design : A Performance-Directed Approach, Morgan Kaufmann Publishers, San Mateo, CA, 1990.


SPAID: Software Prefetching in Pointer- and.. - Lipasti, Schmidt.. (1995)   (38 citations)  (Correct)

....at the call sites for the data referenced by the pointers. The fundamental premise of this heuristic is that pointer arguments passed on procedure calls are highly likely to be dereferenced within the scope of the called procedure. 2. Related work Since the introduction of cache memories [Smi82, Prz90] researchers have continually sought to improve their performance. Some investigators have concentrated on improving the caches themselves, by such techniques as placing caches on the same chip as the processor [ACH 87] inventing nonblocking caches that can tolerate multiple outstanding ....

....to cache sizes between 4K and 32K, primarily because the working sets of the benchmarks used do not sufficiently exercise caches larger than 32K. Other parameters that were varied were line size (between 16 and 256 bytes) and associativity (both direct mapped and 4 way set associative) Smi82, Prz90] 4.3. Cache work analysis To quantify the performance gains due to the SPAID heuristic, we compute a measure called CW (cache work) that approximates processor cycles spent executing each benchmark program. CW is defined as follows: In the above equation, the first term is the product of I ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, San Mateo, CA, 1990.


Pollution Control Caching - Walsh, Jr. (1995)   (3 citations)  (Correct)

....caches can match the miss rate performance and E[CPI] of direct mapped caches that are greater than four times their size. Keywords: Cache memory, SPECmark, Petri Nets, ANOVA (Analysis of Variance) 2 1 Introduction Cache misses can be divided into three types: compulsory, conflict and capacity [12]. Compulsory misses occur when instructions or data are referenced for the first time. Conflict misses occur when two or more references map to the same location in the cache. Capacity misses occur because the cache is too small to hold needed data. Capacity and conflict misses can be further ....

....systems references. They were gathered on a system having an i486 processor running UNIX System V R4. We chose to run 60 million references from each workload and we primed the caches with 500,000 references before taking statistics. This allowed us to avoid any problem with cold start issues [5] [12] and trace length versus cache size issues [13] 6 4 Discussion of results 4.1 Trace Driven Simulation 4.1.1 PCC In Figure 4 we see the geometric mean of the miss ratio for the SPECint92 portion of our workloads. Here we see the PCC cache (PCC(256) 256 word Pollution Control Cache) compared ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990.


Memory Subsystem Performance of Programs with Intensive.. - Diwan, Tarditi, Moss (1993)   (13 citations)  (Correct)

....2 Background The following sections describe memory subsystems, copying garbage collection, SML, and the SML NJ compiler. 2.1 Memory subsystems This section reviews the organization of memory subsystems. Terminology for memory subsystems is not standardized; we use Przybylski s terminology [39]. It is well known that CPUs are getting faster relative to DRAM memory chips [37] main memory cannot supply the CPU with instructions and data fast enough. A solution to this problem is to use a cache, a small fast memory placed between the CPU and main memory that holds a small subset of ....

....is the smallest part of a cache with which a valid bit is associated. In this paper, subblock placement implies a subblock of one word, i.e. valid bits are associated with each word. Moreover, on a read miss, the whole block is brought into the cache not just the subblock that missed. Przybylski [39] notes that this is a good choice. A memory access to a location which is resident in the cache is called a hit. Otherwise, the memory access is a miss. A read request for memory location m causes m to be mapped to a set. All the tags and valid bits (if any) in the set are checked to see if any ....

[Article contains additional citation context not shown here]

Przybylski, S. A. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, California, 1990.


A Technique for Collecting Simultaneous Multithreaded.. - Vega, Hamkalo.. (2006)   (Correct)

No context found.

S. Przybylski, "Cache and Memory Hierarchy Design. A Performance Directed Approach," Morgan Kaufman Publishers, 1990. 7


Sector Cache Design and Performance - Rothman, Smith (1999)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance--Directed Approach. San Mateo, CA. Morgan Kaufmann Publishers, 1990.


Analysis Of Wavelet Transform - Implementations For Image (2001)   (Correct)

No context found.

Steven K. Przybylski, "Cache and Memory Hierarchy Design -- A Performance-Directed Approach," Morgan-Kaufman, 1990.


A Local Wavelet Transform Implementation Versus An Optimal - Row-Column Algorithm For (2001)   (Correct)

No context found.

Steven K. Przybylski, Cache and Memory Hierarchy Design -- A Performance-Directed Approach, MorganKaufman, 1990.


Using Locality Surfaces to Determine Cache Miss Ratios - Sorenson (2000)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach, Morgan Kaufmann Publishers, Inc., 1990. This book presents a detailed study of caches, specifically in hierarchies. Again, much detailed cache simulation is presented.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee (1999)   (13 citations)  (Correct)

No context found.

S. A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.


the Garbage Collection Bibliography - Richard Jones (2003)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: A PerformanceDirected Approach. Morgan Kaufman, Palo Alto, CA, 1990.


Level Two Translation Lookaside Buffers - Callaghan, Hoque, Rotenberg (1995)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: a Performance-Directed Approach, Morgan Kaufmann Publishers, 1990.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC