45 citations found. Retrieving documents...
S. Przybylski, "Cache and Memory Hierarchy Design. A Performance Directed Approach," Morgan Kaufman Publishers, 1990. 7

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Code Transformation-Based Methodology for.. - Liveris, Zervas.. (2002)   (2 citations)  (Correct)

....that the code enclosed by a loop nest is also distributed after the application of the transformation. With function insertion, rarely executed code segments (e.g. code in the scope of a condition) that are contained in loop nests, are replaced by function calls. In this way, less capacity misses [5] occur in the I Cache, since only the size of the most frequently executed code remains in the scope of the loop nests. Furthermore, an insight analysis, which results to the formulation of analytical equation that can be used to predict the number of I cache misses, is presented. This analysis ....

....an application code, namely the code contained in loop nests. Specifically, three different cases of I Cache behaviour are identified according to the relation among the code size contained in a loop nest and the I cache size: 2. 1 Size ICache size L In this case there are no capacity misses [5] since the whole code of the loop can be placed in the cache. Therefore, the only misses that occur are the compulsory misses during [5] the first iteration of the loop (Fig.1) So, in this case the number of instruction cache misses is: size Block size L size Block size ICache size ICache ....

[Article contains additional citation context not shown here]

S. A. Przybylski, CACHE AND MEMORY HIERARCHY DESIGN -- A Performance Directed Approach, Morgan Kaufman Publishers, 1990.


High-Level Cache Modeling For 2-D Discrete Wavelet.. - Andreopoulos..   (Correct)

....L2 Cache the cache of level 2, which can be a separable configuration of instruction and data caches or a joint configuration for both. The data transfer and storage organization is left to hardware. Thus, the instructions or data enter the processor core after passing through the cache hierarchy [16]. The typical path followed is: Main Memory )L2 Cache )I Cache or D Cache )Processor. When the instruction or data do not exist at the cache of a certain level, a miss occurs, in which case the processor waits until a block of instructions or data is fetched from the upper level cache (or the ....

....or data is fetched from the upper level cache (or the main memory) causing a delay to the execution not related to the computational complexity of the executed program. Following the classical 3 C model of the cache misses, the latter can be separated into capacity, compulsory and conflict misses [16]. This paper first proposes single processor software designs for all the transform production methods in section II. Based on them, analytical equations are proposed in section III that allow for the prediction of the expected number of data cache misses in a generic memory hierarchy. The ....

[Article contains additional citation context not shown here]

Steven K. Przybylski, Cache and Memory Hierarchy Design - A Performance-Directed Approach, San Fransisco, Morgan-Kaufmann, 1990.


The Feasibility of Using Compression to Increase Memory System.. - Wang, Quong (1994)   (1 citation)  (Correct)

....high performance workstations of today, compression already shows promise; as miss penalties increase in future, compression will only become more feasible. Keywords: Memory system performance, multilevel memory system, cache, data compression. 1 Introduction Multi level memory hierarchies [3][5][7] are the standard way to reduce average memory access time in a cost effective manner. The average access time of a cache is function of its hit time, miss rate, and miss penalty. We can reduce the miss rate of a cache either by making the cache bigger or by making the program smaller. The ....

Przybylski, S., Cache and Memory Hierarchy Design: a Performance-Directed Approach. Morgan Kaufmann Publishers, 1990


Cache Misses And Energy-Dissipation Results For.. - Andreopoulos..   (Correct)

....the execution not related to the computational complexity of the executed program. If the instruction (data) is not found in the L2 Cache either, then it is transferred from the main memory. In every case where a replacement of a cache block occurs, the LRU replacement strategy is typically used [18], where the cache block that was least recently used (LRU) is replaced with the new input block. With respect to the misses, the most severe ones (in terms of delay and energy dissipation) are those of L2 Cache, because in this case off chip accesses occur. Off chip accesses are slower and more ....

S. K. Przybylski, "Cache and Memory Hierarchy Design - A Performance-Directed Approach," MorganKaufman, 1990.


Coming Challenges in Microarchitecture and Architecture - Ronen, Mendelson, Lai.. (2001)   (4 citations)  (Correct)

....unit (ALU) instructions, so the structure of memory hierarchy has a major impact on performance. Much work has been done in improving cache performance. Caches are made bigger and heuristics are used to make sure the cache contains those portions of memory that are most likely to be used [8] [9]. Change in the control flow can cause a stall. The length of the stall is proportional to the length of the pipe. In a super pipelined machine, this stall can be quite long. Modern microprocessors partially eliminate these stalls by employing a technique called branch prediction. When a branch ....

S. A. Przybylski, Cache and Memory Hierarchy Design: A Performance -Directed Approach. San Mateo, CA: Morgan Kaufmann, 1990.


Analysis Of Wavelet Transform Implementations For.. - Andreopoulos.. (2001)   (Correct)

....modulo (N 2 ipm hs) # buffer i 29 30 31 platform. For simplicity in the presented analysis, the case of a fully associative instruction and data cache will be selected. In this way however, the conflict misses of an n way set associative cache are excluded from the description [13]. In the case of a miss due to lack of space in the cache (capacity miss) the LRU replacement strategy is assumed [13] where the cache block that was leastrecently used (LRU) is replaced with the new input block. For each of the different approaches of the wavelet transform decomposition, the ....

....of a fully associative instruction and data cache will be selected. In this way however, the conflict misses of an n way set associative cache are excluded from the description [13] In the case of a miss due to lack of space in the cache (capacity miss) the LRU replacement strategy is assumed [13], where the cache block that was leastrecently used (LRU) is replaced with the new input block. For each of the different approaches of the wavelet transform decomposition, the boundary effects (initialization and finalization phenomena) are ignored so as to facilitate the description. In ....

[Article contains additional citation context not shown here]

Steven K. Przybylski, "Cache and Memory Hierarchy Design -- A Performance-Directed Approach," Morgan-Kaufman, 1990.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee, Dumir (1999)   (13 citations)  (Correct)

....a balance between abstraction and fidelity, so as not to make the model unwieldy for theoretical analysis or simplistic to the point of lack of predictiveness. Memory hierarchy models used by computer architects to design caches have numerous parameters and suffer from the first shortcoming [1, 26]. Early algorithmic work in this area focussed on a two layered memory model[21] a very large capacity memory with slow access time (secondary memory) and a limited size faster memory (internal memory) All computation is performed on elements in the internal memory and there is no restriction ....

S. A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.


Techniques for Cache and Memory Simulation Using Address.. - Holliday (1990)   (9 citations)  (Correct)

....of caches and memories, this paper surveys the current techniques. There are important related subjects which we do not attempt to cover; namely, the vast literature on the performance results found using tracedriven simulation and the analytical models mentioned above. The book by Przybylski [3] and the bibliography by Smith [4] are good starting points for the former. An important class of the latter models predicts cache miss ratios in either the transient [5, 6] or steady state cases [7, 8] Another class applies Mean Value Analysis techniques [9] to study the overhead due to cache ....

S. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach. San Mateo, CA: Morgan Kaufmann, 1990.


Tradeoffs in Two-Level On-Chip Caching - Jouppi, Wilton (1993)   (60 citations)  (Correct)

....issues for on chip memory system performance modeling. Previous work by Hill [3] has studied access times and miss rates, and recommended that firstlevel caches should be direct mapped. However, he did not study on chip RAM area, and studied only single level caching organizations. Przybylski [7] has studied execution times of multi level cache systems as a function of many parameters. However, no mapping of configuration to chip area was done, nor was access time computed from cache parameters. Mulder [5] modeled the area of on chip caches, but did not consider the access time ....

....Caching Performance In this section we consider the performance of two level on chip cache configurations. Although direct mapped caches usually provide the best performance for first level caches [3] Przybylski points out that associativity is useful in lower levels of the cache hierarchy [7]. In this section we assume the second level cache is four way set associative, while the first level cache is direct mapped. Set associative caches tend to result in lower miss rates, but their access and cycle times are larger than the same sized direct mapped caches, since the tag must be read ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, Inc., 1990.


Measuring Data Cache And Tlb Parameters Under Linux - Yuanhua   (Correct)

....especially in high performance computer systems. Cache memories help bridge the cycle time gap between microprocessors and relatively slower main memories, by taking advantage of data locality in programs. Compilers and application programmers are increasingly designed with knowledge of caches [1, 2, 5, 7, 8]. In recent studies, cache conscious algorithmic design improved the performance of an operating system by more than 30 [11] some standard search algorithms by a factor of 2 to 5 [3] and full application programs by 27 to 42 [3] Any person (or compiler) who optimises memory operations for ....

Przybylski, S.A., Cache and Memory Hierarchy Design: A Performance-directed Approach, Morgan Kaufmann Publishers, Inc., 1990.


Memory System Architecture for Real-Time Multitasking Systems - Rixner (1995)   (Correct)

....725.6s 51.2s 725.6s 0.929 = 969696 task must be stalled waiting for the memory. This is actually an important point in any cache design, and a thorough discussion of the art is given by Przybylski in [18]. As the cache in a system using EQ is essentially a regular set associative cache with minor modifications, most all of the considerations that Przybylski raises are applicable in this case. Since those issues are not fundamental to real time caching, they will be largely ignored here, but in ....

Stephen A. Przybylski. Cache and Memory Hierarchy Design: A PerformanceDirected Approach, San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1990.


UTLB: A Mechanism for Address Translation on Network Interfaces - Angelos (1998)   (8 citations)  (Correct)

....translation entries over I O bus. The miss penalty is therefore several times the hit cost which is simply a memory reference on the network interface. A miss in the Shared UTLB Cache will have a perceivable impact on small message latency. We apply existing techniques in processor cache design [21, 37] to reduce the miss rates in the Shared UTLB Cache. Misses fall into three categories: capacity misses, conflict misses, and compulsory misses [23] When only one process is using the network interface, both capacity misses and conflict misses may occur in the Shared UTLB Cache. Multi programming ....

Steven A. Przybylski. Cache and memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, 1990.


Performance Aspects Of Computers With Graphical User Interfaces - Gupta (1993)   (Correct)

....used to design profiles for any general client server system. Information gained from such profiles would be of great help in designing strategies for task partitioning and load balancing. 60 5. FACTORS IN MEMORY SYSTEM DESIGN Cache memory is frequently used to improve memory access performance [65, 66, 67, 68, 69, 70, 71, 72]. Graphical displays make use of a specialized frame buffer memory to maintain the bit map image of the display. However, the benefit of using conventional caches is not clear for frame buffer accesses since these accesses have large working sets and are mainly writes. As computers with graphical ....

S. Przybylski, Cache and Memory Hierarchy Design: A Performance Directed Approach. San Mateo, CA: Morgan Kaufmann, 1990.


Local Area Network Traffic Locality: Characteristics and.. - Gulati (1992)   (3 citations)  (Correct)

....simulation offers the advantages of flexibility, accuracy, and ease of use. The experiments are fully repeatable and the same data can be used to compare different cache strategies. For the last several years trace driven simulation has been the mainstay of cache performance estimation [5, 11, 12, 17, 31, 35, 36]. The usefulness of trace driven simulation depends on the representativeness of the traces used to drive the simulations [31] It is important to obtain traces that provide accurate cache performance predictions. 59 Given the general advantages of using trace driven simulation and the ....

....used to compare different cache strategies. For the last several years trace driven simulation has been the mainstay of cache performance estimation [5, 11, 12, 17, 31, 35, 36] The usefulness of trace driven simulation depends on the representativeness of the traces used to drive the simulations [31]. It is important to obtain traces that provide accurate cache performance predictions. 59 Given the general advantages of using trace driven simulation and the availability of a tracing device, the HP 4972A LAN Protocol Analyzer, trace driven simulation was chosen as the methodology for studying ....

Przybylski S. A., Cache and Memory Hierarchy Design : A Performance-Directed Approach, Morgan Kaufmann Publishers, San Mateo, CA, 1990.


SPAID: Software Prefetching in Pointer- and.. - Lipasti, Schmidt.. (1995)   (38 citations)  (Correct)

....at the call sites for the data referenced by the pointers. The fundamental premise of this heuristic is that pointer arguments passed on procedure calls are highly likely to be dereferenced within the scope of the called procedure. 2. Related work Since the introduction of cache memories [Smi82, Prz90] researchers have continually sought to improve their performance. Some investigators have concentrated on improving the caches themselves, by such techniques as placing caches on the same chip as the processor [ACH 87] inventing nonblocking caches that can tolerate multiple outstanding ....

....to cache sizes between 4K and 32K, primarily because the working sets of the benchmarks used do not sufficiently exercise caches larger than 32K. Other parameters that were varied were line size (between 16 and 256 bytes) and associativity (both direct mapped and 4 way set associative) Smi82, Prz90] 4.3. Cache work analysis To quantify the performance gains due to the SPAID heuristic, we compute a measure called CW (cache work) that approximates processor cycles spent executing each benchmark program. CW is defined as follows: In the above equation, the first term is the product of I ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, San Mateo, CA, 1990.


Pollution Control Caching - Walsh, Jr. (1995)   (3 citations)  (Correct)

....caches can match the miss rate performance and E[CPI] of direct mapped caches that are greater than four times their size. Keywords: Cache memory, SPECmark, Petri Nets, ANOVA (Analysis of Variance) 2 1 Introduction Cache misses can be divided into three types: compulsory, conflict and capacity [12]. Compulsory misses occur when instructions or data are referenced for the first time. Conflict misses occur when two or more references map to the same location in the cache. Capacity misses occur because the cache is too small to hold needed data. Capacity and conflict misses can be further ....

....systems references. They were gathered on a system having an i486 processor running UNIX System V R4. We chose to run 60 million references from each workload and we primed the caches with 500,000 references before taking statistics. This allowed us to avoid any problem with cold start issues [5] [12] and trace length versus cache size issues [13] 6 4 Discussion of results 4.1 Trace Driven Simulation 4.1.1 PCC In Figure 4 we see the geometric mean of the miss ratio for the SPECint92 portion of our workloads. Here we see the PCC cache (PCC(256) 256 word Pollution Control Cache) compared ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990.


Memory Subsystem Performance of Programs with Intensive.. - Diwan, Tarditi, Moss (1993)   (13 citations)  (Correct)

....2 Background The following sections describe memory subsystems, copying garbage collection, SML, and the SML NJ compiler. 2.1 Memory subsystems This section reviews the organization of memory subsystems. Terminology for memory subsystems is not standardized; we use Przybylski s terminology [39]. It is well known that CPUs are getting faster relative to DRAM memory chips [37] main memory cannot supply the CPU with instructions and data fast enough. A solution to this problem is to use a cache, a small fast memory placed between the CPU and main memory that holds a small subset of ....

....is the smallest part of a cache with which a valid bit is associated. In this paper, subblock placement implies a subblock of one word, i.e. valid bits are associated with each word. Moreover, on a read miss, the whole block is brought into the cache not just the subblock that missed. Przybylski [39] notes that this is a good choice. A memory access to a location which is resident in the cache is called a hit. Otherwise, the memory access is a miss. A read request for memory location m causes m to be mapped to a set. All the tags and valid bits (if any) in the set are checked to see if any ....

[Article contains additional citation context not shown here]

Przybylski, S. A. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, California, 1990.


Towards a Theory of Cache-Efficient Algorithms (Extended Abstract) - Sen, al.   (Correct)

....a balance between abstraction and fidelity, so as not to make the model unwieldy for theoretical analysis or simplistic to the point of lack of predictiveness. Memory hierarchy models used by computer architects to design caches have numerous parameters and suffer from the first shortcoming [1, 24]. Existing memory hierarchy models [2 5] do not model certain salient features of caches, notably the lack of full associativity in address mapping and the lack of explicit control over data movement and replacement. Unfortunately, these small differences are malign in the effect. 1 They ....

S. A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.


Cache Behaviour of Lazy Functional Programs - Langendoen, Agterkamp (1992)   (2 citations)  (Correct)

....the block size (i.e. line size) the update policy, the associativity, and the fetch policy of the standard cache as defined in Figure 4. 3. 1 Cache size In our first experiment, we look at the total size of the cache since it has the strongest influence on performance of all cache parameters [Przybylski90] We have varied the cache size in the standard configuration from 1 Kbyte to 256 Kbyte. The results of the benchmark programs in Figure 5 show that the miss rate and traffic ratios strictly decrease when the cache is enlarged, but that large differences exist between the individual ....

....policy, however, are the same: copy back reduces traffic ratios and a 2 way associative cache marginally outperforms a direct mapped cache. 4. 2 Imperative languages The cache behaviour of imperative programs has been thoroughly studied, both analytically and empirically [Smith82, Smith87, Hill89, Przybylski90, Hennessy90] It is, however, rather difficult to compare the results of imperative and functional languages because of the strong dependence on the memory reference behaviour of the application benchmark (i.e. workload) Nevertheless, we can compare the trends in both areas. In general ....

[Article contains additional citation context not shown here]

S. A. Przybylski. Cache and memory hierarchy design: a performance-directed approach. Morgan Kaufmann Publishers, Inc., Palo Alto, California, USA, 1990.


Standard Memory Hierarchy Does Not Fit Simultaneous.. - Hily, Seznec (1998)   (3 citations)  (Correct)

....carefully the problem of contention on the secondlevel (L2) cache. Due to large L2 cache size (often 1 MBytes or more) contention on the memory (or a third level cache) is far less critical. We built a configurable simulator for a classical two level memory hierarchy (Figure 1) as described in [5] and [3] CPU INSTRUCTION CACHE DATA CACHE WRITE BUFFERS WRITE BUFFERS UNIFIED L2 CACHE MAIN MEMORY FETCH DISPATCH degree 8 POOL OF FUs WINDOW INST. THREAD1 THREAD2 THREADi INST. CACHE DATA CACHE Figure 1: simulated architecture We simulated a write back scheme for the data cache and the L2 ....

....for this is that the memory references generated by the simultaneous execution of independent threads exhibit less spatial locality than that of a single thread. Conflict misses have then a higher probability to happen than for a single thread. Increasing the associativity limits conflict misses [5] (but extends the access time) As illustrated in Figure 8, this appears far more critical when the L1 cache size is small. 0 1 2 3 4 5 6 1 thread 2 threads 4 threads 6 threads # # # # # # # # # l16 a1 l16 a2 l16 a4 l32 a1 l32 a2 l32 a4 l64 a1 l64 a2 l64 a4 64K L1 16K L1 Figure 8: average IPC ....

S. A. Przybylski. Cache and Memory Hierarchy Design : A Performance-Directed Approach. Morgan Kaufmann, 1990.


Contention on 2nd Level Cache May Limit the Effectiveness of.. - Hily, Seznec (1997)   (2 citations)  (Correct)

....simulations. This has the advantage of providing reproducible results. As our work focuses on cache organization behavior, we therefore first present the simulated memory hierarchy. 2. 1 Simulated Memory hierarchy The design of memory hierarchies for microprocessors has been extensively studied [6, 7, 8]. However, very few studies have discussed the problem of cache bandwidth [9] or L2 cache contention [10] All these studies were devoted to singlethreaded architectures. To our knowledge, the only study available on caches and simultaneous multithreading is part of a work done by Tullsen et al. ....

.... (which however may not be the case for the DEC21164 which has a small 96KB L2 cache INRIA Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading5 [11] We built a configurable simulator for a classical two level memory hierarchy (Figure 2) as described in [6]. CPU INSTRUCTION CACHE DATA CACHE WRITE BUFFERS WRITE BUFFERS UNIFIED L2 CACHE MAIN MEMORY Figure 2: Two Level memory hierarchy We simulated a write back scheme for the data cache and the L2 cache [12] The write policy on miss is write allocate. Write back was preferred to write through, because ....

[Article contains additional citation context not shown here]

S. A. Przybylski. Cache and Memory Hierarchy Design : A Performance-Directed Approach. Morgan Kaufmann, 1990.


Compiler Support for Software Prefetching - McIntosh (1998)   (10 citations)  (Correct)

....Multi level memory hierarchies typically enforce the inclusion principle, that is, the hardware requires that the contents of the cache at level K be a subset of the contents at K 1. The inclusion requirement is essential for efficient implementation of hardware cache coherence in multiprocessors [81]. In keeping with the widespread acceptance of multi level hierarchies in modern workstation and multiprocessor designs, we assume a 2 level cache hierarchy for all of our simulation studies. A write buffer is a small queue (2 32 entries) that stores pending write operations, typically placed ....

S. A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, San Mateo, CA, 1990.


Memory-System Performance of Programs with Intensive Heap.. - Diwan, Tarditi, Moss (1995)   (17 citations)  (Correct)

....is the smallest part of a cache with which a valid bit is associated. In this paper, subblock placement implies a subblock of one word, i.e. valid bits are associated with each word. Moreover, on a read miss, the whole block is brought into the cache, not just the subblock that missed. Przybylski [Przybylski 1990] notes that this is a good choice. A memory access to a location which is resident in the cache is called a hit. Otherwise, the memory access is a miss. A miss is a compulsory miss if it is due to a memory block being accessed for the first time. A miss is a capacity miss if it results from the ....

Przybylski, S. A. 1990. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, California.


Memory Subsystem Performance of Programs Using Copying.. - Diwan, Tarditi, Moss (1994)   (27 citations)  (Correct)

....2 Background The following sections describe memory subsystems, copying garbage collection, SML, and the SML NJ compiler. 2.1 Memory subsystems This section reviews the organization of memory subsystems. Since terminology for memory subsystems is not standardized we use Przybylski s terminology [31]. It is well known that CPUs are getting faster relative to DRAM memory chips; main memory cannot supply the CPU with instructions and data fast enough. A solution to this problem is to use a cache, a small fast memory placed between the CPU and main memory that holds a subset of memory. If the ....

....smallest part of a cache with which a valid bit is associated. In this paper, subblock placement implies a subblock size of one word, i.e. valid bits are associated with each word. Moreover, on a read miss, the whole block is brought into the cache not just the subblock that missed. Przybylski [31] notes that this is a good choice. A memory access for which a block is resident in the cache is called a hit. Otherwise, the memory access is a miss. A read request for memory location m causes m to be mapped to a set. All the tags and valid bits (if any) in the set are checked to see if any ....

[Article contains additional citation context not shown here]

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, California, 1990.


Operating System Support For High-Speed Networking - Druschel (1994)   (16 citations)  (Correct)

....increasingly dominate the transfer time for a fixed sized cache block. Increases in data transfer rate must be combined with an increase in cache block size to be effective. Unfortunately, the cache block size cannot be increased arbitrarily without affecting the hit rate of the cache systems [Prz90] Several recently announced components integrate some form of a cache with a dynamic RAM to reduce the average access latency [CW92] These integrated second level caches use large cache lines and are connected to the DRAM using wide data paths. As for any cache, the hit rate of these components ....

....interrupts and the events they signal trigger processor rescheduling. Cache Size: Cache memories, particularly fast on chip caches, are limited in size. In practice, their effective size is further reduced due to the limited associativity of directmapped and set associative organizations [Prz90] For data to remain cached during a data operation that involves loading every word and storing it in a different location, the cache must be at least twice the size of the data unit. In practice, cache size requirements are further increased by accesses to program variables during and between ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, San Mateo, CA, 1990. TK7895.M4P79.


Novel Caches for Predictable Computing - Muller, May, Irwin, Page (1998)   (Correct)

....of caches. Over the years, many attempts have been made to understand cache performance. One approach has been the phenomenological studies, characterised by the seminal paper of Smith [1] This approach ultimately leads to analytical models of cache behaviour that are based on empirical data [2]. Even the more complicated of these models are based on the idea of an average application. They are therefore not useful when predicting the performance of specific applications. We propose to fundamentally change the way we view caches. Instead of seeing a cache as an addition to an existing ....

S. A. Przybylski. Cache and Memory Hierarchy Design: A PerformanceDirected Approach. Morgan Kaufmann Publishers Inc, San Mateo, California, 1990.


Design of Cache Memories for Dataflow Architecture - Kavi, Hurson (1997)   (Correct)

....of a cache is subject to more constraints and trade offs than that of the main memory. Issues such as the placement replacement policy, the fetch update policy, homogeneity, the addressing scheme, block size, and the cache bandwidth are among those which should be taken into consideration ( 9] [15], 17] 20] Optimizing the design of a cache memory generally has four aspects: Maximizing the probability of finding a memory reference s target in the cache (the hit ratio) Minimizing the time to access information that is residing in the cache (access time) Minimizing the delay ....

S. Przybylski. (1990). Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, San Mateo,CA.


Competitive Algorithms for Multilevel Caching and Relaxed List .. - Chrobak, Noga (1998)   (3 citations)  (Correct)

....achieve improved memory performance by implementing a hierarchy of caches. Such a multilevel cache system consists of m caches of increasing size and access time, where the level 1 cache is smallest and fastest, and the main memory is viewed as the largest and slowest cache at level m. See [15, 16, 17]. Aggarwal et al. [1] introduced RLUP as a model for the management of hierarchical memory 1 . They also proved that a version of LRU is C competitive, for some C, if there is an ff such that c 2i ffc i for all i. In addition to online algorithms, they investigated offline algorithms for ....

S. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach, Morgan Kaufmann (1990).


Address Reference Generation in a Memory Hierarchy Simulator .. - Niessen, Wijshoff (1995)   (Correct)

....LUdecomposition, SpM x SpM) multiplication, with (SpM x V) sparse matrix vector multiply as a special case, and triangular solve. This report demonstrates the use of this simulator for one of these applications. 1 Introduction A hierarchical memory system consists of several storage levels [7, 8, 13, 16, 17]. Each of these levels is faster and smaller than the level below. This way, a large virtual memory can be created with a low average access time. In most systems, the lower part of the hierarchy is made up of main memory (DRAM) while the middle part consists of one or more (hardware managed) ....

Steven A. Przybylski. Cache and memory hierarchy design : a performancedirected approach. Morgan Kaufman Publishers, 1990.


A Persistent Distributed Store for Cooperative Applications - Long-Term Research   (Correct)

....related data together. In recent years, evaluation of cache performance has received considerable attention [23] Experience with large caches shows that simple stochastic simulators do not represent program activity faithfully enough. Recent work has therefore used traces taken from real programs [37]. Garbage collectors and allocators were often evaluated on the base of simple stochastic simulation or simple synthetic benchmarks. Wilson [52] shows that this approach is flawed, because real programs exhibit highly non random behaviour. He therefore advocates using real program traces. We ....

Steven A Przybylski. Cache and memory hierarchy design: a performance-directed approach. Morgan Kaufmann, San Mateo CA (USA), 1990.


Cache Performance of Garbage-Collected Programs - Reinhold (1994)   (21 citations)  (Correct)

....to future work. 4. Cache design parameters The portion of the cache design space considered in this paper is constrained in several ways. Only direct mapped caches are considered. Because they are the simplest to implement, direct mapped caches have faster access times than other types of caches [15, 27]; they are the most common type of cache in current high performance computers. The caches are assumed to be virtually, rather than physically, indexed. A wide range of cache sizes is considered, from 32kb to 4mb. This range includes current typical sizes for single level off chip caches ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, Palo Alto, California, 1990.


An Analytical Model for Designing Memory Hierarchies - Jacob, al. (1996)   (26 citations)  (Correct)

....that the disk technology is useless in the hierarchy. Since the model takes only a moment to recommend a configuration, we can easily use it to choose a subset of devices from a larger pool of technologies. This is similar to Przybylski s dynamic programming approach to hierarchy optimization [12], but it is much simpler because we can quickly search through all possible subsets. This process will find the best organization of the best subset of technologies at a given budget point. V. Verification The analysis in Section III makes the following simplifications: ffl The polynomial ....

S. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, Inc., 1990.


Memory Subsystem Performance of Programs Using Copying.. - Diwan, Tarditi, Moss (1994)   (27 citations)  (Correct)

....2 Background The following sections describe memory subsystems, copying garbage collection, SML, and the SML NJ compiler. 2.1 Memory subsystems This section reviews the organization of memory subsystems. Since terminology for memory subsystems is not standardized we use Przybylski s terminology [32]. It is well known that CPUs are getting faster relative to DRAM memory chips; main memory cannot supply the CPU with instructions and data fast enough. A solution to this problem is to use a cache, a small fast memory placed between the CPU and main memory that holds a subset of memory. If the ....

....smallest part of a cache with which a valid bit is associated. In this paper, subblock placement implies a subblock size of one word, i.e. valid bits are associated with each word. Moreover, on a read miss, the whole block is brought into the cache not just the subblock that missed. Przybylski [32] notes that this is a good choice. A memory access for which a block is resident in the cache is called a hit. Otherwise, the memory access is a miss. A read request for memory location m causes m to be mapped to a set. All the tags and valid bits (if any) in the set are checked to see if any ....

[Article contains additional citation context not shown here]

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, first edition, 1990.


An Analytical Model for Designing Memory Hierarchies - Jacob, Chen, Silverman, Mudge (1996)   (26 citations)  (Correct)

....that the disk technology is useless in the hierarchy. Since the model takes only a moment to recommend a configuration, we can easily use it to choose a subset of devices from a larger pool of technologies. This is similar to Przybylski s dynamic programming approach to hierarchy optimization [12], but it is much simpler because we can quickly search through all possible subsets. This process will find the best organization of the best subset of technologies at a given budget point. V. Verification The analysis in Section III makes the following simplifications: ffl The polynomial ....

S. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, Inc., 1990.


Multithreaded Architectures: Principles, Projects and Issues - Dennis, Gao (1994)   (4 citations)  (Correct)

.... the cache access time is often the dominating factor in determining processor cycle time, much effort has been invested in the design of cache memories to achieve high performance through compromise in choices of architecture parameters including total cache size, associativity, and block size [113, 69, 106, 66]. The challenge facing architects is how to make the cache both faster and larger, goals generally in conflict. Although the current trend is to devote a substantial proportion of the chip area of a RISC microprocessor to cache memory, other architectural choices may prove more cost effective. 2.2 ....

Steven A. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann, 1990.


An Evaluation Study of a Link-Based Data Diffusion Machine - Muller, al. (1994)   (9 citations)  (Correct)

....number of active processors. Both the number of levels and the fanout are irrelevant for the miss ratio at the memory level (ignoring subtle effects related to the execution time) 3.2. 2 Item size The effects of the item size on cache performance in uniprocessors have been studied in much detail [12, 13]. In a sequential system a larger item size makes better use of the locality of the application and hence reduces the miss ratio up until the pollution point [14] although with an increasing miss penalty. Beyond the pollution point the miss ratio begins to rise again as unused data in the large ....

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers Inc, San Mateo, California, 1990.


A Technique for Collecting Simultaneous Multithreaded.. - Vega, Hamkalo.. (2006)   (Correct)

No context found.

S. Przybylski, "Cache and Memory Hierarchy Design. A Performance Directed Approach," Morgan Kaufman Publishers, 1990. 7


Sector Cache Design and Performance - Rothman, Smith (1999)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance--Directed Approach. San Mateo, CA. Morgan Kaufmann Publishers, 1990.


Analysis Of Wavelet Transform - Implementations For Image (2001)   (Correct)

No context found.

Steven K. Przybylski, "Cache and Memory Hierarchy Design -- A Performance-Directed Approach," Morgan-Kaufman, 1990.


A Local Wavelet Transform Implementation Versus An Optimal - Row-Column Algorithm For (2001)   (Correct)

No context found.

Steven K. Przybylski, Cache and Memory Hierarchy Design -- A Performance-Directed Approach, MorganKaufman, 1990.


Using Locality Surfaces to Determine Cache Miss Ratios - Sorenson (2000)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach, Morgan Kaufmann Publishers, Inc., 1990. This book presents a detailed study of caches, specifically in hierarchies. Again, much detailed cache simulation is presented.


Towards a Theory of Cache-Efficient Algorithms - Sen, Chatterjee (1999)   (13 citations)  (Correct)

No context found.

S. A. Przybylski. Cache and Memory Hierarchy Design: A Performance-Directed Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.


the Garbage Collection Bibliography - Richard Jones (2003)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: A PerformanceDirected Approach. Morgan Kaufman, Palo Alto, CA, 1990.


Level Two Translation Lookaside Buffers - Callaghan, Hoque, Rotenberg (1995)   (Correct)

No context found.

Steven A. Przybylski. Cache and Memory Hierarchy Design: a Performance-Directed Approach, Morgan Kaufmann Publishers, 1990.


VaWiRAM: A Variable Width Random Access Memory Module - John (1996)   (1 citation)  (Correct)

No context found.

S. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach, Morgan Kaufman, 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC