16 citations found. Retrieving documents...
D. F. Zucker, M. J. Flynn, R. B. Lee, "A Comparison of Hardware Prefetching Techniques For Multimedia Benchfmarks." In Proceedings of the International Conferences on Multimedia Computing and Systems, Himshima, Japan, June 1996

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Memory Arbitration and Cache Management in Stream-Based.. - Harmsze, Timmer, van.. (2000)   (1 citation)  (Correct)

....will be more advantageous. In a typical CPU cache the data and instructions of one task can occupy the whole cache. In a multi processor architecture with many independent streams, data from different tasks can concurrently occupy a central cache. Therefore we require the notion of stream caching [7]. Section 4 describes a method to overcome the disadvantages of cache fragmentation and all methods will be illustrated in section 5 using the CPA architecture. 2 Background memory arbitration and response time calculations In [2] a simple and efficient memory arbitration scheme is presented ....

D. Zucker, M. Flynn, and R. Lee. A comparison of hardware prefetching techniques for multimedia benchmarks. In Proceedings of the International Conference on Multimedia Computing and Systems, June 1996.


Cache Prefetching - Berg (2002)   (Correct)

....e ective. The only prefetchers that have been evaluated in the context of multimedia applications are the consecutive and stride prefetchers. Zucker et al. evaluate the e ectiveness of several prefetching techniques on MPEG encoding and decoding applications running on a general purpose processor [33]. They found that a stream bu er could remove on the order of 50 of the cache misses. A stride prefetcher was able to remove around 80 of the misses for cache sizes larger than 16 kbyte, however, with smaller cache sizes the stride prefetcher was no better than the consecutive prefetcher. The ....

....eliminating most of the cache miss stall times. On the other hand, JPEG and MPEG coding applications showed only little bene t because for these applications only a small component of execution time was spent waiting on L1 cache misses. This illustrates a shortcoming of the study by Zucker et al. [33] who only model the memory latency, but not the instruction execution. In comparison, Ranganathan et al. model an out of order processor which shows that for these applications memory latency does not yet impact performance severely. 5. CONCLUSIONS AND FUTURE WORK To improve performance from ....

D. F. Zucker, M. J. Flynn, and R. B. Lee, \A comparison of hardware prefetching techniques for multimedia benchmarks," Proceedings of the International Conference on Multimedia Computing and Systems, pp. 236-244, June 1996. 29


A Combined Hardware/Software Solution for Stream.. - Struik, van der.. (1998)   (1 citation)  (Correct)

....be supported. The table can also be used for synchronization purposes. When each memory operation results in the issuing of at most one prefetch request, the number of pending prefetches is limited. We call this synchronization on instruction address. Such a scheme is applied in the stream cache [3], where a separate cache is introduced to store prefetched data blocks. In a pure software solution, the programmer is responsible for inserting prefetch instructions in the code. Which data streams are prefetched is a design decision of the programmer. A simple implementation of stream ....

Daniel F. Zucker, Michael J. Flynn, and Ruby B. Lee, "A comparison of hardware prefetching techniques for multimedia benchmarks", Proceedings of the International Conference on Multimedia Computing and Systems, Hiroshima, Japan, June '96.


DSTRIDE: Data-cache miss-address-based stride.. - Govindarajalu.. (2001)   (Correct)

....At the end of three consecutive strided references, a stream is allocated and the entry in the history buffer is freed. A unit stride filter is also used. Setting the czone bit at runtime requires a software bitmask that must be individually adjusted for a given application and architecture [11, 13]. This seems to be impractical for even a small set of multimedia workloads [14] Our stride detection logic, discussed in Section 3.3, can be used with stream buffers to detect both unit and non unit strides but does not have these drawbacks. 2.2.3. Markov predictor. A Markov predictor uses the ....

D. F. Zucker, M.J. Flynn and R.B Lee, "A Comparison of Hardware prefetching Techniques for MultiMedia Benchmarks, " TR CSL-TR-95-683, Dec. 1995.


Multi-Level Cache Hierarchy Evaluation for Programmable Media.. - Fritts, Wolf (2000)   (Correct)

....on video application traces [5] have found that hybrid memory architectures with stream buffers provide better performance than cache only memory hierarchies. However, these studies are based on trace driven simulations assuming perfect branch prediction and memory disambiguation. Zucker et al. 6][7] also examined the impact of streaming memory structures such as stream buffers, stride prediction tables, and stream caches, and also found considerable benefit with these structures in multimedia applications. Consequently, we expect some hybrid of cache and prefetching will provide the best ....

D. Zucker, M. Flynn, and R. Lee, "A comparison of hardware prefetching techniques for multimedia benchmarks", Technical Report CSL-TR-95-683, Stanford University, Dec. 1995.


Parallel Media Processors for the Billion-Transistor Era - Jason Fritts Zhao (1999)   (4 citations)  (Correct)

....with hybrid memory architectures that combine a stream buffer with cache. However, these studies are based on trace driven simulations that assume perfect branch prediction and memory disambiguation. Previous research in multimedia memory hierarchies includes a study performed by Zucker, et al. [10] at Stanford. Using the JPEG and MPEG multimedia applications, the study examined three prefetching techniques for streaming data: stream buffer, stride prediction table, and a hybrid of the two called a stream cache. Compared to a cache system without any prefetching, all the techniques were ....

....memory model. Evaluations of other memory types require corresponding changes in the compilers and simulators, which are not addressed in this paper due to space limitations. Comparisons of different memory architectures and their impact on multimedia applications can be found in the literature [4][10]. In all cache simulations, a block size of 64 bytes and two way set associativity are used, as this configuration outperforms others cache configurations of similar area [9] The datapath model is that presented in Section 2.1, i.e. 8 clusters, each containing 128 registers, 4 ALUs, 2 memory ....

D. Zucker, M. Flynn, and R. Lee, "A comparison of hardware prefetching techniques for multimedia benchmarks", Technical Report CSL-TR-95-683, Stanford Univ. Dec. 1995.


Exploiting Cache in Multimedia - Cucchiara, al. (1999)   (2 citations)  (Correct)

....be used in the next future [9] In order to justify the adoption of pre fetching, a precise performance evaluation and comparison on different pre fetch techniques for a given class of application or data type is mandatory. Some works explore pre fetch techniques for image processing or multimedia [4, 5, 12]. In particular [4] states that standard cache pre fetch mechanisms are not suitable for handling images characterized by 2D spatial locality. Accordingly, we propose new schemes that explicitly exploit 2D cache locality in multimedia applications and compare them to other proposed pre fetching ....

....locality. Accordingly, we propose new schemes that explicitly exploit 2D cache locality in multimedia applications and compare them to other proposed pre fetching methods. 3. The working set: algorithms and data types A great effort is now oriented to the definition of benchmarks for multimedia [2, 12], in order to create a common working set for performance evaluation: as well, we believe that these benchmark are currently too limited since they include only coding decoding algorithms such as some tasks of JPEG and MPEG standards [1] Instead, we believe that a complete benchmark suite, at ....

D. Zucker, M.J. Flynn, R. Lee, "A comparison of hardware prefetching techniques for multimedia benchmark", Proc. of IEEE Multimedia 96, pp. 236-244.


Optimizing the Data Cache Performance of a Software MPEG-2.. - Soderquist, Leeser (1997)   (9 citations)  (Correct)

....overall memory system bandwidth needs. The undiminished requirement for extremely high bandwidth memory is reflected in Intel s support of the Rambus architecture for future PC memory systems. 4. 4 Prefetching Hardware prefetching has been suggested as a remedy for inadequate MPEG performance [11]. Yet it is primarily promoted as a means to reduce miss rates and without much consideration of cache memory traffic, which tends to increase with prefetching, especially the more aggressive schemes. Prefetching is essentially a form of latency hiding; for memory bound problems, it merely exposes ....

Daniel F. Zucker et al. A comparison of hardware prefetching techniques for multimedia. Technical Report CSL-TR-95683, Stanford University Depts. of Electrical Engineering and Computer Science, Stanford, CA, Dec. 1995.


REMARC: Reconfigurable Multimedia Array Coprocessor - Miyamori, Olukotun (1998)   (7 citations)  (Correct)

....the total execution cycles are reduce by half, the relative percentage of cache miss stall cycles becomes much larger in the REMARC processor. This suggests that memory systems including cache organizations are more important in the REMARC processor than the original processor. Some studies [17] show hardware prefetching schemes for data caches can reduce cache misses. Combination of the REMARC architecture and memory systems must be an interesting topic especially for stream based multimedia applications. 4.3 MPEG2 Encoding We used the mpeg2encode [20] program also distributed by MPEG ....

Daniel F. Zucker, Michael J. Flynn, and Ruby B. Lee, "A Comparison of Hardware Prefetching Techniques for Multimedia Benchmarks", International Conference on Multimedia Computing and Systems, 1996.


Memory Traffic and Data Cache Behavior of an MPEG-2 Software .. - Soderquist, Leeser (1997)   (Correct)

....I B B P B B P B B 2 3 4 5 6 7 8 9 10 Figure 2: Typical sequence of MPEG frames, showing interframe dependencies Figure 2 shows the interframe dependencies of the different frame types, superimposed on the displayed frame order. For decoding, these frames must be processed in the non temporal order [1,4,2,3,7,5,6,10,8,9], which is a result of these dependencies. The dependencies between frames and the properties and and sequence of frame types determine in critical ways the flow pattern of MPEG data and the nature of hardware support required. Sequence header GOP GOP GOP header Picture . Picture Picture ....

....very large cache but a large cache is not always feasible, and simulations show that improving MPEG2 decoding performance requires very big caches. In any case, one would prefer to extract better performance from smaller, lower cost resources. Hardware prefetching has been suggested as a remedy [9], yet primarily as a means to reduce miss rates and without much consideration of cache memory traffic, which tends to increase with prefetching. We advocate looking more closely at the internal dynamics of the decoder itself. Exploiting knowledge of individual data types, their sizes, access ....

Daniel F. Zucker et al. A comparison of hardware prefetching techniques for multimedia. Technical Report CSL-TR95 -683, Stanford University Depts. of Electrical Engineering and Computer Science, Stanford, CA, Dec. 1995.


An Automated Method for Software Controlled Cache Prefetching - Zucker, Lee, Flynn (1998)   (7 citations)  Self-citation (Zucker Flynn)   (Correct)

....Prefetch Address Fig. 1. Stride prediction table architecture. The profile step simulates a hardware SPT prefetching into a series stream cache. The SPT architecture is shown in figure I and the series stream cache architccturc is shown in figure 2. These architectures are described in detail in [13]. The SPT is used to determine what data will be needed by a given instruction based on what data it has accessed previously. An attempt is made to calculate a stride value as if the memory access is made in a regular stride through the data. A table, indexed by instruction address, is maintained ....

Daniel F. Zucker, Michael J. Flynn, and Ruby B. Lee, "A comparison of hardware prefetching techniques for multimedia benchmarks," in Proceedings of the International Conference on Multimedia Computing and Systems, Hiroshima, Japan, June 1996, pp. 236-244.


Architecture And Arithmetic For Multimedia Enhanced Processors - Zucker (1997)   (5 citations)  Self-citation (Zucker Flynn)   (Correct)

No context found.

Daniel F. Zucker, Michael J. Flynn, and Ruby B. Lee. A comparison of hardware prefetching techniques for multimedia benchmarks. In Proceedings of the International Conference on Multimedia Computing and Systems, pages 236--244, Hiroshima, Japan, June 1996.


Architecture And Arithmetic For Multimedia Enhanced Processors - Zucker (1997)   (5 citations)  Self-citation (Zucker Flynn)   (Correct)

No context found.

Daniel F. Zucker, Michael J. Flynn, and Ruby B. Lee. A comparison of hardware prefetching techniques for multimedia benchmarks. Technical Report No. CSL-TR-95-683, Computer Systems Laboratory, Stanford University, December 1995.


An Automated Method for Software Controlled Cache Prefetching - Zucker, Lee, Flynn (1998)   (7 citations)  Self-citation (Zucker Flynn)   (Correct)

....Prefetch Address Fig. 1. Stride prediction table architecture. The profile step simulates a hardware SPT prefetching into a series stream cache. The SPT architecture is shown in figure 1 and the series stream cache architecture is shown in figure 2. These architectures are described in detail in [13]. The SPT is used to determine what data will be needed by a given instruction based on what data it has accessed previously. An attempt is made to calculate a stride value as if the memory access is made in a regular stride through the data. A table, indexed by instruction address, is ....

Daniel F. Zucker, Michael J. Flynn, and Ruby B. Lee, "A comparison of hardware prefetching techniques for multimedia benchmarks," in Proceedings of the International Conference on Multimedia Computing and Systems, Hiroshima, Japan, June 1996, pp. 236--244.


FSRAM: Flexible Sequential and Random Access Memory for.. - Ying Chen Karthik (2004)   (Correct)

No context found.

D. F. Zucker, M. J. Flynn, R. B. Lee, "A Comparison of Hardware Prefetching Techniques For Multimedia Benchfmarks." In Proceedings of the International Conferences on Multimedia Computing and Systems, Himshima, Japan, June 1996


Enhancing the Memory Performance of Embedded.. - Chen..   (Correct)

No context found.

D. F. Zucker, M. J. Flynn, R. B. Lee, "A Comparison of Hardware Prefetching Techniques For Multimedia Benchfmarks." In Proceedings of the International Conferences on Multimedia Computing and Systems, Himshima, Japan, June 1996

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC