Results 1 -
8 of
8
Parallel Media Processors for the Billion-Transistor Era
- In International Conference on Parallel Processing
, 1999
"... ..."
Architecture And Compiler Design Issues In Programmable Media Processors
, 2000
"... The processing demands for multimedia applications are rapidly escalating. Many current applications are pushing the limits of existing microprocessors, and the next generation of multimedia promises considerably greater demands. Adequate support for future multimedia requires the flexibility and co ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
The processing demands for multimedia applications are rapidly escalating. Many current applications are pushing the limits of existing microprocessors, and the next generation of multimedia promises considerably greater demands. Adequate support for future multimedia requires the flexibility and computing power of high-level language (HLL) programmable media processors. This thesis examines the architecture and compiler design issues for programmable media processors. Design of the architecture requires an accurate understanding of multimedia characteristics. Using the MediaBench benchmark suite and the Impact compiler, workload and architecture evaluations were performed to define the essential architecture for programmable media processors. The workload evaluation examines various processing aspects, including functional necessities, data types and sizes, branch performance, loop characteristics, memory statistics, and instruction level parallelism. The architecture evaluation exam...
MultiLevel Cache Hierarchy Evaluation for Programmable Media
- Processors,” IEEE Workshop on Signal Processing Systems
, 2000
"... Abstract- This paper presents the results of a multi-level cache memory hierarchy evaluation for programmable media processors. With the continuing advances in VLSI technology, it becomes possible to support larger memory hierarchies on-chip, but the question remains of how to most effectively use t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract- This paper presents the results of a multi-level cache memory hierarchy evaluation for programmable media processors. With the continuing advances in VLSI technology, it becomes possible to support larger memory hierarchies on-chip, but the question remains of how to most effectively use these additional silicon resources for optimizing memory performance. This paper explores that issue by evaluating the various levels of the memory hierarchy using a cache-based memory system. This evaluation examines the change in performance from varying cache parameters including the L2 cache parameters of cache size, line size, and latency, and the external memory parameters of bandwidth and latency. Examining the performance impact of these parameters, we have identified external memory latency and bandwidth as the primary memory bottlenecks in media processors.
True Motion Estimation - Theory, Application, and Implementation
, 1998
"... This thesis offers an integrated perspective of the theory, applications, and implementation of true motion estimation. Taking the pictures of 3D real-world scene generates sequences of video images. When an object in the three-dimensional real world moves, there are corresponding changes in the bri ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This thesis offers an integrated perspective of the theory, applications, and implementation of true motion estimation. Taking the pictures of 3D real-world scene generates sequences of video images. When an object in the three-dimensional real world moves, there are corresponding changes in the brightness---or luminance intensity---of its two-dimensional image. The physical three-dimensional motion projected onto the twodimensional image space is referred to as "true motion." The ability to track true motion by observing changes in luminance intensity is critical to many video applications. This thesis explores techniques that track such motion and shows how these techniques can be used in many important applications.
Gop-Interlaced Coding Scheme For Enhancing Coarse-Grain Parallelism And Error Concealment
, 1998
"... The main novelty of this work is redefining the structure of the group-of-picture (GOP). More specifically, in the current MPEG and H.263 video coding standards, the GOP is a group of consecutive frames and two GOPs have no overlaps in time. In our scheme, the GOP is a group of nonconsecutive frame ..."
Abstract
- Add to MetaCart
(Show Context)
The main novelty of this work is redefining the structure of the group-of-picture (GOP). More specifically, in the current MPEG and H.263 video coding standards, the GOP is a group of consecutive frames and two GOPs have no overlaps in time. In our scheme, the GOP is a group of nonconsecutive frames and the frames in two GOPs interleave with each other. There are many advantages of using the proposed encoding scheme, which interlaces the group-ofpicture (GOP). First, the encoding scheme is easily paralleled in coarse-grain. So far, no state-of-the-art microprocessors can provide enough computational power for most real-time video compression standards. In order to bring the real-time video compression into microprocessor-based consumer products, the algorithms and the architecture need to be redesigned. The proposed GOP-interlaced coding scheme enables a linear-speedup of two in a dual-CPU system for real-time video compression. Second, it offers improved error-concealment capability. Because compressed image data comes with a problem of sensitivity to channel errors, error-concealment techniques are introduced to minimize the impact of these errors. The proposed GOPinterlaced coding scheme leads to a 5dB SNR improvement in our motion-interpolated error concealment over the original coding scheme.
Abstract Cache Write Policy for Streaming Output Data
"... This paper presents the results of a cache memory hierarchy evaluation that demonstrates multimedia applications are significantly influenced by the cache write policy of the L1 data cache. As part of an initial investigation of the memory characteristics of multimedia applications, we have been exp ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents the results of a cache memory hierarchy evaluation that demonstrates multimedia applications are significantly influenced by the cache write policy of the L1 data cache. As part of an initial investigation of the memory characteristics of multimedia applications, we have been exploring multi-level cache memory hierarchies to evaluate the memory bottlenecks in media processing. The existence of streaming data in multimedia and the benefits of streaming memory prefetch structures such as stream buffers for supporting streaming input data has been well studied. However, support for streaming output data has been largely ignored. This study targets streaming output data by evaluating various cache configurations for write policies and write buffers at the L1 cache level. It was found that in cache memory hierarchies the writeallocate cache write policy provides much better performance than no-write-allocate for memory-intensive applications. In alternative memory hierarchies, memory structures for streaming data will need to support significant amounts of write storage, and enable temporary output data streams to be available near the processor when they are next needed by the program. 1.
é ù 1 1 Digital Signal Processing on MMX™ Technology
"... Algorithmic-level optimization and programming-level optimization are tightly coupled with each other. Many programmers can optimize the implementa-tion of a specific algorithm using MMX ™ technology. However, without algo-rithmic-level optimization, the speed-up of the optimization will be limited. ..."
Abstract
- Add to MetaCart
(Show Context)
Algorithmic-level optimization and programming-level optimization are tightly coupled with each other. Many programmers can optimize the implementa-tion of a specific algorithm using MMX ™ technology. However, without algo-rithmic-level optimization, the speed-up of the optimization will be limited. On the other hand, many algorithm developers can optimize the DSP algorithm in terms of the numbers of operations (multiplications or additions). Nonetheless, without implementation details, the number of operations cannot be directly translated into the number of clock cycles spent in CPU. Moreover, many algorithms can accom-plish the same task. For best performance of DSP/multimedia applications on per-sonal computers, we should consider algorithm-MMX co-optimization. One way to increase performance of digital signal processing is to execute several computations in parallel. MMX is one of the techniques that speed up software performance by performing the same operation on multiple data ele ents in parallel using a single instruction. However, MMX programming and designing DSP algorithms for MMX are full of twists and turns. Implementation of digital signal processing using MMX technology is a mix of a science and an art. Matching the algorithms to MMX instruction capabilities is the key to ex-tracting the best performance. This chapter is covering algorithm design and algorithmic-level optimization for MMX. In this chapter, besides showing you how to optimize your code and algorithm---from a science view, we will show you how we go about optimizing ours---from an art perspective.