| Marc Tremblay and J. Michael O'Connor Venkatesh Narayanan and Liang He, "VIS Speeds New Media Processing", IEEE Micro, Vol. 16, No. 4, 1996, pp. 35-42 |
....of the particular application at hand. Thus, processors will have to evolve to accommodate the tremendous bandwidth and computation needs of these types of applications. Today, we already see in the major microprocessor families a set of extensions targeting the multimedia market (MMX [25] VIS [26], etc. These multimedia extensions are simple vector like instructions that operate on parts of a 64 bit word. Extending these limited vector instructions into more general ones, like those found in modern vector ISAs is relatively simple. Research performed on traditional vector architectures ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, pages 10--20, August 1996.
....Set (VIS) This is a set of multimedia instruction enhancements similar in concept to MAX, but includes complete subword multiply instructions instead of multiply primitives. Application of these techniques to an MPEG decoder was described in [ZKK 95] and additional reports of VIS were made in [TONH96] Finally, Intel described their multimedia instructions, known as MMX, in [PW96] In addition to a number of additional multimedia enhancements, MMX supports arithmetic on packed data formats. 5.3 Algorithm 5.3.1 Overview The basic idea is to use simple arithmetic operations to pack two ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS speeds new media processing. IEEE Micro, 16(4):51--59, August 1996.
....data type and manipulate it in application dependent manner. This requires signi cant exibility. Fortunately, most modern CPU architectures have been extended with Single Instruction Multiple Data (SIMD) instructions to speed up digital signal processing. Examples include VIS in Sun s UltraSPARC [17], MMX in Intel s Pentium(II) 12] MAX 2 in Hewlett Packards PA RISC, MDMX in Silicon Graphics MIPS, MVI in Digital s Alpha, and, recently, Altivec in Motorola s PowerPC CPUs. In this paper we show a high performance implementation of a layered codec capable of constructing a large set of layers ....
....is to allow functional testing and performance measurements. Our focus is onan ecient implementation, and since the elements of digital signal processing in our codec design are well suited for SIMD processing, we have chosen to implement a 16 prototype using VIS (visual instruction set) [17] on SUN UltraSPARC CPUs. 3.1 The Wavelet Transform The presentation of our implementation of the 1 3 3 1 wavelet transform has two parts: First, we explore the optimizations possible by utilizing the symmetries in the lter matrices. Second, we describe the actual implementation of the ltering ....
Marc Tremblay, J. Michael O'Connor, V. Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10-20, August 1996.
....on scientific and multimedia benchmarks have yielded average performance improvements of 84 , and range as high as 253 . 1 Introduction The recent shift towards computation intensive multimedia workloads has resulted in a flourish of new multimedia extensions to current microprocessors [12, 15, 20, 23, 25, 30]. Many new designs are targeted specifically toward the multimedia domain [5, 13, 16] This trend is likely to continue as it has been projected that multimedia processing will soon become the main focus of microprocessor design [14] While different processors vary in the type and number of ....
....potential provided by SLP analysis. We list some of these below: ffl Many multimedia instructions are designed for a specific high level operation. For example, HP s MAX 2 extensions offer matrix transform instructions [20] and SUN s VIS extensions include instructions to compute pixel distances [23]. The complex CISC like semantics of these instructions make automatic code generation difficult. ffl Multimedia hardware has typically been viewed as a coprocessor and has not been designed for general purpose computation [25] Floating point capabilities, for example, have only recently been ....
Marc Tremblay and Michael O'Connor and Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10--20, Aug 1996.
....methods. 41 7.5 Speedup on an MPC7400 processor using SLP compilation. 43 7 Chapter 1 Introduction The recent shift toward computation intensive multimedia workloads has resulted in a variety of new multimedia extensions to current microprocessors [8, 12, 18, 20, 22]. Many new designs are targeted specifically at the multimedia domain [3, 9, 13] This trend is likely to continue as it has been projected that multimedia processing will soon become the main focus of microprocessor design [10] While different processors vary in the type and number of ....
....by SLP analysis. We list some of these limitations below: ffl Many multimedia instructions are designed for a specific high level operation. For example, HP s MAX 2 extensions offer matrix transform instructions [18] and SUN s VIS extensions include instructions to compute pixel distances [20]. The complex CISC like semantics of these instructions make automatic code generation difficult. ffl SLP hardware is typically viewed as a multimedia engine alone and is not designed for general purpose computation. Floating point capabilities, for example, have only recently been added to some ....
Marc Tremblay and Michael O'Connor and Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10--20, Aug 1996.
.... use with these workloads [31, 38, 69] A recent step in this direction has been the recently announced media instruction set architec 3 ture (ISA) extensions for most commodity general purpose processors (e.g. 3DNow [87] AltiVec [95] MAX [68] MDMX and MIPSV [58] MMX [93] MVI [19] VIS [123]) At the high end server market, database and information processing applications such as online transaction processing (OLTP) 121] and decision support systems (DSS) 122] have emerged as the largest and fastest growing market segment for shared memory servers [115] While database workloads ....
....characterization. the Alpha 21264 processor. All functional units are fully pipelined except the floating point divide (non pipelined) 2.2. 2 VIS Media ISA Extensions The VIS media ISA extensions to the SPARC V9 architecture are a set of instructions targeted at accelerating media processing [60, 123]. Both our in order and out of order processor models include support for VIS. The VIS extensions define the packed byte, packed word and packed double data types which allow concurrent operations on eight bytes, four words (16 bits each) or two double 16 Cache line size 64 bytes L1 data cache ....
[Article contains additional citation context not shown here]
Marc Tremblay et al. VIS Speeds New Media Processing. In IEEE Micro, volume 16(4), pages 51--59, Aug 1996.
....no modifications to the architecture, except for the ALU. A few special purpose 8 bit registers with bit level access are also needed. 6.3. 1 Reference Architecture For comparison purposes, a general purpose processor with multimedia extensions such as the UltraSPARC with the VIS instruction set [35] or the Intel MMX Architecture [36] Since different architectures have various types of ISA multimedia extensions, we assume a 64 bit architecture that supports 8 bit multiplications and additions. Specifically, we assume that we can perform eight 8 bit additions with 8 bit results and four 8 bit ....
Marc Trembley, J. Micheal O'Connor, Venkatesh Narayan, and Liang He, "VIS Speeds New Media Processing," IEEE Micro, p. August, 10-20 1996.
....loop with independent operations. In the incremental algorithm this loop corresponds to the loop over parallel rays. 3.2 Data Transformations So far, the algorithm uses floating point numbers to represent the densities of the image and distances during image traversal. The VISual instruction set [3,8] on the UltraSPARC, like most architectural multimedia extensions, on the other hand operates on fixed point data. We found that we could fit the densities of the pixels in bytes and the distances in halfwords (16 bit) without losing much quality in the resulting image. To achieve this, we had to ....
Marc Tremblay, J. Michael O'Conner, Venkatesh Narayanan, and Liang He. VIS speeds new media processing. IEEE Micro, 16(4):10--20, August 1996.
....embedded domain evolution towards media processing Media processing has motivated strong changes in the focus and design of mid 90s processors. In the general purpose domain, these changes have been very straightforward with the inclusion of SIMD like multimedia extensions such as MMX [3] VIS [4] or MDMX [5] These extensions have become the most important change to the basic ISA since the inclusion of the FP units inside the processor core. On the other hand, the changes in the embedded design have been strongly influenced by different domains such as the general purpose or the ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, pages 10--20, August 1996.
....by explicitly caching the working set of a graphics application, thereby minimizing transfers to memory. Programmable media processors, however, have largely been replaced by multimedia extensions to general purpose processors. These extensions, including MMX [PW96] MAX 2 [Lee96] and VIS [TONH96], perform SIMD operations on multiple narrow integer data values stored within the wide registers of general purpose processors. More recently, SSE extends MMX to add support for prefetching stream data from memory and for SIMD floating point operations [Die99] TH99] Multimedia extensions such ....
....a subset of the integer operations. These arithmetic operations are well matched to the low precision data types commonly found in media processing applications. These parallel subword instructions, similar to the multimedia instructions commonly added to general purpose processors [Lee96] PW96] [TONH96], exploit the fine grained data parallelism inherent in media applications. The adders, multipliers, scratch pad, and communication unit are fully pipelined, allowing a new operation to issue every cycle. The divide square root unit has two SRT cores, so no more than two divide or square root ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro (August, 1996), pp. 10-20.
....embedded domain evolution towards media processing Media processing has motivated strong changes in the focus and design of mid 90s processors. In the general purpose domain, these changes have been very straightforward with the inclusion of SIMD like multimedia extensions such as MMX [3] VIS [4] or MDMX [5] These extensions have become the most important change to the basic ISA since the inclusion of the FP units inside the processor core. On the other hand,the changes in the embedded processor design have been strongly influenced by different domains such as the general purpose or the ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, pages 10--20, August 1996.
....Spatial locality is exploited by fetching more than a single cache block. The SPARC V9 [3] specification includes several types of prefetch instructions. For example, the prefetch read once instruction indicates the specified cache block could be replaced after a single access. The ULTRA SPARC [21] processor also provides a block load instruction that loads several floating point registers while bypassing the first level cache. Finally, examples of proposed memory access annotations to reduce coherence overhead in cache coherent shared memory multiprocessors include read for ownership [5] ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10--20, August 1996.
....the discussion. Section 5 presents experimental results for the optimized parallel addition example. Section 6 comments on applicability to other multimedia instruction sets. Section 7 gives a conclusion and discusses future works. 2. 0 The VIS Multimedia Extension The VIS instruction set [1] works with 4x8 bit, 2x16 bit, 8x8 bit, 4x16 bit and 2x32 bit data words. The VIS instructions perform computation and arrangements on these subwords, as well as conversion between the different formats. 1. This research is supported in part by the ARPA and Air Force Material Command Contract ....
Marc Tremblay et al., VIS Speeds New Media Processing, IEEE Micro August 1996
....several application domains. In our experiments, dynamic instruction counts were reduced by 46 . Speedups ranged from 1.24 to 6.70. 1 Introduction The recent shift toward computation intensive multimedia workloads has resulted in a variety of new multimedia extensions to current microprocessors [6, 10, 16, 18, 20]. Many new designs are targeted specifically at the multimedia domain [3, 7, 11] This trend is likely to continue as it has been projected that multimedia processing will soon become the main focus of microprocessor design [8] While different processors vary in the type and number of multimedia ....
....by SLP analysis. We list some of these limitations below: ffl Many multimedia instructions are designed for a specific high level operation. For example, HP s MAX 2 extensions offer matrix transform instructions [16] and SUN s VIS extensions include instructions to compute pixel distances [18]. The complex CISC like semantics of these instructions make automatic code generation difficult. ffl SLP hardware is typically viewed as a multimedia engine alone and is not designed for general purpose computation. Floating point capabilities, for example, have only recently been added to some ....
Marc Tremblay and Michael O'Connor and Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10--20, Aug 1996.
....primitives in hardware. SIMD extensions to microprocessor instruction sets are used to exploit data level parallelism and provide the option of more, lower precision operations, which are useful in a number of multimedia applications. Sun Microsystems VIS is described by Tremblay et al. [33] and Intel s MMX is described by Peleg and Weiser [24] However, adding operations on partitioned data does not address the producer consumer locality of graphics applications. Imagine inherits many ideas from vector processor architectures. An early vector machine is the CRAY 1 [29] a more ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS speeds new media processing. IEEE Micro, 6(4):10--20, August 1996.
....one of the most significant computing workloads in the next years [1] In reaction to this trend, major vendors of general purpose microprocessors have included SIMD extensions to their instruction set architectures to tackle these types of applications. Examples are Intel s MMX [2] SUN s VIS [3] and Mips MDMX [4] All these ISA extensions offer new packed data types, fixed point arithmetic and, typically, 64 bit multimedia vector registers. The goal is to execute between 4 and 8 parallel fixed point operations over small data. Although This work has been supported by Direccio General de ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, August 1996.
....one of the most significant computing workloads in the next years [1, 2] In reaction to this trend, major vendors of generalpurpose microprocessors have included SIMD extensions to their instruction set architectures to tackle these types of applications. Examples are Intel s MMX [3] SUN s VIS [4] and Mips MDMX [5] These multimedia extensions are aimed at exploiting the subword level parallelism available in most multimedia kernels. All these ISA extensions offer new packed data types, fixed point arithmetic and, typically, 64 bit multimedia vector registers. The goal is to execute ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, pages 10--20, August 1996.
....computing workloads in the next years [1, 2] Realizing this trend, the major vendors of general purpose processors have included multimedia extensions based on the SIMD (Single Instruction Multiple Data) paradigm. Examples of this are: Intel s MMX [3] Motorola s Altivec [4] SUN s VIS [5] or Mips MDMX [6] Also, more recently, as 3 D graphic applications have become more important, additional extensions have been included to deal with FP SIMD parallelism (Altivec, 3DNow [7] and KNI [8] The goal of these architectural extensions was to take advantage of the high levels of data ....
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, pages 10--20, August 1996.
....Work Most of the work of integrating digital video with microprocessors has focused on multimedia instruction set extensions. Almost every major architectural family has its own version of these instructions, which focus on SIMD style processing of small data types packed into a standard word [21, 28, 16]. These efforts are a critical step in the evolution of microprocessors into viable multimedia platforms. Nevertheless, by increasing the efficiency of computation, these instructions expose underlying memory bandwidth limitations and the well known, growing gap between logic and memory ....
....rather than 512 KBytes, but the rapid drop down at 2 MBytes persists. There are several distinct ways of implementing this feature, including software locality hints as in the PA RISC architecture [16] or low latency, cache bypassing block loads and stores as in the SPARC V9 architecture [28]. The most efficient approach, used in, the PowerPC architecture [26] the Intel Pentium and P6 [10] and the QED RM7000 MIPS V processor [22] provide for marking memory as non cacheable on a page by page 23 basis. Once marked, data is transparently and automatically transferred between registers ....
Marc Tremblay et al. VIS speeds new media processing. IEEE Micro, 16(4):10--20, August 1996.
....Spatial locality is exploited by fetching more than a single cache block. The SPARC V9 [4] specification includes several types of prefetch instructions. For example, the prefetch read once instruction indicates the specified cache block could be replaced after a single access. The ULTRA SPARC [26] processor also provides a block load instruction that loads several floating point registers while bypassing the first level cache. Memory access annotations have also been proposed for reducing coherence overhead in cache coherent sharedmemory multiprocessors. For example, read for ownership ....
Marc Temblay, Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10 -- 20, August 1996. 20
No context found.
Marc Tremblay and J. Michael O'Connor Venkatesh Narayanan and Liang He, "VIS Speeds New Media Processing", IEEE Micro, Vol. 16, No. 4, 1996, pp. 35-42
No context found.
Marc Tremblay et al. Vis speeds new media processing. IEEE micro, August, 1996.
No context found.
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS speeds new media processing. IEEE Micro, pages 10--20, August 1996.
No context found.
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan, and Liang He. VIS speeds new media processing. IEEE Micro, pages 10--20, August 1996.
No context found.
Marc Tremblay, J. Michael O'Connor, Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. In IEEE Micro, Vol. 16, No 4, August 1996, pp. 10-20.
No context found.
Marc Tremblay, J. O'Connor, V. Narayanan, and L. He, "VIS Speeds New Media Processing", IEEE Micro, Vol. 16 No. 4, August 1996, pp. 10-20.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC