| E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform," IEEE Trans. Signal Processing, vol. 40, pp. 2174--2193, Sept. 1992. |
....Publisher Item Identifier S 1053 587X(01)10495 2. the sparse factorizations of the DCT matrix [12] 17] and many of them are recursive [12] 14] 16] 17] Besides one dimensional (1 D) algorithms, two dimensional (2 D) DCT algorithms have also been investigated extensively [6] 18] [21], generally leading to less computational complexity than the row column application of the 1 D methods. However, the implementation of the direct 2 D DCT requires much more effort than that of the separable 2 D DCT. The theoretical lower bound on the number of multiplications required for the ....
....is often required to compress the data. In these circumstances, significant algorithmic savings can be achieved if some operations of the DCT are incorporated into the quantization step. This leads to a class of fast 1 D and 2 D DCTs that are generally referred to as the scaled DCT [5] 8] [21], 23] 25] For example, the Arai s method needs only five multiplications [3] 8] All of the aforementioned fast algorithms still need floatingpoint multiplications, which are slow in both hardware and software implementations. To achieve faster implementation, coefficients in many ....
E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform, " IEEE Trans. Signal Processing, vol. 40, pp. 2174--2193, Sept. 1992.
....operation. 2. Fast DCT algorithms via polynomial arithmetic: All components of C n x can be interpreted as values of one polynomial at n nodes. Reducing the degree of this polynomial by divide and conquer technique, one can get a real fast DCT algorithm with low arithmetical complexity (see [9, 20, 21, 16]) The best DCT algorithms require about 2 n log 2 n ops. A polynomial DCT algorithm generates a factorization (1.1) of C n with sparse, non orthogonal matrix factors M n , i.e. the factorization (1.1) does not preserve the orthogonality of C n (see e.g. 19, 2, 23] This fact leads to a bad ....
E. Feig and S. Winograd, Fast algorithms for the discrete cosine transform, IEEE Trans. Signal Process. 40 (1992), 2174 - 2193.
....the discrete cosine transforms (DCTs) of type II and III. The discovery of this class and its derivation is made possible through an approach using polynomial algebras. Algebraic Approach to the DCTs. There is a large number of publications on fast DCT algorithms. With few exceptions (including [5, 6, 7, 8]) these algorithms have been found by clever manipulation of matrix entries, which provides little insight into their structure or the reason for their existence. In [9, 10] we developed an algebraic approach to the 16 DCTs and DSTs to remedy this situation. We associate to each DCT (or DST) a ....
E. Feig and S. Winograd, "Fast Algorithms for the Discrete Cosine Transform," IEEE Trans. on Signal Processing, vol. 40, no. 9, pp. 2174--2193, 1992.
....than previously understood. 1. INTRODUCTION There is a large number (several hundred, e.g. 1, 2] of publications on fast algorithms for the family of 16 discrete trigonometric transforms (DTTs) comprising 8 cosine and 8 sine transforms (DCTs and DSTs) With very few exceptions (including [3, 4, 5]) each of these algorithms has been found by insightful manipulation of the transform matrix entries. We address in this paper two important theoretical questions: 1) Why do these algorithms exist and (2) How to explain the structure of these algorithms To answer these questions, we associate ....
E. Feig and S. Winograd, "Fast Algorithms for the Discrete Cosine Transform," IEEE Trans. on Signal Processing, vol. 40, no. 9, pp. 2174--2193, 1992.
....DCT Since DCT is an orthogonal discrete transform, its transform kernel matrix must be a linearly independent matrix. That is, the pruning algorithm presented in Section II can be directly applied to derive efficient pruning DCT algorithms. Moreover, all well known DCT algorithms (such as [4] [6]) and pruning DCT algorithms (such as [1] 3] can be modeled as a matrix vector multiplication with known decompositions of the DCT transform kernel matrix. Since the optimism of the proposed pruning algorithm is decomposition dependent, we cannot only derive effective pruning DCT algorithms ....
....to each existing fast algorithm by checking the complexities of the so obtained pruning algorithms. The following data are obtained by applying the proposed output pruning algorithm to derive efficient pruning DCT algorithms, based on the matrix decompositions presented in [1] 5] and [6]. For the 1 D DCT of length 64, Table I lists the numbers of required multiplications and additions for the corresponding pruning DCT algorithms with respect to different pruning patterns. The most well known pruning DCT algorithm presented in [1] gives the same complexities as listed in the ....
E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform, " IEEE Trans. Signal Processing, vol. 40, pp. 2174--2193, Sept. 1992.
....Pschel is with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e mail: pueschel ece.cmu.edu) Publisher Item Identifier S 1053 587X(01)07061 1. 8] Lee [15] Feig [16] Chan and Ho [17] Steidl and Tasche [18] and Feig and Winograd [19]. Most of the algorithms cited above are given as a factorization of the respective transform matrix into a product of highly structured, sparse matrices. If an algorithm is given another way, e.g. by equations, it is possible to rewrite the algorithm in the form of a sparse matrix product. All ....
....lines give the matrix from Algorithm 1, the last line contains the permutation matrix (which makes the block structure of and explicit) and the fifth line gives the matrix . The algorithms for DCT and DCT have the same arithmetic cost as the best known algorithms [8] 11] 12] 15] 17] [19], 39] Note that those who use only 12 multiplications do not normalize the first row of the DCT , which saves one multiplication. The only algorithm that claims 11 multiplications [40] considers a scaled version of the DCT matrix DCT DCT Multiplying by scalars conserves the perm irred symmetry ....
[Article contains additional citation context not shown here]
E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform, " IEEE Trans. Signal Processing, vol. 40, pp. 2174--2193, Sept. 1992.
.... 6 2 3 4 C C C C x x x x x x x x x x x x x x x x ( 5) 5) 5) The number of multiplications can be reduced further and a number of previously published fast algorithms [3 5] have been presented to achieve this. However, some of these manipulations result in irregular architectures that are dominated by complex routing requirements. In addition, such algorithms exhibit many stages of multiplication and accumulation, which has a detrimental effect on the numerical ....
E. Feig and S. Winograd, "Fast Algorithms for the Discrete Cosine Transform", IEEE Trans. on Signal Proc., Vol. 40, No. 9, September 1992, pp. 2174-2193.
....Then, the higher spatial frequencies can be discarded, generating a high compression rate and a small perceptible loss in the image quality. The 2 D DCT calculation has a high degree of computational complexity. Since many authors have proposed simplifications to this calculation, as [ARA 88] FRI 92] and others, this complexity can be minimized according to the application needs. Specifically for image compression applications there are many algorithms to compute the 2 D DCT coefficients and the algorithm chosen in this paper was proposed in [ARA 88] and modified in [KOV 95] This algorithm ....
FEIG, E.; WINOGRAD, S. "Fast Algorithms for the Discrete Cosine Transform". IEEE Transactions on Signal Processing, v. 40, n. 9, 1992, p. 2174-2193.
....proposed and the VHDL synthesis results after full completion of the design in Altera VHDL. 2. Algorithm Used for the DCT Calculation The 2 D DCT calculation has a high degree of computational complexity. Since many authors have proposed simplifications to this calculation, as [6] 7] [8] and others, this complexity can be minimized according to the application needs. Specifically for image compression applications there are many algorithms to compute the 2 D DCT coefficientes and the algorithm chosen in this paper was proposed in [7] and modified in [9] The algorithm chosen ....
E. Feig, S. Winograd. "Fast Algorithms for the Discrete Cosine Transform". IEEE Transactions on Signal Processing, v. 40, n. 9, p. 2174-2193, 1992.
....gets a fast algorithm as a result of the ART approach. For the DCT 8 the system automatically produced an algorithm which uses 14 multiplications instead of 64: In view of the regularity, this compares preferably in gate count and chip size with the 13 multiplications of handoptimized algorithms [12], which economizes on multiplications at the cost of additions. The algorithm is stated as a list of matrices in a computer algebra system. The IDEAS system translates this representation to the high level Hardware Description Language ELLA. Using the LOCAM synthesis tools a gate level description ....
E. Feig and S. Winograd. Fast algorithms for the Discrete Cosine Transform. IEEE Trans. on Signal Processing, pages 2174--2193, 1992.
.... ; g p = f (p Gamma1) 2 Gamma f N Gamma1 Gamma(p Gamma1) 2 : Note that for a specific N , if we only consider the number of operations used in an algorithm and don t care about the computational structure of the post addition stage, then we may apply algorithms by Feig and Winograd [6] or McGovern, Woods and Yan [14] for computing 1 D DCT s and the post addition stage. Feig and Winograd applied matrix factorization techniques to the DCT kernel matrix, the structure of tensor products of their matrices can be viewed as elements in certain fields defined by polynomial ....
E. Feig and S. Winograd, "Fast Algorithms for the Discrete Cosine Transform", IEEE Trans. Signal Processing, Vol. 40, No. 9, Sep. 1992, pp. 2174--2193.
....elements. 3. 1 Implementation of the IDCT with the D30V The discrete cosine transform (dct) was first introduced by Ahmed et al. 1974) Since then, many algorithms have been proposed to compute it efficiently; these are based on a direct two dimensional computation of the data (Kuroda, 1995; Feig and Winograd, 1992) or on decomposing the two dimensional array into rows and columns on which 1 d idcts are computed independently. Equation 1 with N = 8 is the basic definition the 2 d idct for video processing, MPEG 2, 1994, Appendix A) For the row column decomposition method, the 2 d idct is calculated by ....
Feig, Ephraim and Shmuel Winograd (1992). "Fast Algorithms for the Discrete Cosine Transform." Trans. on ASSP, vol. 40, no. 9, pp. 2174--2193, September.
....3.2. Implementation of the Inverse Discrete Cosine Transform with the D30V The discrete cosine transform (dct) was first introduced by Ahmed et al. 17] Since then, many algorithms have been proposed to compute it efficiently. These are based on a direct two dimensional computation of the data [18, 19, 20] or on decomposing the two dimensional array into rows and columns on which 1 d idcts are computed independently. Equation 1 with N = 8 is the basic definition for the 2 d idct for video processing, 16, Appendix A] For the row column decomposition method, the 2 d idct is calculated by ....
Ephraim Feig and Shmuel Winograd. Fast algorithms for the discrete cosine transform. Trans. on Acoustics, Speech, and Signal Processing, 40(9):2174--2193, Sep. 1992.
.... transforms [1] 13] 23] 26] or the fast Hartley transforms [16] 25] or convert the DCT s into circular convolutions which can be computed very efficiently using distributed arithmetics [22] On the other hand, direct computation algorithms use techniques such as matrix factorizations [3] [9] [29] 30] divide and conquer method [20] recursive decomposition [7] 17] and small odd length DCT modules which are derived from Winograd s small modules of real value discrete Fourier transforms (DFT s) 14] In general, indirect computation algorithms took advantage of using existing fast ....
E. Feig and S. Winograd. Fast algorithms for the discrete Cosine transform. IEEE Trans. Signal Processing, 40(9):2174--2193, September 1992.
....the image and then performs convolution in spatial domain. In these processes, the computation is concentrated in inverse DCT (IDCT) and the convolution processes. Experiments show that in image video decoding, around 40 of CPU time is spent in IDCT even using available fast DCT algorithms [1, 9, 15]. Convolution is no doubt another computation intensive process especially when a large mask has to be used. As shown in Fig. 1, brute force method performs each procedure shaded block, sequentially. Is there a way to merge some procedures so that some operations can be absorbed Since IDCT is ....
....such as platform architecture, hardware support. In the above complexity analysis, we did not count in the multiplications required by decoding for brute force method because, in terms of the number of multiplications required per pixel, less than 2 is needed by using some fast IDCT algorithms [1, 9]. However, when large number of compressed images or videos with lots of (key) frames have to be processed, the accumulation can not be ignored, and hence, more efficiency of our block DCT convolution approach is expected. K b 4 N 2 4 ( 2 = N mult 64 P K b 64 P K b = 4P N ....
E. Feig and S. Winograd, "Fast Algorithms for the Discrete Cosine Transform," IEEE Trans. Signal Processing, vol. 40. no. 9, Sept. 1992.
....the coded coefficient value and then added to the image. Thus two operations per pixel are required, with the number of pixels depending on the extent of the atom. This cost is summed over all coded atoms in the given frame. 3.2. Inverse DCT Complexity The optimized IDCT algorithm presented in [4] is capable of performing a full 8 Theta 8 inverse DCT in 522 operations. Even better performance may be achieved for DCT blocks with few nonzero coefficients, since direct basis summation may be less expensive than a full IDCT algorithm. The basis summation method is equivalent to the method ....
E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform," IEEE Trans. on Sig. Proc., September 1992, vol.40, no.9, pp.2174-2193.
....that the optimization for throughput is very different than optimization for power. Now consider a bigger and more widely used example, the DCT (Discrete Cosine Transform) to illustrate this point further. We will compare two implementation of the DCT: one proposed by Feig and Winograd [20] and the other, the direct maximally fast form. Feig s DCT algorithm can be derived from the direct form using the exceptionally sophisticated application of common subexpression elimination replication and algebraic rules. The direct form can be derived from the Feig s DCT, by the simple ....
E. Feig, S. Winograd: "Fast Algorithms for Discrete Cosine Transform", IEEE Trans. on Signal Processing, Vol. 40, No. 9, pp. 2174-2193, 1992.
No context found.
E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform," IEEE Trans. Signal Processing, vol. 40, pp. 2174--2193, Sept. 1992.
No context found.
E. Feig, S. Winograg, "Fast Algorithms for the Discrete Cosine Transform", Research Report, IBM RC 16148, 1990.
No context found.
E. Feig and S. Winograd. Fast Algorithms for the Discrete Cosine Transform, IEEE Trans. Signal Processing, vol. 40, no. 9, pp. 2174-2193, Sep. 1992.
No context found.
E. Feig and S. Winograd, Fast algorithms for the discrete cosine transform, IEEE Trans. Signal Process. 40 (1992), 2174 - 2193.
No context found.
E. Feig and S. Winograd, Fast algorithms for the discrete cosine transform, IEEE Trans. Signal Process. 40 (1992), 2174 - 2193. 17
No context found.
E. Feig and S. Winograd, "Fast algorithms for the discrete cosine transform," IEEE Transactions on Signal Processing, vol. 40, pp. 2174-2193, Sep. 1992.
No context found.
E. Feig and S. Winograd, "Fast Algorithms for Discrete Cosine Transform," IEEE Transactions on Signal Processing, vol. 40, no. 9 (1992): 2174--2193.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC