| M. Benes, A. Wolfe, S. M. Nowick, "A High-Speed Asynchronous Decompression Circuit for Embedded Processors", Proceedings of the 17th Conference on Advanced Research in VLSI, pp. 219-236, September 1997. |
....at the same address in main memory. CCRP uses a Line Address Table (LAT) to map missed instruction cache addresses to main memory addresses where the compressed code is located. The authors report a 73 compression ratio for MIPS instructions. A working demonstration of CCRP has been completed [Benes97, Benes98]. Implemented in 0.8m CMOS, it occupies 0.75 mm 2 , and can decompress 560 Mbit s. Lekatsas and Wolf explore other compression algorithms for CCRP [Lekatsas98] Their SAMC (Semi adaptive Markov Compression) algorithm combines a Markov model with an arithmetic coder. Each instruction is ....
M. Benes, A. Wolfe, S. M. Nowick, "A High-Speed Asynchronous Decompression Circuit for Embedded Processors", Proceedings of the 17th Conference on Advanced Research in VLSI, pp. 219-236, September 1997.
....the VM or the compression of the virtual program. Moreover, no timing of the execution of compressed programs is reported, although they show that the intermediate form can be compiled eciently. Several works compress native programs, using Hu man codes, doing decompression at the hardware level [12, 16, 2]. Decompression occurs between memory and cache and is mostly transparent to the processor. The advantage of this approach is the use of hardware to decompress, but this advantage comes with an increase hardware complexity. The compression factors are somewhere around 80 on native code. Cooper ....
Martin Benes, Andrew Wolfe, and Steven M. Nowick. A high-speed asynchronous decompression circuit for embedded processors. In Proc. Conf. on Advanced Research in VLSI, September 1997.
....the VM or the compression of the virtual program. Moreover, no timing of the execution of compressed programs is reported, although they show that the intermediate form can be compiled eciently. Several works compress native programs, using Hu man codes, doing decompression at the hardware level [13, 17, 2]. Decompression occurs between memory and cache and is mostly transparent to the processor. The advantage of this approach is the use of hardware to decompress, but this advantage comes with an increase hardware complexity. The compression factors are somewhere around 80 on native code. Cooper ....
Martin Benes, Andrew Wolfe, and Steven M. Nowick. A high-speed asynchronous decompression circuit for embedded processors. In Proc. Conf. on Advanced Research in VLSI, September 1997.
....is increasingly important as system on a chip designs become popular in the embedded world. Code compression is one technique to reduce program size by applying compression algorithms to native instruction sets. There are many recent publications suggesting new compressed code representations [Araujo98, Benes97, Benes98, Bunda92, Ernst97, Fraser95, Kozuch94, Lefurgy97, Lekatsas98, Liao96, Wolfe92]. However, the increased instruction density has an accompanying performance cost because the instructions must be decompressed before execution. Although some work has addressed the issue of performance for decompression, on the whole, it remains much less studied than size optimizations for the ....
M. Benes, A. Wolfe, S. M. Nowick, "A High-Speed Asynchronous Decompression Circuit for Embedded Processors", Proceedings of the 17th Conference on Advanced Research in VLSI, September 1997.
....parallelism of VLIW is much larger code sizes. Beyond the classical optimizations used to achieve smaller programs, compression can shrink program size by utilizing repetition found at the instruction level. Several compression techniques have been proposed for general purpose architectures [Wolfe92, Kozuch94, Fraser95, Liao95, Benes97, Ernst97, Kirovski97, Lefurgy97, Wolf97, Aranjo98]. Previous work focused on using short variable length codewords and increasing the meaning of codes by allowing them to decode to a list of instructions. It is not known if such compression methods can be used on DSP architectures. DSP instructions can hold multiple independent operations which ....
....method in section 3. Our experimental results are presented in section 4. In section 5, we discuss some implications of the results. Finally, section 6 contains our conclusions. 2 Previous work There have been several recent works on code compression. The Compressed Code RISC Processor (CCRP) [Wolfe92, Kozuch94, Benes97] is a MIPS processor that compresses instruction cache lines using Huffman coding. Dictionary compression methods [Bell90] have been studied for several processors [Liao95, Lefurgy97] A software managed compression cache that decompresses functions on a cache miss has been proposed [Kirovski97] ....
M. Benes, A. Wolfe, S. M. Nowick, "A High-Speed Asynchronous Decompression Circuit for Embedded Processors ", Proceedings of the 17th Conference on Advanced Research in VLSI, September 1997.
....for decompression. The main issue is the complexity, and therefore size, of the decoder. There are several potential compression algorithms to choose from. For this environment, Huffman algorithm [2] produces near optimal results. It also allows fast decompression at reasonable real estate price [17,18]. For similar reasons, it was used in several previous studies [1,9] Integer ALU Operation Integer Compare to Predicate Operation T S OPT OPCODE PREDICATE 1 1 2 5 5 5 2 8 5 1 5 T S OPT OPCODE PREDICATE 1 1 2 5 5 5 2 3 5 5 1 5 Src1 Src 2 BHWX Reserved Dest L1 Src1 Src2 BHWX D1 Reserved Dest L1 ....
....scheme, Stream, achieves approximately 75 of the original image size, yet has a significant decoder complexity. The reason is the limited input width and dictionary size of Byte wise compression. Practical implementations of the Huffman decoder in hardware have been proposed in two studies [17,18]. Both are strongly dependent on implementation (MPEG 2 decompression for example) but generally can achieve 300 600 Mbit sec for a table with 114 dictionary entries and codes in the range of 1 to 16 bits. The real estate budget ranges from 10,000 to 28,000 transistors. This data allows an ....
M. Benes, A. Wolfe, S.M. Nowick "A Highspeed Asynchronous Decompression Circuit for Embedded Processors" in Proc. of the 17th Conference on Advanced Research in VLSI, 1997.
....[7] and uncompressed program symbols are one byte long. Using this approach a 73 compression ratio has been reported for the MIPS instruction set [6, 8] The decompression engine proposed in the CCRP is also the only implemented real time RISC decompression engine ever reported in the literature [9]. Lefurgy et al. 1] proposed an interesting code compression technique based on dictionary encoding. In [1] object code is parsed and common sequences of instructions are replaced by a single codeword. Only frequent sequences are compressed. Escape bits are used to distinguish between a codeword ....
Martin Benes, Andrew Wolfe, and Steven M. Nowick. A high--speed asynchronous decompression circuit for embedded processors. In Proceedings of 17th Conference on Advanced Research in VLSI, Los Alamitos, CA, 1997. IEEE Society Press.
....circuits because they have the potential advantages of higher average case performance [12, 13, 6] lower power consumption, and freedom from clock skew problems. Recent emerging asynchronous designs have shown impressive results for digital signal processing [9, 22, 19] and microprocessors [7, 10, 4], but the lack of CAD support is still limiting their advances in some areas. This paper focuses on a CAD tool for a specific type of design, combinational circuits that convert data signals into control signals. These circuits typically perform instruction decoding of some type and, due to their ....
....style for these decoders which applies a combination of domino logic, dual rail signaling, and one hot encoded outputs. Chris Myers initially conceived of this design [11] and Benes et al. independently developed a similar technique that they used in a decompression circuit for embedded processors [4]. Domino logic is used for its well known speed advantage over static logic and because it guarantees that the outputs are hazard free. However, domino logic can only realize functions that are monotonic. Thus, to implement all functions, some dual rail inputs and some dual rail in1 ternal signals ....
[Article contains additional citation context not shown here]
M. Benes, A. Wolfe, and S.M. Nowick. A highspeed asynchronous decompression circuit for embedded processors. In Proceedings of the 17th Conference on Advanced Research in VLSI, Los Alamitos, CA, september 1997. IEEE Computer Society Press.
....memory addresses where the compressed code is located. The LAT limits compressed programs to only execute on processors that have the same line size for which they were compiled. The authors report a 73 compression ratio for MIPS instructions. A working demonstration of CCRP has been completed [Benes97]. Implemented in 0.8 CMOS, it occupies 0.75 mm 2 , and can decompress 560 Mbit s. 2.4 Procedurization 2.4.1 Procedure abstraction Procedure abstraction [Standish76] is a program optimization for procedure oriented languages that replaces repeated sequences of common code with function calls to ....
M. Benes, A. Wolfe, S. M. Nowick, "A High-Speed Asynchronous Decompression Circuit for Embedded Processors", Proceedings of the 17th Conference on Advanced Research in VLSI, September 1997.
....circuits because they have the potential advantages of higher average case performance [12, 13, 6] lower power consumption, and freedom from clock skew problems. Recent emerging asynchronous designs have shown impressive results for digital signal processing [9, 23, 19] and microprocessors [7, 10, 4], but the lack of CAD support is still limiting their advances in some areas. This paper focuses on a CAD tool for a specific type of design, combinational circuits that convert data signals into control signals. This research is funded in part by a gift from Intel Corporation and a NSF CAREER ....
....style for these decoders which applies a combination of domino logic, dual rail signaling, and one hot encoded outputs. Chris Myers initially conceived of this design [11] and Benes et al. independently developed a similar technique that they used in a decompression circuit for embedded processors [4]. Domino logic is used for its well known speed advantage over static logic and because it guarantees that the outputs are hazard free. However, a single stage of domino logic can only realize functions that are monotonic. Thus, to implement all functions, some dual rail inputs and some dual rail ....
[Article contains additional citation context not shown here]
M. Benes, A. Wolfe, and S.M. Nowick. A highspeed asynchronous decompression circuit for embedded processors. In Proceedings of the 17th Conference on Advanced Research in VLSI, Los Alamitos, CA, september 1997. IEEE Computer Society Press.
.... fast (since it is on the critical path during cache refill) but also very small (otherwise the savings in instruction memory will be lost to the area increase due to the decompression circuit) Very recently, we introduced the first prototype design for such an asynchronous decompression circuit [1]. Instructions are compressed in memory, using a Hu#man encoding scheme [10] Hu#man codes are variable length, where the shortest codes are assigned to the most frequent symbols. In principle, a Hu#man decoder is an excellent match for asynchronous design: an asynchronous decoder can be ....
....Several of these designs have demonstrated the benefits of asynchronous design for average case operation. 1 In this paper, we propose a new architecture an implementation of an asynchronous Hu#man decoder. The design uses an entirely new organization, and is 83 faster than our earlier design [1], using the same 0.8 CMOS technology, with no increase in area. The circuit is nonpipelined, and is implemented as an iterative self timed ring. It is largely implemented using dynamic domino dualrail logic. It achieves a high speed decode rate with very low area overhead. Simulations using Lsim ....
[Article contains additional citation context not shown here]
M. Benes, A. Wolfe, and S.M. Nowick. A high-speed asynchronous decompression circuit for embedded processors. In Proceedings of the 17th Conference on Advanced Research in VLSI. IEEE Computer Society Press, Los Alamitos, CA, 1997.
....[11] In some specific applications the large data dependent variations in delays naturally lead to elegant and efficient asynchronous solutions. For example, Yun et al. [8] describe a differential equation solver based on adders and multipliers with superior average case delays. Benes et al. [12] describe a high speed software decompression engine for embedded processors. The engine exploits the large variations in delays so typical for Huffman decoders. For similar reasons Intel is investigating asynchronous instruction decoders [13] In [14, this issue] performance benefits are pursued ....
Martin Benes, Andrew Wolfe, and Steven M. Nowick, "A highspeed asynchronous decompression circuit for embedded processors, " in Advanced Research in VLSI, Sept. 1997.
....The presented work is an extended version of two recent conference papers [32, 31] 1 Introduction Asynchronous design has been the focus of much recent research activity. In fact, asynchronous designs have been applied to several large scale control and datapath circuits and processors [11, 18, 12, 19, 2, 30, 34, 15, 1]. A number of methods have been developed for the design of hazard free controllers [22, 20, 37, 13, 27] These methods have been applied to several large and realistic design examples, including a low power infrared communications chip [14] a second level cache controller [21] a SCSI controller ....
M. Benes, A. Wolfe, and S.M. Nowick. A high-speed asynchronous decompression circuit for embedded processors. In Proceedings of the 17th Conference on Advanced Research in VLSI. IEEE Computer Society Press, Los Alamitos, CA, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC