| W. Chou, P. A. Beerel, R. Ginosar, R. Kol, C. J. Myers, S. Rotem, K. S. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proc. Int. Symp. Advanced Research in Asynchronous Circuits and Systems, 1998, pp. 80--91. |
....does not appear to have been studied in the main literature on percolation networks. We obtained an estimate of 0:72 through Monte Carlo simulation, and a lower bound of 0:6663 by analytical methods. The asynchronous community has had a longstanding fondness for average case performance (e.g. [12, 10, 1, 8]) As we have shown, actual performance often corresponds much more closely to the worst case performance of the components than to the average case. In fact, it is possible to optimize components in ways that decrease the average case delay of the component while decreasing system performance. ....
W. chun Chou, P. A. Beerel, et al. Average-case optimized technology mapping of one-hot domino circuits. In Proceedings of the Fourth International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 80--91, San Diego, California, Apr. 1998. IEEE.
....(e.g. length changing prefixes) and branches. The central driving units for the model is the actual length decoding logic in the columns. These units are optimized for the common instruction lengths and consequently have longer delay for less common instructions, as previously described in [6]. In our Petri net model, depicted in Figure 5, this logic is modeled with a free choice place with probability mass function that matches the relative frequency of instruction lengths given in [6] Even for a given instruction length the decoding time can vary, thereby motivating the use of ....
.... lengths and consequently have longer delay for less common instructions, as previously described in [6] In our Petri net model, depicted in Figure 5, this logic is modeled with a free choice place with probability mass function that matches the relative frequency of instruction lengths given in [6]. Even for a given instruction length the decoding time can vary, thereby motivating the use of stochastic delay models for the decoding of each length. Once decoded, the column broadcasts a signal to multiple tag units associated with its column indicating 7 # # # # Mean Variance CPU Columns ....
W.-C. Chou, P. A. Beerel, R. Ginsoar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun. Average-case optimized technology mapping of one-hot domino circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), pages 80--91. IEEE Computer Society Press, March 1998.
....with superior average case delays. Benes et al. [12] describe a high speed software decompression engine for embedded processors. The engine exploits the large variations in delays so typical for Huffman decoders. For similar reasons Intel is investigating asynchronous instruction decoders [13]. In [14, this issue] performance benefits are pursued for microprogrammed control organizations. Elastic pipelines In general it is not easy to translate a local asynchronous advantage in average case performance into a system level performance advantage. Today s synchronous circuits are ....
....in practice are often easily met) but allow greater flexibility in the synthesis path. A number of tools have been developed for both burst mode [57] 65] 66] 67] 59] 68] and STG [69] 60] 70] 71] 72] 73] synthesis; these have been applied to a number of real world designs [74] [13], 8] 75] 73] 76] An alternative approach has also been proposed, called timed circuits [77] which incorporates user specified timing information to optimize the circuits. Compiling asynchronous circuits from higher level programming languages has been extensively explored in [78] 79] ....
[Article contains additional citation context not shown here]
W. Chou, P. A. Beerel, R. Ginosar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1998, pp. 80--91.
....noise on control lines, timing verification, and potential area increase. Some of these risks have been assessed and are presented. The design was motivated by the observation that instruction length decoding could pose a bottleneck in variable length instruction set architectures. As reported in [1], our analysis of the Pentium variable length instruction set revealed two principal findings (Figure 1) First, the average instruction length is about three bytes, and instructions longer than seven bytes are rare. Second, very few instruction types are used frequently. RAPPID design exploits ....
.... consists of a Byte Unit (BU) comprising the Byte Latch, Byte Control, and Length Decoder, and four identical Tag Units (TU) and Crossbar Switches (XB) The Length Decoder implementation is optimized for common instructions, such that length decoding for common opcodes is faster than for rare ones [1]. The TUs and XBs are arranged in 16 columns and four rows, wrapped around in a torus. Each XB in the four rows is connected to an output buffer. Each column receives a byte from the instruction line at the head of the IF, latches it in the Byte Latch, and performs a speculative length decoding ....
[Article contains additional citation context not shown here]
W. Chou et al., Average-case optimized technology mapping of one-hot domino circuits, Proc. 4th Int. Symp. Advanced Research in Asynchronous Circuits and Systems, San Diego, CA, March 1998, pp. 80-91.
.... now widely used in several other burst mode CAD packages, including the 3D method [38] and ACK [16] It has also been used as part of the asynchronous tool suite at Intel Corporation, where it has been applied in the design of a high speed experimental asynchronous Instruction Length Decoder (see [5]) 4.4 ESPRESSO HF For very large problems which Hfmin is unable to solve in reasonable time, Minimalist ooeers EspressoHF [33] a new fast heuristic two level logic minimizer. Espresso HF uses an algorithm loosely based on Espresso (but substantially dioeerent from it) to solve problems with ....
W.-C. Chou, P.A. Beerel, R. Ginosar, R. Kol, C.J. Myers, S. Rotem, K. Stevens, and K.Y. Yun. Average-case optimized technology mapping of one-hot domino circuits. In Proc. Int. Symp. Adv. Research in Async. Ckts. and Sys., pages 8091, March 1998.
....design uses static and domino gates from a standard synchronous library, with a few custom circuits, such as elements [5] The design was motivated by the observation that instruction length decoding could pose a bottleneck in variable length instruction set architectures. As reported in [6], our analysis of the variable length instruction set revealed two principal findings (Fig. 1) First, the average instruction length is about three bytes, and instructions longer than seven bytes are rare. Second, very 0018 9200 01 10.00 2001 IEEE Fig. 2. Microarchitecture. few instruction ....
....buffers. Each column consists of a BU, comprising the byte latch, byte control, and length decoder, and four identical tag units and steering switches. The length decoder implementation is optimized for common instructions, such that length decoding for common opcodes is faster than for rare ones [6]. The TUs and SSs are arranged in 16 columns and four rows, wrapped around in a torus. The horizontal toroidal wrap ensures that instructions from different cache lines are correctly packed into the output buffers. Each SS in the four rows is connected to an output buffer. Each Fig. 3. One cell of ....
[Article contains additional citation context not shown here]
W. Chou, P. A. Beerel, R. Ginosar, R. Kol, C. J. Myers, S. Rotem, K. S. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proc. Int. Symp. Advanced Research in Asynchronous Circuits and Systems, 1998, pp. 80--91.
No context found.
W.-C. Chou, P. A. Beerel, R. Ginsoar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun. Average-case optimized technology mapping of one-hot domino circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), pages 80--91. IEEE Computer Society Press, 1998.
....Row 3 Fig. 2. Microarchitecture. from a standard synchronous library, with a few custom circuits, such as C elements [5] The design was motivated by the observation that instruction length decoding could pose a bottleneck in variable length instruction set architectures. As reported in [6], our analysis of the variable length instruction set revealed two principal findings (Figure 1) First, the average instruction length is about three bytes, and instructions longer than seven bytes are rare. Second, very few instruction types are used frequently. The asynchronous design exploits ....
....Each column consists of a Byte Unit, comprising the Byte Latch, Byte Control, and Length Decoder, and four identical Tag Units and Steering Switches. The Length Decoder implementation is optimized for common instructions, such that length decoding for common opcodes is faster than for rare ones [6]. The TU s and SS s are arranged in 16 columns and four rows, wrapped around in a torus. The horizontal toroidal wrap ensures that instructions form different cache lines are correctly packed into the output buffers. Each SS in the four rows is connected to an output buffer. Each line is ....
[Article contains additional citation context not shown here]
W. Chou, P. A. Beerel, R. Ginosar, R. Kol, C. J. Myers, S. Rotem, K. S. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1998, pp. 80--91.
....design uses static and domino gates from a standard synchronous library, with a few custom circuits, such as C elements [5] The design was motivated by the observation that instruction length decoding could pose a bottleneck in variable length instruction set architectures. As reported in [6], our analysis of the variable length instruction set revealed two principal findings (Figure 1) First, the average instruction length is about three bytes, and instructions longer than seven bytes are rare. Second, very few instruction types are used frequently. The asynchronous design exploits ....
....that requires a week keeper. prising the Byte Latch, Byte Control, and Length Decoder, and four identical Tag Units and Steering Switches. The Length Decoder implementation is optimized for common instructions, such that length decoding for common opcodes is faster than for rare ones [6]. The TU s and SS s are arranged in 16 columns and four rows, wrapped around in a torus. The horizontal toroidal wrap ensures that instructions form different cache lines are correctly packed into the output buffers. Each SS in the four rows is connected to an output buffer. Each line is ....
[Article contains additional citation context not shown here]
W. Chou, P. A. Beerel, R. Ginosar, R. Kol, C. J. Myers, S. Rotem, K. S. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1998, pp. 80--91.
No context found.
W.-C. Chou, P. A. Beerel, R. Ginsoar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun. Average-case optimized technology mapping of one-hot domino circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), pages 80--91. IEEE Computer Society Press, March 1998.
....analysis computationally intractable. In practice, one can often achieve useful performance models by abstracting the delay of a data path component to a random variable with an estimated probability distribution that takes into account input statistics (e.g. common v.s. rare input combinations) [35, 14] obtained through high level algorithmic simulation. Consequently, the actual data signals need not be modeled, thereby dramatically simplifying the complexity of the system model. As an example, Fig. 4 (c) plots the distribution of the delay of the adder shown in Fig. 4 (a) In some cases, such ....
....In addition, data path components often generate signals that dictate subsequent control operations, such as the result of a comparator. These signals can also be modeled as random boolean variables whose distributions can be estimated using, for example, high level algorithmic simulation (e.g. [14]) The issues regarding modeling asynchronous controllers are made more obtuse because there is no universal specification language for asynchronous controllers. Regardless of the specification language, however, to model performance (Invited paper) 2nd Workshop on Hardware Design and Petri ....
[Article contains additional citation context not shown here]
W.-C. Chou, P. A. Beerel, R. Ginsoar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun. Average-case optimized technology mapping of one-hot domino circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), pages 80--91. IEEE Computer Society Press, March 1998.
....computationally intractable. In practice, one can often achieve useful performance models by abstracting the delay of a data path component to a random variable with an estimated probability distribution that takes into account input statistics (e.g. common v.s. rare input combinations) 6] [31]. Consequently, the actual data signals need not be modeled, thereby dramatically simplifying the complexity of the system model. As an example, Fig. 2 (b) plots the distribution of the delay of the adder shown in Fig. 2 (a) In some cases, such an abstraction may lead to significant absolute ....
....In addition, data path components often generate signals that dictate subsequent control operations, such as the result of a comparator. These signals can also be modeled as random boolean variables whose distributions can be estimated using, for example, high level architectural simulation (e.g. [31]) The issues regarding modeling asynchronous controllers are made more obtuse because there is no universal specification language for asynchronous controllers. In fact, there are many different languages with varying expressiveness, including asynchronous finite state machines [32] burst mode ....
W.-C. Chou, P. A. Beerel, R. Ginsoar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC). March 1998, pp. 80--91, IEEE Computer Society Press.
....inter controller protocol timing verification and performance analysis of the chip using the proposed technique. I. INTRODUCTION There is mounting evidence that asynchronous circuits are finding a niche in high performance applications, such as Intel s asynchronous instruction length decoder [1], 2] and the asynchronous differential equation solver benchmark circuit [3] Common traits in these systems are a high degree of concurrency, distributed control and implicit timing assumptions to hide the control overhead. In order to guarantee correct operation of these systems as well as to ....
....events. Tightly coupled systems are characterized, and the time separations problem is then formalized. A. Cyclic timing constraint graphs A cyclic timing constraint graph is a directed, labeled graph G = V; E) In general, the graph has two components an 3 x y b e d max min Reset [2,3] [0,1] a Component Cyclic Component Acyclic Fig. 1. A cyclic timing constraint graph. Square vertices represent max events, circles represent min events. Edge delays are shown in the legend. acyclic component modeling the behavior of the system immediately after it is powered up or reset, and a cyclic ....
[Article contains additional citation context not shown here]
W. Chou, P. A. Beerel, R. Ginosar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun, "Average-case optimized technology mapping of one-hot domino circuits," in Proceedings of the 4th International Symposium on Advanced Research in Asynchronous Circuits and Systems, Apr. 1998, pp. 80--91.
....(e.g. length changing prefixes) and branches. The central driving units for the model is the actual length decoding logic in the columns. These units are optimized for the common instruction lengths and consequently have longer delay for less common instructions, as previously described in [12]. In our Petri net model, depicted in Figure 7, this logic is modeled with a free choice place with probability mass function that matches the relative frequency of instruction lengths given in [12] Even for a given instruction length the decoding time can vary, thereby motivating the use of ....
.... lengths and consequently have longer delay for less common instructions, as previously described in [12] In our Petri net model, depicted in Figure 7, this logic is modeled with a free choice place with probability mass function that matches the relative frequency of instruction lengths given in [12]. Even for a given instruction length the decoding time can vary, thereby motivating the use of stochastic delay models for the decoding of each length. Once decoded, the column broadcasts a signal to multiple tag units associated with its column indicating its instruction is ready to be ....
W.-C. Chou, P. A. Beerel, R. Ginsoar, R. Kol, C. J. Myers, S. Rotem, K. Stevens, and K. Y. Yun. Averagecase optimized technology mapping of one-hot domino circuits. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC), pages 80--91. IEEE Computer Society Press, March 1998.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC