| Gary S. Tyson and Todd M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. In Proceedings of the International Symposium on Microarchitecture, 1997. |
....compiler to statically identify true dependences, which are then forwarded using a separate, fast, communication path. SUDS and other systems in this class essentially statically predict that all memory references that the compiler can not analyze are in fact independent. Several recent systems [87, 119, 25] have proposed hardware prediction mechanisms, for finding, and explicitly forwarding, additional dependences that the compiler can not analyze. Memory dependence speculation has also been examined in the context of fine grain instruction level parallel processing on VLIW processors. The point of ....
....to be in short supply. In nested transaction systems even seemingly simple problems, like efficient timestamp implementation, seem to require baroque solutions (see, for example, 116] A second problem has to do with how one could extend the existing work on dynamic memory dependence prediction [87, 119, 25] to nested transaction systems. Perhaps then, this dissertation, in the end, raises more questions than it answers. In the introduction I stated that the SUDS system was built on three tech66 niques. They were dynamic scalar renaming, control dependence analysis, and speculation. I believe that ....
Gary S. Tyson and Todd M. Austin. Improving the accuracy and performance of memory communication through renaming. In 30th Annual International Symposium on Microarchitecture (MICRO), Research Triangle Park, NC, December 1997.
....instructions are accurately identifiable at runtime. This memory dependence predictability can be utilized to reduce the misprediction penalty of the optimistic static prediction described above or to bypass the store data early to the dependent load(s) to shorten the store to load latency [21] [34]. The above three types of locality predictability are, like the access region locality studied in this paper, based on per instruction runtime information. We expect more studies on per instruction memory access behaviors to come and cooperate with other types of locality, including traditional ....
....forwarded to a later load. The MIPS R10000 processor forwards data to address matching loads in the LSQ on a refill [36] By tracking previously manifested dependences and keeping store data in a separate hardware table, data forwarding from a store to a load can be performed speculatively [21] [34]. There is another opportunity to perform fast forwarding in the LVAQ without keeping dependence tables (i.e. no speculation) Accesses to the stack region in a procedure are usually based on the same value of sp, i.e. sp is not updated within a procedure. The dependence checking hardware can ....
G. Tyson and T.M. Austin, "Improving the Accuracy and Performance of Memory Communication Through Renaming," Proc. 30th Int'l Symp. Microarchitecture, pp. 218-227, Dec. 1997.
....al. 29] and Tyson Store Cache store addr vf index index by . addr Finding Store Load Relationships Value File Store Load Cache value file index value file index index by pc . Finding the Value File Entry Figure 12: The structure of the memory renaming architecture [30]. The Store Cache is used to find the relationships between store and load instructions. The Store Load cache is used to keep track of which Value File entries to use for store and load instructions. Store instructions use the value file entry to store their last value or a pointer to the ....
....track of which Value File entries to use for store and load instructions. Store instructions use the value file entry to store their last value or a pointer to the instruction producing the value. Load instructions used the value file entry to predict the value to use for the load. and Austin [30] found that memory communication between store and load instructions can be accurately predicted in hardware. Memory renaming keeps track of store load dependencies in order to directly communicate a predicted value from a store to a load, bypassing memory. The approach uses a store cache to keep ....
[Article contains additional citation context not shown here]
G. Tyson and T. M. Austin, "Improving the accuracy and performance of memory communication through renaming," in 30th Annual International Symposium on Microarchitecture, pp. 218--227, Dec. 1997. 34
....Memory renaming presents a tougher problem than that of register renaming. Although it is impossible for the processor to implement a deterministic memory renaming technique, various techniques have been proposed that implement speculative memory renaming through registers to increase performance [4, 14, 21]. Renaming is basically a dynamic mechanism, implemented in hardware to overcome static code limitations. Very little work[19, 16] has been done on how the compiler can help the hardware in this process. In the mechanism that we present, the compiler becomes a vital part in the process of ....
G. Tyson and T. Austin. Improving the accuracy and performance of memory communication through renaming. 30th Annual International Symposium on Microarchitecture, December 1997.
....more predictor space since the number of true memory dependencies is far higher than the number of tracking errors. On the other hand, predicting the true memory dependencies allows for speculative forwardings from stores to loads through the memory renamer entries as described in [Mosh97b] and [Tyso97]. Finally, the Speculative Memory Forwarding scheme introduced in [Mosh97b] proposed to bypass the memory renamer by processing speculative forwardings directly in the register renamer. 1.2 Simulation Methodology Results provided in this paper were collected from an IA 32 trace driven ....
....control mis speculations, the validity of the instructions is not questioned. However, all the instructions belonging to the subgraph starting from the faulting instructions misrepresent the data flow graph. On a misprediction, we cannot just re execute the faulting instruction as proposed in [Tyso97]. The faulting instruction needs a different physical register, which must be broadcasted to all dependent instructions 10 . However, all dependency arcs are marked by means of physical registers in the wake up logic of the instruction scheduler. Instructions dependent on the faulting one cannot ....
G. S. Tyson and T. M. Austin, "Improving the Accuracy and Performance of Memory Communication Through Renaming", in Proceedings of the 30 th Annual international Symposium on Microarchitecture, December 1997, pp.218227.
....compiler to statically identify true dependences, which are then forwarded using a separate, fast, communication path. SUDS and other systems in this class essentially statically predict that all memory references that the compiler can not analyze are in fact independent. Several recent systems [38, 50, 11] have proposed hardware prediction mechanisms, for finding, and explicitly forwarding, additional dependences that the compiler can not analyze. Memory dependence speculation has also been examined in the context of fine grain instruction level parallel processing on VLIW processors. The point of ....
G. S. Tyson and T. M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. In 30th Annual International Symposium on Microarchitecture (MICRO), Research Triangle Park, NC, Dec. 1997.
....cannot be statically determined. Whether or not two memory instructions are data dependent may vary depending on the dynamic values of their addresses. The resolution of dynamic memory dependencies is referred to as memory disambiguation, and several hardware mechanisms have been proposed [4, 16, 24] and even implemented in commercial processors [8, 12] Many of the proposed mechan 1 A wider range of processor configurations was simulated, but the results do not differ by very much. isms allow memory operations with unresolved dependencies to speculatively execute. The memory ....
Gary S. Tyson and Todd M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. Proceedings of the 30th Annual International Symposium on Microarchitecture, December 1997.
....here: Case 1: Both the store and load addresses are available: In this case, the linking of the store and the load can be done in an unambiguous manner. Case 2: The store address and or the load address is not available: In this case, we have to do speculative store load linking [1] 6] [9]. In these mechanisms, store load bypassing enables the direct communication of the store s signature to a potentially data dependent load. We can also do this linking by predicting the addresses of all stores and loads having unknown addresses, and by using the using the predicted addresses to ....
....addresses, and by using the using the predicted addresses to speculatively pass signatures. The hardware structure we use for store load linking is called a signature memory bu#er (SMB) There have been a plethora of schemes recently to successfully link the dependent stores and loads [1] 6] [9]. An SMB implementation could use one of the store load linking techniques similar to store caches, store sets, or other store load memory bypassing techniques. An e#cient implementation is heavily dependent on the specific microarchitecture chosen, and will have to be tuned based on the hardware ....
[Article contains additional citation context not shown here]
G. S. Tyson and T. M. Austin, "Improving the Accuracy and Performance of Memory Communication Through Renaming," Proc. 30th International Symposium on Microarchitecture (MICRO30) , 1997.
.... 2 4 4 5 7 8 6 TIME address 1 STORE DDT a b DEF R X STORE R X 1 2 R Y TAG1 TAG2 3 LOAD R Y USE R Y R X TAG1 synonym (a) b) 4 MOSHOVOS SOHI Moshovos, Breach, Vijaykumar and Sohi introduced RAW memory dependence prediction for scheduling loads [14] Tyson and Austin [21] and Moshovos and Sohi [15,17] introduced RAWbased cloaking. The memory renaming proposal of Tyson and Austin combines cloaking with value prediction. Lipasti s Alias prediction [10] is also similar to cloaking. Moshovos and Sohi proposed RAW based speculative memory bypassing [15] Jourdan, ....
G. S. Tyson and T. M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. In Proc. Annual International Symposium on Microarchitecture, Dec. 1997.
....and to use this information to either bypass loads beyond stores or to forward the store s data value to the matching load. To perform this load store pairing, they use a set of fullyassociative tables holding pairs that were wrongly reordered in the past. Similar work was done by Austin and Tyson [Aust97]. Moshovos and Sohi [Mosh97b] also proposed to improve the communication of data between the producer and consumer by eliminating or bypassing store load pairs. A more thorough examination of such communication improvements is presented in [Jour98] Chrysos and Emer proposed a simplification of ....
T. Austin and G. Tyson -- "Improving the Accuracy and Performance of Memory Communication Through Renaming" -- MICRO-30, Dec. 1997.
....22 of the data cache misses on average for the SPEC C programs. 7. Memory Renaming Memory renaming is the process of finding dependencies between store and load instructions, and communicating a predicted value from the store to the load. Research by Moshovos et al. 29] and Tyson and Austin [30] found that memory communication between store and load instructions can be accurately predicted in hardware. Memory renaming keeps track of store load dependencies in order to directly communicate a predicted value from a store to a load, bypassing memory. The approach uses a store cache to keep ....
....index index by addr store . load index by addr Finding Store Load Relationships Value File Store Load Cache value file index value file index index by pc index by pc store . load Finding the Value File Entry Figure 12: The structure of the memory renaming architecture [30]. The Store Cache is used to find the relationships between store and load instructions. The Store Load cache is used to keep track of which Value File entries to use for store and load instructions. Store instructions use the value file entry to store their last value or a pointer to the ....
[Article contains additional citation context not shown here]
G. Tyson and T. M. Austin, "Improving the accuracy and performance of memory communication through renaming," in 30th Annual International Symposium on Microarchitecture, pp. 218--227, Dec. 1997. 39
No context found.
G. S. Tyson and T. M. Austin. Improving the accuracy and performance of memory communications through renaming. In 30th International Symposium on Microarchitecture, pages 218--227, Dec. 1997. 189
....that store to the same memory location before the address of the load has been calculated. A load that speculates correctly frees dependent instructions early instead of waiting to access the memory subsystem. The SMF Tint accomplishes something similar to previous memory forwarding hardware [24, 30, 39]. The SMF Tint is di#erent from the VP Tint because it uses accompanying tables as shown in Figure 5. Load instructions speculatively obtain values from the value file, which holds the data or data tag from a previous store instruction. The store cache identifies store load dependencies ....
....non speculatively and organizes the speculative indexing into the value file. The functionality of the store load cache from memory renaming is replaced by the SMF Tint, so only one table needs to be accessed at prediction time. The tables for the memory renaming hardware function as described in [31, 39]. While the use of a value file table does not allow this particular Traveling Speculation to take full advantage of the prediction bandwidth and latency properties of the framework, the framework still o#ers performance boosts through an increase in prediction accuracy and instruction ....
[Article contains additional citation context not shown here]
G. S. Tyson and T. M. Austin. Improving the accuracy and performance of memory communications through renaming. In 30th International Symposium on Microarchitecture, pages 218--227, Dec. 1997.
....data ow restrictions [20, 32, 33, 50] Through con dence mechanisms, history tables and value bu ers, many of these techniques are able to increase the performance of out of order processors. Similar to some forms of value prediction, memory renaming has also been shown to improve performance [30, 40, 57]. However, to achieve signi cant performance, large, complex data structures and quick recovery from misspeculation are required. As an additional improvement to the VSQ, loads can access the VSQ speculatively. Unlike other speculation based memory techniques, the decisions are governed by ....
....only in57 structions that are in the dependency chain of the misspeculated load should be re executed. Tyson and Austin found that only one third of all instructions following a misspeculated load depend on that value. The tradeo s and implementation of these two methods are further discussed in [32, 57]. 6.2 Analyzing the Load Tokens 6.2.1 Wide Issue Microprocessor Model The trace cache implementation is based largely on the discussions in [43] A ll unit collects instructions at issue time as in [43, 49] Upon completing a trace cache line (also called a trace cache entry) the ll unit can ....
[Article contains additional citation context not shown here]
G. S. Tyson and T. M. Austin. Improving the accuracy and performance of memory communications through renaming. In 30th International Symposium on Microarchitecture, pages 218-227, Dec. 1997.
....is set on a register (say because of a page fault) all subsequent instructions which use that register essentially become NOPs and set their output register s NaT bit. 2.4.7. Memory Renaming Tyson and Austin proposed memory renaming which allow loads to execute early in out of order processors [Tyso97]. This optimization is done entirely in hardware with no modification to the binary. This is achieved by tracking the loads and stores that frequently communicate with each other. Once a stable relationship has developed between a load and a store, the load s data can be accurately predicted to be ....
Gary S. Tyson and Todd M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. Proc. 30th Intl. Symp. Microarchitecture, pp. 218-227, Dec, 1997.
....studied in [Wal93] Memory renaming was considered in a limit study in [AS92] IPC as high as several thousand was reported. Recent developments that employ relatively small value les to rename locally live memory addresses have shown that memory renaming is not as impractical as was thought [TA97] Such studies have shown promise in reducing the e ect of false memory dependencies and allowing more memory references to execute out of order. We extend previous work in memory renaming to the limit case to explore the possible gains of an unrestricted memory renaming model. These limit ....
Gary S. Tyson and Todd M. Austin. Improving the accuracy and performance of memory communication through renaming. In Proc. Micro-30, pages 218-227, December 1997.
No context found.
Gary S. Tyson and Todd M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. In Proceedings of the International Symposium on Microarchitecture, 1997.
No context found.
G. Tyson, T. Austin. "Improving the Accuracy and Performance of Memory Communication Through Renaming". In the Proceedings of the 30th Annual Int'l Symposium on Microarchitecture (MICRO'30), December, 1997, Pages 218-227.
No context found.
G. Tyson and T. Austin. "Improving the Accuracy and Performance of Memory Communication Through Renaming." In Proc. 30th International Symposium on Microarchitecture, pages 218--227, Dec. 1997.
No context found.
G.S. Tyson and T.M. Austin. (1997). Improving the Accuracy and Performance of Memory Comunication Through Renaming. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp 218-227.
No context found.
Gary S. Tyson and Todd M. Austin. Improving the accuracy and performance of memory communication through renaming. In International Symposium on Microarchitecture, pages 218--227, 1997.
No context found.
G. S. Tyson and T. M. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. In Proc. MICRO-30, December 1997.
No context found.
G. Tyson, T. Austin. Improving the Accuracy and Performance of Memory Communication Through Renaming. In Proc. MICRO-30, Dec. 1997.
No context found.
G. Tyson and T. Austin, "Improving the Accuracy and Performance of Memory Communication Through Renaming," Proc. of 30th annual international symposium on Microarchitecture, Triangle Park, NC, 1997, pp. 218--227.
No context found.
G. Tyson, T. Austin. "Improving the Accuracy and Performance of Memory Communication Through Renaming". In the Proceedings of the 30th Annual Int'l Symposium on Microarchitecture (MICRO'30), December, 1997, Pages 218-227.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC