| S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. of ASPLOS III, April 1989. |
....circumstances, the compiler must conservatively invalidate the potentially stale data to ensure coherence. This conservatism results in over invalidation, which increases both the miss ratio and the network traffic compared to mechanisms that utilize run time interprocessor dataflow information [7, 21, 22, 23, 25, 30]. With extra cost, multiprocessors may employ dedicated hardware for cache coherence maintenance by allowing processors to communicate with each other about the data reference status, and to invalidate or update cached copies. Snoopy buses [12, 15, 29, 35, 38] and memory directories [2, 4, 5, 14, ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence. International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, 1989.
....shared bus. Hardware solutions to the cache coherency problem for multiprocessors with point to point connections more commonly employs a directory based scheme [27, 2, 6, 18, 16] Due to the increased complexity of hardware solutions to the cache coherency problem, software assisted schemes 1 [7, 10, 13, 26, 29, 25, 21, 8] have been proposed, which are under supervision of the compiler (static schemes) or supported by the operating system kernel (dynamic schemes) To support scalability to a large number of processors, overheads such as storage requirements and run time related to cache coherency scheme should be ....
S. Owicki and A. Agarwal. Evaluating the Performance of Software Cache Coherence. ACM, pages 230--242, 1989. 12
....alternative cache management strategies have also been proposed. For example, the cache management for the VMP multiprocessor being developed at Stanford is controlled by software [CSB86, CGBG88] Owicki and Agarwal compare software controlled cache coherency mechanisms to hardware mechanisms in [OA89] Their results show that software schemes scale well, but their performance is more sensitive to the sharing patterns of the workload than hardware schemes. For workloads in which a processor makes numerous references to a cache line before it is invalidated, performance is quite good. Reference ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence. In Proceedings, Architectural Support for Programming Languages and Operating Systems, pages 230--242, April 1989.
....for the former. An important class of the latter models predicts cache miss ratios in either the transient [5, 6] or steady state cases [7, 8] Another class applies Mean Value Analysis techniques [9] to study the overhead due to cache coherence protocols in different multiprocessor architectures [10, 11, 12, 13]. 3 Frame k Cache Memory Main Frame nk 1 Frame 2k 1 Frame k 1 Frame (n 1)k 1 Frame (n 1)k Frame k 1 Frame 1 Set n 1 Set 1 Set 0 Frame 0 Block mn 1 Block n Block n 1 Block 2 Block 1 Block 0 Figure 1: Structure of a set associative cache. 1.1 Cache Terminology The early work ....
S. Owicki and A. Agarwal, "Evaluating the performance of software cache coherence," in Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, (Boston, MA), pp. 230--242, April 1989. 21
....the access pattern, we will provide a pair of best case and worstcase models whose predictions will contain the actual execution time (a prediction interval [3] 4. 2 Low level models Significant work has been done in low level analytical models of computer architectures and applications [33, 1, 32, 22]. While such analytical models had fallen out of favor, being replaced by comprehensive simulations, they have recently been enjoying a resurgence due the need to model large scale NUMA machines and the availability of hardware performance counters [18, 7] However, these models have mainly been ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. of ASPLOS III, April 1989.
....for example, the memory system architecture of shared memory systems with complex modern processors can be analyzed with these computationally e#cient methods. 1. Introduction Approximate Mean Value Analysis (AMVA) is a widely used approach to evaluating key computer system performance questions [1, 2, 5, 8, 11, 12, 16, 17, 18, 19, 20, 25, 26, 27, 28, 29]. The wide applicability of the AMVA technique is due to both its very low computational expense and its high degree of accuracy in producing performance estimates that agree with detailed system simulation or system measurement. These capabilities are achieved through the use of heuristic ....
S. S. Owicki and A. Agarwal. Evaluating the Performance of Software Cache Coherence. In Proc. 3rd Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 230--242, April 1989.
....for validity at access time. If the region has been modified since the last access by this processor, the region must be invalidated. The SRp algorithm is the simplest SR cache management algorithm to implement, and behaves similarly to the user based coherence approach described in earlier work [56] and the compiler based approach that selectively flush data from the cache after each computational unit (as defined in chapter 2) As with all pessimistic strategies, SRp may perform poorly for many applications because a region may be unnecessarily invalidated from the cache after every ....
....Table 6.3: Algorithm dependent derived parameters. Chapter 6: An analytic study 91 Discussion Earlier analytic studies of hardware and software coherence strategies have assumed a software model in which caches are managed statically through the compiler. For instance, Owicki and Agarwal [56] use analytic techniques to compare a static software scheme to a bus based cache coherence protocol, and Adve et al. 2] use Mean Value Analysis to compare a static software scheme to a directory based hardware scheme. Static coherence strategies are limited by the fact that a program s run time ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. 3rd Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr 1989. Bibliography 178
....iterative and synchronous behavior of the application in these studies helps reduce the degrees of freedom in the parallel system for tractability. Analytical models have also helped study the performance and scalability of specific system artifacts such as interconnection network [18, 1] caches [45, 44], scheduling [50] and synchronization [64] 5.2 Input Model Several theoretical models have been proposed in literature to abstract parallel machine artifacts. The PRAM [26] has been an extremely popular vehicle for algorithm development. A PRAM consists of a set of identical sequential ....
S. Owicki and A. Agarwal. Evaluating the Performance of Software Cache Coherence. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, Boston, Massachusetts, April 1989.
....of the application. In the future we intend to do a more detailed analysis of our protocols and strategies, using a large set of user applications. Also, we will look at the differences and resemblances between protocols for replication and coherence protocols for CPU caches [Eggers and Katz 1989; Owicki and Agarwal 1989], file caches [Noe et al. 1985; Morris et al. 1986; Ousterhout et al. 1988] and distributed database systems [Bernstein and Goodman 1981] Based on this analysis, we will try to improve our implementations. Our model has several advantages over other models based on logically shared data, such as ....
Owicki, S. and Agarwal, A., Evaluating the Performance of Software Cache Coherence, Proc. 3nd Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 230-242, Boston, MA, Apr. 1989.
....modification the decision is made. Relief may come from advances in the design of multiprocessors. Recent studies show that, for a wide variety of workloads, software controlled caches are nearly as effective as fully coherent hardware caches and much easier to build, as they require no hardware [23] [2] Further extensions to this idea stem from the observation that full coherency is often not necessary, and that it is beneficial to rely on the compiler to maintain coherency in software only when required [2] This line of thinking leads to cache designs that have the necessary control to ....
Susan Owicki and Anant Agarwal. Evaluating the Performance of Software Cache Coherence. In Proceedings of the 3rd Symposium on Programming Languages and Operating Systems. ACM, 1989.
....compiler directed mechanism uses compile time information to determine which cached memory locations may become stale, and then inserts special instructions into the generated code. These instructions are executed by each of the processors to prevent them from using these potentially stale data [8, 9, 19]. This mechanism produces less network traffic compared to a directory scheme, but its performance is dependent on the compiler technology. To examine the range of performance that can be realized by different run time and compile time coherence schemes, we compare a compiler directed scheme ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence. International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, 1989.
....on ways to deal with the scalability issue with directories [11, 25, 31, 58, 81] However, these proposals still result in false sharing and at least some increased hardware complexity and directory memory. All these problems are dealt with by using software controlled cache coherence (SCCC) [2, 36, 38, 82]. In an SCCC system the hardware is much simpler than in an HWCC system. There is neither a hardware directory nor a state machine at the memory or cache controllers for processing the sending and receiving of coherence messages to and from processors and memory modules. An SCCC system maintains ....
....for SCCC is difficult. Address traces do not normally have the information necessary for a simulation of SCCC; for example, when does a location need to be invalidated or written back. Therefore most studies have either used numerical models or probabilistic simulations. Owicki and Agarwal [82] compared HWCC to SCCC using an analytic model. They only compared SCCC to HWCC in a bus based system which is really not comparable to a large scale multiprocessor with a multistage interconnection network. Adve et al. 2] compared SCCC to HWCC using mean value analysis [90] Using various ....
Susan Owicki and Anant Agarwal. Evaluating the performance of software cache coherence. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, 1989.
....the read write pattern of the application. In the future we intend to do a more detailed analysis of our protocols and strategies, using a large set of user applications. Also, we will look at the differences and resemblances between protocols for replication and coherence protocols for CPU caches [18, 27], non uniform memory access (NUMA) architectures [28, 29] file caches [30, 31, 32] and distributed database systems [14] Based on this analysis, we will try to improve our implementations. Our model has several advantages over other models based on logically shared data. It provides a higher ....
S. Owicki and A. Agarwal, "Evaluating the Performance of Software Cache Coherence, " Proceedings 3nd International Conference on Architectural Support for Programming Languages and Operating Systems , Boston, MA, pp. 230-242 (April 1989).
....techniques are used by the NYU Ultracomputer [17] and the RP3 [8] multiprocessor. An important advantage of these software based methods is that they are economical in terms of hardware; only a few instructions used by the compilers to invalidate or flush the cache are needed. Previous research [37, 2] shows that the performance of compilerbased software schemes is comparable to hardware approaches for programs that do not suffer a significant cache hit reduction caused by inaccurate static predictions of memory access conflicts. However, since these schemes must be conservative when ....
Susan Owicki and Anant Agarwal. Evaluating the Performance of Software Cache Coherence. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, April 1989.
.... e.g. 2, 4, 18] We are, however, unaware of any study comparing the performance of any of these systems to the performance of a hardware implementation of shared memory on a dedicated interconnect, e.g. 10, 17] Several studies have compared software to hardware cache coherence mechanisms [20, 22], but these systems still rely on hardware initiated data movement and a dedicated interconnect. In this paper, we compare a sharedmemory implementation that runs entirely in software on a general purpose network of computers to a hardware implementation on a dedicated interconnect. Up to eight ....
....by the compiler or the programmer at the end of critical sections. Cytron et al. 8] and Cheong and Veidenbaum [6] describe algorithms for compiler based software cache coherence. Owicki and Agarwal compare analytically the performance of such a scheme to snoopy cache coherence hardware [20]. Petersen, on the other hand, describes a software cache coherence scheme using the virtual memory management hardware [22] This scheme is transparent to the programmer. It does not require the programmer or compiler to insert cache flush instructions. Using trace driven simulation, she compared ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence. In Proceedings of the 3rd Symposium on Architectural Support for Programming Languages and Operating Systems, pages 230--242, May 1989.
....schemes based on compiler support have been proposed [Vei86, LYL87, CV88, Che92, DMCK92] An important advantage of these software based methods is that they need little hardware support: only a few instructions used by the compilers to invalidate and flush the cache. Previous research [OA89, AAHV91] shows that the performance of compiler based software schemes is comparable to hardware approaches for programs that do not suffer a significant cache hit reduction caused by inaccurate static predictions of memory access conflicts. However, compiler based solutions have to be ....
Susan Owicki and Anant Agarwal. Evaluating the Performance of Software Cache Coherence. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, April 1989.
....to mention that it increases contention for memory. A study by Owicki and Agarwal confirmed that the performance of this no cache solution is worse than any other scheme is most cases and is abysmal if there are a large number of references to shared data (17 of the instructions in the study) 23][24]. The second problem is that shared read write data must be identified and made non cacheable. This process should be done by the compiler in cooperation with the linker and runtime library, but most commercial real time system development is done with existing languages and development systems, ....
Owicki, S. and Agarwal, A. Evaluating the Performance of Software Cache Coherence. In ASPLOS-III: Proceedings of the Third Internation Conference on Architectural Support for Programming Languages and Operating Systems, April 3--6 1989, pp. 230--242.
....not to mention that it increases contention for memory. A study by Owicki and Agarwal confirmed that the performance of this no cache solution is worse than any other scheme is most cases and is abysmal if there are a large number of references to shared data (17 of the instructions in the study)[23][24] The second problem is that shared read write data must be identified and made non cacheable. This process should be done by the compiler in cooperation with the linker and runtime library, but most commercial real time system development is done with existing languages and development ....
Owicki, S. and Agarwal, A., Evaluating the Performance of Software Cache Coherence, report MIT/LCS/TM-395, Laboratory for Computer Science, Massachusetts Institute of Technology, June 1989.
....for validity at Access time. If the region has been modified since the last access by this processor, the region must be invalidated. The Cache Flush algorithm is the simplest SR cache management algorithm to implement, and behaves similarly to the user based coherence approach described in [OA89, MOR86] and the compiler based approach described in [CKM88] However, the Cache Flush algorithm may perform poorly for many applications Access Control SR Cache Management Algorithm Primitive Cache Flush Cache Validate ReadAccess if status[P] HI and cache inval(R) WriteAccess status[P] ....
....version) The alternative of leaving shared data uncached was also initially considered, but for each application, the performance on 16 processors is worse than the 1 processor case, so this alternative will not be discussed further. The algorithm for user controlled caches described in [OA89, MOR86] in which the user explicitly flushes data before leaving a critical section, is similar to the SR Cache Flush algorithm, so we consider the performance of the Cache Flush algorithm to be representative of this class of techniques. 3 3 Unlike user controlled caches, however, the task of ....
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence. ACM, pages 230--242, 1989.
....invalidations [Smi85] and to reduce unnecessary invalidations [MB89] An important advantage of these software based methods is that they are economical in terms of hardware: it need only support a few instructions used by compilers to invalidate and flush the cache. Previous research [AAHV91, OA89] shows that the performance of compiler based software schemes is comparable to hardware approaches for programs that do not suffer a significant cache hit reduction caused by inaccurate static predictions of memory access conflicts. On the other hand, compiler based solutions are expensive in ....
Susan Owicki and Anant Agarwal. Evaluating the Performance of Software Cache Coherence. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, April 1989.
....efficient invalidations [Smi85] and to reduce unnecessary invalidations [MB89] An important advantage of these software based methods is that they are economical in terms of hardware: only a few instructions used by the compilers to invalidate and flush the cache are needed. Previous research [OA89, AAHV91] shows that the performance of compiler based software schemes is comparable to hardware approaches for programs that do not suffer a significant cache hit reduction caused by inaccurate static predictions of memory access conflicts. On the other hand, compiler based solutions are ....
Susan Owicki and Anant Agarwal. Evaluating the Performance of Software Cache Coherence. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 230--242, April 1989.
No context found.
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. of ASPLOS III, April 1989.
No context found.
Susan Owicki and Anant Agarwal. Evaluating the Performance of Software Cache Coherence. Technical Report, DEC System Research Centre, 1989. Report Number 41. (p 4)
No context found.
Susan Owicki and Anant Agarwal. Evaluating the performance of software cache coherence. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Systems, pages 230--242, May 1989.
No context found.
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. 3rd Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, Apr. 1989.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC