96 citations found. Retrieving documents...
Susan J. Eggers and Randy H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Sharing Speculation: A Mechanism for Low-Latency.. - Desikan, Huh, Burger, .. (2003)   (Correct)

....benchmarks, the authors measure the number of bytes transferred as a function of their false sharing measure, and conclude that their mathematical model gives a relatively accurate measure of false sharing. A number of researchers have looked at the coherence overhead due to false sharing [3, 4]. Researchers have also focused on program restructuring techniques to reduce false sharing [5] Chow et al. propose a runtime scheduling technique for eliminating false sharing in parallel loops [2] More recently, modified sectored caches have been proposed to reduce false sharing in ....

Susan J. Eggers and Randy H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, ISCA 88, pages 373--382, May 1988.


CICO: A Practical Shared-Memory Programming Performance Model - Larus, Chandra, Wood (1993)   (4 citations)  (Correct)

....and the increased data reuse reduced interprocessor communication. Shared memory communication is the interprocessor message traffic caused by cache misses and invalidations. One part is the coherence traffic when processors exchange values or appear to exchange values because of false sharing [15]. The other part is the conflict and capacity misses caused by caches of finite size and associativity [23] A programmer or compiler can reduce both aspects of shared memory communication by modifying a program and its data structures to use caches more effectively. However, to make such a ....

....i in array A, even if the value spans more than one cache block. 2.2 Adding Annotations This section describes an approach for annotating cache block race free programs with CICO primitives. A program is cache block race free if it contains no data races and no unsynchronized false sharing [15]. Consequently, when two processors access the same cache block, they must execute a synchronization event between their accesses. These rules place CICO annotations in race free programs so as to capture the interprocessor communication caused by a cachecoherence protocol. The rules can also be ....

Susan J. Eggers and Randy H. Katz. A Characterization of Sharing in Parallel Programs and its Apphcation to Coherehey Protocol Evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373 382, 1988.


Reconciling Sharing and Spatial Locality Using Adjustable.. - Dubnicki, LeBlanc   (Correct)

....of writes to shared data for the 3 benchmarks on 16 processors discussed in [12] are 31 (MP3D) 33 (LU) and 11 (PTHOR) In the applications discussed in [13] shared writes constituted 17 (Maxflow) 5.6 (SA TSP) 19 (MP3D) 6.8 (PTHOR) and 5 (LocusRoute) of all shared references. In [9] the figures are 7 (PLOVER) 22 (PSPICE) 10 (PUPPY) and 2 (TOPOPT) For Splash applications executed with 32 processors the figures are 18 (Ocean) 12 (Water) 40 (MP3D) 12 (LocusRoute) 7 (PTHOR) and 14 (Cholesky) Obviously, the fraction of shared writes varies wildly, but in many ....

S.J. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherehey protocol evaluation. In Proc. 15th International Symposium on Computer Architecture, pages 373-383, May 1988.


Dynamic Pointer Allocation for Scalable Cache Coherence.. - Simoni, Horowitz (1991)   (14 citations)  (Correct)

....so incur traffic and or latency penalties on most directory oper ations. 3 Dynamic Pointer Allocation Directory In several studies of small scale multiprocessing (4 to 32 processors) simulation results indicate that most blocks written into are present in only a small number of other caches [1, 5, 19]. For instance, in the most common case exactly one invalidation is sent. The explanation is that most data is written frequently enough that it is not accessed by many processors before one of them performs a write. This reference behavior has led to the development of limited pointers ....

Susan J. Eggers and Randy H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In Proc. of the 15th Int. Sym. on Computer Architecture, pages 373-382, May 1988.


Design And Analysis Of Update-Based Cache Coherence Protocols For .. - Glasco (1995)   (1 citation)  (Correct)

....and update based protocols. The invalidate based protocols include Goodman s write once [37] the Synapse [30] the Illinois [58] and the Berkeley [29] protocols. The update based protocols include the Firefly [67] and the Dragon [54] protocols. These protocols have been well studied [46, 26, 8, 7, 54, 67]. The next two sections give a brief description of how the Berkeley invalidatebased protocol and the Dragon update based broadcast protocols operate for typical processor reads and writes. Berkeley Invalidate Based Protocol In the Berkeley invalidate based protocol, a cache line may be in one ....

.... case, a directory cache could be used to cache this smaller set of directory entries [39] Also, the bits for each directory entry can be dynamically allocated out of a pool of directory bits [62] Several studies have suggested that the average number of shared copies of a memory line is small [13, 57, 71, 4, 26]. The results presented by the researchers demonstrate that the limited directory schemes result in a minimal performance loss [4, 14, 39] and similarly, the cached directory has also been shown to have a minimal affect on performance [39] These results are very dependent on the number of shared ....

Susan J. Eggers and Randy H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In Proceedings of the 15th International Symposium on Computer Architecture, pages 373--382, May 1988.


Bus And Cache Memory Organizations For Multiprocessors - Winsor (1989)   (2 citations)  (Correct)

....in the sequence of references to a particular shared block is not addressed. Thus, there is insufficient information to judge the applicability of this model to workloads in which such locality is present. The issue of locality of reference to a particular shared line is considered in detail in [EK88]. This paper also discusses the phenomenon of passive sharing which can cause significant inefficiency in write broadcast protocols. Passive sharing occurs when shared lines that were once accessed by a processor but are no longer being referenced by that processor remain in the processor s cache. ....

....on a simulated workload model. A simulation model showed that EDWP performed better than write broadcast protocols for some workloads, and the performance was about the same for other workloads. A detailed comparison with write invalidate protocols was not presented, but based on the results in [EK88], the EDWP protocol can be expected to perform significantly better than write invalidate protocols for short average write run lengths, while performing only slightly worse for long average write run lengths. The major limitation of all of the snooping cache schemes is that they require all ....

SUSAN J. EGGERS AND RANDY H. KATZ. "A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation". The 15th Annual International Symposium on Computer Architecture Conference Proceedings, Honolulu, Hawaii, IEEE Computer Society Press, May 30--June 2, 1988, pages 373--382.


ORION: An Adaptive Home-based Software Distributed Shared Memory .. - Ng, Wong (2000)   (1 citation)  (Correct)

....of performance loss. To these we can add a third the timeliness of data. Software DSMs rely on the use of signal handling mechanisms that are costly. If the data is available when it is needed, this can be reduced. This probably explains why in certain applications, the write update protocol [11] 3 will perform better than the write invalidate protocol [11] even if more data is transferred. We shall see how timeliness of data can be improved in the later part of the paper. One of the important variant of RC is the home based release consistency model (HRC) Unlike in the traditional RC ....

....of data. Software DSMs rely on the use of signal handling mechanisms that are costly. If the data is available when it is needed, this can be reduced. This probably explains why in certain applications, the write update protocol [11] 3 will perform better than the write invalidate protocol [11] even if more data is transferred. We shall see how timeliness of data can be improved in the later part of the paper. One of the important variant of RC is the home based release consistency model (HRC) Unlike in the traditional RC model, each page in a HRC has an assigned home. It should be ....

[Article contains additional citation context not shown here]

S.J. Eggedrs, and R.H. Katz. "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation." In Proceedings of the 15 th Annual International Symposium on Computer Architecture, pages 373-383, May 1998.


ORION: An Adaptive Home-based Software DSM - Ng School Of   (Correct)

....for a page, which varies from application to application. This points to the need for the system to dynamically adapt to the needs of the applications in order to achieve better performances. The idea of automatic adaptive protocol is not new. As a matter of fact, many automatic adaptive schemes [5,6,7,8,9,10,11] have been proposed. These include adaptation between the single writer and multiple writer protocols, adaptation between the write invalidate and write update protocols, and even process thread migration. In this paper, besides introducing our software DSM system, we have also proposed 2 adaptive ....

S.J. Eggers, and R.H. Katz. "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation." In Proceedings of the 15 th Annual International Symposium on Computer Architecture, pages 373-383, May 1998.


Evaluation of Design Alternatives for a Directory-Based Cache.. - Grahn (1995)   (Correct)

....policy. One of the objectives in I is to evaluate if the write update policy actually can reduce the read stall time as compared to the write invalidate policy despite the larger amount of network traffic. This issue has been addressed earlier in the context of bus based multiprocessors [10, 11] but not for multiprocessors with a general interconnection network and a directory based protocol. Using program driven simulations of a detailed multiprocessor model [3] I find in I that for two of four studied applications, the write update policy reduced the read stall time significantly; up ....

S. Eggers and R. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation," In Proceedings of the 15th International Symposium on Computer Architecture, pages 373-382, May 1988.


Evaluating the Effect of Coherence Protocols on the.. - Bianchini, E..   (Correct)

....is the first to relate these constructs and techniques to their communication behavior under invalidate, update, or competitive protocols. Some related pieces of work are listed next. The impact of coherence protocols on application performance is an active area of research. Early work by Eggers [10] studied the relative performance of invalidate and update protocols on small bus based cache coherent multiprocessors. More recent work [6; 22] has looked at the impact of update based protocols on overall program performance on larger machines. Other researchers have taken an ....

S. J. Eggers and R. H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In Proceedings of the 15th International Symposium on Computer Architecture, pages 373--383, May 1988.


Eliminating Useless Messages in Write-Update Protocols .. - Bianchini, LeBlanc.. (1994)   (3 citations)  (Correct)

....the block. The advantage of WU is that each processor receives and stores the update as it occurs, thus preventing future cache misses when the new value is needed. This property is particularly helpful when many processors read the updated values between successive write operations to the data [Eggers and Katz, 1988]. The disadvantage of WU is that every write operation to shared data requires that updates be sent over the network, even if no processor accesses the data between successive writes. WI achieves superior performance when cache blocks are written many times by a single processor before being ....

....that every write operation to shared data requires that updates be sent over the network, even if no processor accesses the data between successive writes. WI achieves superior performance when cache blocks are written many times by a single processor before being accessed by any other processor [Eggers and Katz, 1988]. In most cases, WI results in higher miss rates, but fewer communication operations. Previous studies comparing WI and WU protocols on bus based machines have offered mixed results [Archibald and Baer, 1986; Eggers and Katz, 1988; Veenstra and Fowler, 1994] In general, the comparison depends on ....

[Article contains additional citation context not shown here]

S. J. Eggers and R. H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation," In Proceedings of the 15th International Symposium on Computer Architecture, pages 373--383, May 1988. 28


A Superassociative Tagged Cache Coherence Directory - Lilja, Ambalavanan (1994)   (1 citation)  (Correct)

....the block. When more than n processors attempt to share the same block, an invalidate on overflow policy randomly selects one of the n pointers to the given block (i.e. P0 P3) for replacement. The processor to which it points then is sent a message to invalidate its cached copy. Several studies [1, 7, 11, 14] have shown that relatively few processors simultaneously share a block so that n=2 to 4 typically is sufficient. When more than a active blocks map to the same set in an a way set associative implementation, one of the tag entries is randomly selected for replacement. Invalidation messages must ....

S. J. Eggers and R. H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation," Intl. Symp. Computer Architecture, pp. 373-382, 1988.


The Effect of "Seance Communication" on Multiprocessing Systems - Avi Mendelson And   (Correct)

....enforcing any modification of shared, or potentially shared data, either by invalidating all the corresponding copies of the data present in other caches, or by updating them. The performance of multiprocessors depends on the sharing patterns of the application being run. Various works, such as [Egge88] and [Egge89] have widely studied and analyzed the impact of the sharing patterns on different cache coherency protocols. Their results indicate that write update protocols can outperform write invalidate protocols when the application is characterized with finegrain sharing, but may perform ....

....design parameters can affect the system performance: 1) Large cache lines can cause false sharing and so reduce their benefits. False sharing may cause an overhead in both write invalidate and write update protocols ( Egge89] 2) When large caches are used, it was reported by Eggers and Katz ([Egge88] and [Egge89] that the number of coherency related activities increases as well. This phenomenon was not explained, maybe because the relationship between the amount of coherency related activities and the cache size was not clear enough. A possible explanation may be found by examining the ....

S. J. Eggers and R. H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In the 15th Int'l Symp. on Computer Architecture, pp. 373 - 383, 1988.


Page Placement For Non-Uniform Memory Access Time (NUMA) Shared .. - LaRowe, Jr. (1991)   (Correct)

....are actively shared by multiple processors. Fortunately, they report significant amounts of read only sharing, indicating that caching (and likely page replication as well) should be effective. Studies of memory reference patterns have been reported by others as well (e.g. Eggers and Katz in [EK88] and [EK89a] Smith in [Smi85] Gallivan, Gannon, Jalby, Malony, and Wijshoff in [GGJ 89] and [GGJ 90] Baylor and Rathi in [BR89] Weber and Gupta in [WG89] and Darema Rigers, Pfister, and So in [DRPS87] These studies have reported similar results, suggesting that caching can prove ....

....used and the simulator that is an instance of that model are discussed in [Las88] This program is an early version of the hh3d application used in the experimental studies of Chapters 6 and 7. The third program (lpresim) is a preprocessing step to a parallel switch level timing simulator, ldv [BEK88] The program is a parallel version of presim [Ter83] The program reads in a list of transistors and nodes, building a hash table of nodes. The nodes are then traversed, in the order determined by the hash table, to determine characteristics of the transistor network. Because the ordering of ....

S. Eggers and R. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


An Architecture-Independent Analysis of False Sharing - Khera, LaRowe, Jr., Ellis (1993)   (4 citations)  (Correct)

.... are employed to take advantage of the faster local memory access times [4, 2, 12] False sharing has been blamed as a cause of increased coherency overhead in multiprocessor hardware caches with increasing line sizes in workload characterization studies of shared memory reference patterns [11, 10]. Techniques for ameliorating the false sharing problem have also been proposed in [6, 8, 1] The solution provided by Munin [1, 5] addresses only the most conservative form of false sharing. Other proposals [6, 8] deal with the granularity of coherency which addresses one contributing factor ....

Susan J. Eggers and Randy H. Katz. A Characterization of Sharing in Parallel Programs and its Applicability to Coherency Protocol Evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


Trade-offs Between False Sharing and Aggregation in.. - Amza, Cox, Rajamani, .. (1997)   (28 citations)  (Correct)

....presents compile time techniques to analyze the sharing behavior of explicitly parallel programs [11, 10] His analysis determines which data structures may be susceptible to false sharing. Heuristics are then applied to determine if it s profitable to pad the data structures. Eggers and Katz [6, 7] showed that the performance of coherent caches for bus based shared memory multiprocessors depends on the relationship between the cache block size, the granularity of sharing, and the locality exhibited by a program. They showed that the optimal cache block size varies for different sets of ....

S.J. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


Experimental Comparison of Memory Management Policies for.. - LaRowe, Jr., Ellis (1991)   (49 citations)  (Correct)

....problems. The management of caches in shared memory systems has received much attention. The hardware caching literature is the primary source of information on memory coherence strategies for ensuring the consistency of cached shared data The workload models discussed in this literature [1, 2, 21, 42] are relevant to our work, since one goal is to make it possible to efficiently run UMA programs on a NUMA architecture. Since the local memory of a NUMA machine can be thought of as cache space for some global (distributed) shared memory, this body of work is clearly related. Key differences lie ....

S. Eggers and R. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


A Hybrid Approach to Trace Generation for Performance.. - Giorgi, Prete, Ricciardi (1996)   (Correct)

....traces with a variable number of processors. Indeed, actual traces, captured by traditional hardware based techniques, prove to be particularly useful in the validation phase of a new architecture. Software tracing methodologies include program instrumentation [4,6,20] single step execution [2] and microcode modification. The tracing technique based on microcode modification (ATUM) uses processor microcode to record addresses in a reserved part of main memory as a side effect of normal execution [17] Compared with other techniques, this one leads to fewer distortions and a very fast ....

S. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. Proc. 15th Int. Symp. Comput. Architecture, 1988, pp. 373-382.


Two Adaptive Hybrid Cache Coherency Protocols - Anderson, Karlin (1996)   (6 citations)  (Correct)

....reads and only one subsequent invalidate. A key program characteristic underlying these examples is the block write run, which is defined as a sequence of write references to a shared cache block, uninterrupted by either an access to that block by another processor, or replacement of the block [8]. In the first example above, there were many write runs of length one; in the second, a single write run of length n. In general, short write runs favor writeupdate protocols, while long write runs favor invalidate based protocols. On the other hand, short write runs can lead to poor performance ....

....which is an expensive operation that is done only once. In our algorithms, updating is analogous to spinning, while invalidating is analogous to blocking (since it entails the subsequent large reread cost) 2 SR stands for snoopy reading , the name given to the protocol by Eggers and Katz [8] These protocols are dynamic versions of the competitive protocol SR. They maintain a counter and an invalidation threshold T b for each cache block b. The counter associated to a block is initialized to the value of T b . On an update to an actively shared block, the counter in the writer s ....

S. Eggers and R. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proc. of 15th Int. Symp. on Computer Architecture, pages 373--382, 1988.


Memory Servers for Multicomputers - Iftode, Li, Petersen (1993)   (31 citations)  (Correct)

....referenced page frames may be reaccessed. Using only the last reference time as the replacement criterion is not a good strategy either, because page frames whose pages are no access may have been referenced recently. Since parallel programs exhibit a high degree of spatial locality of reference [12], a better strategy is to combine both parameters to decide the priorities in which page frames are reclaimed. A practical approach is to use a list data structure to represent an ordered set for each type of page frames. Each set is ordered by its last reference time, and, each set has thresholds ....

S.J. Eggers and R.H. Katz. A Characterization of Sharing in Parallel Programs and Its Applications to Coherence Protocol Evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, June 1988.


Adaptive Schemes for Home-based DSM Systems - Ng, Wong (1999)   (1 citation)  (Correct)

....applications, the reverse is true [2] The main cause for this difference is the differences in the memory access patterns (MAP) exhibited by the applications in question. This points to the need to adapt the protocol to better suit the needs of the applications. Many automatic adaptive schemes [5,6,7,8,9,10,11] have been proposed. These include adaptation between single writer and multiple writer protocols, adaptation between write invalidate and writeupdate protocols, and even process thread migration. HLRC exhibited an important feature that aids adaptation. The homes of pages serve as a natural ....

S.J. Eggers, and R.H. Katz. "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation." In Proceedings of the 15 th Annual International Symposium on Computer Architecture, pages 373-383, May 1998.


Identification And Optimization Of Sharing Patterns For Scalable.. - Kaxiras (1998)   (4 citations)  (Correct)

....simultaneously. To make this point clear I use the concept of read runs as a tool to investigate sharing behavior. In Section 4.5 of Chapter 4 I am making extensive use of read run analysis to explain the performance of various GLOW schemes. Analogous to a write run defined by Eggers and Katz [29], I define a read run for a data block as a sequence of reads (from any processor) between two writes (from any writer) The size of a read run is thus related to the number of simultaneous cache copies in the system (if we ignore for a moment multiple reads because of replacements) Figure 3.1 ....

Susan J. Eggers and Randy H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation." In Proceedings of the 15th Annual International Symposium on Computer Architecture, pp. 373-382, Jun. 1988.


Highly Concurrent Cache Coherence Protocols - Williams, Reynolds, Jr. (1990)   (Correct)

.... and write invalidate protocols, at least for bus based protocols involving a small number of PE s, depend on several factors: block size, cache size, the probability of an access being a write, and the extent to which accesses by different processes to the same block are 4 interleaved [EgK88, EgK89]. Write update protocols can waste bandwidth by sending updates to PE s that no longer need to access the updated block. On the other hand, write invalidate protocols may invalidate actively accessed copies of a block. Write invalidate protocols tend to perform better than writeupdate protocols ....

....Protocol 26 Since every read is executed on a cache copy, the memory copy can be eliminated. Eliminating the memory copy allows an optimization for the special case in which only the owner has a cache copy. Studies of parallel programs suggest this case occurs frequently in actual applications [BaR89, EgK88]. The owner can detect whether it has the only cache copy because it maintains DIR. If DIR is empty, the owner does not use a update policy, but instead executes all operations on V locally. The memory copy is also not required in the dynamic version of this protocol, but, because it is useful in ....

S. Eggers and R. Katz, A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation, Proc. 15th International Symp. Computer Architecture, May 1988, 373-382.


TLB Performance in Multiprocessors - Teller, Gottlieb (1991)   (1 citation)  (Correct)

....Due to space limitations, we do not present all the relevant data produced by these simulations, which can be found in [Teller, 1991] We believe that, as with other caches, TLB performance can be studied well by trace driven simulations. See [Agarwal and Cherian, 1989] Chaiken, et al. 1990] [Eggers and Katz, 1988], and references therein for other multiprocessor cache studies. We follow Knuth [1976] in our use of asymptotic orders and write f =O(g) if there exist N and C 0 such that, for all n N, f (n) Cg(n) We write f =W(g) if g =O( f ) and write f =Q(g) if f =O(g) and f =W(g) Roughly speaking, f ....

Eggers, S., and R. Katz, "A Characterization of sharing in parallel programs and its" application to coherency protocol evaluation," Proceedings of the 15th Annual International Symposium on Computer Architecture, IEEE Catalog No. 88CH2545-2, pp. 373-382, June 1988.


Replication Techniques For Speeding Up Parallel.. - Bal, Kaashoek, Tanenbaum (1992)   (24 citations)  (Correct)

....the read write pattern of the application. In the future we intend to do a more detailed analysis of our protocols and strategies, using a large set of user applications. Also, we will look at the differences and resemblances between protocols for replication and coherence protocols for CPU caches [Eggers and Katz 1989; Owicki and Agarwal 1989] file caches [Noe et al. 1985; Morris et al. 1986; Ousterhout et al. 1988] and distributed database systems [Bernstein and Goodman 1981] Based on this analysis, we will try to improve our implementations. Our model has several advantages over other models based on ....

Eggers, S. J. and Katz, R. H., A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation, 15th Int. Symp. on Computer Architecture, pp. 373382, Israel, June 1989.


Evaluating the Potential of Programmable Multiprocessor.. - John Carter Mike (1994)   (Correct)

....consistency protocol and do not provide any reasonable hooks with which the compiler or runtime system can guide the hardware s behavior. Using traces of shared memory parallel programs, researchers have found there are a small number of characteristic ways in which shared memory is accessed [4, 15, 17, 29]. These characteristic patterns are sufficiently different from one another that any protocol designed to optimize one will not perform particularly well for the others. Since all existing and announced commercial multiprocessors implement a single hardware consistency mechanism, they will ....

S.J. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


Static Cache Simulation and its Applications - Mueller (1994)   (10 citations)  (Correct)

.... Single Stepping is a processor mode that interrupts the execution of a program after each instruction. The interrupt handler can be used to gather the trace data. This technique is just slightly faster the hardware simulation (100x 1000x slow down) and works only for existing architectures [70, 21], though sometimes traces from an existing architecture are used to project the speed of prototyped architectures [56] Inline Tracing is a technique where the program is instrumented before execution such that the trace data is generated by the instrumentation code as a side effect of the ....

S. J. Eggers and R. H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In International Symposium on Computer Architecture, pages 373--382, 1988.


Automatic Data Aggregation for Software Distributed Shared.. - Rajamani   (Correct)

....parameters. While considering the domain for forming page groups, we explored two options a) consider only adjacent pages for forming page groups, or b) have any set of pages form a group. The first option is a natural extension of the idea of large coherence units. Since contemporary research [6] [19] 15] has found an application dependent 9 performance variation with the size of coherence units we explored the benefit of dynamically aggregating requests for adjacent pages, under the first option. As a group formed from adjacent pages grows or shrinks in size, depending on program ....

....improvement in performance due to communication aggregation for three of the four applications in their test suite. The improvements in program speedup range from 3 45 due to communication aggregation. 4. 2 Variation in Performance with Size of Coherence Unit Eggers and Katz showed in [6] and [7] that the performance of coherent caches depend on the relationship between the cache block size and the granularity of sharing and locality exhibited by a program, for bus based shared memory multiprocessors. 45 They show that large cache blocks improve performance for applications with ....

S.J. Eggers and R.H. Katz. A Characterization of Sharing in Parallel Programs And Its Application to Coherency Protocol Evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


Data Replication for Mobile Computers - Huang, Sistla, Wolfson (1994)   (42 citations)  (Correct)

....However, their optimization objective is energy, and ours is communication. 8. 2 Caching and virtual memory In the computer architecture and operating systems literature there are studies of two subjects related to dynamic data allocation, namely caching and distributed virtual memory (see [1, 2, 6, 7, 11, 14, 15, 23, 26, 27, 24, 25, 28, 30, 35]) However, there are several important differences between Caching and Distributed Virtual Memory (CDVM) on one hand, and replicated data in distributed systems on the other. Therefore, our results have not been obtained previously. First, the CDVM methods do not focus on the communication cost, ....

....of writes, but also as a results of limited storage. One may argue whether or not limited storage is a major issue in distributed databases, however, in this paper we assumed that storage at the mobile computer is abundant. Third, the architecture assumed in most CDVM methods is bus based (e.g. [1, 14, 15, 25]) This architecture supports broadcast at the same cost as a single cast, and on the other hand incurs contention. In contrast, in this paper we assumed point to point communication. 9 Conclusions In this paper we have considered several data allocation algorithms for mobile computers. In ....

S. J. Eggers and R. H. Katz, "A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation", Proc. of the 15-th Int'l Symp. on Comp. Architecture, Pages 373-382, June 1988


Adaptive Protocols for Software Distributed Shared Memory - Amza, Cox, Dwarkadas.. (1999)   (32 citations)  (Correct)

....a single or multiplewriter protocol is in use, either the whole page or the diffs are fetched. In an update protocol, instead, the modifications to the page are sent with the synchronization message. Pages are never invalidated. The tradeoffs between invalidate and update protocols are well known [12]. Update protocols send substantially more data, including data that the processor may never access or that may be overwritten by newer data before the processor accesses the data originally sent. Invalidate protocols only retrieve the data for the pages the processor accesses, but they pay the ....

S.J. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


CC-NUMA Page Table Management and Redundant Linked List Based.. - Vlaovic   (Correct)

....the only cache capable of writing is the head. Once the line is written the rest of the list purged, and if the members wish to obtain the new copy, they must join the list again. This invalidation scheme follows the current trend of invalidation of cache lines rather than updating them [7]. Updates are useful when cached copies are frequently reread after and update, and when temporally close updates to the same cache line are gathered before the update is enacted. However, this assumes that every node that has that cache line requires the most recent copy, which is not always the ....

S.J. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, May 1988.


Toward Large-Scale Shared Memory Multiprocessing - Bennett, Carter, Zwaenepoel (1991)   (Correct)

....a suite of consistency protocols so that individual data objects are kept consistent by a protocol tailored to the way in which that object is accessed. Several studies of shared memory parallel programs have indicated that no single consistency protocol is best suited for all parallel programs [Ben90a, Egg88, Egg89]. Furthermore, within a single program, different shared data objects often are accessed in fundamentally different ways [Ben90a] and a particular object s access pattern can change during the execution of a program. Existing DSM systems have not taken advantage of these observations, and have ....

....object. For example, the log entry for a read of an element of a matrix object indicates only that the matrix was read at object granularity, but indicates the specific element that was read at element granularity. Our study of sharing in parallel programs distinguishes itself from similar work [Egg88, Sit88, Web89] in that it studies sharing at the programming language level, and hence is relatively architectureindependent, and in that our selection of parallel programs embodies a wider variation in programming and synchronization styles. An important difference between our approach and previous methods ....

[Article contains additional citation context not shown here]

Susan J. Eggers and Randy H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373-- 383, May 1988.


Eager Combining: A Coherency Protocol for Increasing.. - Ricardo Bianchini (1994)   (6 citations)  (Correct)

....an issue on machines that use low order interleaving of addresses. In this paper we focus on a different type of hot spot, which is caused by producer consumer sharing of data. Previous studies have considered the relationship between a program s sharing patterns and contention. Eggers and Katz [Eggers and Katz, 1988] compared the coherency overhead of writeupdate and write invalidate protocols on small scale multiprocessors, and observed that contention for locks and data was not significant in their applications. Gupta and Weber [Gupta and Weber, 1992] classified data objects according to their expected ....

S. J. Eggers and R. H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation," In Proceedings of the 15th International Symposium on Computer Architecture, pages 373--383, May 1988.


The Interaction of Parallel Programming Constructs and.. - Ricardo Bianchini   (Correct)

....is the first to relate these constructs and techniques to their communication behavior under invalidate, update, or competitive protocols. Some related pieces of work are listed next. The impact of coherence protocols on application performance is an active area of research. Early work by Eggers [8] studied the relative performance of invalidate and update protocols on small bus based cache coherent multiprocessors. More recent work [4, 18] has looked at the impact of update based protocols on overall program performance on larger scale machines. Other researchers have taken an ....

S. J. Eggers and R. H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In Proceedings of the 15th International Symposium on Computer Architecture, pages 373--383, May 1988.


Object Allocation in Distributed Databases and Mobile Computers - Huang, Wolfson   (7 citations)  (Correct)

....penalty if the read write pattern is not known. In contrast, in this paper we do so. 5. 2 Caching and virtual memory In the computer architecture and operating systems literature there are studies of two subjects related to dynamic allocation, namely caching and distributed virtual memory (see [2, 1, 4, 5, 8, 10, 11, 19, 16, 18, 20, 21, 22, 24, 29]) In these methods, when a processor issues a read to a shared page that is not in its memory, a read page fault is triggered and the page fault interrupt handler requests the page from another processor; when received, the page is cached locally. There are various ways of handling writes, but ....

....a page that is read is found in the cache, no I O cost is incurred. On the other hand, even when an object is replicated at a processor, it may reside in secondary storage, leading to an I O cost incurred at the time of read. Fourth, the architecture assumed in most CDVM methods is bus based (e.g. [2, 10, 11, 21]) This architecture supports broadcast at the same cost as a single cast, and on the other hand incurs contention. In contrast, in this paper we assumed point to point communication. The present work is also related to replicated file systems, such as CODA (in [18, 24] However, these works ....

S. J. Eggers and R. H. Katz, "A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation", Proc. of the 15-th Int'l Symp. on Comp. Architecture, Pages 373-382, June 1988


Improving Performance of Bus-Based Multiprocessors - Anderson (1995)   (1 citation)  (Correct)

....6 cases; it came in a close second in the other 4 cases. Chapter 4 ADAPTIVE HYBRID CACHE PROTOCOLS 4.1 Introduction Applications exhibit a wide variety of reference patterns. It is well known that depending on the application, either write invalidate (WI) or write update (WU) performs best [EK88] Indeed, even within applications, various forms of sharing behavior exist: for example, widely write shared data (like locks) largely read only data, and migratory data [CF93, SBS93] Because of these different sharing patterns, varying the protocol as the application executes has the ....

....the potential to increase application performance by decreasing both the number of bus transactions and the number of bytes transferred over the bus, leading to decreased bus contention and reduced latency for memory operations. A key concept in choosing between WI and WU is that of the write run [EK88] Eggers defines a write run as a sequence of write references by one processor to a shared address, uninterrupted by any accesses by other processors. We will use a slightly different definition of a write run in our discussions. Our changes are motivated by the observation that in practice, ....

Susan Eggers and Randy Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proc. of 15th Int. Symp. on Computer Architecture, pages 373--382, 1988.


Synchronization, Coherence, and Consistency for High Performance .. - Dwarkadas (1992)   (Correct)

.... (decoding each instruction to determine its instruction addresses and timing information) at compile time (only once for each piece of code) Several approaches to the application of execution driven simulation to cache and shared memory performance evaluation have evolved in the last few years [15, 75, 32, 25]. We have adapted this technique for cache and shared memory simulation in our testbed. We describe our approach and compare it to other existing approaches in Chapter 3. 2.4.3 Analytical Modeling Analytical models provide a quick first cut estimate of cache performance. They also provide insight ....

....address trace analysis, and does not handle shared memory systems. TRAPEDS also does not model communication accurately, since it does not evaluate the effects of contention. For these reasons, TRAPEDS is not an appropriate tool to study trade offs in computation and communication. MPtrace [32] uses the executiondriven approach to generate traces of information that allow an address trace to be reconstructed. MPtrace achieves an overhead of 2 to 3 times the execution time of the program being simulated (excluding trace storage overhead) by transferring the work of trace reconstruction ....

[Article contains additional citation context not shown here]

S. J. Eggers and R. H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In Proceedings of the 15th Annual Symposium on Computer Architecture, pages 373--383. IEEE, May 1988. 137


Tradeoffs between False Sharing and Aggregation in.. - Amza, Cox, Rajamani, .. (1997)   (28 citations)  (Correct)

....presents compiletime techniques to analyze the sharing behavior of explicitly parallel programs [12] His analysis determines which data structures may be susceptible to false sharing. Heuristics are then applied to determine if it is profitable to pad the data structures. Eggers and Katz [6, 7] showed that the performance of coherent caches for bus based shared memory multiprocessors depends on the relationship between the cache block size, the granularity of sharing, and the locality exhibited by a program. They showed that the optimal cache block size varies for different sets of ....

S.J. Eggers and R.H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


Timepatch: A Novel Technique for the Parallel Simulation.. - Umakishore Ramachandran   (Correct)

....Thus cache simulations play a very important role in the design cycle of building shared memory multiprocessors by aiding the choice of appropriate parameter values for a specific cache protocol and estimating the performance of the system. Various simulation techniques including trace driven [EK88, ASHH88] and execution driven [Fuj83, CMM 88, DGH91] methods have been used for this purpose. Most of the known approaches to cache simulation are sequential. Such simulations impose a heavy burden on system resources both in terms of space and time. The elapsed time for the simulation is ....

S. J. Eggers and R. H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In 15th Annual International Symposium on Computer Architecture, pages 373--82, June 1988.


Evaluating Distributed Shared Memory for Parallel Numerical.. - Larry Wittie (1993)   (1 citation)  (Correct)

....of processors and avoiding the severe buscontention or network expense problems of existing multiprocessors. DSM systems pass short messages, but hide the underlying mechanism. DSM systems are feasible because only a small fraction of write accesses, much less than 3 of all memory references[5], in parallel programs are to variables used by more than one processor. DSM systems maintain local memory, or cache, copies of shared data. One of the main classifications of DSM systems depends on the way they handle remote memory accesses. It defines a spectrum from on demand to eager sharing. ....

S.J. Eggers and R.H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. The 15th Ann. Int. Symp. on Comp. Arch., pages 373--382, May 1988.


Extending The Scalable Coherent Interface For Large-Scale.. - Johnson (1993)   (10 citations)  (Correct)

....our results are relevant to any system that must manage data coherence, such as message passing operating systems, compilers, and or application software. Coherence protocols can update or invalidate stale copies to maintain coherence. Invalidate protocols appear to be in widespread favor and some [EgKa88, Scot92] have put forth data and or arguments in favor of invalidate over update. Updates perform well when 1) a cached copy is reread after an update and 2) temporally close updates to the same cache line are collected before sending. The problem with updates is that all caches with stale copies are ....

....this is implemented in the Stanford DASH [LLGG90] Directory protocols have a performance problem in that each directory serializes insertions and deletions and the network connection serializes the messages for purges. This problem has Ch. 1 9 not been highlighted in current performance studies [ASHH88, EgKa88, OKNe90, CFKA90, GHGM91, GuWe92] because they have focused on the performance of small machines with at least one study [GHGM91] admitting the assumption that shared instructions hit in the cache. With thousands of processors, anything sequential is unacceptable, unless the case is extremely infrequent. Matloff [Matl91] claims ....

[Article contains additional citation context not shown here]

Susan J. Eggers and Randy H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation," Proceedings of the Fifteenth Annual International Symposium on Computer Architecture 16, 2 (May 1988), 373-382.


AdaptiveSoftware Cache Managementfor - Distributed Shared Memory   (Correct)

No context found.

Susan J. Eggers and Randy H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.


Minerva: An Adaptive Subblock Coherence Protocol for Improved .. - Rothman, Smith   (Correct)

No context found.

Susan J. Eggers and Randy H. Katz. A Characterization of Sharing in Parallel Programs And Its Applicibility to Coherency Protocol Evaluation. In Proc. 15th Annual International Symposium on Computer Architecture, pages 373-382, Honolulu, HI, May 30{June 2 1988.


Adjustable Block Size Coherent Caches - Dubnicki, LeBlanc (1992)   (40 citations)  (Correct)

No context found.

S.J. Eggers and R.H. Katz. A characterization of shar- ing in parallel programs and its application to coherency protocol evaluation. In 15 International Symposium on Computer Architecture, pages 373-383, May 1988.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  (Correct)

No context found.

Eggers, S. J. and Katz, R. H. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Conference on Computer Architecture, Honolulu, HI, 373-383, 1988. 175


Application Performance on the MIT Alewife Multiprocessor - Frederic Chong Beng-Hong (1996)   (Correct)

No context found.

S. J. Eggers and R. H. Katz. A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation. In Proceedings of the 15th International Symposium on Computer Architecture, New York, June 1988. IEEE.


Techniques and Tools for Distributed Shared Memory.. - Callaghan, Tamches   (Correct)

No context found.

S. J. Eggers and R. H. Katz, "A characterization of sharing in parallel programs and its application to coherency protocol evaluation," Proceedings of the 15th Annual International Symposium on Computer Architecture, May 1988.


Shared Regions: A strategy for efficient cache management in.. - Sandhu (1995)   (2 citations)  (Correct)

No context found.

S. J. Eggers and R. H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In 15th Int'l. Symp. on Computer Architecture, May 1988.


Willow: A Scalable Shared-Memory Multiprocessor - Bennett, Dwarkadas.. (1992)   (7 citations)  (Correct)

No context found.

S. J. Eggers and R. H. Katz. A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation. In Proceedings of the 15th Annual Symposium on Computer Architecture, pages 373--383, May 1988.


Munin: Distributed Shared Memory Based on Type-Specific .. - Bennett, Carter.. (1990)   (187 citations)  (Correct)

No context found.

Susan J. Eggers and Randy H. Katz. A characterization of sharing in parallel programs and its application to coherency protocol evaluation. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 373--383, May 1988.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC