12 citations found. Retrieving documents...
E. Rothberg, J. P. Singh, and A. Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In ISCA, 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
I/O Limitations in Parallel Molecular Dynamics - Terry Clark Ridgway   (Correct)

....data production rates that can be attained with the Intel Paragon and EulerGROMOS. 1 For a review of parallel molecular dynamics algorithms see [8] The machine can of course be made to perform more efficiently by manipulating machine parameters such as cache size, instruction set, and so forth [9]. 2 2 The AChE and tacrine study The enzyme acetylcholinesterase is responsible for degrading the neurotransmitter acetylcholine in species from man on down to insects. AChE is a target for many commonly used drugs and toxins. For example, clinical studies suggest that acetylcholinesterase ....

Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working sets, cache sizes and node granularity issues for large--scale multiprocessors. Comp. Arch. News, 21(2):14--25, 1993.


Attributes of Molecular Dynamics Calculations: Accounting for CPU.. - Clark (1997)   (Correct)

.... application of importance, molecular dynamics is often used to evaluate computing systems, placing it in benchmark suites such as SPEC (System Performance and Evaluation Cooperative) and SPLASH [28] As a benchmark, MD is often used alongside other applications to evaluate computer architectures [18, 26, 32], languages and compilers [7, 13, 17, 25] and software systems [10] Efforts to improve molecular dynamics performance include sequential algorithms addressing the pairlist calculation [1, 22, 30] and numerous vectorization [3, 14, 15, 29] and parallelization efforts [4, 5, 6, 11, 12, 23] A ....

Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working sets, cache sizes and node granularity issues for large--scale multiprocessors. Comp. Arch. News, 21(2):14--25, 1993.


Abstracting Network Characteristics and Locality.. - Sivasubramaniam.. (1993)   (2 citations)  (Correct)

....hardware facilities that would help exploit locality in applications, and have clearly illustrated the use of caches in reducing network traffic. There have also been application driven studies which try to synthesize cache requirements from the application viewpoint. For instance, Gupta et al. [21] show that a small sized cacheof around 64KB can accommodate the important working set of many applications. Similarly, Wood et al. 30] show that the performance of a suite of applications is not very sensitive to different cache coherence protocols. But from the performance evaluation viewpoint, ....

E. Rothberg, J. P. Singh, and A. Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14--25, May 1993.


An Analytical Model of the Working-Set Sizes in.. - Karlsson, Dahlgren.. (2000)   (5 citations)  (Correct)

....instructions or data. Reuse of such chunks often results in distinct workings sets and a typical workload consists of many such working sets. Based on multiprocessor executions of scientific and technical workloads, Rothberg et al. found that the performance critical working sets form a hierarchy [15]. In fact, they noticed that the most performance critical working set easily fits in moderatelysized caches and it grows very slowly with the problem size thanks to the compute intensive nature of the workloads. It is presently an open issue whether these observations extend to data intensive ....

....presents the analytical model for the cache miss ratio as a function of the size of a fully associative cache. We first present the overall approach in Section 3.1 before we present the model equations in Section 3.2. 3. 1 Overview of the Model Based on scientific applications, Rothberg et al. [15] found that all working sets in an execution typically form a hierarchy like the one sketched in Figure 2. This diagram shows the miss ratio versus the cache size assuming a fullyassociative cache. While a distinct drop in the miss ratio rarely shows up in practice, it is useful for the continued ....

[Article contains additional citation context not shown here]

E. Rothberg, J. P. Singh, and A. Gupta. Working Sets, Cache Sizes and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings of the 20th International Symposium on Computer Architecture, pages 14--25, May 1993.


An Approach to Scalability Study of Shared Memory.. - Sivasubramaniam.. (1994)   (3 citations)  (Correct)

....performance [11] and the impact of synchronization and task granularity on parallel system performance [6] Cypher et al. 10] identify the architectural requirements such as floating point operations, communications, and input output for messagepassing scientific applications. Rothberg et al. [24] conduct a similar study towards identifying the cache and memory size requirements for several applications. However, there have been very few attempts at quantifying the effects of algorithmic and architectural interactions in a parallel system. The work we present in this paper is part of a ....

....algorithms with regular communication structures that can be predetermined before the execution of the algorithm. Madala and Sinclair [17] confine their studies to synchronous algorithms while [31] and [9] develop models for regular iterative algorithms. However, there exist several applications [24] with irregular data access, communication, and synchronization characteristics which cannot always be captured by such simple parameters. Further, an application may be structured to hide a particular overhead such as latency by overlapping computation with communication. It may be difficult to ....

Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14--25, May 1993.


I/O Limitations in Parallel Molecular Dynamics - Clark, Scott, Wlodek (1995)   (Correct)

.... (AChE) system described in this paper can require a month or more of a 512 node 1 For a review of parallel molecular dynamics algorithms see [8] The machine can of course be made to perform more efficiently by manipulating machine parameters such as cache size, instruction set, and so forth [9]. Paragon, far short of a desirable turn around time. Beyond this state of the art machine, one of the largest typically in use, a straight forward extrapolatation of our experimental performance data reveals that the output rates eventually dominate the total cost of the simulation. In this ....

Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working sets, cache sizes and node granularity issues for large--scale multiprocessors. Comp. Arch. News, 21(2):14--25, 1993.


Performance Implications of Communication.. - Lim, Chang.. (1997)   (Correct)

....index calculations are required during the local computation phases. 2 We use the notation application system to refer to the implementation of an application on a particular system. Blocked LU decomposition This application implements in situ factorization of a dense matrix as described in [15]. The communication and computation structure of this application is as follows: The matrix is divided up into blocks distributed among processors. Every step comprises three substeps, between which processors synchronize with a barrier. First, the pivot block (I ; I) is factored by its ....

E. Rothberg, J. P. Singh, and A. Gupta. Working Sets, Cache Sizes and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings of the 20th International Symposium on Computer Architecture, San Diego, CA, May 1993.


A Performance Study of Cosmological Simulations on.. - Marios Dikaiakos (1996)   (4 citations)  (Correct)

....node to node communication does not become a bottleneck. Furthermore, even under the more restrictive assumption of a random message traffic, the per processor communication rate does not exceed the sustainable per processor bandwidth, which is determined by the bisection width of the Paragon mesh [19, 27]. Another interesting remark from Figure 5 0 100 200 300 400 500 Time (sec) 0 2000 4000 6000 8000 10000 Number of Messages per second Number of Messages dispatched Figure 6: Communication profile: 32 processors, 2 time steps, 125,000 particle simulation (right) is that, in most cases, the ....

E. Rothberg, J.P. Singh, and A. Gupta. Working Sets, Cache Sizes and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14--25, May 1993.


Machine Abstractions and Locality Issues in.. - Sivasubramaniam.. (1993)   (Correct)

....approach to conduct our study. A similar approach has also been used by other researchers in studying the impact of application characteristics on architectural requirements 2 We do not distinguish between the terms, process, processor and thread, and use them synonymously in this paper. [14, 22]. 3.1 Approaches for Measuring Overheads Experimentation, simulation, and analytical models are techniques that can be used for measuring overheads. But each has its own limitations. Experimentation is useful for understanding and evaluating existing architectures but the underlying hardware is ....

....structures that can be predetermined before the execution of the algorithm. Madala and Sinclair [18] confine their studies to synchronous algorithms while Vrsalovic et al. 26] and Cvetanovic [13] develop models for regular iterative algorithms. However, there exist several applications [22] with irregular data access, communication, and synchronization characteristics which cannot always be captured by such simple parameters. Further, an application may be structured to hide a particular overhead such as latency by overlapping computation with communication. It may be difficult to ....

[Article contains additional citation context not shown here]

Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14--25, May 1993.


Synthesizing Network Requirements Using Parallel.. - Sivasubramaniam.. (1994)   (Correct)

....sensitive to the workload, it is necessary to study them in the context of real applications. The RISC ideology clearly illustrates the importance of using real applications in synthesizing architectural requirements. Several researchers have used this approach for parallel architectural studies [22, 9, 15]. Cypher et al. 9] use a range of scientific applications in quantifyingthe processing, memory, communication and I O requirements. They present the communication requirements in terms of the number of messages exchanged between processors and the volume (size) of these messages. As identified in ....

....to be varied. Each node in the system has a piece of the globally shared memory and a 2 way set associative private cache (64KBytes with 32 byte blocks) The cache is maintained sequentially consistent using an invalidation based fully mapped directory based cache coherence scheme. Rothberg et al. [22] show that a cache of moderate size (64KBytes) suffices to capture the working set in many applications, and Wood et al. 29] show that the network traffic generated is 2 Efficiency is defined as speedup(p) p where p is the number of processors. Speedup(p) is the ratio of the time taken to ....

E. Rothberg, J. P. Singh, and A. Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14--25, May 1993.


Load Balancing and Data Locality in Adaptive.. - Singh, Holt.. (1995)   (30 citations)  Self-citation (Singh Gupta)   (Correct)

....caches on the machine are large enough to accommodate the working sets of these schemes but not the working sets of schemes with poor locality. Infinite caches do not capture this effect. However, this is not a very significant issue in our applications since the important working sets are small [22]. Besides, infinite caches are better at measuring inherent communication, which is what we want to compare using the simulator. 15 7.2 Organization of Experiments For each application, we first examine approaches that are conceptually obvious and very easy for a programmer to implement. These ....

Ed Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In Proceedings' of the 20th Annual International Symposium on Computer Architecture, May 1993. To appear.


Dynamic Tracking of Page Miss Ratio Curve for Memory.. - Zhou, Pandey.. (2004)   (Correct)

No context found.

E. Rothberg, J. P. Singh, and A. Gupta. Working sets, cache sizes and node granularity issues for large-scale multiprocessors. In ISCA, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC