19 citations found. Retrieving documents...
M. Holliday and M. Stumm, "Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors," IEEE Transactions on Computers , 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Modeling Communication Locality in Multiprocessors - Salisbury, Chen, Melhem (1999)   (2 citations)  (Correct)

....has high locality. This is reflected not only in the reduced number of communicating pairs, but in the reduction in average number of links traversed by each message. Multiprocessor performance models have been developed to accommodate these kinds of assumptions about communication locality [1, 10]. Other researchers have recognized the importance of locality by designing multi hop networks that reduce cost by providing fewer communication links between processors that are unlikely to communicate [4, 17] We are interested in networks where the delay between each pair of processors is ....

M. Holliday and M. Stumm. Performance evaluation of hierarchical ring based shared memory multiprocessors. IEEE Transactions on Computers, 43(1):52--67, January 1994.


Multistage Ring Network: An Interconnection Network for.. - Yoo, Park, Maeng   (Correct)

....the sys3 tem, they used the corner rings, having branching factor n to replace each vertex of the topology. The corner rings have the potential to bottleneck the system. On the other hand, the hierarchical ring networks have been found to be cost effective interconnection networks [9] 12] [10], 17] 2] This topology allows simple switches, which can be modeled as a two by two crossbar switch. The hierarchical structure can exploit the spatial locality of memory access, often exhibited in many parallel programs, in order to reduce network traffic and memory latency. However, the ....

....2) 8, 8, 4, 2, 2) TH16 (16, 4, 4, 2) 16, 4, 4, 2, 2) Table 1: Hierarchical ring network topologies for local ring sizes of 8 and 16. Figure 5: The network latency of a packet delivery. 1 shows the considered hierarchical ring network topologies, the best topologies in the study of Hollyday [10]. The branching factor at each level of the hierarchy specifies the topology. In the table, TH16 for the system size 512 refers to a hierarchical ring network topology with 16 processing modules per local ring, 4 local rings per second level ring, 4 second level rings per third level ring, and 2 ....

[Article contains additional citation context not shown here]

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring-based shared memory multiprocessors," IEEE Tr. on Computers, vol. 43, pp. 52--67, Jan. 1994.


Analysis of A Scalable, All-Optical Interconnection Network For.. - Jones (1998)   (Correct)

.... especially scalability to a large number of PNs, while still maintaining a small diameter and low cost [12, 14] Recently there has been strong interest in hierarchical interconnection networks that can provide a high degree of scalability while still maintaining a low network latency [15, 16, 17, 18, 19, 20, 21, 22, 23]. The rationale behind hierarchical networks is based on the locality of reference found in the communication profiles of many parallel processing applications. Therefore, it is desirable to have cluster based interconnection networks where a cluster is comprised of a relatively small number of ....

M. Holliday and M. Stumm, "Performance Evaluation of Hierarchical Ring Based Shared Memory," IEEE Transactions on Computers, vol. C-43, pp. 52 -- 67, January 1989.


Bidirectional Ring: An Alternative to the Hierarchy of.. - Muhammad Jaseemuddin And (1995)   (Correct)

....sizes to assess their relative performance using a simple but real communication protocol and synthetic workloads that are tuned by carefully selecting the parameters to reflect the actual workloads. A similar approach is used in evaluating the performance of a hierarchical ring based system in [5]. The use of synthetic workloads allows us to study the behavior of each network under many possible interesting situations, and it also makes the simulation of large scale networks manageable. In [1] real applications are used to analyze medium scale systems using a slotted ring and running ....

....of all requests generated by the processor. The request rate is a parameter which indicates the average number of requests generated by the processor per cycle. It is fixed at 0.05 which corresponds to 20 cycles between two consecutive cache misses. This rate is supported by real workloads [5]. The ring cycle time and the processor clock cycle time are considered to be the same. The ratio of the ring cycle time to the memory access time is 10:1. For instance, a word read takes 10 ring cycles. However, for a cache line read, the first word takes 10 cycles, but each subsequent word is ....

[Article contains additional citation context not shown here]

M. Holliday, and M. Stumm, Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors, IEEE Trans. Computer, Vol. 42, No. 1, pp. 52-67, January 1994.


Hierarchical Optical Ring INterconnection (HORN): A Scalable.. - Louri, Gupta   (Correct)

....PEs. It is proving to be very difficult for flat interconnection networks to satisfy the above requirements, especially scalability to a large number of PEs, while still maintaining a small diameter and low cost. Recently there has been strong interest in hierarchical interconnection networks [3, 4] that can provide a high degree of scalability while still maintaining a low network latency. The rationale behind hierarchical networks is based on the locality of reference found in the communication profiles of many parallel processing applications. Therefore, it is desirable to have ....

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring based shared memory multiprocessors ", IEEE Transactions on Computers, C-43, pp. 52-67, January 1989.


Performance Issues in the Design of Hierarchical-ring and Direct .. - Ravindran (1998)   (Correct)

....already been done are marked by p . Those areas that have not been studied much are addressed in this dissertation (with the exception of distributed memory hybrid networks) There have been only a few performance studies of hierarchical ring and direct networks for shared memory multiprocessors [28, 38, 39, 42, 47, 54, 66, 95]. Holliday and Stumm [42] analyzed hierarchical ring multiprocessor networks under cell 3 Direct Indirect Hybrid Networks Networks Networks Sharedmemory p Distributed Memory p p Table 1.2: Categorizing multiprocessor interconnection networks. into direct, hybrid, and indirect networks ....

.... have not been studied much are addressed in this dissertation (with the exception of distributed memory hybrid networks) There have been only a few performance studies of hierarchical ring and direct networks for shared memory multiprocessors [28, 38, 39, 42, 47, 54, 66, 95] Holliday and Stumm [42] analyzed hierarchical ring multiprocessor networks under cell 3 Direct Indirect Hybrid Networks Networks Networks Sharedmemory p Distributed Memory p p Table 1.2: Categorizing multiprocessor interconnection networks. into direct, hybrid, and indirect networks for shared memory and ....

[Article contains additional citation context not shown here]

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring-based shared memory multiprocessors", IEEE Trans. on Computers, Vol. 43, No. 11, pp. 52-67, Jan 1994.


Predicting Application Behavior in Large Scale Shared-memory .. - Harzallah, Sevcik (1995)   (4 citations)  (Correct)

.... of remote memory latency (determined by the physical structure of the architecture and the locality properties of the application) and contention delay (wasted cycles due to queueing for resource availability in the network or at a memory module) Several studies of shared memory systems [SRG94, HS94, NSI94, SSRV94] have shown that the cost of communication for a remote memory reference is the key determinant of application performance. Hence, it is this aspect of the application that needs to be analyzed in detail. It follows that predicting the performance of parallel applications running ....

M. A. Holliday and M. Stumm. Performance evaluation of hierarchical ring-based shared memory multiprocessors. IEEE Transactions on Computers, 1(43):52--67, January 1994.


The Performance of SCI Multiprocessor Rings - Hexsel, Topham   (Correct)

....employed here. It is likely the results would show the same broad tendencies as those of DASH since the two machines are built from similar technologies SCI s faster network would provide a performance advantage. The Hector multiprocessor [20] using a hierarchical snooping protocol [6, 11] should have a performance comparable to that of the Express Ring. Holliday and Stumm report in [11] that Hector s hierarchical protocol scales well to a large number of processors (1024) if the applications possess good locality characteristics. 5 Conclusion This paper contains a detailed ....

....since the two machines are built from similar technologies SCI s faster network would provide a performance advantage. The Hector multiprocessor [20] using a hierarchical snooping protocol [6, 11] should have a performance comparable to that of the Express Ring. Holliday and Stumm report in [11], that Hector s hierarchical protocol scales well to a large number of processors (1024) if the applications possess good locality characteristics. 5 Conclusion This paper contains a detailed performance evaluation study of SCI based shared memory multiprocessors. Previous studies of SCI have ....

Mark Holliday and Michael Stumm. Performance evaluation of hierarchical ring-based shared memory multiprocessors. IEEE Trans. on Computers, C-43(1):52--67, January 1994.


Hot Spot Analysis in Large Scale Shared Memory Multiprocessors - Karim Harzallah (1993)   (4 citations)  (Correct)

....priority of resource acquisition among the different classes of requests, and optimization of negatively acknowledged packets. We then validated our model with a detailed simulator proven to be quite accurate when compared against measurements obtained from experimentation on the prototype [HS92] Our results [HS93] from simulation and analysis showed good agreement despite the assumptions made in deriving the model. We believe that the model captures the key features of the hierarchical structure, while making it possible to quickly study the effects of varying system parameters. 4 ....

M. Holliday and M. Stumm. Performance evaluation of hierarchical ring-based shared memory multiprocessors. Technical Report 1992--18, Duke University,Computer Science Department, 1992.


Performance Benefits and Limitations of Large NUMA.. - Sevcik, Zhou (1992)   (5 citations)  (Correct)

....in mesh connected multiprocessors (such as Alewife) 14] He obtained upper bounds on the performance in light of the communication requirements of applications. Holliday and Stumm have used detailed simulation to assess the performance of applications on large versions of the Hector structure [13]. Zhou and Brecht have studied processor allocation policies that are appropriate in NUMA systems [16] 2.1. The System Model The general NUMA architecture that we examine is shown in Figure 1. We assume that memory modules are paired one to one with processors, but that the memory modules ....

M. Holliday and M. Stumm, Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors, Technical Report CS-1992-18, Duke University (1992). -


Loop and Data Transformations: A Tutorial - Kulkarni, Stumm (1993)   (5 citations)  Self-citation (Stumm)   (Correct)

....be possible to estimate the run times quite accurately. IN order to predict the run time of a nested loop, it is necessary to have a good model of target architectures and the nested loops themselves. In particular, the existing architectural models for non uniform memory access machines [SZ91, HS92, KKB92] with possible modifications, might be able to serve this purpose. Dependences and data mapping, together with the mapping of iterations on the processor space can give us an estimate of the number of remote accesses. An analysis of the self input output dependences can give us a ....

M. Holliday and M. Stumm. Performance evaluation of hierarchical ring-based shared memory mutliprocessors. Technical Report CS-1992-18, Computer Science Department, Duke University, 1992.


On Topology and Bisection Bandwidth of Hierarchical-ring.. - Ravindran, Stumm (1998)   Self-citation (Stumm)   (Correct)

....networks. In a related earlier work, we presented the results of a simulation study of the scalability and bisection bandwidth constraints of hierarchical ring networks that considered only two request rates and did not take into account system throughput [4] In another study, Holliday and Stumm [2] studied the performance and scalability of hierarchical ring networks, but they assumed a high degree of locality in their memory accesses. Hamacher and Jiang [1] used analytical models to derive optimal hierarchical ring topologies. In contrast to the above work, our study is much more ....

....a single cluster that has a probability 1.0 of containing the target memory. This workload models poor communication locality. The system and workload parameters used in our study are summarized in Table 1. We define the cycle ratio as the relative speed of the processor, network, and memory [2]. It is specified as NXMY which means that each network cycle is X times as slow as a processor cycle and the memory requires Y processor cycles to service one memory request. We define network cycle time as the time required for a packet to move from the input of one node to the input of the next ....

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring-based shared memory multiprocessors, " IEEE Trans. on Computers, Vol. 43, No. 11, pp. 52-67, Jan 1994.


Issues in the Design of Direct Multiprocessor Networks - Ravindran, Stumm (1997)   Self-citation (Stumm)   (Correct)

No context found.

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring-based shared memory multiprocessors", IEEE Trans. on Computers, vol. 43, no. l1, pp. 52-67, Jan 1994.


A Performance Comparison of Hierarchical Ring- and.. - Ravindran, Stumm (1997)   (7 citations)  Self-citation (Stumm)   (Correct)

....better than meshes only for small system sizes (by 10 30 ) for larger systems the performance of hierarchical rings is severely constrained due to bisection bandwidth limitations and meshes perform significantly better. Although some previous work on the performance of hierarchical ring networks [4, 13, 16, 20, 21] and on mesh networks [1, 2, 8, 12, 23] has been published, we are aware of only one study that compares the performance of both types of networks [15] That study uses analytical models to conclude that three level hierarchical systems perform somewhat better than mesh systems. The rest of the ....

....the smpl simulation library [19] The batch means method of output analysis was used, with the first batch discarded to account for initialization bias. A hierarchical ring base simulator was validated against measurements taken from the Hector prototype, a hierarchical slotted ring architecture [16, 26]. The base simulator was then extended to model other switching techniques, such as wormhole routing. For 2D meshes, the processor and memory modules are essentially the same as in the ring simulator with new NIC modules that incorporates switching, routing and flow control in meshes. Our primary ....

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring-based shared memory multiprocessors", IEEE Trans. on Computers, vol. 43, no. l1, pp. 52-67, Jan 1994.


A Comparison of Blocking and Non-Blocking Packet Switching.. - Ravindran, Stumm (1996)   Self-citation (Stumm)   (Correct)

.... using the smpl simulation library and uses the batch means method of output analysis with the first batch discarded to account for initialization bias [11] A base simulator was validated against measurements taken from the Hector prototype, a non blocking hierarchical slotted ring architecture [8], 16] The base simulator was then extended to model other switching techniques, such as wormhole and non blocking virtual cut through. Our measure of performance is average memory access latency, the time between when a request is first issued and the corresponding response is received. This ....

M. Holliday and M. Stumm, "Performance evaluation of hierarchical ring-based shared memory multiprocessors", IEEE Trans. on Computers, 43(1), pp. 52-67, 1994.


Hierarchical Ring Topologies and the Effect of their Bisection .. - Ravindran And (1995)   (2 citations)  Self-citation (Stumm)   (Correct)

....patterns. Thus we are able to emulate a wide spectrum of memory access behaviors, from high to low cache miss rates and from high degrees of memory locality to almost no locality. There have been only few studies on performance of scalable hierarchical ring networks so far. Holliday and Stumm [5] studied the performance of large scale hierarchical slotted ring architectures. Throughout their study they assumed very large degrees of locality in their workloads which makes their results applicable only for well behaved applications. Hamacher and Jiang [4] used analytical models and compared ....

....to account for initialization bias. The batch termination criterion was that each processor had to complete at least some minimum number of requests (in our simulations it is 200 requests per processor) The base simulator was validated against measurements taken from the Hector prototype [5]. It was then extended to model features not present in Hector, such as the insertion ring interface, flit level simulation, wormhole routing and flow control. 2.3 Benchmark Description In order to evaluate the performance of the interconnection network under controlled conditions, we used ....

M. Holliday, M. Stumm, "Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors," IEEE Transactions on Computers, (Jan 1994), pp. 52-67


Loop and Data Transformations: A Tutorial - Kulkarni, Stumm (1993)   (5 citations)  Self-citation (Stumm)   (Correct)

....be possible to estimate the run times quite accurately. IN order to predict the run time of a nested loop, it is necessary to have a good model of target architectures and the nested loops themselves. In particular, the existing architectural models for non uniform memory access machines [SZ91, HS92, KKB92] with possible modifications, might be able to serve this purpose. Dependences and data mapping, together with the mapping of iterations on the processor space can give us an estimate of the number of remote accesses. An analysis of the self input output dependences can give us a ....

M. Holliday and M. Stumm. Performance evaluation of hierarchical ring-based shared memory mutliprocessors. Technical Report CS-1992-18, Computer Science Department, Duke University, 1992.


A Survey and Comparison of Multi-Ring Techniques for Scalable.. - Wang, Yurcik   (Correct)

No context found.

M. Holliday and M. Stumm, "Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors," IEEE Transactions on Computers , 1993.


Modeling Communication Locality in Multiprocessors - Salisbury, Chen, Melhem (1999)   (2 citations)  (Correct)

No context found.

M. Holliday and M. Stumm, Performance evaluation of hierarchical ring based shared memory multiprocessors, IEEE Trans. Comput. 43 (1994), 52#67.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC