22 citations found. Retrieving documents...
Douglas C. Burger and David A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Comparison of Wormhole-Routed Interconnection Networks - Petrini, Vanneschi (1997)   (Correct)

....of the communication performance of these machines is not an easy task because they have widely different technological characteristics. On the other hand, theoretical models of the interconnection network often prove overly simplistic and are not able to capture important performance aspects [BW95] Mou96] In this paper we try to face this problem with a detailed simulation model and a real application, the transpose FFT algorithm. Our experiments are conducted on the quaternary fat trees and the bidimensional tori, whose communication performance is properly equalized. The remainder of ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In International Symposium on Parallel Processing, April 1995.


Communication Performance of Wormhole Interconnection Networks - Petrini (1997)   (Correct)

....the existing prototypes have different technological characteristics. In practice, it is diffucult to compare apples with apples. On the other hand, theoretical models of the interconnection network often prove overly simplistic and are not able to capture important performance aspects [111] [23] [3] In this Section we compare the two interconnection networks using the cost model introduced in Section 6.1. The raw data already shown in Sections 6.5 and 6.4.5 are filtered to take into account the router complexity and the wire delay. In Figure 6.10 a) we can see that the bi dimensional ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In International Symposium on Parallel Processing, April 1995.


Communication Performance of Wormhole Interconnection Networks - Petrini (1997)   (Correct)

....less sensitive to congestion 3 . Unfortunately, accurate studies on the network performance proved that these models are not robust in the presence of high network loads or non uniform traffic patterns. Burger and Wood compared several approximate models using cache coherent multiprocessors [22]. In practice, it is very difficult to find the actual channel utilization given a user program. But there is another important point: even if know in advance 4 this value, the results remain disappointing for many programs, in particular those that execute several global communication ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, IPPS'94, April 1995.


Reducing Synchronization Overhead in Parallel Simulation - Ulana Legedza William (1996)   (15 citations)  (Correct)

....[21] 14] They synchronize using periodic global barriers. They achieve good performance by decreasing network simulation accuracy, which allows them to synchronize infrequently. The WWT researchers have explored a range of network simulation models (from very accurate to not very accurate) [7]. All of these could easily be incorporated into Parallel Proteus, but would not solve the problem of reducing synchronization overhead while maintaining accurate network simulation. However, WWT and PTL cannot exploit lookahead in the same way as Parallel Proteus can, because they simulate shared ....

Douglas C. Burger and David A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.


How Much Does Network Contention Affect Distributed Shared.. - Dai, Panda (1997)   (Correct)

....model and the average latency model, as used in the WWT [7] and the FLASH [5, 8] projects. As mentioned before, these models do not provide useful insights into the effect of network contention on DSM system performance. A set of network simulation models for DSM systems have been proposed in [1] to show the tradeoff between accuracy and efficiency of network simulation. However, in this paper, our focus has been to isolate and quantify various types of network contention and study their impact on the overall DSM system performance under a set of design choices. Since network contention ....

D. C. Burger and D. A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In IPPS'95, April 1995.


Scheduling with Global Information in Distributed Systems - Petrini, Feng (2000)   (2 citations)  (Correct)

....in bytes, that can be sent or received by a single processor is hmax = ffi g . Furthermore, by globally scheduling a communication pattern, as described in Section 3. 2, we can derive an accurate estimate of the communication time with simple analytical models already developed for the BSP model [4]. Another important benefit of the BSP model is higher resource utilization over the parallel machine, irrespective of the computational and communication patterns. For example, a sparse communication pattern (where a single pro 4 h denotes the maximum amount of information sent or received by ....

D. C. Burger and D. A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, IPPS'95, Santa Barbara, CA, April 1995.


Scheduling with Global Information in Distributed Systems - Petrini, Feng (2000)   (2 citations)  (Correct)

....bytes, that can be sent or received by a single processor is h max = T g : Furthermore, by globally scheduling a communication pattern, as described in Section 3. 2, we can derive an accurate estimate of the communication time with simple analytical models already developed for the BSP model [28, 4, 27]. Another important bene t of the BSP model is higher resource utilization over the parallel machine, irrespective of the computational and communication patterns. For example, a sparse communication pattern (where a single processor receives h max bytes) or a more dense communication pattern ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, IPPS'95, Santa Barbara, CA, April 1995. 24


Latency and Bandwidth Requirements of Massively Parallel.. - Petrini (1999)   (Correct)

....models, but also some research results [12] are based on the assumption that latency can be modeled with a constant value. In the presence of bandwidth bound patterns this parameter is not easy to to predict. Also, it can be some order of magnitudes bigger than the base latency without con icts [11]. If we switch from the latency of the single packet to the global behavior of the network, we can note that the accepted bandwidth remains stable during the AAPB. After an initial period of 10000 cycles of deterministic routing, the processing nodes receive a constant amount of data per unit of ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In International Symposium on Parallel Processing, April 1995.


Improved Resource Utilization with Buered Coscheduling - Petrini, Feng   (Correct)

....in bytes, that can be sent or received by a single processor is hmax = g : Furthermore, by globally scheduling a communication pattern, as described in Section 3. 2, we can derive an accurate estimate of the communication time with simple analytical models already developed for the BSP model [24, 5, 23]. Another important bene t of the BSP model is higher resource utilization over the parallel machine, irrespective of the computational and communication patterns. For example, a sparse communication pattern (where a single processor receives hmax bytes) or a more dense communication pattern ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, IPPS'95, Santa Barbara, CA, April 1995. 15


Mechanisms for Efficient Shared-Memory, Lock-Based Synchronization - Kagi (1999)   (2 citations)  (Correct)

....latency provides sufficient lookahead for efficient parallel simulation, as host nodes stop and synchronize only once every C cycles, where C is the constant network latency. Using a small C (or variablelength messages) reduces the node lookahead, which causes severe increases in simulation time [BW95]. Although I model contention at the target node interfaces, memory, and memory directories, using a constant network latency ignores contention in the net 71 work itself. To account for network contention, I use an analytical model [SGV92] which takes the network load as a parameter) to derive ....

.... latencies for the benchmarks ranged from 85 to 112 processor cycles for current technology parameters, and from 165 to 188 processor cycles for future technology parameters) To validate this process, I use a detailed, event driven network simulator (based on the original WWT network simulator [BW95]) that accurately simulates message buffering, message retransmission, and flow control [BG95] The implementation serializes the network simulation at a central host node, making simulation performance suffer by roughly a factor of 15. The target network used for the validation is an 84 mesh of ....

Doug Burger and David A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In Proceedings of the Ninth International Parallel Processing Symposium, pages 22--31, April 1995.


Latency and Bandwidth Requirements of Massively Parallel.. - Petrini, Vanneschi (1999)   (Correct)

....but also some research results [CLR94] are based on the assumption that latency can be modeled with a constant value. In the presence of bandwidth bound patterns this parameter is not easy to to predict. Also, it can be some order of magnitudes bigger than the base latency without conflicts [BW95] If we switch from the latency of the single packet to the global behavior of the network, we can note that the accepted bandwidth remains stable during the AAPB. After an initial period of 10000 cycles of deterministic routing, the processing nodes receive a constant amount of data per unit of ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In International Symposium on Parallel Processing, April 1995.


Time-Sharing Parallel Jobs in the Presence of Multiple.. - Petrini, Feng (2000)   (3 citations)  (Correct)

.... bytes, that can be sent or received by a single processor is hmax = T g : Furthermore, by globally scheduling a communication pattern, as described in Section 2, we can derive an accurate estimate of the communication time with simple analytical models already developed for the BSP model [21] [2] [20] Unfortunately, BSP computations are overly restrictive, and many important applications cannot be eciently expressed using this model. With BCS, we can inherit the nice mathematical framework of BSP, without forcing the user to write BSP programs. 8 Conclusion and Future Work In this ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, IPPS'95, Santa Barbara, CA, April 1995.


Simulating Architecture Adaptive Algorithms with MISS-PVM - Kvasnicka, Ueberhuber (1997)   (Correct)

....Their tool Paps is used before any actual coding on real hardware. Simulation Environments Some simulators are embedded in whole parallel programming environments like Dimemas, which is described in Labarta et al. 14] Another simulation project is the Wisconsin Wind Tunnel (Burger, Wood [6], Reinhardt et al. 18] This simulator predicts the performance of shared memory machines using several different network simulation models. A two fold technique is used: the program is executed (cache hits remain unchanged) and cache misses are simulated by trapping to the WWT. The authors ....

D. C. Burger, D. A. Wood, Accuracy vs. Performance in Parallel Simulation of Interconnection Networks, Proceedings 9th International Parallel Processing Symposium, IEEE Press, Los Alamitos, 1995, pp. 22--31.


How Much Does Network Contention Affect Distributed Shared.. - Dai, Panda (1997)   (Correct)

....latency. Representative examples include various memory consistency models [11] data pre fetching [25] data forwarding updating [21] remote get put operations [27] integrated or decoupled protocol controllers [8, 10, 12, 31] estimating accuracy vs. performance in simulating DSM systems [2], software DSM systems [23, 1, 19] and explicit communication primitives [29] Research towards reducing network latency has been largely left to the (interconnection) network community. However, most recently, several papers [18, 9, 31] have reported that network latency is becoming a key ....

D. C. Burger and D. A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the International Symposium on Parallel Processing, April 1995.


Prototype-Oriented Development of High-performance Systems - Justo, al. (1997)   (Correct)

....give general and accurate results [10, 17] The approach in the case of simulation is to apply a form of symbolic execution of an abstract view of the design. The main advantage of simulation is that, in general, the accuracy of the results is better than those obtained by analytical modelling [8, 2]. For this reason, simulation has been widely used as technique for studying the performance of HPS. Measurements can also be obtained after the program has been implemented. In this case a form of monitoring is used [4, 24] These results are then used to modify the design or in a less costly way ....

D. C. Burger and D. A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In International Symposium on Parallel Processing, pages 1--12. IEEE Press, April 1995.


Optimistic Simulation of Parallel Architectures Using.. - Sashikanth.. (1996)   (2 citations)  (Correct)

....and 100 cycles for optimistic algorithms. Depending on the desired accuracy of simulation of the target network contention and topology, the quantum length must be less than or equal to the message latency. The fixed network latency assumption results in an error of over 20 in several cases [2]. Reducing the quantum length results in a further increase in the synchronization cost. Since optimistic techniques improve the lookahead and reduce the frequency of synchronization, they may perform better if the network simulation messages are not rolled back frequently. Greater Host Message ....

Doug Burger and David A. Wood, Accuracy vs. Performance in Parallel Simulation of Interconnection Networks, International Symposium on Parallel Processing, April 1995.


Simulation of the SCI Transport Layer on the Wisconsin Wind.. - Douglas Burger And (1995)   (2 citations)  Self-citation (Burger)   (Correct)

....in between quanta of target execution. This centralization is a severe limitation, which inflates simulation time substantially. The simulator is nevertheless critical for validating whatever less expensive network models are used (a detailed discussion of relevant trade offs appears elsewhere [1]) It also allows design parameters of SCI based networks to be evaluated. Finally, the network simulator enables the measurement effects produced by architectural and protocol optimizations that change the network load or contention distributions. The rest of this report is structured as follows: ....

....quantum latency is too large, the network simulator may not send the header unit to the intended recipient before the target receipt event was to occur, causing an error. Smaller quanta can greatly inflate simulation time, as the number of inter quanta synchronizations per target cycle increases [1]. 4.3 Resource latencies The physical times that symbols take to move across network resources are listed in Table 1. is a base factor that accounts for the speed of the hardware. The relations shown in Table 1 were arrived at by estimating the amount of hardware needed to perform each function, ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.


Efficient Synchronization: Let Them Eat QOLB - Kägi, Burger, Goodman (1997)   Self-citation (Burger)   (Correct)

....constant latency provides sufficient lookahead for efficient parallel simulation, as nodes stop and synchronize only once every cycles, where is the constant network latency. Using a small (or variable length messages) reduces the node lookahead, which causes severe increases in simulation time [6]. Although we model contention at the node interfaces, memory, and memory directories, using a constant network latency ignores contention in the network itself. To account for network contention, we used an analytical model [41] which takes the network load as a parameter) to derive a different ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the Ninth International Parallel Processing Symposium, pages 22--31, April 1995.


A Programming Tutorial for the Wisconsin Wind Tunnel - Or How   Self-citation (Burger)   (Correct)

....The table does not list the switches for the network simulators described in the next section. 10 Network Simulation This section describes the range of options for simulating interconnection networks with the Wisconsin Wind Tunnel. Burger and Wood describe the network simulator in more detail in [3]. Our simulated networks assume unidirectional virtual channels with one virtual channel per physical channel. The network is wormhole routed and the routers are loosely based on Dally s Torus router. Routing is statically determined; adaptive routing schemes are currently under development. 10.1 ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In To Appear in the Proceedings of the 9th International Parallel Processing Symposium, April 1995.


An Analysis of the Interactions of Overhead-Reducing .. - Kagi, Aboulenein.. (1995)   Self-citation (Burger)   (Correct)

....take a constant number of cycles for traversal. This assumption of constant latency provides sufficient lookahead at each node to allow efficient parallel simulation. Reducing the minimum end to end network latency reduces the node lookahead, which causes severe increases in simulation time [6]. The constant latency assumption ignores network contention, which can play a pivotal role in evaluating various optimizations. Optimizations that reduce target execution time without a corresponding reduction in communication raise the effective load on the network. Other optimizations that ....

....iterated this process until the difference between the network latency constant and the value produced by the model for that run converged to within one cycle per message. To validate this process, we used a detailed, event driven SCI network simulator (based on the original WWT network simulator [6]) that accurately simulates message buffering, message retransmission, and flow control [5] The implementation serializes the network simulation at a central node, making simulation performance suffer by roughly a factor of 15. The target network that we used to derive the validation was an mesh ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proc. of the 9th International Parallel Processing Symposium, April 1995.


Parallel Computer Research in the Wisconsin Wind Tunnel Project - Hill, Larus, Wood (1996)   (2 citations)  Self-citation (Wood)   (Correct)

....most realistic and concrete academic machine remains but a wind tunnel model of a commercial product. The Wisconsin Wind Tunnel project employed a mixture of complementary methods: Micro architectural level simulation (e.g. of Typhoon, Typhoon 1, and Typhoon 2) using the Wisconsin Wind Tunnel [37,36,15,2,8] and other simulators. New tools for performance measurement and modelling [30, 29, 28] User and system software prototyping and development on existing commercial platforms (e.g. Blizzard CM 5 and Blizzard COW) Surgical hardware prototyping, as exemplified by the MBus card we ....

Douglas C. Burger and David A. Wood. Accuracy vs. Performance in Parallel Simulation of Interconnection Networks. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.


Reducing Synchronization Overhead in Parallel Simulation - Legedza (1995)   (15 citations)  (Correct)

No context found.

Douglas C. Burger and David A. Wood. Accuracy vs. performance in parallel simulation of interconnection networks. In Proceedings of the 9th International Parallel Processing Symposium, April 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC