| S. Chittor and R. J. Enbody. Performance evaluation of mesh--connected wormhole--routed networks for interprocessor communication in multicomputers. In Proceedings of the 1990. |
....represent spheres of spatial locality, but these still do not capture the communication structure of specific parallel algorithms or applications. In particular, many scientific programs generate permutation patterns such as matrix transpose (dimension reversal) bit complement, and bit reversal [8 12]. In this paper, we focus on characterizing network performance over a range of destination distributions. 2.2 Routing The routing algorithm determines which path a packet takes to reach its destination. Each time a packet enters a node, the routing algorithm generates a list of candidate links ....
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormholerouted networks for interprocessor communication in multicomputers," in Supercomputing, pp. 647--656, November 1990.
....DMA hardware is also needed to transfer messages between theses channels and system memory. An abstract n port communication model has been assumed in [10] to develop efficient communication algorithms for regular communication patterns in n cubes. Preliminary experimentation on the Symult 2010 [3] has led to the observation that the processor to router channel is a bottleneck for several communication patterns. Scientific applications typically generate non uniform traffic [8] For such traffic, there exists consumption bottleneck due to the messages queuing at the hot destination nodes. ....
S. Chittor and R. Enbody. Performance evaluation of meshconnected wormhole-routed networks for interprocessor communication in multicomputers. In Proc. ofSupercomputing, pp. 647 656, Nov. 1990.
....because performing packetization and reassembly in software not only degrades processor performance, it also increases communication latency. For example, Chittor and Enbody s studies showed packetization overheads (24.6 sec) for each packet, much larger than the typical routing delays (14.8 sec) [3]. 3 Second, packetization increases the network load; each packet must contain routing and sequencing information in its header. For smaller packet sizes, the overhead increases proportionally. For example, the Thinking Machines CM5 [8] uses five word packets, so one word of routing and ....
S. Chittor and R. Enbody. Performance evaluation of mesh-connected wormholerouted networks for interprocessor communication in multicomputers. In Proceedings of Supercomputing, 1990.
....message. Third, typical message passing libraries implement some form of packetization, breaking long messages down into some maximumphysical transfer size (256 or 512 bytes, typically) For instance, in the Symult 2010, messages longer than 256 bytes are sent as a series of 256 byte packets [10]. From the network s point of view, packetization eliminates the extremely long messages and dramatically increases the number of messages at the maximum physical transfer size. Messages below the maximum transfer size are transmitted without resizing. Fourth, a variety of novel programming models ....
....most packetization systems use fairly long packets. More significantly, packetization introduces additional overhead for message to packet conversion and reassembly and resequencing at the destination. If these tasks are performed in software, they can dramatically increase message passing latency [10]. The obvious alternative is to implement them in hardware, but to date no commercial multicomputers have done so. Perhaps this is because the cost of packetization hardware can be significant. In addition, the performance benefits of packetization are unclear. Before hardware designs include such ....
S. Chittor and R. Enbody. Performance Evaluation of Mesh-connected Wormhole-routed Networks for Interprocessor Communication in Multicomputers. In Proceedings of Supercomputing, 1990.
....message. Third, typical message passing libraries implement some form of packetization, breaking long messages down into some maximum physical transfer size (256 or 512 bytes, typically) For instance, in the Symult 2010, messages longer than 256 bytes are sent as a series of 256 byte packets [13]. From the network s point of view, packetization eliminates the extremely long messages and dramatically increases the number of messages at the maximum physical transfer size. Messages below the maximum transfer size are transmitted without resizing. Fourth, a variety of novel programming models ....
....packetization systems use fairly large packets. More significantly, packetization introduces additional overhead for message to packet conversion and reassembly and resequencing at the destination. If these tasks are performed in software, they can dramatically increase message passing latency [13]. The obvious alternative is to implement them in hardware, but to date no commercial multicomputers have done so. Perhaps this is because the cost of packetization hardware can be significant. In addition, the performance benefits of packetization are unclear. Before hardware designs include such ....
S. Chittor and R. Enbody. Performance Evaluation of Mesh-connected Wormholerouted Networks for Interprocessor Communication in Multicomputers. In Proceedings of Supercomputing, 1990.
....interface, which can be characterized by number of consumption channels for message consumption. If the worm gets blocked or slowed down due to limited consumption channels, then the subsequent flits of the worm remain in the network for additional time leading to wastage of network bandwidth. In [10], it was first reported that wormhole systems can undergo severe performance degradation due to limited number of router to processor consumption channels. The results were based on experimentation on the deterministic wormhole routed Symult 2010 system and suggested using multiple consumption ....
S. Chittor and R. Enbody. Performance Evaluation of Mesh-Connected Wormhole-Routed Net works for Interprocessor Communication in Multicomputers. In Proceedings of the Supercomputing '90, New York, pages 647--656, Nov 1990.
....represent spheres of spatial locality, but these still do not capture the communication structure of specific parallel algorithms or applications. In particular, many scientific programs generate permutation patterns such as matrix transpose (dimension reversal) bit complement, and bit reversal [13 16]. Other application constructs, such as synchronization or multicast operations, may induce hot spots of heavily utilized nodes and links [16 18] Finally, dynamic models [1] can produce variation in target destinations during the course of application execution. Interface Control Memory ....
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormholerouted networks for interprocessor communication in multicomputers," in Supercomputing, pp. 647--656, November 1990.
....are possible with a lower annex setup cost. This study is a beginning, more work needs to be done to determine the right set of network interfaces which can provide robust messaging performance. 7 Related work While several studies have measured communication performance on parallel machines [7, 21, 15], they have not explored the relationship between network interface architecture and messaging performance. A great deal of attention has also been focused separately on specialized hardware support for messaging, the design of efficient messaging layers, and network management for improving ....
S. Chittor and R. Enbody. Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers. In Proceedings of Supercomputing, pages 647--56, 1990.
....Because both of these effects reduce communication latencies when random mappings are used without changing the situation when ideal mappings are used, the impact of exploiting physical locality on end performance will be lower when higher dimensional networks (n 2) are used. 5 Related Work In [5], Chittor and Enbody present data obtained from runningexperiments similar to those described in Section 3 on the Ametek 2010, a distributed memory, mesh connected multiprocessor somewhat similar to that used in this paper. For the sizes of machines measured (up to 144 nodes) they note that the ....
Suresh Chittor and Richard Enbody. Performance Evaluation of Mesh-Connected WormholeRouted Networks for Interprocessor Communication in Multicomputers. In Proceedings of Supercomputing '90, pages 647--656, November 1990.
....wormhole routing the network latency is almost independent of the length of a path. In order to reduce the number of channels used for a given multicast, the subpath between the source and one of the destinations in a multicast path is not necessary a shortest path. The simulation study shown in [13] indicates that channel congestion becomes an important issue in affecting the network performance when the traffic is high. The simulation results presented in [10] also show that path like model provides much better performance than the tree like model when there is a contention in the network. ....
....buffer cannot be released until the last flit is transmitted. Furthermore, for simultaneous transmission of multiple multicast paths, the bandwidth between the local processor and router may become a bottleneck. This router to processor channel bottleneck has already been observed in Symult 2010 [13]. The performance of the proposed multicast routing algorithms are dependent on the efficient implementation of of the routing function R. The complexity analysis shown in Section 4 is based on a software approach. It has been shown in [16] that with some special hardware design tricks, the ....
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers," in Proceedings of Supercomputing '90, pp. 647 -- 656, Nov. 1990.
.... Third, typical message passing libraries implement some form of packetization, breaking long messages down into some maximum physical transfer size (256, 512, or 1024 bytes, typically) For instance, in the Symult 2010, messages longer than 256 bytes are sent as a series of 256 byte packets [11]. Such packetization eliminates extremely long messages and dramatically increases the number of messages at the maximum physical transfer size. Messages below the maximum transfer size are transmitted without fragmenting. Fourth, a variety of novel programming models seek to exploit parallelism ....
....performing packetization and reassembly in software not only degrades processor performance, it also increases communication latency. For example, Chittor and Enbody s studies showed packetization overheads (24.6 sec) for each packet, much larger than the typical routing delays (14.8 sec) [11]. 2 Second, packetization increases the network load; each packet must contain routing and sequencing information in its header. For smaller packet sizes, the overhead increases proportionally. For example, the Thinking Machines CM 5 [21] uses five word packets, so one word of routing and ....
S. Chittor and R. Enbody. Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers. In Proceedings of Supercomputing, pages 647--56, 1990.
....represent spheres of spatial locality, but these still do not capture the communication structure of specific parallel algorithms or applications. In particular, many scientific programs generate permutation patterns such as matrix transpose (dimension reversal) bitcomplement, and bit reversal [6 9]. In this paper, we focus on characterizing network performance over a range of destination distributions. 2.2 Routing The routing algorithm determines the path a packet takes in order to reach its destination. Each time a packet enters a node, the routing algorithm generates a list of candidate ....
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers," in Supercomputing, pp. 647--656, November 1990.
....communications. However, the network we consider here is different from those mentioned above. The network with wormhole routing features limited message buffering and pipeline transmission. We need to develop new techniques to support delay sensitive traffic. For wormhole routing, Chittor [7] reported their experimental results showing that contention for network channels can scale up with the system size and will become the primary concern in ensuring fast communication in future multicomputers. Different routing algorithms are studied [8, 9] in order to avoid deadlocks in wormhole ....
S. Chittor and R. Enbody, "Performance Evaluation of Mesh-Connected Wormhole-Routed Networks for Interprocessor Communication in Multicomputers," Supercomputing, 1990.
....number of consumption channels and b) available local memory bandwidth for message consumption. If the worm gets blocked or slowed down due to limited consumption capacity, then the subsequent flits of the worm remain in the network for additional time leading to wastage of network bandwidth. In [7], it was first reported that wormhole systems can undergo severe performance degradation due to limited number of router toprocessor consumption channels. The results were based on experimentation on the deterministic wormhole routed Symult 2010 system and suggested using multiple consumption ....
S. Chittor and R. Enbody. Performance Evaluation of Mesh-Connected Wormhole-Routed Net works for Interprocessor Communication in Multicomputers. In Proceedings of the Supercomputing '90, New York, pages 647--656, Nov 1990.
....performance implications. In [11] the first issue is studied. Many of the papers on adaptive routing address the latter two issues in the generation of non uniform traffic. Traffic patterns generated from real applications would be ideal in any evaluation, but to date, we know of only one study [12] which has examined this. Topology: The behavior of routing algorithms over a wide range of topologies and sizes is an area that has also been understudied. The influence that the connectivity and the size of networks have on traffic characteristics such as contention would be interesting. ....
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers," in Supercomputing, pp. 647--656, November 1990.
....described in Section 3. reduce communication latencies when random mappings are used without changing the situation when ideal mappings are used, the impact of exploiting physical locality on end performance is lower when higher dimensional networks (n 2) are used. 5 Related Work In [5], Chittor and Enbody present data obtained from running experiments similar to those described in Section 3 on the Ametek 2010, a distributed memory, mesh connected multiprocessor somewhat similar to that used in this paper. For the sizes of machines measured (up to 144 nodes) they note that the ....
Suresh Chittor and Richard Enbody. Performance Evaluation of Mesh-Connected Wormhole-Routed Networks for Interprocessor Communication in Multicomputers. In Proceedings of Supercomputing '90, pages 647--656, November 1990.
....the communication. Efficient communication is essential for good overall performance. In this paper we model communication on the recently introduced wormhole routed multicomputers. On these computers, contention rather than path length is the limiting factor in achieving efficient communication [3]. In [10] we introduced a metric for contention and demonstrated that it was accurate in predicting the performance of homogeneous tasks mapped onto wormhole routed computers. In this paper, we extend that work to consider the more realistic case of inhomogeneous tasks. A metric that predicts ....
....to path lengths. At first the mapping problem was thought to be insignificant, and random mapping was found to be good enough [4] Later the increase in communication time due to contention was found to be significant on larger machines, and is was found that contention scaled with system size [3]. Contention can cause a network to become saturated, leading to an exponential increase in communication delay. Path contention level, introduced in [2, 10] characterizes the level of contention. It was shown to be useful for predicting the point at which the communication network saturates, ....
Suresh Chittor and Richard Enbody. "Performance Evaluation Of Mesh-connected Wormholerouted Networks For Interprocessor Communication In Multicomputers". Proceedings of Supercomputing '90, pages 647--656, November 1990.
....communication. Efficient communication is essential for good overall performance. In this paper we model communication on the recently introduced wormhole routed mesh multicomputers. On these computers, contention rather than path length is the limiting factor in achieving efficient communication [3]. In [10] we introduced a metric for contention and demonstrated that it was accurate in predicting the performance of homogeneous tasks mapped onto wormhole routed mesh computers. In this paper, we extend that work to consider the more realistic case of inhomogeneous tasks. A metric that predicts ....
....In wormhole routed mesh multicomputers communication time is insensitive to path lengths. Then the mapping problem was insignificant, and random mapping was claimed to be good enough [4] Increase in communication time due to contention is significant and contention scales with system size [3]. With contention communication network cannot afford to accept all the traffic injected. This would inevitably lead to congestion (near lock up) of the communication network and practically limiting the network from operation. Path contention level is introduced in [2, 10] characterizes the level ....
Suresh Chittor and Richard Enbody. Performance evaluation of mesh-connected wormholerouted networks for interprocessor communication in multicomputers. Proceedings of Supercomputing '90, pages 647--656, November 1990.
....Intel DELTA 2 routine easier. In addition, an analysis of path contention combined with knowledge about message lengths can indicate that any mapping is acceptable for a particular problem. Contention for communication channels on wormhole routed meshes has been studied by Dally [7] and Chittor [4, 5, 6]. Dally showed that contention could increase communication time dramatically. Chittor demonstrated contention to be visible on a first generation, wormhole routed mesh the Symult 2010. He also developed a metric for predicting contention and through simulation predicted that contention could be ....
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers," in Proc. Supercomputing Conference, pp. 647--656, Nov. 1990. Communication and Contention on the Intel DELTA 16
No context found.
S. Chittor and R. J. Enbody. Performance evaluation of mesh--connected wormhole--routed networks for interprocessor communication in multicomputers. In Proceedings of the 1990.
No context found.
S. Chittor and R. Enbody, "Performance evaluation of mesh-connected wormhole- routed networks for interprocessor communication in multicomputers," in Supercomputing, pp. 647 656, November 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC