| M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992. |
.... tree leaf graph of height 3. Processors are drawn in black and routers in grey. to represent multi stage machines with constant bandwidth, such as the CM5 [33] for which experiments have shown that bandwidth is constant between every pair of processors and hardly depends on network congestion [35], or the SP 2 with power of two number of nodes. The two additional parameters cluster and weight serve to model heterogeneous architectures for which multiprocessor nodes having several highly interconnected processors (typically by means of shared memory) are linked by means of networks of ....
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saro. Performance evaluation of the CM-5 interconnection network. In Proceedings of CompCon Spring'93, 1993.
....by several parallel computers as the Connection Machine CM 5 [10] the Data Diffusion Machine [15] and the Meiko CS 2 [14] Unfortunately, not much is known on the communication performance of the fat trees. Most of the literature deals with the CM 5 and focuses on raw network performance [7] [12] [13] Typical communication patterns include simple sends and ping pong between pairs of nodes. Block permutations of data and grid shifts have been shown to have little or no contention on the CM 5. This makes the data network very efficient for regular communication patterns commonly used in ....
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
....processors to route the message in the ascending and descending phases. Other references to fat trees include [70] 88] Unfortunately, not much is known on the communication performance of the fat trees. Most of the literature deals with the CM 5 and focuses on raw network performance [94] [103] [107] Typical communication patterns include simple sends and ping pong between pairs of nodes. Block permutations of data and grid shifts have been shown to have little or no contention on the CM 5. This makes the data network very efficient for regular communication patterns commonly used in ....
Mengjou Lin, Rose Tsang, David H. C. Du, Alan E. Klietz, and Stephen Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
....libraries, and parallel file systems that exploit disk arrays. However, not much is known about the achievable performance, the software overhead involved, and the interactions among the new features. Previous performance measurements by Bozkus et al. [7] Ponnusamy et al. [26, 27] and Lin et al. [22] exposed the performance limitations of an early version of the CM 5 communication library. Since then, the new communication library has been re implemented using active messages [34] as the base layer. Thus, the first goal of this thesis is to re evaluate the CM 5 s computation and communication ....
....that the peak bandwidth per node is 20 megabytes second. However, Table 5.3 shows that, for simple send, the observed data rate is only 8.3 megabytes second. The effective data rate is limited not by the link bandwidth, but by the speed the SPARC processor can inject data into the data network [22]. The cost of receiving sixteen bytes of data (the payload of a data network packet) is about 60 cycles [9] Because the SPARC processor has a 32 megahertz clock, the receiving bandwidth is at most 16 Theta32 60 = 8.5 megabytes second. The second test, send reply, is identical to simple send ....
[Article contains additional citation context not shown here]
Lin, M. J., Tsang, R., Du, D. H. C., Klietz, A. E., and Saroff, S. Performance Evaluation of the CM-5 Interconnection Network. In Proceedings of Spring COMPCON 93 (Feb 1993), pp. 189--198.
....read and write data from and to the NI s buffers [15] For this fat tree test, we used the CM 5 message passing library, namely CMMD 3.1. It is designed to manage interprocessor communication and provides simple routines for message passing as well as global synchronization and global operations [8]. See Appendix A for the CMMD v. 3.1 code used in this experiment. ffl Rating: 20 MB sec for raw communications within groups of four nodes, 10 MB sec for within groups of 8 nodes with the same grandparent, and 5 MB sec elsewhere [7] due to message packet overheads, the data communication rates ....
Mengjou Lin, Rose Tsang, David H. C. Du, Alan E. Klietz, and Stephen Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
....are taken into account in existing models of communication delay, but these models characterize only point to point communication in a contention free network. A general methodology to measure performance of communications under contention does not exist. Benchmarks dedicated to specific machines [7, 13, 4] generally fill in this gap. Results of these benchmarks are difficult to use in order to compare different machines. In this paper a general methodology is proposed to evaluate performance under light to average contention of point to point communications. Very high contention should not appear ....
....very useful to evaluate parameters described in paragraph 2.1, but it does not evaluate communications in the case of contention or of global communications. For these two cases, specific benchmarks have been developed for specific machine (for instance, evaluation of communications on the CM5 1 [13, 4, 16]) Boyd et al. developed a benchmark in the which user can control some communication parameters like the average number of point to point data communications per processor, the degree of sharing (the number of variables read but not owned by a processor) the computation to communication ratio ....
[Article contains additional citation context not shown here]
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
....elimination code and give the corresponding real and estimated execution times in order to show the accuracy of the estimated performance figures. Related Work There are numerous articles in the literature about benchmarking different aspects of recent parallel architectures or supercomputers [3, 4, 11, 12, 13, 14, 16]. There are also several benchmark suits specially developed to provide a common ground to test the performance of different high performance computers [1, 2, 10, 15] Some of them investigate the use of real application programs, while others employ short kernel codes to evaluate the performance, ....
....1=t send ) for an aligned buffer is around 8.5 MB sec. This bandwidth is significantly lower than the theoretical peak bandwidth of 20 MB sec. In the current CMMD implementation, a node s ability to inject data into the network is much less than the network s capacity to accept the data [14]. Assembler codes can achieve close to 18 MB sec moving data from one node s registers to another s [18] However C codes with calls to the CMMD library tend to run slower, partly because the C compiler s output is never as efficient as a hand crafted assembler code. 5.2 Effect of Distance on ....
M. Lin, R. Tsang, D.H.C. Du, A. E. Klietz, and S. Saroff. Performance Evaluation of the CM-5 Interconnection Network. In Proc. of Spring COMPCON 93 (1993).
....the assignment of processes to processors can still affect performance by influencing the communication overhead. On recent distributed memory machines, such as the Intel Delta and CM 5, the time to send a single message between two processors is largely independent of their physical location [27, 42, 43], and hence the assignment of processes to processors does not have much direct effect on performance. However, when a collective communication task, such as a broadcast, is being done, contention for physical resources can degrade performance. Thus, the way in which processes are assigned to ....
M. Lin, D. Du, A. E. Klietz, and S. Saroff. Performance evaluation of the CM-5 interconnection network. Technical report, Department of Computer Science, University of Minnesota, 1992.
....between them are computed by considering the whole tree. This graph is used to represent multi stage machines with constant bandwidth, such as the CM 5 [17] for which experiments have shown that bandwidth is constant between every pair of processors and hardly depends on network congestion [18], or the SP 2 with power of two number of nodes. Figure 7: The tree leaf graph of height 3. Processors are drawn in black and routers in grey. 4.4 Mapping files Mapping files, which usually end in .map , contain the result of the mapping of source graphs onto target architectures. They ....
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance evaluation of the CM-5 interconnection network. In Proceedings of CompCon Spring'93, 1993.
....the assignment of processes to processors can still affect performance by influencing the communication overhead. On recent distributed memory machines, such as the Intel Delta and CM 5, the time to send a single message between two processors is largely independent of their physical location [29, 48, 49], and hence the assignment of processes to processors does not have much direct effect on performance. However, when a collective communication task, such as a broadcast, is being done, contention for physical resources can degrade performance. Thus, the way in which processes are assigned to ....
M. Lin, D. Du, A. E. Klietz, and S. Saroff. Performance evaluation of the CM-5 interconnection network. Technical report, Department of Computer Science, University of Minnesota, 1992.
....assignment of processes to processors can still affect performance by influencing the communication overhead. On recent distributed memory xxviii machines, such as the Intel Delta and CM 5, the time to send a single message between two processors is largely independent of their physical location [28, 43, 44], and hence the assignment of processes to processors does not have much direct effect on performance. However, when a collective communication task, such as a broadcast, is being done, contention for physical resources can degrade performance. Thus, the way in which processes are assigned to ....
M. Lin, D. Du, A. E. Klietz, and S. Saroff. Performance evaluation of the CM-5 interconnection network. Technical report, Department of Computer Science, University of Minnesota, 1992.
....Projects Agency under Contract DABT63 91C 0004, and by a research agreement with the Intel Supercomputer Systems Division. these features and the achievable processor and communication performance. Preliminary performance measurements by Bozkus et al. [1] Ponnusamy et al. [8, 9] and Lin et al. [7] exposed the limitations of an early version of the CM5 communication library. Since then, the library has been re implemented using active messages as the base layer. The goal of our work is to re evaluate the CM5 s computation and communication performance and the interaction of the two. 1.1 ....
....data rate is limited not by the link subject to change. The performance data reported here were captured on a 32 node Paragon XP S system with R11 node boards running OSF 1 R1.0C with compiler icc Release 4.1.2. bandwidth, but by the speed the SPARC processor can inject data into the network [7]. The same is true for the Paragon XP S; software overheads dominate. In contrast to the somewhat modest increases in communication performance, processor performance has improved dramatically. Thus, the CM 5 might well be imbalanced even if it could fully exploit the 20 Mbytes second bandwidth of ....
Lin, M. J., Tsang, R., Du, D. H. C., Klietz, A. E., and Saroff, S. Performance Evaluation of the CM-5 Interconnection Network. In Proceedings of Spring COMPCON 93 (Feb 1993).
....between them are computed by considering the whole tree. This graph is used to represent multi stage machines with constant bandwidth, such as the CM 5 [20] for which experiments have shown that bandwidth is constant between every pair of processors and hardly depends on network congestion [21], or the SP 2 with power of two number of nodes. Figure 7: The tree leaf graph of height 3. Processors are drawn in black and routers in grey. 4.4 Mapping files Mapping files, which usually end in .map , contain the result of the mapping of source graphs onto target architectures. They ....
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance evaluation of the CM-5 interconnection network. In Proceedings of CompCon Spring'93, 1993.
No context found.
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
No context found.
Mengjou Lin, Rose Tsang, David H. C. Du, Alan E. Klietz, and Stephen Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
No context found.
M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance Evaluation of the CM-5 Interconnection Network. Technical Report AHPCRC Preprint 92-111, University of Minnesota AHPCRC, October 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC