| U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993. |
....[BBB 91] It is used in particle in cell method applications to implement a bucket sort. In addition to an array of locks, used to allow portions of data to be locked at a time, it uses four barriers to synchronize its computation phases. The details of the parallel algorithm appear in [RSRM93] TSP. The Traveling Sales Person application is implemented using the LMSK algorithm [JLK63] a branch and bound algorithm that proceeds by dynamic construction of a search tree, at the root of which is a description of the initial problem. Independent subproblems are generated by selection of ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the twenty second International Conference on Parallel Processing, pages I--237--240, August 1993.
....the algorithm implements a bucket sort, which reads each key and then increments the count of the bucket to which the key belongs. A prefix sum operation is then performed on the bucket counts. Last, the keys are read a second time and assigned ranks using the prefix sums. The parallel algorithm [43] consists of seven phases the first two phases involve the initialization of processor specific data; they include each processor reading its portion of the key, updating its bucket count, and reading appropriate portions of the local bucket count from all processors. The next three phases ....
....(number of lock operations between two successive adaptation operations) for Applications (15 processor runs on the KSR2) the global prefix sum on the bucket counts. The last two phases involve each processor reading the keys and assigning ranks. The details of the parallel algorithm appear in [43]. In order to avoid excessive synchronization, the bucket count data structure is replicated on each processor. Maintaining such local counts requires accumulating them once they have been computed, in order to compute the prefix sums. Computation of the ranks also requires synchronization, since ....
Ramachandran, U., Shah, G., Ravikumar, S., and Muthukumarasamy, J. Scalability study of the KSR-1. In Proceedings of the twenty second International Conference on Parallel Processing (August 1993), pp. I--237--240.
....referenced by any processor, whereas in an NUMA architecture, memory is physically partitioned into a distributed shared memory structure in which memory local to a processor provides a lower access latency than does memory that is remote to a processor. A Cache Only Memory Architecture (COMA) [21, 22, 23, 24, 25, 26] provides an extension to the NUMA architecture. In an COMA architecture, the hard binding between a memory address and the physical location of the contains at that address has been removed. Hence, all main memory is managed as a cache with directories to provide address mapping. A Cluster[15] is ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability Study of the KSR-1. GIT-CC 93/03.
....summarizes the characteristics of EP. IS IS is the Integer Sort kernel that uses bucket sort to rank a list of integers which is an important operation in particle method codes. A list of 64K integers with 2K buckets is chosen for this study. An implementation of the algorithm is described in [22] and Table 2 summarizes its characteristics. The input list is equally partitioned among the processors. Each processor maintains two sets of buckets. One set of buckets (of size 2K) is used to maintain the information for the portion of the list local to it. The Phase Description Comp. Gran. ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....for invalidation protocols. 6.2 IS Integer Sort is a kernel that occurs in Numerical Aerodynamic Simulation applications and is part of the NAS benchmark suite [3] The kernel uses a parallel bucket sort to rank a list of integers. The parallel implementation that we study has been described in [28]. The problem size is 32K and the bucket size is 2K. It should be noted that this implementation has very good scalability on KSR 2 for large problem sizes (1 M) For our simulation we used a smaller problem size (which amplifies the overheads relative to the overall execution time) to keep the ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....and or the hardware would also incur the cost of re designing the input models. 4 Experimentation 4.1 Overview The experimentation technique for evaluating parallel systems uses real or synthetic workloads and measures their performance on actual hardware. For instance, several studies [22, 11, 47, 49] experiment with the KSR 1 hardware for evaluating its computation, communication and scalability properties. The scalability of the KSR 1 is studied in [47] using applications drawn from the NAS benchmark suite [9] Similarly, an experimental evaluation of the computation and communication ....
....systems uses real or synthetic workloads and measures their performance on actual hardware. For instance, several studies [22, 11, 47, 49] experiment with the KSR 1 hardware for evaluating its computation, communication and scalability properties. The scalability of the KSR 1 is studied in [47] using applications drawn from the NAS benchmark suite [9] Similarly, an experimental evaluation of the computation and communication capabilities of the CM 5 is conducted in [46] Lenoski et al. 36] evaluate the scalability of the Stanford DASH multiprocessor prototype using a set of ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....one would expect that the network would be able to handle typical loads generated by applications. For example, experimental results on a state of the art machine such as the KSR 2 have shown that the latencies for remote accesses do not vary significantly for a wide variety of network loads [RSRM93] In [SSRV94] it was reported that the contention overheads observed in several applications were quite small. Recent studies [CKP 93] have also shown that parameterized models of the network may be adequate from the point of view of developing performance conscious parallel programs. In any ....
....that the speedup for IS is not as good beyond 4 processors purely because we chose a fairly small data size (64K elements, and 2K keys) to keep the run times manageable. The algorithm we used for IS has been shown to scale quite well for larger problem sizes on parallel machines such as the KSR 1 [RSRM93] Figures 10 and 12 show the breakdown of the simulation overheads for Matmul and IS respectively. Each bar in these Figures gives the following breakdown: 1 2 3 4 5 6 7 8 1 2 4 8 Speedup Number of Processors Base Simulator Figure 9: 64x64 Matrix Multiply: Speedup Comparison 0 10 20 30 40 ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the ksr-1. In International Conference on Parallel Processing, pages I--237--240, August 1993.
....to the application, the performance characteristics of the systems are significantly different. More importantly, SVM based systems, especially those that use hardware support, tend to be more scalable than SMP systems because they have no fewer centralized resources that limit scalability [HLH92, RSRM93] However, in order to be scalable on SVM systems applications must be cognizant that the cost of memory access is not uniform This work was supported in part by an NSF PYI Award MIP 9058430, and matching grants from DEC, SYSTRAN, and IBM. across the entire global address space. In recent ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability Study of the KSR-1. In Proceedings of the International Conference on Parallel Processing, 1993.
.... a suite of parallel applications drawn from several different domains including scientific, image understanding, and combinatorial optimization, on a variety of parallel architectures including KSR 2, MasPar MP 2, Intel iPSC 2, and PVM based workstation clusters, to enable this scalability study [65, 66]. This framework will be useful in this evaluation process for the system abstractions for state sharing. We have also investigated protocols that can be used to maintain the consistency of replicate data such as files. Our work in replicated data management has ranged from development of new ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....messages sent by a processor 6.2 IS Integer Sort is a kernel that occurs in Numerical Aerodynamic Simulation applications and is part of the NAS benchmark suite [4] The kernel uses a parallel bucket sort to rank a list of integers. The parallel implementation that we study has been described in [31]. The problem size is 32K and the bucket size is 1K. It should be noted that this implementation has very good scalability on KSR 2 for large problem sizes (1 M) For our simulation we used a smaller problem size (which amplifies the overheads relative to the overall execution time) to keep the ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....are interconnected through a ring. The architecture is scaled, not by adding more nodes to a single ring, but by hierarchically coupling such rings, i.e. rings of nodes are connected through a higher level ring (see Figure 1) The scalability of such an architecture has been studied in [23]. Before we discuss how to build a scalable database system given the proposed hardware architecture, we have to recognize that the SN mode of functioning is acceptable as long as all the nodes are busy processing assigned tasks that access data local to the nodes. However, the scheduling ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability Study of the KSR-1. In Proceedings of the International Conference on Parallel Processing, 1993.
....N A Table 2: Characteristics of IS IS is the Integer Sort application that uses bucket sort to rank a list of integers which is an important operation in particle method codes. A list of 64K integers with 2K buckets is chosen for this study. An implementation of the algorithm is described in [20] and Table 2 summarizes its characteristics. The input list is equally partitioned among the processors. Each processor maintains two sets of buckets. One set of buckets (of size 2K) is used to maintain the information for the portion of the list local to it. The other set (of size chunk = 2K p ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....problem is scaled (the number of buckets is scaled linearly with the size of the input list to be sorted) On the other hand, if the number of buckets is maintained constant, it may be possible to sustain bandwidth requirements by increasing the problem size linearly with the processing speed. In [20], the authors show that the applications EP, IS, and CG scale well on a 32 node KSR 1. In our study, we use the same implementations of these applications to synthesize the network requirements. Although our results suggest that these applications may incur overheads affecting their scalability, ....
....and CG scale well on a 32 node KSR 1. In our study, we use the same implementations of these applications to synthesize the network requirements. Although our results suggest that these applications may incur overheads affecting their scalability, this does not contradict the results presented in [20] since the implications of our study are for larger systems built with much faster processors. All of the above link bandwidth results have been presented for the binary hypercube network topology. The cube represents a highly scalable network where the bisection bandwidth grows linearly with the ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the KSR-1. In Proceedings of the 1993 International Conference on Parallel Processing, pages I--237--240, August 1993.
....one would expect that the network would be able to handle typical loads generated by applications. For example, experimental results on a state of the art machine such as the KSR 2 have shown that the latencies for remote accesses do not vary significantly for a wide variety of network loads [RSRM93] In [SSRV94] it was reported that the contention overheads observed in several applications were quite small. Recent studies [CKP 93] have also shown that parameterized models of the network may be adequate from the point of view of developing performance conscious parallel programs. In any ....
U. Ramachandran, G. Shah, S. Ravikumar, and J. Muthukumarasamy. Scalability study of the ksr-1. In International Conference on Parallel Processing, pages I--237--240, August 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC