| Steven S. Lumetta, Arvind Krishnamurthy, and David E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995. |
.... 1 1 Introduction and Motivations Although a significant amount of parallel machines have been built and a lot of parallel algorithms concerning graphs have been written, only a few implementations of those algorithms have been carried out on existing parallel platforms [HRD, KLCY94, KGP94, LKC95, RM94, HRD95] Being interested in graph problems, not only we would like to design parallel graph algorithms to process very large data as fast as possible, but also we would like that the implementations of these algorithms on parallel machines be as efficient as in theory, and this for a vast ....
S. S. Lumetta, A. Krishnamurthy, and D. E. Culler. Towards modeling the performance of a fast connected components algorithm on parallel machines. In SC'95, 1995.
....Coscheduling. One real parallel application and one to three simulated parallel jobs are scheduled in a round robin fashion with a 100ms time quantum on a 64 node CM 5. Slowdown is the ratio of the measured locally scheduled execution time to the ideal coscheduled time. components of a graph [107]. Figure 3.2 shows the slowdown of scheduling each application in a round robin fashion while varying the number of simulated competing jobs between one and three. The reported slowdown is relative to the execution time of the applications when ideally coscheduled. Our results show that the ....
Steven S. Lumetta, Arvind Krishnamurthy, and David E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995.
....44] The program em3d simulates the propagation of electro magnetic waves through objects in three dimensions on an unstructured mesh [35] Another sorting algorithm [13, 44] is implemented in sample sort. Finally, connect uses a randomized algorithm to find the connected components of a graph [103]. Figure 3.2 shows the slowdown of scheduling each application in a round robin fashion while varying the number of simulated competing jobs between one and three. The reported slowdown is relative to the execution time of the applications when ideally coscheduled. Our results show that the ....
Steven S. Lumetta, Arvind Krishnamurthy, and David E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995.
....Our aim was to understand the problems involved in implementing parallel pointer based algorithms and their possible solutions, and to see how well these kinds of algorithms perform in practice. Implementations of parallel pointer based algorithms have been conspicuously sparse. Lumetta et al. [22] implemented a hybrid parallel serial connected components algorithm for distributed memory machines. Their study showed that for certain classes of probabilistic meshes their algorithm performs reasonably well. But its performance was quite poor on other meshes and is likely to be worse on ....
S. S. Lumetta, A. Krishnamurthy, and D. E. Culler. Towards modeling the performance of a fast connected components algorithm on parallel machines. In Proceedings Supercomputing '95, 1995.
....messages with three parameters representing software overhead, network latency, and communication bandwidth. The LogP model has been used as a performance model for active messages [60] and the Split C language [18] and it has been applied to the analysis of several application programs [19, 46]. A theoretical comparison of the BSP and LogP models can be found in [8] This study concludes that the two models are substantially equivalent in terms of asymptotic analysis. While the LogP model may be very valuable for modeling the behavior of current asynchronous message passing layers and ....
S. Lumetta, A. Krishnamurthy, and D. E. Culler, "Towards modeling the performance of a fast connected components algorithms on parallel machines," in Supercomputing '95, November 1995.
....of the vertices 2. What is the size of the largest component after m edges have been included The second largest component 3. What is the expected number of isolated vertices after m edges have been included 4. What is the behavior on other types of graphs, such as lattices (See, for example, [8]. 5. How well does the disjoint set data structure perform on this problem 500 1000 1500 2000 2500 3000 3500 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 3 Edges 500 Vertices Figure 2: Estimated Density Function for #Edges for Connectedness: 1000 Trials Students may be encouraged to ....
S. Lumetta, A. Krishnamurthy, and D. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. Available at http://www.cs.berkeley.edu/ stevel. Conference version at http://www.supercomp.org/sc95/proceedings/465 SLUM/SC95.HTM.
....the proper processor. On reception of a state description, a processor first checks if the state has beenreached before. If the state is new, the processor adds it to the work queue to be validated against an assertion list. ffl Connected Components: First, a graph is spread across all processors [33]. Each processor then performs a connected components on its local subgraph to collapse portions of its components into representative nodes. Next, the graph is globally adjusted to point remote edges (crossing processor boundaries) at the respective representative nodes. Finally, a global ....
S. Lumetta, A. Krishnamurthy, and D. E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995.
....The algorithm maintains a forest of trees, and makes progress either by decreasing the number of trees in the forest or by decreasing the height of the trees. The algorithm terminates when no two trees in the forest share an edge and all trees in the forest are of height one. 2 A separate paper [9] presents more detailed results for the algorithm on several different platforms and demonstrates the best connected components performance seen to date. 4 A. KRISHNAMURTHY, S. S. LUMETTA, D. E. CULLER, AND K. YELICK C D D B C B A A Figure 2. Pointer doubling operation. The parent of each vertex ....
....by treating local and global subgraphs separately, by paying attention to locality, and by tolerating remote memory access latencies. The resulting implementation is very efficient and obtains speedups on the order of 20 on a 32 processor CM 5 and 238 on a 256 processor CM 5. In related work [9], we demonstrate that our algorithm is the fastest in the world. ....
S. S. Lumetta, A. Krishnamurthy, D. E. Culler, "Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines," Proceedings of Supercomputing '95, San Diego, California, December 1995, available at http://www.supercomp.org.sc95/proceedings/465 SLUM/SC95.HTM.
No context found.
Steven S. Lumetta, Arvind Krishnamurthy, and David E. Culler. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95, 1995.
No context found.
LUMETTA, S. S., KRISHNAMURTHY, A., AND CULLER, D. E. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95 (1995).
No context found.
LUMETTA, S. S., KRISHNAMURTHY, A., AND CULLER, D. E. Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines. In Proceedings of Supercomputing '95 (1995).
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC