MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Mathematics and communications libraries

Download:
Download as a PDF | Download as a PS
by S. Lennart Johnsson, S. Lennart Johnsson, Kapil K. Mathur, Kapil K. Mathur
http://www.cs.uh.edu/~johnsson/rpt/tr-01-93.ps.gz
Add To MetaCart

Abstract:

Massively parallel computing holds the promise of extreme performance. The utility of these systems will depend heavily upon the availability of libraries until compilation and run--time system technology is developed to a level comparable to what today is common on most uniprocessor systems. Critical for performance is the ability to exploit locality of reference and effective management of the communication resources. We discuss some techniques for preserving locality of reference in distributed memory architectures. In particular, we discuss the benefits of multidimensional address spaces instead of the conventional linearized address spaces, partitioning of irregular grids, and placement of partitions among nodes. Some of these techniques are supported as language directives, others as run--time system functions, and others still are part of the Connection Machine Scientific Software Library, CMSSL. We briefly discuss some of the unique design issues in this library for distributed memory architectures, and some of the novel ideas with respect to managing data allocation, and automatic selection of algorithms with respect to performance. The CMSSL also includes a set of communication primitives we have found very useful on the Connection Machine systems in implementing scientific and engineering applications. We briefly review some of the techniques used to fully utilize the bandwidth of the binary cube network of the CM--2 and CM--200 Connection Machine systems. 1

Citations

963 Performance Fortran Forum. High Performance Fortran language specification version 1.0 – High - 1993
617 A set of level 3 basic linear algebra subprograms – Dongarra, Croz, et al. - 1990
487 The cache performance and optimizations of blocked algorithms – LAM, ROTHBERG, et al. - 1991
414 Partitioning sparse matrices with eigenvectors of graphs – Pothen, Simon, et al. - 1990
398 A fast algorithm for particle simulations – Greengard, Rokhlin - 1987
395 Basic linear algebra subprograms for Fortran usage – Lawson, Hanson, et al. - 1979
366 An extended set of Fortran basic linear algebra subprograms: model implementation and test programs – Dongarra, Croz, et al. - 1988
252 Partitioning of unstructured problems for parallel processing – Simon
202 How toEmulate Shared Memory – Ranade - 1991
199 On the multi-level splitting of finite element spaces – Yserentant - 1986
182 Algebraic connectivity of graphs – FIEDLER - 1973
182 Universal schemes for parallel communication – Valiant, Brebner - 1981
173 A scheme for fast parallel communication – Valiant - 1982
139 A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory – Fiedler - 1975
99 I/O complexity: The red-blue pebble game – Hong, Kung - 1981
88 A fast adaptive multipole algorithm for particle simulations – Carrier, Greengard, et al. - 1988
78 Communication efficient basic linear algebra computations on hypercube system – Johnsson - 1987
76 A VLSI Architecture for Concurrent Data Structures – Dally - 1986
66 An implementation of the fast multipole method without multipoles – Anderson - 1992
63 Multiprocessor FFTs – Swarztrauber - 1987
62 Impact of hierarchical memory systems on linear algebra algorithm design – Gallivan, Jalby, et al. - 1988
57 Computer architecture a quantative approach”, third edition – Hennessy, Patterson
51 The Fluent Abstract Machine – Ranade, Bhatt, et al. - 1988
49 The J-Machine: A fine-grain concurrent computer – Dally, Chien, et al. - 1989
47 On the problem of optimizing data transfers for complex memory systems – Gallivan, Jalby, et al. - 1988
44 Optimal Communication Algorithms for Hypercubes – Bertsekas, Ozveren, et al. - 1991
42 Combinatorial Algorithms – Reingold, Nievergelt, et al. - 1977
41 Fast Fourier Transforms - For Fun and Profit – Gentleman, Sande - 1966
38 Fortran at ten gigaflops: The Connection Machine convolution compiler – BROMLEY, HELLER, et al. - 1991
32 Eigenvectors of acyclic matrices – Fiedler - 1975
30 Johnsson and Ching-Tien Ho. Spanning graphs for optimum broadcasting and personalized communication in hypercubes – Lennart - 1989
30 A new method for solving triangular systems on distributedmemory message-passing multiprocessors – Li, Coleman - 1989
29 Embedding of Grids into Optimal Hypercubes – Chan - 1991
28 Intensive hypercube communication I: prearranged communication in link-bound machines – Stout, Wagar - 1987
27 Data Parallel Finite Element Techniques for Large-scale Computational Fluid Dynamics – Johan - 1992
24 Embedding meshes in Boolean cubes by graph decomposition – Ho, Johnsson - 1990
23 Decomposition into Cycles I: Hamilton Decompositions – Alspach, Bermond, et al. - 1990
23 Multiplication of matrices of arbitrary shape on a Data Parallel Computer – Mathur, Johnsson - 1994
21 Passing messages in link-bound hypercubes – Stout, Wagar - 1987
20 A parallel triangular solver for a distributed memory multiprocessor – Li, Coleman - 1988
20 Block cyclic dense linear algebra – Lichtenstein, Johnsson - 1993
19 B-valuation of graphs – Havel, Mov'arek
19 Performance modeling of distributed memory architectures – Johnsson - 1991
18 Minimizing the communication time for matrix multiplication on multiprocessors – Johnsson - 1993
16 Johnsson and Ching-Tien Ho. Generalized shuffle permutations on Boolean cubes – Lennart - 1992
15 Communication efficient multi-processor FFT – JOHNSSON, JACQUEMIN, et al. - 1992
14 Computing fast Fourier transforms on Boolean cubes and related networks – Johnsson, Ho, et al. - 1987
13 All-to-all broadcast with applications on the Connection Machine – Brunet, Johnsson - 1992
13 Matrix multiplication on Boolean cubes using generic communication primitives – Johnsson, Ho - 1989
13 A New Era of Fast Dynamic RAMs – Jones - 1992