Results 1 
8 of
8
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 773 (23 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Fast Parallel Algorithms for ShortRange Molecular Dynamics
 JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract

Cited by 653 (7 self)
 Add to MetaCart
(Show Context)
Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently  those with shortrange forces where the neighbors of each atom change rapidly. They can be implemented on any distributedmemory parallel machine which allows for messagepassing of data between independently executing processors. The algorithms are tested on a standard LennardJones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers  the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray YMP and C90 algorithm shows that the current generation of parallel machines is competitive with conventi...
L.: Communication Primitives for Unstructured Finite Element Simulations on Data Parallel Architectures
, 1992
"... ..."
(Show Context)
All{to{all communication algorithms for distributed BLAS
 Harvard University
, 1993
"... ..."
(Show Context)
Alltoall Broadcast and Applications on the Connection Machine
, 1991
"... An alltoall broadcast algorithm that exploits concurrent communication on all channels of the Connection Machine system CM200 binary cube network is described. Issues in integrating a physical alltoall broadcast between processing nodes into a language environment using a global address space ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
An alltoall broadcast algorithm that exploits concurrent communication on all channels of the Connection Machine system CM200 binary cube network is described. Issues in integrating a physical alltoall broadcast between processing nodes into a language environment using a global address space is dicussed. Timings for the physical broadcast between nodes, and the virtual broadcast are given for the Connection Machine system CM200. The peak data transfer rate for the physical broadcast on a CM200 is 5.9 Gbytes/sec, and the peak rate for the virtual broadcast is 31 Gbytes/sec. Array reshaping is an effective performance optimization technique. An example is given where reshaping improved performance by a factor of seven by reducing the amount of local data motion. We also show how to exploit symmetry for computation of an interaction matrix using the alltoall broadcast function. Further optimizations are suggested for Nbody type calculations. Using the alltoall broa...
Language and compiler issues in scalable high performance scientific libraries
 PROCEEDINGS OF THE THIRD WORKSHOP ON COMPILERS FOR PARALLEL COMPUTERS
, 1992
"... Library functions for scalable architectures must be designed to correctly and efficiently support any distributed data structure that can be created with the supported languages and associated compiler directives. Libraries must be designed also to support concurrency in each function evaluation, a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Library functions for scalable architectures must be designed to correctly and efficiently support any distributed data structure that can be created with the supported languages and associated compiler directives. Libraries must be designed also to support concurrency in each function evaluation, as well as the concurrent application of the functions to disjoint array segments, known as multipleinstance computation. Control over the data distribution is often critical for locality of reference, and so is the control over the interprocessor data motion. Scalability, while preserving efficiency, implies that the data distribution, the data motion, and the scheduling is adapted to the object shapes, the machine configuration, and the size of the objects relative to the machine size. The Connection Machine Scientific Software Library is a scalable library for distributed data structures. The library is designed for languages with an array syntax. It is accessible from all supported languages (Lisp, C, CMFortran, and Paris (PARallel Instruction Set) in combination with Lisp, C, and Fortran 77). Single library calls can manage both concurrent application of a function to disjoint array segments, as well as concurrency in
PRELIMINARY DOCUMENTATION
, 1993
"... The information in this document is subject to change without notice and should not be construed as a commitment by Thinking Machines Corporation. Thinking Machines assumes no liability for errors in this document. This document does not describe any product that is currently available from Thinking ..."
Abstract
 Add to MetaCart
The information in this document is subject to change without notice and should not be construed as a commitment by Thinking Machines Corporation. Thinking Machines assumes no liability for errors in this document. This document does not describe any product that is currently available from Thinking Machines Corporation, and Thinking Machines does not commit to implement the contents of this document in any product. 4 ******************************************************************************&I & 4I&. & 444 44 44.4.4444 & &. 44 & 4.I&I&&. 4 Connection Machine ® is a registered trademark of Thinking Machines Corporation. CM, CM2, CM200, CM5, CM5 Scale 3, and DataVault are trademarks of Thinking Machines Corporation. CMosr, CMAX, and Prism are trademarks of Thinking Machines Corporation. C * ® is a registered trademark of Thinking Machines Corporation. FastGraph is a trademark of Thinking Machines Corporation. Paris, *Lisp, and CM Fortran are trademarks of Thinking Machines Corporation. CMMD, CMSSL, and CMX11 are trademarks of Thinking Machines Corporation. Scalable Computing (SC) is a trademark of Thinking Machines Corporation.