Results 1  10
of
19
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 773 (23 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
The TorusWrap Mapping For Dense Matrix Calculations On Massively Parallel Computers
 SIAM J. SCI. STAT. COMPUT
, 1994
"... Dense linear systems of equations are quite common in science and engineering, arising in boundary element methods, least squares problems and other settings. Massively parallel computers will be necessary to solve the large systems required by scientists and engineers, and scalable parallel algori ..."
Abstract

Cited by 71 (5 self)
 Add to MetaCart
Dense linear systems of equations are quite common in science and engineering, arising in boundary element methods, least squares problems and other settings. Massively parallel computers will be necessary to solve the large systems required by scientists and engineers, and scalable parallel algorithms for the linear algebra applications must be devised for these machines. A critical step in these algorithms is the mapping of matrix elements to processors. In this paper, we study the use of the toruswrap mapping in general dense matrix algorithms, from both theoretical and practical viewpoints. We prove that, under reasonable assumptions, this assignment scheme leads to dense matrix algorithms that achieve (to within a constant factor) the lower bound on interprocessor communication. We also show that the toruswrap mapping allows algorithms to exhibit less idle time, better load balancing and less memory overhead than the more common row and column mappings. Finally, we discuss ...
Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence
 International Journal Supercomputer Applications
, 1994
"... This paper surveys the current state of applications of large dense numerical linear algebra, and the influence of parallel computing. Furthermore, we attempt to crystalize many important ideas that we feel have been sometimes been misunderstood in the rush to write fast programs. 1 Introduction Th ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
(Show Context)
This paper surveys the current state of applications of large dense numerical linear algebra, and the influence of parallel computing. Furthermore, we attempt to crystalize many important ideas that we feel have been sometimes been misunderstood in the rush to write fast programs. 1 Introduction This paper represents my continuing efforts to track the status of large dense linear algebra problems. The goal is to shatter the barriers that separate the various interested communities while commenting on the influence of parallel computing. A secondary goal is to crystalize the most important ideas that have all too often been obscured by the details of machines and algorithms. Parallel supercomputing is in the spotlight. In the race towards the proliferation of papers on person X's experiences with machine Y (and why his algorithm runs faster than person Z's), sometimes we have lost sight of the applications for which these algorithms are meant to be useful. This paper concentrates on la...
Computational and numerical methods for bioelectric field problems
 Critical Reviews in BioMedical Engineering
, 1997
"... Fundamental problems in electrophysiology can be studied by computationally modeling and simulating the associated microscopic and macroscopic bioelectric fields. To study such fields computationally, researchers have developed a number of numerical and computational techniques. Advances in computer ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
Fundamental problems in electrophysiology can be studied by computationally modeling and simulating the associated microscopic and macroscopic bioelectric fields. To study such fields computationally, researchers have developed a number of numerical and computational techniques. Advances in computer architectures have allowed researchers to model increasingly complex biophysical system. Modeling such systems requires a researcher to apply a wide variety of computational and numerical methods to describe the underlying physics and physiology of the associated threedimensional geometries. Issues naturally arise as to the accuracy and efficiency of such methods. In this paper we review computational and numerical methods for solving bioelectric field problems. The motivating applications represent a class of bioelectric field problems that arise in electrocardiography and
Future Research Directions in Problem Solving Environments for Computational Science
 of a workshop on Research Direction in Integrating Numerical Analysis, Symbolic Computer, Computational Geometry, and Artificial Intelligence for Computational Science, Washington DC
"... for ..."
On the Future of Problem Solving Environments

, 2000
"... In this paper we review the current state of the problem solving environment (PSE) field and make projections for the future. First we describe the computing context, the definition of a PSE and the goals of a PSE. The stateoftheart is summarized along with sources (books, bibliographics, web sit ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
In this paper we review the current state of the problem solving environment (PSE) field and make projections for the future. First we describe the computing context, the definition of a PSE and the goals of a PSE. The stateoftheart is summarized along with sources (books, bibliographics, web sites) of more detailed information. The principal components and paradigms for building PSEs are presented. The discussion of the future is given in three parts: future trends, scenarios for 2010/2025, and research
Decomposing Matrices into Blocks
, 1997
"... In this paper we investigate whether matrices arising from linear or integer programming problems can be decomposed into socalled bordered block diagonal form. More precisely, given some matrix A, we try to assign as many rows as possible to some number κ of blocks of size such that no tw ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
In this paper we investigate whether matrices arising from linear or integer programming problems can be decomposed into socalled bordered block diagonal form. More precisely, given some matrix A, we try to assign as many rows as possible to some number &kappa; of blocks of size such that no two rows assigned to different blocks intersect in a common column. Bordered block diagonal form is desirable because it can guide and speed up the solution process for linear and integer programming problems. We show that various matrices from the LP and MIPlibraries Netlib and Miplib can indeed be decomposed into this form by computing optimal decompositions or decompositions with proven quality. These computations are done with a branchandcut algorithm based on polyhedral investigations of the matrix decomposition problem. In practice, however, one would use heuristics to find a good decomposition. We present several heuristic ideas and test their performance. Finally, we investigate the usef...
PatternDriven Automatic Parallelization
"... This paper describes a knowledgebased system for automatic parallelization of a wide class of sequential numeric codes operating on vectors and dense matrices, and for execution on distributed memory messagepassing multiprocessors. Its main feature is a fast and powerful pattern recognition tool t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
This paper describes a knowledgebased system for automatic parallelization of a wide class of sequential numeric codes operating on vectors and dense matrices, and for execution on distributed memory messagepassing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequentlyoccurring computations and programming concepts in the source code. This tool works also for dusty deck codes that have been `encrypted' by former machinespecific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machinespecific library routines, and powerful program transformations, The partially restored program semantics also supports local array alignment, distribution and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
Fpga Implementation Of A Cholesky Algorithm For A Sharedmemory
"... Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Solving a system of linear equations is a key problem in engineering and science. Matrix factorization is a key component of many methods used to solve such equations. However, the factorization process is very time consuming, so these problems have often been targeted for parallel machines rather than sequential ones. Nevertheless, commercially available supercomputers are expensive and only large institutions have the resources to purchase them. Hence, efforts are on to develop more affordable alternatives. In this paper, we propose such an approach. We present an implementation of a parallel version of the Cholesky matrix factorization algorithm on a singlechip multiprocessor built inside an APEX20K series FPGA (FieldProgrammable Gate Array) developed by Altera. Our multiprocessor system uses an asymmetric, sharedmemory MIMD architecture and was built using the configurable Nios processor core which was also developed by Altera. Our system was developed using Altera's SOPC (SystemOnaProgrammableChip) Quartus II development environment. Our Cholesky implementation is based on an algorithm described in George, et al. [6]. This algorithm is scalable and uses a "queue of tasks" approach to ensure dynamic loadbalancing among the processing elements. Our implementation assumes dense matrices in the input. We present performance results for uniprocessor and multiprocessor implementations. Our results show that the implementation of multiprocessors inside FPGAs can benefit matrix operations, such as matrix factorization. Further benefits result from good dynamic loadbalancing techniques.