Results 1 
7 of
7
Multifrontal Parallel Distributed Symmetric and Unsymmetric Solvers
, 1998
"... We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been dev ..."
Abstract

Cited by 187 (30 self)
 Add to MetaCart
(Show Context)
We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting efficiently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been developed. We discuss some of the main algorithmic choices and compare both implementation issues and the performance of the LDL T and LU factorizations. Performance analysis on an IBM SP2 shows the efficiency and the potential of the method. The test problems used are from the RutherfordBoeing collection and from the PARASOL end users.
A Combined Unifrontal/Multifrontal Method for Unsymmetric Sparse Matrices
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1995
"... We discuss the organization of frontal matrices in multifrontal methods for the solution of large sparse sets of unsymmetric linear equations. In the multifrontal method, work on a frontal matrix can be suspended, the frontal matrix can be stored for later reuse, and a new frontal matrix can be g ..."
Abstract

Cited by 110 (14 self)
 Add to MetaCart
We discuss the organization of frontal matrices in multifrontal methods for the solution of large sparse sets of unsymmetric linear equations. In the multifrontal method, work on a frontal matrix can be suspended, the frontal matrix can be stored for later reuse, and a new frontal matrix can be generated. There are thus several frontal matrices stored during the factorization and one or more or these are assembled (summed) when creating a new frontal matrix. Although this means that arbitrary sparsity patterns can be handled efficiently, extra work is required to sum the frontal matrices together and can be costly because indirect addressing is required. The (uni)frontal method avoids this extra work by factorizing the matrix with a single frontal matrix. Rows and columns are added to the frontal matrix, and pivot rows and columns are removed. Data movement is simpler, but higher fillin can result if the matrix cannot be permuted into a variableband form with small profile...
Communication Lower Bounds for DistributedMemory Matrix Multiplication
, 2004
"... ..."
(Show Context)
RKleene: A HighPerformance DivideandConquer Algorithm for the AllPair Shortest Path for Densely Connected Networks
, 2007
"... We propose a novel divideandconquer algorithm for the solution of the allpair shortestpath problem for directed and dense graphs with no negative cycles. We propose RKleene, a compact and inplace recursive algorithm inspired by Kleene’s algorithm. RKleene delivers a better performance than p ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We propose a novel divideandconquer algorithm for the solution of the allpair shortestpath problem for directed and dense graphs with no negative cycles. We propose RKleene, a compact and inplace recursive algorithm inspired by Kleene’s algorithm. RKleene delivers a better performance than previous algorithms for randomly generated graphs represented by highly dense adjacency matrices, in which the matrix components can have any integer value. We show that RKleene, unchanged and without any machine tuning, yields consistently between 1/7 and 1/2 of the peak performance running on five very different uniprocessor systems.
Use of Computational Kernels in Full and Sparse Linear Solvers, Efficient Code Design on HighPerformance RISC Processors
 RISC processors, inVector and Parallel Processing { VECPAR'96
, 1997
"... . We believe that the availability of portable and efficient serial and parallel numerical libraries that can be used as building blocks is extremely important for both simplifying application software development and improving reliability. This is illustrated by considering the solution of full ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
. We believe that the availability of portable and efficient serial and parallel numerical libraries that can be used as building blocks is extremely important for both simplifying application software development and improving reliability. This is illustrated by considering the solution of full and sparse linear systems. We describe successive layers of computational kernels such as the BLAS, the sparse BLAS, blocked algorithms for factorizing full systems, direct and iterative methods for sparse linear systems. We also show how the architecture of the today's powerful RISC processors may influence efficient code design. 1 Introduction One of the common problems for application scientists is to exploit as efficiently as possible the hardware of highperformance computers (either serial or parallel) without totally rewriting or redesigning existing codes and algorithms. We believe that the availability of portable and efficient serial and parallel numerical libraries that ca...
Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects
, 2012
"... La résolution de systèmes linéaires creux est critique dans de nombreux domaines de la simulation numérique. Beaucoup d’applications, notamment industrielles, utilisent des méthodes directes en raison de leur précision et de leur robustesse. La qualité du résultat, les fonctionnalités numériques, ai ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
La résolution de systèmes linéaires creux est critique dans de nombreux domaines de la simulation numérique. Beaucoup d’applications, notamment industrielles, utilisent des méthodes directes en raison de leur précision et de leur robustesse. La qualité du résultat, les fonctionnalités numériques, ainsi que le temps de calcul sont critiques pour les applications. Par ailleurs, les ressources matérielles (nombre de processeurs, mémoire) doivent être utilisées de manière optimale. Dans cette habilitation, nous décrivons des travaux poursuivant ces objectifs dans le cadre de la plateforme logicielle MUMPS, développée à Toulouse, LyonGrenoble et Bordeaux depuis une quinzaine d’années. Le cœur de l’approche repose sur une parallélisation originale de la méthode multifrontale: une gestion asynchrone du parallélisme, associée à des ordonnanceurs distribués, permet de traiter des structures de données dynamiques et autorise ainsi le pivotage numérique. Nous nous intéressons à l’ordonnancement des tâches, à l’optimisation de la mémoire et à différentes fonctionnalités numériques. Les travaux en cours et les objectifs futurs visent à résoudre efficacement des problèmes de plus en plus gros, sans perte sur les aspects numériques, et tout en adaptant nos approches aux évolutions rapides des calculateurs. Dans ce contexte, les aspects génie