Results 11  20
of
35
Towards a parallel outofcore multifrontal solver: Preliminary study. Research report 6120
 INRIA
"... apport de recherche ..."
Locality of reference in sparse Cholesky factorization methods
 SUBMITTED TO THE ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS
, 2005
"... Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are us ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in generalsymmetric and generalunsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cachemiss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finiteelement structural analysis, the leftlooking algorithm consistently outperforms the multifrontal algorithm. Direct cachemiss measurements indicate that the difference in performance is largely due to differences in the number of level2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the leftlooking algorithm. On the other hand, the leftlooking algorithm never uses more memory than the multifrontal one. Key words. Cholesky factorization, sparse cholesky, multifrontal methods, cacheefficiency, locality of reference AMS subject classifications. 15A23, 65F05, 65F50, 65Y10, 65Y20 1. Introduction. In
S.: Algebraic analysis of highpass quantization
 ACM Transactions on Graphics
"... This paper presents an algebraic analysis of a meshcompression technique called highpass quantization [Sorkine et al. 2003]. In highpass quantization, a rectangular matrix based on the mesh topological Laplacian is applied to the vectors of the Cartesian coordinates of a polygonal mesh. The resul ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper presents an algebraic analysis of a meshcompression technique called highpass quantization [Sorkine et al. 2003]. In highpass quantization, a rectangular matrix based on the mesh topological Laplacian is applied to the vectors of the Cartesian coordinates of a polygonal mesh. The resulting vectors, called δcoordinates, are then quantized. The applied matrix is a function of the topology of the mesh and the indices of a small set of mesh vertices (anchors), but not of the location of the vertices. An approximation of the geometry can be reconstructed from the quantized δcoordinates and the spatial locations of the anchors. In this paper we show how to algebraically bound the reconstruction error that this method generates. We show that the small singular value of the transformation matrix can be used to bound both the quantization error and the rounding error, which is due to the use of floatingpoint arithmetic. Furthermore, we prove a bound on this singular value. The bound is a function of the topology of the mesh and of the selected anchors. We also propose a new anchorselection algorithm, inspired by this bound. We show experimentally that the method is effective and that the computed upper bound on the error is not too pessimistic.
Efficient Harmonic Simulation of a Trabecular Bone Finite Element Model by means of Model Reduction. 12th Workshop "The Finite Element Method
 Fields&quot;, University of Ulm
, 2005
"... Threedimensional serial reconstruction techniques allow us to develop very detailed microfinite element (microFE) model of bones that can very accurately represent the porous bone microarchitecture. However, such models are of very high dimension and, at present, simulation is limited to a linear ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Threedimensional serial reconstruction techniques allow us to develop very detailed microfinite element (microFE) model of bones that can very accurately represent the porous bone microarchitecture. However, such models are of very high dimension and, at present, simulation is limited to a linear elastic analysis only. In the present paper, we suggest to use model reduction in order to enable harmonic simulation for microFE models. We take two bone models of dimensions 130 000 and 900 000 and report results for implicit moment matching based via the Arnoldi process. We demonstrate that for the fist model a lowdimensional subspace of dimension 10 allows us to accurately describe frequency response up to 190 Hz. For the second model, a lowdimensional subspace of dimension 25 is enough to accurately describe frequency response up to 30 Hz. We show that the time to perform model reduction and then to simulate the lowdimensional model is orders of magnitude less than that needed for harmonic simulation of the original model.
Analysis of the Solution Phase of a Parallel Multifrontal Approach
, 2008
"... We study the forward and backward substitution phases of a sparse multifrontal factorization. These phases are often neglected in papers on sparse direct factorization but, in many applications, they can be the bottleneck so it is crucial to implement them efficiently. In this work, we assume that t ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We study the forward and backward substitution phases of a sparse multifrontal factorization. These phases are often neglected in papers on sparse direct factorization but, in many applications, they can be the bottleneck so it is crucial to implement them efficiently. In this work, we assume that the factors have been written on disk during the factorization phase, and we discuss the design of an efficient solution phase. We will look at the issues involved when we are solving the sparse systems on parallel computers and will consider in particular their solution in a limited memory environment when outofcore working is required. Two different approaches are presented to read data from the disk, with a discussion on the advantages and the drawbacks of each. We present some experiments on realistic test problems using an outofcore version of a sparse multifrontal
Scaling and Pivoting in an OutofCore Sparse Direct Solver
"... Outofcore sparse direct solvers reduce the amount of main memory needed to factorize and solve large sparse linear systems of equations by holding the matrix data, the computed factors, and some of the work arrays in files on disk. The efficiency of the factorization and solution phases is depende ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Outofcore sparse direct solvers reduce the amount of main memory needed to factorize and solve large sparse linear systems of equations by holding the matrix data, the computed factors, and some of the work arrays in files on disk. The efficiency of the factorization and solution phases is dependent upon the number of entries in the factors. For a given pivot sequence, the level of fill in the factors beyond that predicted on the basis of the sparsity pattern alone depends on the number of pivots that are delayed (i.e., the number of pivots that are used later than expected because of numerical stability considerations). Our aim is to limit the number of delayed pivots, while maintaining robustness and accuracy. In this article, we consider a new outofcore multifrontal solver HSL MA78 from the HSL mathematical software library that is designed to solve the unsymmetric sparse linear systems that arise from finite element applications. We consider how equilibration can be built into the solver without requiring the system matrix to be held in main memory. We also examine the effects of different pivoting strategies, including threshold partial pivoting, threshold rook pivoting, and static pivoting. Numerical experiments on problems arising from a range of practical applications illustrate the importance of scaling and show that, in some cases, rook pivoting can be more efficient than partial pivoting in terms of both the factorization
An OutOfCore Dataflow Middleware to Reduce the Cost of Large Scale Iterative Solvers
 P2S2
, 2012
"... Abstract—The emergence of high performance computing (HPC) platforms equipped with solid state drives (SSD) presents an opportunity to dramatically increase the efficiency of outofcore numerical linear algebra computations. In this paper, we explore the advantages and challenges associated with per ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The emergence of high performance computing (HPC) platforms equipped with solid state drives (SSD) presents an opportunity to dramatically increase the efficiency of outofcore numerical linear algebra computations. In this paper, we explore the advantages and challenges associated with performing sparse matrix vector multiplications (SpMV) on a small SSD testbed. Such an endeavor requires programming abstractions that ease implementation, while enabling an efficient usage of the resources in the testbed. For this purpose, we adopt a taskbased outofcore programming model on top of a dataflow middleware based on the filter stream programming model. We compare the performance of the resulting outofcore iterated SpMV procedure running on the SSD testbed to the performance of an incore implementation on a multicore cluster for solving largescale eigenvalue problems. Preliminary experiments indicate that the outofcore implementation on the SSD testbed can compete with an incore implementation in terms of the total CPUhour cost. We conclude with some architectural design suggestions that can enable numerical linear algebra computations in general to be carried out with high efficiency on SSDequipped platforms. I.
Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects
, 2012
"... La résolution de systèmes linéaires creux est critique dans de nombreux domaines de la simulation numérique. Beaucoup d’applications, notamment industrielles, utilisent des méthodes directes en raison de leur précision et de leur robustesse. La qualité du résultat, les fonctionnalités numériques, ai ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
La résolution de systèmes linéaires creux est critique dans de nombreux domaines de la simulation numérique. Beaucoup d’applications, notamment industrielles, utilisent des méthodes directes en raison de leur précision et de leur robustesse. La qualité du résultat, les fonctionnalités numériques, ainsi que le temps de calcul sont critiques pour les applications. Par ailleurs, les ressources matérielles (nombre de processeurs, mémoire) doivent être utilisées de manière optimale. Dans cette habilitation, nous décrivons des travaux poursuivant ces objectifs dans le cadre de la plateforme logicielle MUMPS, développée à Toulouse, LyonGrenoble et Bordeaux depuis une quinzaine d’années. Le cœur de l’approche repose sur une parallélisation originale de la méthode multifrontale: une gestion asynchrone du parallélisme, associée à des ordonnanceurs distribués, permet de traiter des structures de données dynamiques et autorise ainsi le pivotage numérique. Nous nous intéressons à l’ordonnancement des tâches, à l’optimisation de la mémoire et à différentes fonctionnalités numériques. Les travaux en cours et les objectifs futurs visent à résoudre efficacement des problèmes de plus en plus gros, sans perte sur les aspects numériques, et tout en adaptant nos approches aux évolutions rapides des calculateurs. Dans ce contexte, les aspects génie