Results 11  20
of
23
An outofcore sparse symmetricindefinite factorization method
 CODEN ACMSCU. ISSN 00983500 (print), 15577295 (electronic). Alhargan:2006:ASC
, 2006
"... We present a new outofcore sparse symmetricindefinite factorization algorithm. The most significant innovation of the new algorithm is a dynamic partitioning method for the sparse factor. This partitioning method results in very low I/O traffic and allows the algorithm to run at high computationa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We present a new outofcore sparse symmetricindefinite factorization algorithm. The most significant innovation of the new algorithm is a dynamic partitioning method for the sparse factor. This partitioning method results in very low I/O traffic and allows the algorithm to run at high computational rates, even though the factor is stored on a slow disk. Our implementation of the new code compares well with both highperformance incore sparse symmetricindefinite codes and a highperformance outofcore sparse Cholesky code.
A Parallel InteriorPoint Algorithm for Linear Programming on a Shared Memory Machine
, 1998
"... The XPRESS 1 interior point optimizer is an "industrial strength" code for solution of largescale sparse linear programs. The purpose of the present paper is to discuss how the XPRESS interior point optimizer has been parallelized for a Silicon Graphics multi processor computer. The ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The XPRESS 1 interior point optimizer is an "industrial strength" code for solution of largescale sparse linear programs. The purpose of the present paper is to discuss how the XPRESS interior point optimizer has been parallelized for a Silicon Graphics multi processor computer. The major computational task, performed in each iteration of the interiorpoint method implemented in the XPRESS interior point optimizer is the solution of a symmetric and positive definite system of linear equations. Therefore, parallelization of the Cholesky decomposition and the triangular solve procedure are discussed in detail. Finally, computational results are presented to demonstrate the parallel efficiency of the optimizer. It should be emphasized that the methods discussed can be applied to the solution of largescale sparse linear least squares problems. Acknowledgment: We appreciate the comments made by an anonymous referee appointed by CORE which helped us to improve the manuscript....
Locality of reference in sparse Cholesky factorization methods
 SUBMITTED TO THE ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS
, 2005
"... Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are us ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in generalsymmetric and generalunsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cachemiss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finiteelement structural analysis, the leftlooking algorithm consistently outperforms the multifrontal algorithm. Direct cachemiss measurements indicate that the difference in performance is largely due to differences in the number of level2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the leftlooking algorithm. On the other hand, the leftlooking algorithm never uses more memory than the multifrontal one. Key words. Cholesky factorization, sparse cholesky, multifrontal methods, cacheefficiency, locality of reference AMS subject classifications. 15A23, 65F05, 65F50, 65Y10, 65Y20 1. Introduction. In
Advances in design and implementation of optimization software
 European Journal of Operational Research
, 1999
"... ..."
J.J.: Reducing overhead in sparse hypermatrix Cholesky factorization
 In: IFIP TC5 Workshop on High Performance Computational Science and Engineering (HPCSE), World Computer Congress
, 2004
"... The sparse hypermatrix storage scheme produces a recursive 2D partitioning of a sparse matrix. Data subblocks are stored as dense matrices. Since we are dealing with sparse matrices some zeros can be stored in those dense blocks. The overhead introduced by the operations on zeros can become really l ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The sparse hypermatrix storage scheme produces a recursive 2D partitioning of a sparse matrix. Data subblocks are stored as dense matrices. Since we are dealing with sparse matrices some zeros can be stored in those dense blocks. The overhead introduced by the operations on zeros can become really large and considerably degrade performance. In this paper, we present several techniques for reducing the operations on zeros in a sparse hypermatrix Cholesky factorization. By associating a bit to each column within a data submatrix we create a bit vector. We can avoid computations when the bitwise AND of their bit vectors is null. By keeping information about the actual space within a data submatrix which stores nonzeros (dense window) we can reduce both storage and computation.
BOS is Boss: A Case for BulkSynchronous Object Systems
 Proc. ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1999
"... A key issue for parallel systems is the development of useful programming abstractions that can coexist with good performance. We describe a communication library that supports an objectbased abstraction with a bulksynchronous communication style; this is the first time such a library has been pro ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A key issue for parallel systems is the development of useful programming abstractions that can coexist with good performance. We describe a communication library that supports an objectbased abstraction with a bulksynchronous communication style; this is the first time such a library has been proposed and implemented. By restricting the library to the exclusive use of barrier synchronization, we are able to design a simple and easytouse object system. By exploiting established techniques based on the bulksynchronous parallel (BSP) model, we are able to design algorithms and library implementations that work well across platforms. 1 Introduction Portable parallel programming systems should provide useful abstractions without precluding efficient execution. This paper describes a step towards this goal through the use of a communication library called the BSP Object System (BOS). BOS provides the convenience of efficient shared objects in a system optimized for (and restricted to...
User Guide for LDL, a concise sparse Cholesky package
, 2012
"... The LDL software package is a set of short, concise routines for factorizing symmetric positivedefinite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as poss ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The LDL software package is a set of short, concise routines for factorizing symmetric positivedefinite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possible, including an elegant method of sparse symmetric factorization that computes the factorization rowbyrow but stores it columnbycolumn. The entire symbolic and numeric factorization consists of less than 50 lines of code. The package is written in C, and includes a MATLAB interface. 1
Interior Point Methods: A Survey, Short Survey of Applications to Power Systems, and Research Opportunities
, 2004
"... 1 ..."
PARALLEL UNSYMMETRICPATTEN MULTIFRONTAL SPARSE LU WITH COLUMN PREORDERING
"... Abstract. We present a new parallel sparse LU factorization algorithm and code. The algorithm uses a columnpreordering partialpivoting unsymmetricpattern multifrontal approach. Our baseline sequential algorithm is based on umfpack 4 but is somewhat simpler and is often somewhat faster than umfpac ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We present a new parallel sparse LU factorization algorithm and code. The algorithm uses a columnpreordering partialpivoting unsymmetricpattern multifrontal approach. Our baseline sequential algorithm is based on umfpack 4 but is somewhat simpler and is often somewhat faster than umfpack version 4.0. Our parallel algorithm is designed for sharedmemory machines with a small or moderate number of processors (we tested it on up to 32 processors). We experimentally compare our algorithm with SuperLU MT, an existing sharedmemory sparse LU factorization with partial pivoting. SuperLU MT scales better than our new algorithm, but our algorithm is more reliable and is usually faster in absolute (on up to 16 processors; we were not able to run SuperLU MT on 32). More specifically, on large matrices our algorithm is always faster on up to 4 processors, and is usually faster on 8 and 16. The main contribution of this paper is showing that the columnpreordering partialpivoting unsymmetricpattern multifrontal approach, developed as a sequential algorithm by Davis in several recent versions of umfpack, can be effectively parallelized. 1.