Results 1  10
of
29
The design and implementation of a new outofcore sparse Cholesky factorization method
 ACM Transactions on Mathematical Software
"... We describe a new outofcore sparse Cholesky factorization method. The new method uses the elimination tree to partition the matrix, an advanced subtreescheduling algorithm, and both rightlooking and leftlooking updates. The implementation of the new method is efficient and robust. On a 2 GHz per ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
We describe a new outofcore sparse Cholesky factorization method. The new method uses the elimination tree to partition the matrix, an advanced subtreescheduling algorithm, and both rightlooking and leftlooking updates. The implementation of the new method is efficient and robust. On a 2 GHz personal computer with 768 MB of main memory, the code can easily factor matrices with factors of up to 48 GB, usually at rates above 1 Gflop/s. For example, the code can factor AUDIKW, currenly the largest matrix in any matrix collection (factor size over 10 GB), in a little over an hour, and can factor a matrix whose graph is a 140by140by140 mesh in about 12 hours (factor size around 27 GB).
Parallel and fully recursive multifrontal supernodal sparse cholesky
 Future Generation Computer Systems
, 2004
"... We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a multifrontal factorization strategy. Operations on small dense submatrices are performed using new densematrix subroutines that are part of the code, although the code can a ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
We describe the design, implementation, and performance of a new parallel sparse Cholesky factorization code. The code uses a multifrontal factorization strategy. Operations on small dense submatrices are performed using new densematrix subroutines that are part of the code, although the code can also use the BLAS and LAPACK. The new code is recursive at both the sparse and the dense levels, it uses a novel recursive data layout for dense submatrices, and it is parallelized using Cilk, an extension of C specifically designed to parallelize recursive codes. We demonstrate that the new code performs well and scales well on SMP’s. In particular, on up to 16 processors, the code outperforms two stateoftheart messagepassing codes. The scalability and high performance that the code achieves imply that recursive schedules, blocked data layouts, and dynamic scheduling are effective in the implementation of sparse factorization codes.
On the Measurement of
 Preferences in the Analytic Hierarchy Process”, Journal of Multicriteria Decision Analysis
, 1997
"... An enterprise is categorized as an SME if it has employees fewer than 200 and fixed capital less than 200 million baht, excluding land and building. It was reported that there are more than 80,000 SMEs in Thailand. Thus, SMEs are the main blood vessels of the Thai economy. Since the Thai economic co ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
An enterprise is categorized as an SME if it has employees fewer than 200 and fixed capital less than 200 million baht, excluding land and building. It was reported that there are more than 80,000 SMEs in Thailand. Thus, SMEs are the main blood vessels of the Thai economy. Since the Thai economic collapse in 1997, large numbers of Thai SMEs went bankrupt and wiped out of the industries. This resulted in SMEs which are tolerant to the changing economy and new environment. The objectives of this study are to investigate the status of intellectual capital (IC) of SMEs in Thailand and to enhance awareness of SME entrepreneurs regarding the value of IC in their companies. The findings from this study report recent status of IC in Thai SMEs. It should be helpful for enterprises that want to improve their management and maximize their IC assets.
Parallel Application Software on High Performance Computers  Parallel Diagonalisation Routines.
, 1996
"... In this report we list diagonalisation routines available for parallel computers. The methodology of each routine is outlined together with benchmark results on a typical matrix where available. Storage requirements and advantages and disadvantages of the method are also compared. The vast majority ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
In this report we list diagonalisation routines available for parallel computers. The methodology of each routine is outlined together with benchmark results on a typical matrix where available. Storage requirements and advantages and disadvantages of the method are also compared. The vast majority of these routines are available for real dense symmetric matrices only, although there is a known requirement for other data types  such as Hermitian or structured sparse matrices. We will report on new codes as they become available. This report is available from http://www.dl.ac.uk/TCSC/HPCI/ c fl1996, Daresbury Laboratory. We do not accept any responsibility for loss or damage arising from the use of information contained in any of our reports or in any communication about our tests or investigations. ii CONTENTS iii Contents 1 Summary 1 1.1 Test Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.2 Recommendations : : : : : : : : : : :...
WSMP: A HighPerformance Shared and DistributedMemory Parallel Sparse Linear Equation Solver
, 2001
"... The Watson Sparse Matrix Package, WSMP, is a highperformance, robust, and easy to use software package for solving large sparse systems of linear equations. It can be used as a serial package, or in a sharedmemory multiprocessor environment, or as a scalable parallel solver in a messagepassing en ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The Watson Sparse Matrix Package, WSMP, is a highperformance, robust, and easy to use software package for solving large sparse systems of linear equations. It can be used as a serial package, or in a sharedmemory multiprocessor environment, or as a scalable parallel solver in a messagepassing environment, where each node can either be a uniprocessor or a sharedmemory multiprocessor. A unique aspect of WSMP is that it exploits both SMP and MPP parallelism using Pthreads and MPI, respectively, while mostly shielding the user from the details of the architecture. Sparse symmetric factorization in WSMP has been clocked at up to 1.2 Gigaflops on RS6000 workstations with two 200 MHz Power3 CPUs and in excess of 90 Gigaflops on 128node (256processor) SP with twoway SMP 200 MHz Power3 nodes. This paper gives an overview of the algorithms, implementation aspects, performance results, and the user interface of WSMP for solving symmetric sparse systems of linear equations. Key words. Parallel software, Scientific computing, Sparse linear systems, Sparse matrix factorization, Highperformance computing 1.
An outofcore sparse symmetricindefinite factorization method
 CODEN ACMSCU. ISSN 00983500 (print), 15577295 (electronic). Alhargan:2006:ASC
, 2006
"... We present a new outofcore sparse symmetricindefinite factorization algorithm. The most significant innovation of the new algorithm is a dynamic partitioning method for the sparse factor. This partitioning method results in very low I/O traffic and allows the algorithm to run at high computationa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We present a new outofcore sparse symmetricindefinite factorization algorithm. The most significant innovation of the new algorithm is a dynamic partitioning method for the sparse factor. This partitioning method results in very low I/O traffic and allows the algorithm to run at high computational rates, even though the factor is stored on a slow disk. Our implementation of the new code compares well with both highperformance incore sparse symmetricindefinite codes and a highperformance outofcore sparse Cholesky code.
Locality of reference in sparse Cholesky factorization methods
 SUBMITTED TO THE ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS
, 2005
"... Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are us ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper analyzes the cache efficiency of two highperformance sparse Cholesky factorization algorithms: the multifrontal algorithm and the leftlooking algorithm. These two are essentially the only two algorithms that are used in current codes; generalizations of these algorithms are used in generalsymmetric and generalunsymmetric sparse triangular factorization codes. Our theoretical analysis shows that while both algorithms sometimes enjoy a high level of data reuse in the cache, they are incomparable: there are matrices on which one is cache efficient and the other is not, and vice versa. The theoretical analysis is backed up by detailed experimental evidence, which shows that our theoretical analyses do predict cachemiss rates and performance in practice, even though the theory uses a fairly simple cache model. We also show, experimentally, that on matrices arising from finiteelement structural analysis, the leftlooking algorithm consistently outperforms the multifrontal algorithm. Direct cachemiss measurements indicate that the difference in performance is largely due to differences in the number of level2 cache misses that the two algorithms generate. Finally, we also show that there are matrices where the multifrontal algorithm may require significantly more memory than the leftlooking algorithm. On the other hand, the leftlooking algorithm never uses more memory than the multifrontal one. Key words. Cholesky factorization, sparse cholesky, multifrontal methods, cacheefficiency, locality of reference AMS subject classifications. 15A23, 65F05, 65F50, 65Y10, 65Y20 1. Introduction. In
Using postordering and static symbolic factorization for parallel sparse LU
 In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS
, 2000
"... In this paper we present several improvements of widely used parallel LU factorization methods on sparse matrices. First we introduce the LU elimination forest and then we characterize the L, U factors in terms of their corresponding LU elimination forest. This characterization can be used as a comp ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present several improvements of widely used parallel LU factorization methods on sparse matrices. First we introduce the LU elimination forest and then we characterize the L, U factors in terms of their corresponding LU elimination forest. This characterization can be used as a compact storage scheme of the matrix as well as of the task dependence graph. To improve the use of BLAS in the numerical factorization, we perform a postorder traversal of the LU elimination forest, thus obtaining larger supernodes. To expose more task parallelism for a sparse matrix, we build a more accurate task dependence graph that includes only the least necessary dependences. Experiments compared favorably our methods against methods implemented in the S * environment on the SGI’s Origin2000 multiprocessor. 1. Introduction and
Running an Operational Baltic Sea Model on the T3E
, 1999
"... The Swedish Meterological and Hydrological Institute (SMHI) makes daily forecasts of currents, temperature, salinity, water level, and ice conditions in the Baltic Sea. These forecasts are based on data from a High Resolution Operational Model for the Baltic (HIROMB) running on a CRAY T3E. Up to thr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The Swedish Meterological and Hydrological Institute (SMHI) makes daily forecasts of currents, temperature, salinity, water level, and ice conditions in the Baltic Sea. These forecasts are based on data from a High Resolution Operational Model for the Baltic (HIROMB) running on a CRAY T3E. Up to three grids with different resolutions and coverage are involved in the calculation. The more extended but coarser grids provide boundary conditions to the finer grids. Our parallelization strategy is based on a subdivision of the computational grids into a set of smaller rectangular grid blocks which are distributed onto the parallel processors. The linear equation systems for water level and ice dynamics are solved with a distributed multifrontal solver. Here we present new performance results for a grid resolution of 1 nautical mile. We have introduced a new domain decomposition method, and compare it with the former one. Load balance is now much improved. Most of the time, ice covers only a small fraction of the whole grid. Hence, with the water decomposition, the ice dynamics computation would be severely illbalanced. We now use a separate ice decomposition, which also allows a good load balance for ice dynamics.
Finding exact and approximate block structures for ILU preconditioning
, 2001
"... Sparse matrices which arise in many applications often possess a block structure which can be exploited in iterative and direct solution methods. These blockmatrices have as their entries small dense blocks with constant or variable dimensions. Block versions of incomplete LU factorizations which ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Sparse matrices which arise in many applications often possess a block structure which can be exploited in iterative and direct solution methods. These blockmatrices have as their entries small dense blocks with constant or variable dimensions. Block versions of incomplete LU factorizations which have been developed to take advantage of such structures give rise to a class of preconditioners that are among the most effective available. This paper presents general techniques for determining automatically block structures in sparse matrices. A standard `graph compression' algorithm used in direct sparse matrix methods is considered along with two other algorithms which are also capable of unraveling approximate block structures.