Results 1 
7 of
7
Developments and Trends in the Parallel Solution of Linear Systems
, 1999
"... In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equat ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this review paper, we consider some important developments and trends in algorithm design for the solution of linear systems concentrating on aspects that involve the exploitation of parallelism. We briefly discuss the solution of dense linear systems, before studying the solution of sparse equations by direct and iterative methods. We consider preconditioning techniques for iterative solvers and discuss some of the present research issues in this field.
The impact of high performance Computing in the solution of linear systems: trends and problems
, 1999
"... We review the influence of the advent of high performance computing on the solution of linear equations. We will concentrate on direct methods of solution and consider both the case when the coefficient matrix is dense and when it is sparse. We will examine the current performance of software in thi ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We review the influence of the advent of high performance computing on the solution of linear equations. We will concentrate on direct methods of solution and consider both the case when the coefficient matrix is dense and when it is sparse. We will examine the current performance of software in this area and speculate on what advances we might expect in the early years of the next century. Keywords: sparse matrices, direct methods, parallelism, matrix factorization, multifrontal methods. AMS(MOS) subject classifications: 65F05, 65F50. 1 Current reports available at http://www.cerfacs.fr/algor/algo reports.html. Also appeared as Technical Report RALTR1999072 from Rutherford Appleton Laboratory, Oxfordshire. 2 duff@cerfacs.fr. Also at Atlas Centre, RAL, Oxon OX11 0QX, England. Rutherford Appleton Laboratory. Contents 1 Introduction 1 2 Building blocks 1 3 Factorization of dense matrices 2 4 Factorization of sparse matrices 4 5 Parallel computation 8 6 Current situation 12 7 F...
XKAAPI: a Multi Paradigm Runtime for Multicore Architectures
, 2013
"... The paper presents XKAAPI, a compact runtime for multicore architectures that brings multi parallel paradigms (parallel independent loops, forkjoin tasks and dataflow tasks) in a unified framework without performance penalty. Comparisons on independent loops with OpenMP and on dense linear algebr ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The paper presents XKAAPI, a compact runtime for multicore architectures that brings multi parallel paradigms (parallel independent loops, forkjoin tasks and dataflow tasks) in a unified framework without performance penalty. Comparisons on independent loops with OpenMP and on dense linear algebra with QUARK/PLASMA confirm our design decisions. Applied to EUROPLEXUS, an industrial simulation code for fast transient dynamics, we show that XKAAPI achieves high speedups on multicore architectures by efficiently parallelizing both independent loops and dataflow tasks.
Task Scheduling Using a Block Dependency DAG for BlockOriented Sparse Cholesky Factorization
 in: Proceedings of 14th ACM Symposium on Applied Computing
, 2000
"... Blockoriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the red ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Blockoriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the reduction of communication volumes so that it performs more efficiently on a distributedmemory multiprocessor system than the customary columnoriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that shows the execution behavior of block sparse Cholesky factorization in a distributedmemory system. Since the characteristics of tasks for the block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consi...
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
, 2013
"... This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the sam ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite and we provide comparisons between an Intel Xeon Phi coprocessor and a Sandy Bridge Xeonbased machine. Our benchmark suite is composed of three computing kernels: a Fibonacci computation that allows to study the overhead and the scalability of the runtime system, a NQueens application generating irregular and dynamic tasks and a Cholesky factorization algorithm. We also compare the Cholesky factorization with the parallel algorithm provided by the Intel MKL library for Intel Xeon Phi. Performance evaluation shows our XKaapi dataflow parallel programming environment exposes the lowest overhead of all and is highly competitive with native OpenMP and CilkPlus environments on Xeon Phi. Moreover, the efficient handling of dataflow dependencies between tasks makes our XKaapi environment exhibit more parallelism for some applications such as the Cholesky factorization. In that case, we observe substantial gains with up to 180 hardware threads over the state of the art MKL, with a 47 % performance increase for 60 hardware threads.
Malleable Tasks: An Efficient Model For Solving Actual Parallel Applications
, 1999
"... The purpose of this paper is to promote the model of Malleable Tasks for efficiently solving actual parallel applications. Malleable Tasks are presented and discussed in regard to other classical models. We show how this approach has been applied to implement two actual applications, namely, the sim ..."
Abstract
 Add to MetaCart
The purpose of this paper is to promote the model of Malleable Tasks for efficiently solving actual parallel applications. Malleable Tasks are presented and discussed in regard to other classical models. We show how this approach has been applied to implement two actual applications, namely, the simulation of the circulations in the Atlantic ocean, and large sparse Cholesky factorization.
SummaryPlus  Full Text + Links  PDF (130 K)
"... 6. gfedc For tridiagonals T replace T with LDL ..."
(Show Context)