Results 1  10
of
53,289
Linear Algebra Operators for GPU Implementation of Numerical Algorithms
 ACM Transactions on Graphics
, 2003
"... In this work, the emphasis is on the development of strategies to realize techniques of numerical computing on the graphics chip. In particular, the focus is on the acceleration of techniques for solving sets of algebraic equations as they occur in numerical simulation. We introduce a framework for ..."
Abstract

Cited by 324 (9 self)
 Add to MetaCart
for the implementation of linear algebra operators on programmable graphics processors (GPUs), thus providing the building blocks for the design of more complex numerical algorithms. In particular, we propose a stream model for arithmetic operations on vectors and matrices that exploits the intrinsic parallelism
Data Prefetching and Multilevel Blocking for Linear Algebra Operations
 In Proceedings of ICS’96
, 1996
"... Much effort has been directed towards obtaining near peak performance for linear algebra operations on current high performance workstations. The large amounts of data accesses however, make performance highly dependent on the behavior of the memory hierarchy. Techniques such as Multilevel Blocking ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Much effort has been directed towards obtaining near peak performance for linear algebra operations on current high performance workstations. The large amounts of data accesses however, make performance highly dependent on the behavior of the memory hierarchy. Techniques such as Multilevel Blocking
Basic Linear Algebra Operations In Sli Arithmetic
"... . Symmetric levelindex arithmetic was introduced to overcome recognized limitations of floatingpoint systems, most notably overflow and underflow. The original recursive algorithms for arithmetic operations could be parallelized to some extent, particularly when applied to extended sums or pro ..."
Abstract
 Add to MetaCart
or products, and a SIMD software implementation of some of these algorithms is described. The main purpose of this paper is to present parallel SLI algorithms for arithmetic and basic linear algebra operations. 1. Introduction This paper reports on a continuing project to develop, implement and apply
Basic Linear Algebra Operations in SLI Arithmetic
"... . Symmetric levelindex arithmetic was introduced to overcome recognized limitations of floatingpoint systems, most notably overflow and underflow. The original recursive algorithms for arithmetic operations could be parallelized to some extent, particularly when applied to extended sums or pro ..."
Abstract
 Add to MetaCart
or products, and a SIMD software implementation of some of these algorithms is described. The main purpose of this paper is to present parallel SLI algorithms for arithmetic and basic linear algebra operations. 1. Introduction This paper reports on a continuing project to develop, implement and apply
Data Prefetching and Multilevel Blocking for Linear Algebra Operations
 In Proceedings of ICS’96
, 1996
"... Much effort has been directed towards obtaining near peak performance for linear algebra operations on current high performance workstations. The large amounts of data accesses however, make performance highly dependent on the behavior of the memory hierarchy. Techniques such as Multilevel Blocking ..."
Abstract
 Add to MetaCart
Much effort has been directed towards obtaining near peak performance for linear algebra operations on current high performance workstations. The large amounts of data accesses however, make performance highly dependent on the behavior of the memory hierarchy. Techniques such as Multilevel Blocking
High Performance Linear Algebra Operations on Reconfigurable Systems
, 2005
"... FieldProgrammable Gate Arrays (FPGAs) have become an attractive option for scientific computing. Several vendors have developed high performance reconfigurable systems which employ FPGAs for application acceleration. In this paper, we propose a BLAS (Basic Linear Algebra Subprograms) library for s ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
FieldProgrammable Gate Arrays (FPGAs) have become an attractive option for scientific computing. Several vendors have developed high performance reconfigurable systems which employ FPGAs for application acceleration. In this paper, we propose a BLAS (Basic Linear Algebra Subprograms) library
Scheduling dense linear algebra operations on multicore processors
 CONCURRENCY COMPUTAT.:PRACT EXPER
, 2010
"... ..."
Scheduling linear algebra operations on multicore processors
"... Stateoftheart dense linear algebra software, such as the LAPACK and ScaLAPACK libraries, suffer performance losses on multicore processors due to their inability to fully exploit threadlevel parallelism. At the same time the coarsegrain dataflow model gains popularity as a paradigm for programm ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Stateoftheart dense linear algebra software, such as the LAPACK and ScaLAPACK libraries, suffer performance losses on multicore processors due to their inability to fully exploit threadlevel parallelism. At the same time the coarsegrain dataflow model gains popularity as a paradigm
A domainspecific compiler for linear algebra operations
 In High Performance Computing for Computational Science – VECPAR 2010
, 2013
"... Abstract. We present a prototypical linear algebra compiler that automatically exploits domainspecific knowledge to generate highperformance algorithms. The input to the compiler is a target equation together with knowledge of both the structure of the problem and the properties of the operands. T ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. We present a prototypical linear algebra compiler that automatically exploits domainspecific knowledge to generate highperformance algorithms. The input to the compiler is a target equation together with knowledge of both the structure of the problem and the properties of the operands
Basic Concepts for Distributed Sparse Linear Algebra Operations
, 1994
"... Introduction We introduce basic concepts for describing the communication patterns in common operations such as the matrix times vector and matrix transpose times vector product, where the matrix is sparse and stored on distributed processors. At first we will describe a simple onedimensional part ..."
Abstract
 Add to MetaCart
Introduction We introduce basic concepts for describing the communication patterns in common operations such as the matrix times vector and matrix transpose times vector product, where the matrix is sparse and stored on distributed processors. At first we will describe a simple one
Results 1  10
of
53,289