Results 1 
5 of
5
Sparse matrixvector multiplication on FPGAs
 In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays
, 2005
"... Sparse matrixvector multiplication (SpMXV) is a key computational kernel widely used in scientific applications and signal processing applications. However, the performance of SpMXV on most modern processors is poor due to the irregular sparsity structure in the matrices. Applicationspecific proce ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
(Show Context)
Sparse matrixvector multiplication (SpMXV) is a key computational kernel widely used in scientific applications and signal processing applications. However, the performance of SpMXV on most modern processors is poor due to the irregular sparsity structure in the matrices. Applicationspecific processors including FPGAbased systems have become promising alternatives to realize high performance sparse matrix computations. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of this kernel. In this paper, we propose a FPGAbased algorithm for floatingpoint SpMXV. Our algorithm employs a circular linear array architecture which can perform multiple floatingpoint operations in parallel. Our algorithm accepts sparse matrices in columncompressed format (CCF) as inputs, and does not have any requirements on the sparsity structure of the matrices. The architecture overlaps memory I/O time and computation time to improve the performance. The performance of our algorithm is evaluated using various sparse matrices used by the scientific computing community. Our target device is Xilinx VirtexII Pro XC2VP100. Experimental results show that for 32bit floatingpoint SpMXV, our algorithm achieves 1.2 GFLOPS with 6 PEs at a memory bandwidth of 5.6 GB/s. For 64bit floating point SpMXV, our algorithm is able to achieve 950 MFLOPS at a memory bandwidth of 6.5 GB/s. These implementations use less than 15 % and 30 % of slices on XC2VP100, respectively. 1
Mapping Sparse MatrixVector Multiplication on FPGAs
"... Higher peak performance on Field Programmable Gate Arrays (FPGAs) than on microprocessors was shown for sparse matrix vector multiplication (SpMxV) accelerator designs. However due to the frequent memory movement in SpMxV, system performance is heavily affected by memory bandwidth and overheads in r ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Higher peak performance on Field Programmable Gate Arrays (FPGAs) than on microprocessors was shown for sparse matrix vector multiplication (SpMxV) accelerator designs. However due to the frequent memory movement in SpMxV, system performance is heavily affected by memory bandwidth and overheads in real applications. In this paper, we introduce an innovative SpMxV Solver, designed for FPGAs, SSF. Besides high computational throughput, system performance is optimized by minimizing and overlapping I/O operations, reducing initialization time and overhead, and increasing scalability. The potential of using mixed (64bit, 32bit) data formats to increase system performance is also explored. SSF accepts any matrix size and easily adapts to different data formats. SSF minimizes resource costs and uses concise control logic by taking advantage of the data flow via innovative floating point accumulation logic. To analyze the performance, a performance model is defined for SpMxV on FPGAs. Compared to microprocessors, SSF has speedups up to 20x and depends less on the sparsity structure.
Sparse Matrix Vector Multiplication Kernel on a Reconfigurable Computer,” Proc. 9th Ann. HighPerformance Embedded Computing
 Workshop, MIT Lincoln Laboratory, 2005; www.ll.mit.edu/HPEC/agendas/proc05/ HPEC05_Open.pdf. 64 Computer Join the IEEE Computer Society online at www.computer.org/join/ Complete the
"... ..."
(Show Context)
�2007 SWPS THROUGHPUT IMPROVEMENT OF MOLECULAR DYNAMICS SIMULATIONS USING RECONFIGURABLE COMPUTING ∗
"... Abstract. A number of grandchallenge scientific applications are unable to harness Terflopsscale computing capabilities of massivelyparallel processing (MPP) systems due to their inherent scaling limits. For these applications, multiparadigm computing systems that provide additional computing ca ..."
Abstract
 Add to MetaCart
Abstract. A number of grandchallenge scientific applications are unable to harness Terflopsscale computing capabilities of massivelyparallel processing (MPP) systems due to their inherent scaling limits. For these applications, multiparadigm computing systems that provide additional computing capability per processing node using accelerators are a viable solution. Among various generic and customdesigned accelerators that represent a dataparallel programming paradigm, FPGA devices provide a number of performance enhancing features including concurrency, deeppipelining and streaming in a flexible manner. We demonstrate acceleration of a productionlevel biomolecular simulation, in which typical speedups are less than 20 on even the most powerful supercomputing systems, on an FPGAenabled system with a highlevel programming interface. Using accurate models of our FPGA implementation and parallel efficiency results obtained on the Cray XT3 system, we project that the timetosolution is reduced significantly as compared to the microprocessoronly execution times. A further advantage of computing with FPGAenabled systems over microprocessoronly implementations is performance sustainability for largescale problems. The computational complexity of a biomolecular simulation is proportional to its problem sizes, hence the runtime on a microprocessor increases at a much faster rate as compared to FPGAenabled systems which are capable of providing very high throughput for computeintensive operations thereby sustaining performance for largescale problems. Key words. field programmable gate arrays, molecular modeling, performance modeling and projections
ABSTRACT Sparse MatrixVector Multiplication on FPGAs ∗ Floatingpoint Sparse MatrixVector Multiplication (SpMXV) is
"... a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on generalpurpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on cur ..."
Abstract
 Add to MetaCart
(Show Context)
a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on generalpurpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of SpMXV. In this paper, we propose an FPGAbased design for SpMXV. Our design accepts sparse matrices in Compressed Row Storage format, and makes no assumptions about the sparsity structure of the input matrix. The design employs IEEE754 format doubleprecision floatingpoint multipliers/adders, and performs multiple floatingpoint operations as well as I/O operations in parallel. The performance of our design for SpMXV is evaluated using various sparse matrices from the scientific computing community, with the Xilinx VirtexII Pro XC2VP70 as the target device. The MFLOPS performance increases with the hardware resources on the device as well as the available memory bandwidth. For example, when the memory bandwidth is 8 GB/s, our design achieves over 350 MFLOPS for all the test matrices. It demonstrates significant speedup over generalpurpose processors particularly for matrices with very irregular sparsity structure. Besides solving SpMXV problem, our design provides a parameterized and flexible treebased design for floatingpoint applications on FPGAs.