Results 1 - 10
of
602
The design and implementation of FFTW3
- PROCEEDINGS OF THE IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our cu ..."
Abstract
-
Cited by 726 (3 self)
- Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs of prime size, a new way of implementing DFTs by means of machine-specific single-instruction, multiple-data (SIMD) instructions, and how a special-purpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm.
A survey of general-purpose computation on graphics hardware
, 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract
-
Cited by 554 (15 self)
- Add to MetaCart
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware.
The Landscape of Parallel Computing Research: A View from Berkeley
- TECHNICAL REPORT, UC BERKELEY
, 2006
"... ..."
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract
-
Cited by 199 (5 self)
- Add to MetaCart
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
OSKI: A library of automatically tuned sparse matrix kernels
- Institute of Physics Publishing
, 2005
"... kernels ..."
(Show Context)
Fourier slice photography
- in Proceedings of the International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’05
, 2005
"... Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of ..."
Abstract
-
Cited by 127 (4 self)
- Add to MetaCart
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions
SPL: A Language and Compiler for DSP Algorithms
, 2001
"... We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into ecient C or Fortran programs. The formulas are represented in a language that we call SPL, an acronym from Signal Processing Language. The compiler is a component of the SPI ..."
Abstract
-
Cited by 102 (10 self)
- Add to MetaCart
We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into ecient C or Fortran programs. The formulas are represented in a language that we call SPL, an acronym from Signal Processing Language. The compiler is a component of the SPIRAL system which makes use of formula transformations and intelligent search strategies to automatically generate optimized digital signal processing (DSP) libraries. After a discussion of the translation and optimization techniques implemented in the compiler, we use SPL formulations of the fast Fourier transform (FFT) to evaluate the compiler. Our results show that SPIRAL, which can be used to implement many classes of algorithms, produces programs that perform as well as \hard-wired" systems like FFTW.
A comparison of empirical and model-driven optimization
- In ACM Symp. on Programming Language Design and Implementation (PLDI’03
, 2003
"... Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use models of programs and machines to choose thes ..."
Abstract
-
Cited by 99 (12 self)
- Add to MetaCart
(Show Context)
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use models of programs and machines to choose these parameters. It is widely believed that empirical optimization is more effective than model-driven optimization, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the empirical optimization engine in ATLAS (a system for generating dense numerical linear algebra libraries) with a model-based optimization engine that used detailed models to estimate values for optimization parameters, and then measured the relative performance of the two systems on three different hardware platforms. Our experiments show that although model-based optimization can be surprisingly effective, useful models may have to consider not only hardware parameters but also the ability of back-end compilers to exploit hardware resources. 1.
Self adapting linear algebra algorithms and software
, 2004
"... One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned high-performance kernels, and for automatically choosing suitable algorithms. S ..."
Abstract
-
Cited by 93 (23 self)
- Add to MetaCart
(Show Context)
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe approaches for obtaining tuned high-performance kernels, and for automatically choosing suitable algorithms. Specifically, we describe the generation of dense and sparse blas kernels, and the selection of linear solver algorithms. However, the ideas presented here extend beyond these areas, which can be considered proof of concept.