Results 1  10
of
72
A Fast Fourier Transform Compiler
, 1999
"... FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the perform ..."
Abstract

Cited by 193 (5 self)
 Add to MetaCart
(Show Context)
FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performancecritical code was generated automatically by a specialpurpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft “discovered” algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this specialpurpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
Similarity search over time series data using wavelets
 In ICDE
, 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over highdimensional timeseries data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract

Cited by 81 (0 self)
 Add to MetaCart
(Show Context)
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over highdimensional timeseries data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also biorthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for timeseries data. We include several wavelets that outperform both the Haar wavelet and the best known nonwavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.
Virtual Radios
, 1998
"... Conventional software radios take advantage of vastly improved A/D converters and DSP hardware. Our approach, which we refer to as virtual radios, also depends upon high performance A/D converters. However, rather than use DSPs, we have chosen to ride the curve of rapidly improving workstation hardw ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
Conventional software radios take advantage of vastly improved A/D converters and DSP hardware. Our approach, which we refer to as virtual radios, also depends upon high performance A/D converters. However, rather than use DSPs, we have chosen to ride the curve of rapidly improving workstation hardware. We use wideband digitization and then perform all of the digital signal processing in user space on a general purpose workstation. This approach allows us to experiment with new approaches to signal processing that exploit the hardware and software resources of the workstation. Furthermore, it allows us to experiment with different ways of structuring systems in which the radio component of communication devices are integrated with higherlevel applications. This paper describes the design and performance of an environment we have constructed that facilitates building virtual radios and of two applications built using that environment. The environment consists of an I/O subsystem that p...
ArchitectureCognizant Divide and Conquer Algorithms
, 1999
"... Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionall ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Divide and conquer programs can achieve good performance on parallel computers and computers with deep memory hierarchies. We introduce architecturecognizant divide and conquer algorithms, and explore how they can achieve even better performance. An architecturecognizant algorithm has functionallyequivalent variants of the divide and/or combine functions, and a variant policy that specifies which variant to use at each level of recursion. An optimal variant policy is chosen for each target computer via experimentation. With h levels of recursion, an exhaustive search requires (v h ) experiments (where v is the number of variants). We present a method based on dynamic programming that reduces this to (h c ) (where c is typically a small constant) experiments for a class of architecturecognizant programs. We verify our technique on two kernels (matrix multiply and 2D Point Jacobi) using three architectures. Our technique improves performance by up to a factor of two, compared...
Scheduling Threads for Low Space Requirement and Good Locality
 In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1999
"... The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, spaceefficient scheduling algorithm for shared memory machines that combines the low scheduling overh ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, spaceefficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depthfirst schedulers. For a nestedparallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 +O(K \Delta p \Delta D) on p processors. Here, K is a useradjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, userlevel implementation of Posix standard threads or Pthreads, and evaluated its p...
Memory Characteristics of Iterative Methods
, 1999
"... Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that th ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
(Show Context)
Conventional implementations of iterative numerical algorithms, especially multigrid methods, merely reach a disappointing small percentage of the theoretically available CPU performance when applied to representative large problems. One of the most important reasons for this phenomenon is that the current DRAM technology cannot provide the data fast enough to keep the CPU busy. Although the fundamentals of cache optimizations are quite simple, current compilers cannot optimize even elementary iterative schemes. In this paper, we analyze the memory and cache behavior of iterative methods with extensive profiling and describe program transformation techniques to improve the cache performance of two and threedimensional multigrid algorithms. 1 Introduction Multigrid methods [11, 5] are among the most attractive algorithms for the solution of large sparse systems of equations that arise in the solution of elliptic partial differential equations (PDEs). However, even simple multi...
An Adaptive Software Library for Fast Fourier Transforms
 In Proceedings of the International Conference on Supercomputing
, 2000
"... In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at runtime by selec ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
In this paper we present an adaptive and portable software library for the fast Fourier transform (FFT). The library consists of a number of composable blocks of code called codelets, each computing a part of the transform. The actual FFT algorithm used by the code is determined at runtime by selecting the fastest strategy among all possible strategies, given available codelets, for a given transform size. We also presentanefficient automatic method of generating the library modules by using a specialpurpose compiler. The code generator is written in C and it generates a library of C codelets. The code generator is shown to be flexible and extensible and the entire library can be generated in a matter of seconds. Wehaveevaluated the library for performance on the IBMSP2, SGI2000, HPExemplar and Intel Pentium systems. We use the results from these evaluations to build performance models for the FFT library on different platforms. The library is shown to be portable, adaptive and efficient. 1.
Adaptive Use of Iterative Methods in PredictorCorrector Interior Point Methods for Linear Programming
 NUMERICAL ALGORITHMS
, 1999
"... ..."
(Show Context)
Determining modes for continuous data assimilation in 2D turbulence
 J. Statist. Phys
"... We study the number of determining modes necessary for continuous data assimilation in the twodimensional incompressible Navier–Stokes equations. Our focus is on how the spatial structure of the body forcing affects the rate of continuous data assimilation and the number of determining modes. We tr ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
We study the number of determining modes necessary for continuous data assimilation in the twodimensional incompressible Navier–Stokes equations. Our focus is on how the spatial structure of the body forcing affects the rate of continuous data assimilation and the number of determining modes. We treat this problem analytically by proving a convergence result depending on the H −1 norm of f and computationally by considering a family of forcing functions with identical Grashof numbers that are supported on different annuli in Fourier space. The rate of continuous data assimilation and the number of determining modes is shown to depend strongly on the length scales present in the forcing. KEY WORDS: Determining modes; continuous data assimilation. 1.