Results 1  10
of
25
Randomized Matrix Computations
, 2012
"... We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2by2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
(Show Context)
We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2by2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation of a matrix by lowrank matrices and by structured matrices. Our technical advances include estimating the condition number of a random Toeplitz matrix, novel techniques of randomized preprocessing, a proof of their preconditioning power, and a dual version of the Sherman–Morrison–Woodbury formula. According to both our formal study and numerical tests we significantly accelerate the known algorithms and improve their output accuracy.
Adaptive Winograd’s Matrix Multiplications
, 2008
"... Modern architectures have complex memory hierarchies and increasing parallelism (e.g., multicores). These features make achieving and maintaining good performance across rapidly changing architectures increasingly difficult. Performance has become a complex tradeoff, not just a simple matter of cou ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Modern architectures have complex memory hierarchies and increasing parallelism (e.g., multicores). These features make achieving and maintaining good performance across rapidly changing architectures increasingly difficult. Performance has become a complex tradeoff, not just a simple matter of counting cost of simple CPU operations. We present a novel, hybrid, and adaptive recursive StrassenWinograd’s matrix multiplication (MM) that uses automatically tuned linear algebra software (ATLAS) or GotoBLAS. Our algorithm applies to any size and shape matrices stored in either row or column major layout (in doubleprecision in this work) and thus is efficiently applicable to both C and FORTRAN implementations. In addition, our algorithm divides the computation into equivalent incomplexity subMMs and does not require any extra computation to combine the intermediary subMM results. We achieve up to 22 % executiontime reduction versus GotoBLAS/ATLAS alone for a single core system and up to 19 % for a 2 dualcore processor system. Most importantly, even for small matrices such as 1500×1500, our approach attains already 10 % executiontime reduction and, for MM of matrices larger than 3000×3000, it delivers performance that would correspond, for a classic O(n3) algorithm, to fasterthanprocessor peak performance (i.e., our algorithm delivers the equivalent of 5 GFLOPS performance on a system with 4.4 GFLOPS peak performance and where GotoBLAS achieves only 4 GFLOPS). This is a result of the savings in operations (and thus FLOPS). Therefore, our algorithm is faster than any classic MM algorithms could ever be for matrices of this size. Furthermore, we present experimental evidence based on established methodologies found in the literature that our algorithm is, for a family of matrices, as accurate as the classic algorithms.
Additive Preconditioning, Eigenspaces, and the Inverse Iteration
, 2007
"... We incorporate our recent preconditioning techniques into the classical inverse power (Rayleigh quotient) iteration for computing matrix eigenvectors. Every loop of this iteration essentially amounts to solving an ill conditioned linear system of equations. Due to our modification we solve a well co ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
We incorporate our recent preconditioning techniques into the classical inverse power (Rayleigh quotient) iteration for computing matrix eigenvectors. Every loop of this iteration essentially amounts to solving an ill conditioned linear system of equations. Due to our modification we solve a well conditioned linear system instead. We prove that this modification preserves local quadratic convergence, show experimentally that fast global convergence is preserved as well, and yield similar results for higher order inverse iteration, covering the cases of multiple and clustered eigenvalues.
Products of Ordinary Differential Operators by Evaluation and Interpolation
, 2008
"... It is known that multiplication of linear differential operators over ground fields of characteristic zero can be reduced to a constant number of matrix products. We give a new algorithm by evaluation and interpolation which is faster than the previouslyknown one by a constant factor, and prove tha ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
It is known that multiplication of linear differential operators over ground fields of characteristic zero can be reduced to a constant number of matrix products. We give a new algorithm by evaluation and interpolation which is faster than the previouslyknown one by a constant factor, and prove that in characteristic zero, multiplication of differential operators and of matrices are computationally equivalent problems. In positive characteristic, we show that differential operators can be multiplied in nearly optimal time. Theoretical results are validated by intensive experiments. Categories and Subject Descriptors:
Toeplitz and Hankel Meet Hensel and Newton: Nearly Optimal Algorithms and Their Practical Acceleration with Saturated Initialization
 Program in Computer Science, The Graduate
, 2004
"... We extend Hensel lifting for solving general and structured linear systems of equations to the rings of integers modulo nonprimes, e.g. modulo a power of two. This enables significant saving of word operations. We elaborate upon this approach in the case of Toeplitz linear systems. In this case, we ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
We extend Hensel lifting for solving general and structured linear systems of equations to the rings of integers modulo nonprimes, e.g. modulo a power of two. This enables significant saving of word operations. We elaborate upon this approach in the case of Toeplitz linear systems. In this case, we initialize lifting with the MBA superfast algorithm, estimate that the overall bit operation (Boolean) cost of the solution is optimal up to roughly a logarithmic factor, and prove that the degeneration is unlikely even where the basic prime is fixed but the input matrix is random. We also comment on the extension of our algorithm to some other fundamental computations with (possibly singular) general and structured matrices and univariate polynomials as well as to the computation of the sign and the value of the determinant of an integer matrix.
Solving Toeplitz and Vandermondelike Linear Systems with Large Displacement Rank
, 2007
"... Linear systems with structures such as Toeplitz, Vandermonde or Cauchylikeness can be solved in O˜(α 2 n) operations, where n is the matrix size, α is its displacement rank, and O ˜ denotes the omission of logarithmic factors. We show that for Toeplitzlike and Vandermondelike matrices, this cos ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Linear systems with structures such as Toeplitz, Vandermonde or Cauchylikeness can be solved in O˜(α 2 n) operations, where n is the matrix size, α is its displacement rank, and O ˜ denotes the omission of logarithmic factors. We show that for Toeplitzlike and Vandermondelike matrices, this cost can be reduced to O˜(α ω−1 n), where ω is a feasible exponent for matrix multiplication over the base field. The best known estimate for ω is ω < 2.38, resulting in costs of order O˜(α 1.38 n). We also present consequences for HermitePadé approximation and bivariate interpolation.
Algebraic algorithms
"... This article, along with [Elkadi and Mourrain 1996], explain the correlation between residue theory and the Dixon matrix, which yields an alternative method for studying and approximating all common solutions. In 1916, Macaulay [1916] constructed a matrix whose determinant is a multiple of the class ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This article, along with [Elkadi and Mourrain 1996], explain the correlation between residue theory and the Dixon matrix, which yields an alternative method for studying and approximating all common solutions. In 1916, Macaulay [1916] constructed a matrix whose determinant is a multiple of the classical resultant for n homogeneous polynomials in n variables. The Macaulay matrix si16 multaneously generalizes the Sylvester matrix and the coefficient matrix of a system of linear equations [Kapur and Lakshman Y. N. 1992]. As the Dixon formulation, the Macaulay determinant is a multiple of the resultant. Macaulay, however, proved that a certain minor of his matrix divides the matrix determinant so as to yield the exact resultant in the case of generic homogeneous polynomials. Canny [1990] has invented a general method that perturbs any polynomial system and extracts a nontrivial projection operator. Using recent results pertaining to sparse polynomial systems [Gelfand et al. 1994, Sturmfels 1991], a matrix formula for computing the sparse resultant of n + 1 polynomials in n variables was given by Canny and Emiris [1993] and consequently improved in [Canny and Pedersen 1993, Emiris and Canny 1995]. The determinant of the sparse resultant matrix, like the Macaulay and Dixon matrices, only yields a projection operation, not the exact resultant. Here, sparsity means that only certain monomials in each of the n + 1 polynomials have nonzero coefficients. Sparsity is measured in geometric terms, namely, by the Newton polytope
Solving Linear Systems with Randomized Augmentation
"... Our randomized preprocessing of a matrix by means of augmentation counters its degeneracy and ill conditioning, uses neither pivoting nor orthogonalization, readily preserves matrix structure and sparseness, and leads to dramatic speedup of the solution of general and structured linear systems of eq ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Our randomized preprocessing of a matrix by means of augmentation counters its degeneracy and ill conditioning, uses neither pivoting nor orthogonalization, readily preserves matrix structure and sparseness, and leads to dramatic speedup of the solution of general and structured linear systems of equations in terms of both estimated arithmetic time and observed CPU time.
Exploiting Parallelism in MatrixComputation Kernels for Symmetric Multiprocessor Systems  MatrixMultiplication and MatrixAddition Algorithm Optimizations by Software Pipelining and Threads Allocation
, 2011
"... We present a simple and efficient methodology for the development, tuning, and installation of matrix algorithms such as the hybrid Strassen’s and Winograd’s fast matrix multiply or their combination with the 3M algorithm for complex matrices (i.e., hybrid: a recursive algorithm as Strassen’s until ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We present a simple and efficient methodology for the development, tuning, and installation of matrix algorithms such as the hybrid Strassen’s and Winograd’s fast matrix multiply or their combination with the 3M algorithm for complex matrices (i.e., hybrid: a recursive algorithm as Strassen’s until a highly tuned BLAS matrix multiplication allows performance advantages). We investigate how modern symmetric multiprocessor (SMP) architectures present old and new challenges that can be addressed by the combination of an algorithm design with careful and natural parallelism exploitation at the function level (optimizations) such as functioncall parallelism, function percolation, and function software pipelining. We have three contributions: first, we present a performance overview for double and double complex precision matrices for stateoftheart SMP systems; second, we introduce new algorithm implementations: a variant of the 3M algorithm and two new different schedules of Winograd’s matrix multiplication (achieving up to 20 % speed up w.r.t. regular matrix multiplication). About the latter Winograd’s algorithms: one is designed to minimize the number of matrix additions and the other to minimize the computation latency of matrix additions; third, we apply software pipelining and threads allocation to all the algorithms and we