#### DMCA

## Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

### Citations

160 |
Efficient algorithms for computing a strong rank-revealing QR factorization.
- Gu, Eisenstat
- 1996
(Show Context)
Citation Context ...P decomposition may fail to reveal the numerical rank correctly, it is a popular and economical method in many applications. Furthermore, the QRP is used as the first step to more robust RRQR methods =-=[2,8]-=- and for accelerating a Jacobi method for computing the singular value decomposition [5,6]. The QRP decomposition of A ∈ R m×n is defined by an orthonormal Q and a upper triangular matrix R such that ... |

122 |
Numerical methods for solving linear least squares problems,
- Golub
- 1965
(Show Context)
Citation Context ...sion 1.2.1). Topics. Parallel and Distributed Computing 1 QR decomposition with column pivoting TheQRdecompositionwithcolumnpivoting(QRP)isproposedforcomputinga rank revealing QR factorization (RRQR) =-=[7]-=-. Although the QRP decomposition may fail to reveal the numerical rank correctly, it is a popular and economical method in many applications. Furthermore, the QRP is used as the first step to more rob... |

81 |
A storage-efficient WY representation for products of Householder transformations,
- Schreiber, Loan
- 1988
(Show Context)
Citation Context ... machine. Fig.2. Execution time in percentage of Intel MKL routines DGEMV and DGEMM in DGEQP3 on a 12 core (6 cores per socket) Intel Xeon X5670 machine.Although DGEQP3 uses the YTY T representation =-=[10]-=- like DGEQRF, it is not fully blocked because of the column norm updating. The column norm updating requires to compute the row vector v T A for each Householder reflection applied to A. DGEQP3 update... |

47 | New fast and accurate Jacobi SVD algorithm: II
- Drmač, Veselić
(Show Context)
Citation Context ...mical method in many applications. Furthermore, the QRP is used as the first step to more robust RRQR methods [2,8] and for accelerating a Jacobi method for computing the singular value decomposition =-=[5,6]-=-. The QRP decomposition of A ∈ R m×n is defined by an orthonormal Q and a upper triangular matrix R such that AP = QR, where P is a permutation matrix chosen so that |r11| ≥ |r22| ≥ ··· ≥ |rnn|and mo... |

24 |
On rank-revealing factorisations
- CHANDRASEKARAN, IPSEN
- 1994
(Show Context)
Citation Context ...P decomposition may fail to reveal the numerical rank correctly, it is a popular and economical method in many applications. Furthermore, the QRP is used as the first step to more robust RRQR methods =-=[2,8]-=- and for accelerating a Jacobi method for computing the singular value decomposition [5,6]. The QRP decomposition of A ∈ R m×n is defined by an orthonormal Q and a upper triangular matrix R such that ... |

20 | A BLAS-3 Version of the QR Factorization with Column Pivoting
- Quintana-orti, Sun, et al.
- 1998
(Show Context)
Citation Context ...−τvv T A, which can be implemented using three levels of BLAS. The current LAPACK implementation (xGEQP3) is block based. It groups several rank-one updates for exploiting the Level 3 BLAS operations =-=[9]-=-. Figure 1 shows the performance of Intel MKL 3 routine DGEQP3 on a 12 core (6 cores per socket) Intel Xeon X5670 machine. Here the execution is set up to use one thread per core. The poor performance... |

11 | Scaling LAPACK panel operations using parallel cache assignment
- Castaldo, Whaley
- 2010
(Show Context)
Citation Context ... Algorithm 2 applies the Householder matrices to the rest of the matrix (lines 30 to 35). Recent work on using a parallel cache assignment approach to speed up the panel factorization can be found in =-=[3]-=-. Algorithm 2 processes the columns of the matrix in their natural order from left to right. On a parallel machine, it is natural to group the processors into a logical ring and deal columns in a roun... |

8 |
A parallel QR factorization algorithmwith controlled local pivoting
- Bischof
- 1991
(Show Context)
Citation Context ...putation across the processors and guarantees a load balanced computation. This distribution was first proposed in the context of the parallel implementation of a QR decomposition with local pivoting =-=[1]-=-. The selection of which columns are processed by each thread is not left to the OpenMP runtime, but explicitly controlled in lines 8, 15, and 31.Algorithm 2. OpenMP parallel QRP using column cyclic ... |

2 |
Z.: On the failure of rank-revealing QR factorization software – a case study
- Drmač, Bujanović
- 2008
(Show Context)
Citation Context ... outline of the Householder QRP algorithm is shown in Algorithm 1. Note that the formula for the column norm updating is simplified here. The current LAPACK implementation uses a more robust approach =-=[4]-=-. The omitted detail is not relevant to the parallelization discussed in this paper. Algorithm 1. QR decomposition with column pivoting 1 p1:n =1:n 2 c1:n = ‖Ae1:n‖2 2 3 for j =1:n 4 Choose i such tha... |