| Demmel, J. W.; Dhillon, I.; Ren, H. March 1994. "On the correctness of Parallel Bisection in Floating Point," LAPACK Working Note 70, University of Tennessee, Technical Report CS-94-228. |
....global symmetric submatrix A ia:ia n 1, ja:ja n 1 . Eigenvalues and eigenvectors can be selected by specifying a range of values or a range of indices for the eigenvalues. If n = 0, no computation is performed and the subroutine returns after doing some parameter checking. See references [6] [15], 16] and [17] Table 42. Data Types A, vl, vu, abstol, orfac, Z, w, work, gap iwork, ifail, iclustr Subroutine Long precision real Integer PDSYEVX Syntax FORTRAN CALL PDSYEVX (jobz, range, uplo, n, a, ia, ja, desc a, vl, vu, il, iu, abstol, m, nz, w, orfac, z, iz, jz, desc z, work, lwork, ....
Demmel, J. W.; Dhillon, I.; Ren, H. March 1994. "On the correctness of Parallel Bisection in Floating Point," LAPACK Working Note 70, University of Tennessee, Technical Report CS-94-228.
....is, though, that a number of di#erent techniques exist for handling arithmetic exceptions, depending on the context in which the exception occurs. In fact, it is often both easier and cheaper to respond to an exception after the fact than to prevent the exception from occurring in the first place [Demmel and Li 1994; Hull et al. 1994] Conversely, when exception handling is not available, it is sometimes necessary to artfully evade exceptions, resulting in programs that exhibit no exceptional behavior but waste time doing so [Brown 1981; Parlett 1979] In recent years, processor manufacturers have become ....
.... of functions such as complex division, or numeric libraries like LAPACK (l inear algebra pack age) Anderson et al. 1995] These routines are expected to be widely applicable and so must avoid being tripped up by exceptions that are simply an artifact of the way the calculation is performed [Demmel and Li 1994; Hull et al. 1994] The authors of such routines naturally constitute only a small minority of the people writing numeric code, but the results of their work are incorporated into the work of many others. Programmers who use numeric libraries today are often paying for the lack of standardized, ....
[Article contains additional citation context not shown here]
Demmel, J., Dhillon, I., and Ren, H. 1994. On the correctness of parallel bisection in floating point. Tech. Rep. UCB//CSD-94-805, Computer Science Division, Univ. of California, Berkeley, Calif. Also available as LAPACK Working Note 70, http://www.netlib.org/lapack/lawns/lawn70.ps.
....are already in the Java API. 3.1. copySign public static double copySign(double value, double sign) admits none yields none The functionality of copySign is used to implement complex log and complex square root at their discontinuities as well as for Sturm sequence and eigenvalue computations [7]. 7 The copySign method returns the first argument with the sign of the second argument without signaling divide by zero if sign is 0.0. The implementation is very straightforward; see if either argument is NaN, if so return the arguments sum, otherwise convert sign to integer and isolate the ....
James W. Demmel, Inderjit Dhilon, and Huan Ren, "On the Correctness of Parallel Bisection in Floating Point," Computer Sciences Division Technical Report UCB//CSD-94-805, 1994 (also LAPACK Working Note number 70, http://www/netlib.org/lapack/lawns/lawn70.ps).
....the programming interface, the implementation techniques, and the driving applications. For each example application, we describe its irregularities and execution behaviors using measurements from parallel executions. Several of the applications are described more completely elsewhere [CDG 93, DDR94, JY95, WY95] The data structures and applications are then summarized and compared in Section 2.3. We examine four data structures in increasing degree of irregularity of their driving applications: the bipartite graph data structure for bulk synchronous computation over an irregular mesh, the ....
....mechanism used for suspending task migration is described in Section 3.4.2. 2.2.3.3 Example Application Eigenvalue The Eigenvalue program computes the eigenvalues of an N by N symmetric tridiagonal matrix, which is known to have N real eigenvalues. The program uses the bisection algorithm [DDR94] to approximate the eigenvalues to an arbitrary precision. Given an input matrix, it first computes an initial interval of real numbers that contains all possible eigenvalues for the matrix. The interval is then divided into two half intervals (hence the name bisection) and the number of ....
[Article contains additional citation context not shown here]
J. Demmel, I. Dhillon, and H. Ren. On the correctness of parallel bisection in floating point. Technical Report UCB//CSD-94-805, UC Berkeley Computer Science Division, March 1994.
....used to successively subdivide the real line and locate all eigenvalues to arbitrary precision. A parallel implementation of bisection can use a static subdivision of the initial range, but this has poor parallel efficiency if the eigenvalues are clustered, because the work load is not balanced [DDR94] A solution is to use a task queue with load balancing for the scheduling structure. Because our machine target is now a distributed memory multiprocessor, locality is a more obvious concern, but for bisection, the tridiagonal matrix is relatively small and can be statically replicated, so the ....
J. Demmel, I. Dhillon, and H. Ren. On the correctness of parallel bisection in floating point. Tech Report UCB//CSD-94-805, UC Berkeley Computer Science Division, March 1994. available via anonymous ftp from tr-ftp.cs.berkeley.edu, in directory pub/techreports /csd/csd-94-805, file all.ps.
....Parallel Block Basic Linear Algebra Subroutines, and the BLACS[13] Basic Linear Algebra Communication Subroutines. The tridiagonal reduction was written by Jaeyoung Choi [3] Step 2 is broken into two parts, bisection and inverse iteration, parts of which were written by Inderjit Dhillon [5]. Both bisection and inverse iteration do O(1) communication, with each processor responsible for a subset of eigenvalues and eigenvectors. Gram Schmidt reorthogonalization of the eigenvectors is only performed within a single processor. Hence, if a cluster of eigenvalues is too large to fit on a ....
....a IEEE standard conforming divide operation which takes at least 50 times as long as multiply or add. We replaced this by a much faster, but less accurate divide. This requires a modification in the simple bisection algorithm to guarantee logical correctness despite possibly nonmonotonic arithmetic[5]. Second, a great deal of time was spent generating random numbers for inverse iteration. We changed from computing normally distributed random numbers, which require expensive transcendental function evaluations, to uniform random numbers. Together, these improvements sped up bisection and ....
J. Demmel, I. Dhillon, and H. Ren. On the correctness of parallel bisection in floating point. Tech Report UCB//CSD-94-805, UC Berkeley Computer Science Division, March 1994. available via anonymous ftp from tr-ftp.cs.berkeley.edu, in directory pub/tech-reports/cs/csd94 -805, file all.ps.
....sequences for count. In this example we have not yet decided what to do about all these problems, so we currently only guarantee correctness of PDSYEVX for networks of processors with identical floating point formats (but slightly different floating point operations turn out to be acceptable) See [4] for further discussion. Assigning the work by index rather than by range and sorting all the eigenvalues at the end may give the desired result with modest overhead. Of course, if floating point formats differ across processors, sorting is a problem in itself. This requires further investigation. ....
J. Demmel, I. Dhillon, and H. Ren. On the correctness of parallel bisection in floating point. ETNA, 3:116--149, 1995. (See also LAPACK Working Note No.70).
....slightly different code sequences for count. We have not yet decided what to do about all of these problems, so we currently only guarantee correctness of PDSYEVX for networks of processors with identical floating point formats (slightly different floating point operations are acceptable) See [4] for details. 3 The Distributed Linear Algebra Machine (DLAM) In this section, we present a theoretical model of a parallel computer dedicated to dense linear algebra. This model is from an abstraction of physical models. This ideal model provides a convenient framework for developing parallel ....
J. Demmel, I. Dhillon, and H. Ren. "On the correctness of parallel bisection in floating point". Technical Report UCB//CSD-94-805, University of California, Berkeley Computer Science Division, 1994. available via anonymous ftp from tr-ftp.cs.berkeley.edu, in directory pub/tech-reports/csd/csd-94-805, file all.ps.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC