3 citations found. Retrieving documents...
D.E. Keyes. Trends in algorithms for nonuniform applications on hierarchical distributed architectures. In M.D. Salas and W. K. Anderson, editors, Proceedings of the Workshop on Computational Aerosciences for the 21st Century, Kluwer, 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Perspectives on Asynchronous Computations for Fluid Flow Problems - Szyld (2000)   (Correct)

....is bound to become more evident with the advent of the new generation of massively parallel computers and clusters of symmetric multiprocessors, where total communication costs are higher. The need for asynchronism is often brought up in connection with Grand Challenge problems; see, e.g. [18], 9] We believe that the potential for gains using asynchronous parallel iterative methods for linear or nonlinear systems in irregular regions or inhomogeneous clusters can be achieved with a small investment. For example, one could replace the (synchronous) SOR solver for the pressure in the ....

D.E. Keyes. Trends in algorithms for nonuniform applications on hierarchical distributed architectures. In M.D. Salas and W. K. Anderson, editors, Proceedings of the Workshop on Computational Aerosciences for the 21st Century, Kluwer, 2000.


How Scalable is Domain Decomposition in Practice? - Keyes (1998)   (6 citations)  Self-citation (Keyes)   (Correct)

....very slowly at the high performance end, and most architectural advances drive the percentage of peak available to PDE applications lower, while raising the theoretical peak performance. We briefly describe three such improvements algorithmic under investigation; for additional detail, see [Key99] Improvement in algorithmic efficiency A region of strong nonlinearity embedded in a region of nearly linear behavior is characteristic of many important PDE problems (e.g. noise, flame, and crack propagation) Such problems may be said to be nonlinearly stiff ; progress of Newton s method ....

....SCALABLE IS DOMAIN DECOMPOSITION 295 a nonoverlapping analog known as Schur Newton Krylov, in which the iteration is reduced to a lower dimensional interface. Automatic detection of the small regions responsible for the nonlinear stiffness may be possible through an indicator such as tensoricity [Key99] Improvement in parallel implementation efficiency Profiling some large scale runs on 128 and 512 processors, we observe that the percentage of time devoted to global inner products increases from 6 to 9 . The growing percentage of execution time consumed by the global reduction step of inner ....

[Article contains additional citation context not shown here]

Keyes D. E. (1999) Trends in algorithms for nonuniform applications on hierarchical distributed architectures. To appear in the Proceedings of a Workshop on Computational Aerosciences in the 21st Century (Salas, ed.), Kluwer.


How Scalable is Domain Decomposition in Practice? - Keyes (1998)   (6 citations)  Self-citation (Keyes)   (Correct)

....very slowly at the high performance end, and most architectural advances drive the percentage of peak available to PDE applications lower, while raising the theoretical peak performance. We briefly describe three such improvements algorithmic under investigation; for additional detail, see [11]. 5.1 Improvement in algorithmic efficiency A region of strong nonlinearity embedded in a region of nearly linear behavior is characteristic of many important PDE problems (e.g. noise, flame, and crack propagation) Such problems may be said to be nonlinearly stiff ; progress of Newton s method ....

....an entire inner subdomain. There is a nonoverlapping analog known as Schur Newton Krylov, in which the iteration is reduced to a lower dimensional interface. Automatic detection of the small regions responsible for the nonlinear stiffness may be possible through an indicator such as tensoricity [11]. 5.2 Improvement in parallel implementation efficiency Profiling the runs of Fig. 3, we observe that the efficiency of the 512 processor case relative to the 128 processor case is 84 , and the percentage of execution time devoted to global inner products increases to 9 , from 6 in the ....

[Article contains additional citation context not shown here]

D. E. Keyes. Trends in algorithms for nonuniform applications on hierarchical distributed architectures. To appear in the Proceedings of a Workshop on Computational Aerosciences in the 21st Century (Salas, ed.), Kluwer, 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC