16 citations found. Retrieving documents...
S. Venugopal and V. K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj   (Correct)

....information transparent, performance tuning is extremely difficult. P 3 T at compile time computes a set of performance parameters each of which reflects a different performance aspect. In the following all P 3 T performance parameters are described. 4. 1 Work Distribution It is well known [8, 6, 44, 42, 30, 40, 13, 41, 34, 25] that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore, providing both programmer and parallelizing compiler with a work distribution parameter for ....

S. Venugopal and V. K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


Scalable Parallel Algorithms for Solving Sparse Systems of Linear.. - Gupta   (Correct)

....single processor run times on iPSC 2 reported in [3] We also found that for some matrices (e.g. that from a 127 127 9 point finite difference grid) our implementation on eight nCUBE2 processors (8.9 seconds) is faster than the 16 processor iPSC 860 implementation (9. 7 seconds) reported in [55], although iPSC 860 has much higher computation speeds. 8.1 Load Balancing for Factorization The factorization algorithm as described in this paper requires a binary relaxed supernodal elimination trees that are fairly balanced. After obtaining the ordered matrix and the corresponding elimination ....

Sesh Venugopal and Vijay K. Naik. SHAPE: A parallelization tool for sparse matrix computations. Technical Report DCS-TR-290, Department of Computer Science, Rutgers University, New Brunswick, NJ, June 1992. 49


PłT+: A Performance Estimator for Distributed and Parallel.. - Pozgaj, Fahringer (2000)   (Correct)

....The key functionality of P 3 T is devoted to compute a set of performance parameters at compile time: ffl Work Distribution The work distribution parameter describes how well the computations of a program are distributed over the set of available processors. As shown by numerous researchers [15, 13, 80, 68, 49, 62, 22, 67, 53, 34], work distribution has a strong influence on the cost performance ratio of a multiprocessor system. An uneven work distribution CHAPTER 3. P 3 T 25 may lead to a significant reduction in a program s performance. Therefore, providing both programmer and compiler with a work distribution ....

Sesh Venugopal and Vijay K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (1999)   (Correct)

....information transparent, performance tuning is extremely difficult. P 3 T at compile time computes a set of performance parameters each of which reflects a different performance aspect. In the following all P 3 T performance parameters are described. 4. 1 Work Distribution It is well known [7, 5, 37, 35, 27, 33, 11, 34, 30, 23] that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore, providing both programmer and parallelizing compiler with a work distribution parameter for ....

S. Venugopal and V. K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


Analysis and Design of Scalable Parallel Algorithms for Scientific .. - Gupta (1995)   (2 citations)  (Correct)

....single processor run times on iPSC 2 reported in [9] We also found that for some matrices (e.g. that from a 127 127 9 point finite difference grid) our implementation on eight nCUBE2 processors (8.9 seconds) is faster than the 16 processor iPSC 860 implementation (9. 7 seconds) reported in [141], although iPSC 860 has much higher computation speeds. 4.7.1 Load Balancing for Factorization The factorization algorithm as described in this chapter requires a binary relaxed supernodal elimination trees that are fairly balanced. After obtaining the ordered matrix and the corresponding ....

Sesh Venugopal and Vijay K. Naik. SHAPE: A parallelization tool for sparse matrix computations. Technical Report DCS-TR-290, Department of Computer Science, Rutgers University, New Brunswick, NJ, June 1992.


A Scalable Parallel Algorithm for Sparse Matrix Factorization - Gupta, Kumar (1994)   (7 citations)  (Correct)

....processor run times on iPSC 2 reported in [3] We also found that for some matrices (e.g. that from a 127 Theta 127 9 point finite difference grid) our implementation on eight nCUBE2 processors (8.9 seconds) is faster than the 16 processor iPSC 860 implementation (9. 7 seconds) reported in [51], although iPSC 860 has much higher computation speeds. 6.1 Load balancing for factorization The factorization algorithm as described in this paper requires a binary relaxed supernodal elimination trees that are fairly balanced. After obtaining the ordered matrix and the corresponding elimination ....

Sesh Venugopal and Vijay K. Naik. SHAPE: A parallelization tool for sparse matrix computations. Technical Report DCSTR -290, Department of Computer Science, Rutgers University, New Brunswick, NJ, June 1992.


Parallel Direct Methods for Block-Diagonal-Bordered Sparse.. - Koester, Ranka, Fox (1994)   (Correct)

....dense matrix counterparts [20] Parallel sparse matrix solver performance generally is less than similar dense matrix solvers even though there is more inherent parallelism in sparse matrix algorithms than dense matrix algorithms. This additional parallelism is often described by elimination trees [14, 15, 16, 20, 37, 38, 39, 40, 41, 45], graphs that illustrate the dependencies in the calculations. Parallel sparse linear solvers can simultaneously factor entire groups of mutually independent contiguous blocks of columns or rows without communications; meanwhile, dense linear solvers can only update blocks of contiguous columns or ....

....parallel calculations with no additional parallel communications overhead. 1. 2 Block Diagonal Bordered Direct Linear Solvers Block diagonal bordered sparse matrix algorithms require modifications to the normal preprocessing phase described in numerous papers on parallel Choleski factorization [14, 15, 16, 20, 37, 38, 39, 40, 41, 45]. Each of the numerous papers referenced above use the paradigm to order the sparse matrix and then perform symbolic factorization in order to determine the locations of all fillin values so that static data structures can be utilized for maximum efficiency when performing numerical factorization. ....

[Article contains additional citation context not shown here]

S. Venugopal and V. K. Naik. SHAPE: A Parallelization Tool for Sparse Matrix Computations. Research Report RC 17899 (77448), IBM Research Division, T. J. Watson Research Center Yorktown Heights, NY 10598, January 1992.


Parallel Block-Diagonal-Bordered Sparse Linear Solvers for Power.. - Koester (1994)   (2 citations)  (Correct)

....performance generally is less than similar dense matrix solvers even though there is more inherent parallelism in sparse matrix algorithms than dense matrix algorithms. This additional parallelism is often described by elimination trees, graphs that illustrate the dependencies in the calculations [19, 20, 21, 29, 55, 56, 57, 58, 64]. Parallel sparse linear solvers can simultaneously factor entire groups of mutually independent contiguous blocks of columns or rows without communications; meanwhile, dense linear solvers can only update blocks of contiguous columns or rows during each pipelined communication cycle. The limited ....

.... been reported in the power systems community journals to solve the special very sparse irregular power systems network matrices, there has been significant research into efficient general sparse linear solvers for general matrices, always larger and less sparse than power systems network matrices [8, 9, 10, 19, 20, 21, 25, 29, 46, 47, 55, 56, 57, 58, 64]. In the research presented in this thesis, we have developed specialized, efficient parallel sparse linear solvers for linear systems derived from power systems networks. The performance of our parallel linear solvers is significantly better than the performance of linear solvers reported in the ....

[Article contains additional citation context not shown here]

S. Venugopal and V. K. Naik. SHAPE: A Parallelization Tool for Sparse Matrix Computations. Research Report RC 17899 (77448), IBM Research Division, T. J. Watson Research Center Yorktown Heights, NY 10598, January 1992.


On Estimating the Useful Work Distribution of Parallel Programs.. - Fahringer (1996)   (4 citations)  (Correct)

....instantiations for which they own the corresponding sub domain. This naturally specifies the amount of work to be done by each processor and consequently the overall work distribution of a parallel program. Therefore domain decomposition inherently implies a work distribution. It is well known ([8, 5, 3, 28, 26, 18, 22, 6, 25, 19, 13]) that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore providing both programmer and parallelizing compiler with a work distribution parameter ....

Sesh Venugopal and Vijay K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


On Using Volume Computation to Estimate the Work Distribution .. - Fahringer, Hong (1995)   (Correct)

....instantiations for which they own the corresponding sub domain. This naturally specifies the amount of work to be done by each processor and consequently the overall work distribution of a parallel program. Therefore domain decomposition inherently implies a work distribution. It is well known ([8, 5, 3, 28, 26, 17, 21, 6, 25, 18, 12]) that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore providing both programmer and parallelizing compiler with a work distribution parameter for ....

Sesh Venugopal and Vijay K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


Software Support For Parallel Processing Of Irregular And Dynamic.. - Jiao (1996)   (3 citations)  (Correct)

....one of the most important research subjects in parallel computing. The Sparse Matrix Computation is one of the most widely investigated problems, and it s notorious difficult to obtain performance for this class of problems. Many methods has been proposed. See for example, work done by Venugopal [51], Rothberg [44] One of the noteworthy works, which is very closely related to our approach, is done by Chong et al. [11] where the DSC clustering algorithm from PYRROS was applied to the sparse triangular solution iterative problem. In that paper, it was shown that the DSC algorithm can improve ....

S. Venugopal and V. K. Naik, SHAPE: A parallelization tool for sparse matrix computations. Technical Report DSC-TR-290, Department of Computer Science, Rutgers University, New Brunswick, NJ. June 1992.


A Scalable Parallel Algorithm for Sparse Cholesky Factorization - Gupta, Kumar   (3 citations)  (Correct)

....28] have a lower bound of O(Np) on the total communication volume. In [1] Ashcraft proposes a fan both family of parallel Cholesky factorization algorithms that have a total communication volume of O(N p p log N ) A few schemes with two dimensional partitioning of the matrix have been proposed [22, 29, 27, 26], and the total communication volume in the best of these schemes [27, 26] is O(N p p log p) box C) In summary, the simple parallel algorithm with O(Np log p) communication volume (box A) has been improved along two directions one by improving the mapping of matrix columns onto processors ....

....processor run times on iPSC 2 reported in [2] We also found that for some matrices (e.g. that from a 127 Theta 127 9 point finite difference grid) our implementation on eight nCUBE2 processors (8.9 seconds) is faster than the 16 processor iPSC 860 implementation (9. 7 seconds) reported in [29], although iPSC 860 has much higher computation speeds. The factorization algorithm as described in this paper requires a binary supernodal elimination trees that are fairly balanced. After obtaining the ordered matrix and the corresponding elimination tree, we run the elimination tree through a ....

Sesh Venugopal and Vijay K. Naik. SHAPE: A parallelization tool for sparse matrix computations. Technical Report DCS-TR-290, Department of Computer Science, Rutgers University, New Brunswick, NJ, June 1992.


Parallelizing Unstructured Sparse Matrix Computations On.. - Venugopal (1993)   Self-citation (Venugopal)   (Correct)

....is described in Chapter 3, and its construction for block based sparse Cholesky factorization is described in Chapter 5. Parallel partitioner for block based sparse Cholesky factorization: We have developed a parallel partitioner for the sparse Cholesky factorization of any general sparse matrix [73, 74, 75], which partitions the symbolic factor into a hybrid mixture of sparse columns, dense triangles and dense rectangles. Tuning knobs in the form of parameters are provided in order that the partitioner may be ported across problems with widely ranging sparsity structures and across architectures ....

....structure of the matrix in extracting the partitions, and allows for control of the partition granularity. Using some of the principles proposed in that paper, we have developed a parallel partitioner for sparse Cholesky factorization on message passing multiprocessor systems, first described in [74]. The partitioning is a mix of dense blocks and sparse columns, and allows for parametric control to make the partitioning sensitive to both the matrix structure and the machine granularity. With such a hybrid partitioning method, higher performance and scalability is achievable, provided a right ....

[Article contains additional citation context not shown here]

S. Venugopal and V. K. Naik. Shape: A parallelization tool for sparse matrix computations. Technical Report DCS-TR-290, Department of Computer Science, Rutgers University, 1992. Also available as IBM Research Report RC 17899.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (1999)   (Correct)

No context found.

S. Venugopal and V. K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


P³T+: A Performance Estimator for Distributed and.. - Pozgaj, Fahringer (2000)   (Correct)

No context found.

Sesh Venugopal and Vijay K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.


P³T+: A Performance Estimator for Distributed and.. - Fahringer, Pozgaj (2001)   (Correct)

No context found.

S. Venugopal and V. K. Naik. SHAPE: a parallelization tool for sparse matrix computations. Research report rc 17899, IBM Research Division, T.J. Watson Research Center, Yorktwon Heights, NY 10598, July 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC