| Roman Geus and S. R ollin. Towards a fast parallel sparse matrix-vector multiplication. In E. H. D'Hollander, J. R. Joubert, F. J. Peters, and H. Sips, editors, Proceedings of the International Conference on Parallel Computing (ParCo), pages 308--315. Imperial College Press, 1999. |
....write values from CPU register to memory and vice versa) For example, an instruction like a a b can directly be translated to the machine code ADD a, b. This technique is called software pipelining; we reorganised our source code in such a way that the processor pipelines are better lled. In [7] this technique is analysed together with several other techniques for optimizing the performance of the matrix vector product. movl 12( ebp) ebx leal 0( ebx,4) esi movl 8( ecx) ebx movl ( ebx, esi) esi movl esi, 4( ebp) movl 4( ebp) ebx movl 4( ebp) esi movl esi, 28( ebp) ....
.... argc, char argv[ 10 f 11 if (argc 8 ) 14 char szVectorBvalue = argv[1] 15 char szVectorBdist = argv[2] 16 char szVectorXvalue = argv[3] 17 char szVectorXdist = argv[4] 18 char szMatrixAvalue = argv[5] 19 char szMatrixAdist = argv[6] 20 char szVectorXresult = argv[7]; 22 Vector b; 23 Vector x; 24 GeneralICCS1Matrix A; 25 MatrixArchive ar; 27 if (ar.Open(szVectorBvalue, szVectorBdist) f 28 if (ar b)f 31 cprintf( error: unable to read distribute vector bnn ) 36 if (ar.Open(szVectorXvalue, szVectorXdist) f 37 if (ar x)f 40 ....
S. Rollin R. Geus. Towards a fast parallel sparse matrix-vector multiplication. In E. H. D'Hollander, J. R. Joubert, F. J. Peters, and H. Sips, editors, Parallel Computing: Fundamentals & Applications, Proceedings of the International Conference ParCo'99, 17-20 August 1999.
....this sparsity structure. The most tractable solution involves presenting to the user many different specialized storage formats, and allowing the user to choose the format that best exploits his her sparsity structure in order to maximize cache reuse. This approach is widely used today, as in [31, 11, 37, 45, 47, 28]. The advantage of this solution is that the input format is fixed and assumed to be appropriate to the data structure, just as with dense BLAS. Choosing one of the more optimizable data structures (such as one of the block compressed storage schemes) should allow us to directly leverage the ....
Roman Geus and Stefan Rollin. Towards A Fast Parallel Sparse Matrix-Vector Multiplication, Institute Of Scientific Computing. ETH Zurich, Submitted to World Scientific, 1999.
No context found.
Roman Geus and S. R ollin. Towards a fast parallel sparse matrix-vector multiplication. In E. H. D'Hollander, J. R. Joubert, F. J. Peters, and H. Sips, editors, Proceedings of the International Conference on Parallel Computing (ParCo), pages 308--315. Imperial College Press, 1999.
No context found.
Roman Geus and S. R ollin. Towards a fast parallel sparse matrix-vector multiplication. In E. H. D'Hollander, J. R. Joubert, F. J. Peters, and H. Sips, editors, Proceedings of the International Conference on Parallel Computing (ParCo), pages 308--315. Imperial College Press, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC