6 citations found. Retrieving documents...
S. Booth, J. Fisher, N. MacDonald, P. Maccallum, E. Minty, and A. Simpson. Parallel Programming on the Cray T3D, Version 1.1. Technical report, Edinburgh Parallel Computing Centre, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, August 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Sparse LU Factorization on The Cray T3D - Asenjo, Zapata (1995)   (Correct)

....where u; 0 u 1 is a threshold parameter [4, Chap. 7] This paper is organized as follows. Section 2 describes the data distribution. For a detailed explanation of the parallell algorithm and its implementation see [1] Section 3 presents the execution times and workload balance on the CRAY T3D [9] for different sizes of the sparse matrix selected from the Harwell Boeing sparse matrix collection. 2 Parallel Sparse LU Algorithm The parallel algorithm executes a number of iterations, each involving three different phases: Pivots search, rows and columns permutation, and reduced matrix and R ....

S. Booth J. Fisher N. MacDonald P. Maccallum E. Minty and A. Simpson. Parallel Programming on the Cray T3D. Edinburgh Parallel Computing Centre, University of Edinburgh, U.K., September 1994.


Sparse LU Factorization on The Cray T3D - Asenjo, Zapata (1995)   (Correct)

....scatter methods. Each processor stores the nonzero elements belonging to its local matrix in two ways: Like a semi ordered, two dimensional linked list, or by using a storage format that we call Block Row Scatter (BRS) 9] With all these features, our code was implemented on the CRAY T3D machine [8]. This paper is organized as follows. Section 2 describes the parallel algorithm, highlighting optimizations performed to reduce the overhead due to communications. Section 3 presents the execution time and workload balance on the CRAY T3D [8] for different sizes of sparse matrices selected from ....

....our code was implemented on the CRAY T3D machine [8] This paper is organized as follows. Section 2 describes the parallel algorithm, highlighting optimizations performed to reduce the overhead due to communications. Section 3 presents the execution time and workload balance on the CRAY T3D [8] for different sizes of sparse matrices selected from the Harwell Boeing sparse matrix collection. 2 Parallel sparse LU algorithm The parallel algorithm executes a number of iterations, each one containing three different phases: Pivots search, rows and columns permutation, and reduced matrix ....

S. Booth J. Fisher N. MacDonald P. Maccallum E. Minty and A. Simpson. Parallel Programming on the Cray T3D. Edinburgh Parallel Computing Centre, University of Edinburgh, U.K., September 1994.


HPF-2 Scope of Activities and Motivating Applications - Forum (1994)   (3 citations)  (Correct)

....message passing interface. The data distribution follows the philosophy of the scatter methods. Each processor stores the nonzero elements belonging to its local matrix like a semi ordered, two dimensional linked list [94] With all these features, our code was implemented on the CRAY T3D machine [68]. Favored parallel algorithms An algorithm for distributed memory systems is the Stappen et al. 94] Such an algorithm distinguishes different phases: Search for pivots, rows and columns permutations, and updating of the reduced submatrix. They also perform a detailed study of the algorithm ....

S. Booth J. Fisher N. MacDonald P. Maccallum E. Minty and A. Simpson. Parallel Programming on the Cray T3D. Edinburgh Parallel Computing Centre, University of Edinburgh, U.K., September 1994.


Scalable and Portable Computing Using the WPRAM Model - Nash, Dew, Dyer (1996)   Self-citation (Report)   (Correct)

.... ports, to provide an indirect naming service, and the idea of collective communications groups, to implement global message passing operations (such as gather and scatter) The Cray T3D incurs substantial overheads for message passing operations, of the order of 50 70 microsecs using PVM [3]. In comparison, the machine supports the ability to directly access remote memories, rather than using message passing. This reduces the overheads for data access to around 1 2 microsecs [3] The idea of a shared address space, accessible by all nodes of the machine, is becoming more prevalent. ....

....T3D incurs substantial overheads for message passing operations, of the order of 50 70 microsecs using PVM [3] In comparison, the machine supports the ability to directly access remote memories, rather than using message passing. This reduces the overheads for data access to around 1 2 microsecs [3]. The idea of a shared address space, accessible by all nodes of the machine, is becoming more prevalent. In the same spirit as the global router abstraction, data is either accessible by a processor within its local memory, or within a shared address space. The latter is uniformly accessible to ....

S. Booth, J. Fisher, N. MacDonald, P. Maccallum, E. Minty, and A. Simpson. Parallel Programming on the Cray T3D, Version 1.1. Technical report, Edinburgh Parallel Computing Centre, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, August 1994.


A Parallelisation Approach for Supporting Scalable and.. - Nash, Dew, Davy (1997)   (2 citations)  Self-citation (Report)   (Correct)

....local address pair) and both the processor and memory are connected to the network, allowing for direct access of remote data without interrupting the execution of the processor. A separate network is available to support barrier synchronisation. For this work, the Cray shmem operations were used [1]. These support the direct remote memory access operations, and provide low latencies and high bandwidths. The first table in Figure 3 gives a summary of the Cray operations used to support the SADTs. ffl Swap: executes an atomic swap w = W ; W = c, for a constant c, local word w and remote word W ....

....analytical model predicts these times quite closely (shown by the pred labels in the graphs) The results for the Counter are less accurate, although by no more than a factor of 3:5 from the observed performance. This is largely due to the Cray using a special high performance annex register [1]. The results for the Event also provide the alternative implementation based on the use of a twolevel tree to broadcast the signal. It can again be seen that the predicted results are very close, with the alternative method performing better at around 32 processors. The results for the Flag and ....

S. Booth, J. Fisher, N. MacDonald, P. Maccallum, E. Minty, and A. Simpson. Parallel Programming on the Cray T3D, Version 1.1. Technical report, Edinburgh Parallel Computing Centre, The University of Edinburgh, Edinburgh EH9 3JZ, August 1994.


Towards a Model for Shared Data Abstraction with.. - Goodeve, Dobson, Nash, .. (1998)   (1 citation)  Self-citation (Report)   (Correct)

....described above, but the nature of the tradeoffs that can be exploited in the design of types is very similar. The prototype SADT system on the Cray T3D is coded in C, using the primitive operations of the WPRAM computational model[35] and the underlying Cray SHMEM Shared Memory primitives library[11]. These primitives are low level but lightweight, and offer a well founded cost model that can be used to drive the implementation of types. The nature of the network of workstations environment is such that an accurate cost model would be hard, if not impossible to produce. The network of ....

S. Booth, J. Fisher, N. MacDonald, P. Maccallum, E. Minty, and A. Simpson. Parallel Programming on the Cray T3D, Version 1.1. Technical report, Edinburgh Parallel Computing Centre, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, August 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC