7 citations found. Retrieving documents...
R. Arpaci, D. Culler, A. Krishnamurthy, S. Steinberg, K Yelick, "Empirical evaluation of the CRAY-T3D: A compiler perspective," in Proceedings of the International Symposium on Computer Architecture, pp. 320--331, June 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
High-Level Prototyping for the HTMT Petaflop Machine - Yerosheva (2001)   (Correct)

.... different classes of architectures (shared memory, distributed, and the combinations of both) Examples of such classical execution models include: a centralized shared memory architecture (UMA) with multiple processors that share a single centralized (or virtually centralized) memory (CrayT3D 1[80], CrayT3E[74] a distributed shared memory architecture with one processor per node [60] a distributed shared memory architecture among the nodes, possibly with multiple processors at each node; a shared memory on top of hardware for message passing architecture; a distributed shared memory ....

.... Out streams; Send data to the Server . Figure 8. 6 Pseudo code for Client A ik B kj A11 A15 A12 A13 A14 A21 A22 A24 A25 A55 A23 B11 B15 B12 B13 B14 B21 B22 B24 B25 B55 B23 C11 C15 C12 C13 C14 C21 C22 C24 C25 C55 C23 = C11 = A11B11 A12B21 A13B31 A14B41 A15B51 A[80,80] B C Figure 8.8 Partitioning data in DPIMs 62 Object Server Proc(port num) Create server thread with name Server ; Create a Server Socket to listen for connections on: listen socket ; Create the ThreadGroup for connections; Initialize a vector to store connections in; Create ....

R. Arpaci, D. Culler, A. Krishnamurthy, S. Steinberg, K Yelick, "Empirical evaluation of the CRAY-T3D: A compiler perspective," in Proceedings of the International Symposium on Computer Architecture, pp. 320--331, June 1995.


Operating System Support for Database Management Systems.. - Fellig, Tikhonova (2000)   (Correct)

....it does precisely this for most applications. Isn t a database management system just another application This is the rst question that this paper will attempt to answer, putting NT under scrutiny. This question will be investigated by supplying simple stimuli to NT, using a gray box approach [1], and capturing its responses to establish empirically what its bu er management policy is. Dabak, Phadke, and Borate [2] point out that the policy used by NT for page replacement when under available physical memory pressure is FIFO. Solomon [8] speci es that in uniprocessor x86 systems, a ....

R. Arpaci, D. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick. "Empirical Evaluation of the CRAY-T3D: A Compiler Perspective", ISCA, 1995.


Mechanisms for Efficient, Protected Messaging - Lee   (Correct)

....from the past and present are shown in Figure 2 1. The round trip cost, further explained in Table 2.1, is roughly based on a two way null remote procedure call (RPC) or a ping pong operation. It is obtained by doubling the reported value when only the one way cost is provided in the literature [19, 20, 21, 15, 22, 23, 24, 25, 26, 27]. Since an actual implementation for [18] does not exist, the round trip cost is extrapolated from the specified overhead of assembling, sending and receiving a remote read message 1 . On the horizontal axis, Figure 2 1 also shows that the systems employ a variety of mechanisms for robustness. ....

....3. 4 S swap[9] CMNF library J Machine 12.5 Mhz MDP 43 cyc[10] Streaming Injection 1024 max round trip null RPC CS 2 40 Mhz 20 S[39] Channel SPARC 24.6 S[23] DMA w active message Hardware Table Lookup 174 S[21] PARMACS macros ping pong 206 S[20] mpsc library mesg exhange T3D 150 Mhz 21064 600nS[26] Shared Memory 2048 max remote read 2:76 S [40] Fast Messages F I Specific 16 byte Fetch and Increment Hardware Support 120 S[26] Interrupt Driven User Level Message Message Handler T 88100MP dispatch 20 cyc microthreading remote load [18] NOW HP9000 735 50 S [24] LAN based 125Mhz PA RISC ....

[Article contains additional citation context not shown here]

Remzi H. Arpaci, David E. Culler, Arvind Krishnamurthy, Steve G. Steinberg, Katherine Yelick, "Empirical Evaluation of the CRAY-T3D: A Compiler Perspective ", ISCA 1995, pp. 142--153.


Towards Modeling the Performance of a Fast Connected.. - Steven Lumetta   (9 citations)  (Correct)

....local and global objects helps us to apply the cost model for optimization. Using Split C also gives our implementation portability. Versions of Split C exist on the Cray T3D, the IBM SP 1 and SP 2, the Intel Paragon, the Thinking Machines Corp. CM 5, the Meiko CS 2, and networks of workstations [2, 20, 23, 29]. Although our algorithm accepts arbitrary graphs as input, obtaining optimal performance requires a reasonable partitioning of the graph across processors to enhance locality and load balancing. Partitioning techniques rely on the ability to determine properties of the graph structure. ....

....DEC Alpha 21064, a 64 bit, dualissue, second generation RISC processor, clocked at 150 MHz, with 8 kB split instruction and data caches. A Split C global read involves a short instruction sequence to gain addressability to the remote node and to load from remote memory, taking approximately 1 s[2]. The CS 2 is based on a 90 MHz, dual issue Sparc microprocessor with a large cache. Communication is supported by a dedicated ELAN processor within the network interface, which can access remote memory via word by word or DMA network transactions. The Split C global read issues a command to the ....

R. H. Arpaci, et al.,"Empirical Evaluation of the Cray T3D: a compiler perspective," to appear in Proceedings of the International Symposium on Computer Architecture, 1995.


Synchronization and Communication in the T3E Multiprocessor - Scott (1996)   (52 citations)  (Correct)

....to encounter an application in which barrier time is a large fraction of total run time, and the dedicated barrier network is expensive. In addition, we have found the management of the physical barrier resource to be burdensome. The T3D has several weaknesses, many of which have been reported in [3]. The largest of these is the relatively low single node performance. This is caused by a fixed clock (150 MHz) which has not tracked improvements in the 21064 processor, and by lack of a board level cache (each processor uses only its 8KB on chip data cache) This last feature, however, does ....

Arpaci, R. H., D. E. Culler, A. Krishnamurthy, S. G. Steinberg and K. Yelick, "Empirical Evaluation of the CRAY-T3D: A Compiler Perspective.", Proc. 22nd International Symposium on Computer Architecture, pp 320-331, June 1995.


Towards Modeling the Performance of a Fast Connected.. - Lumetta.. (1996)   (9 citations)  Self-citation (Culler Krishnamurthy)   (Correct)

....let us focus on the process of optimization while hiding specific hardware details. Split C also gives our implementation portability, with versions running on the Cray T3D, the IBM SP 1 and SP 2, the Intel Paragon, the Thinking Machines CM 5, the Meiko CS 2, and networks of workstations [2, 24, 27, 34]. 2.3 Parallel platforms We consider three large scale parallel machines: the Cray T3D, the Meiko CS 2, and the Thinking Machines CM 5. These machines offer a range of computational and communication performance against which to evaluate the algorithm implementation. In each case, the Split C ....

....DEC Alpha 21064, a 64 bit, dual issue, second generation RISC processor, clocked at 150 MHz, with 8 kB split instruction and data caches. A Split C global read involves a short instruction sequence to gain addressability to the remote node and to load from remote memory, taking approximately 1 s [2]. The CS 2 uses a 90 MHz, dual issue Sparc microprocessor with a large cache. A dedicated ELAN processor within the network interface supports communication, and is capable of accessing remote memory via word by word or DMA network transactions. The Split C global read issues a command to the ....

R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. Steinberg, K. Yelick, "Empirical Evaluation of the Cray T3D: A Compiler Perspective," Proceedings of the International Symposium on Computer Architecture, 1995.


Connected Components on Distributed Memory Machines - Krishnamurthy, Lumetta.. (1994)   (12 citations)  Self-citation (Culler Krishnamurthy Yelick)   (Correct)

....all of which are stars and are marked with unique values, apply a modified Shiloach Vishkin algorithm. Iterate over the following steps until done: 4 Implementations of the language exist on a variety of machines including the IBM SP 2, the Intel Paragon, the Cray T3D, and the Meiko CS 2 [1, 8, 10, 11]. CONNECTED COMPONENTS ON DISTRIBUTED MEMORY MACHINES 7 edge remote collapsed edge Processor 1 Processor 2 local nodes representative Figure 5. Collapsing remote edges. By collapsing remote edges before entering the global phase, we reduce the amount of work required for each iteration of that ....

R. H. Arpaci, D. E. Culler, A. Krishnamurthy, S. Steinberg, K. Yelick, "Empirical Evaluation of the Cray T3D: A Compiler Perspective," Proceedings of the International Symposium on Computer Architecture, 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC