8 citations found. Retrieving documents...
S. R. Goldschmidt and H. Davis. Tango introduction and tutorial. Technical Report CSL-TR-90-410, Stanford University, 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Comparative Evaluation of Latency Reducing and.. - Gupta, Hennessy.. (1991)   (103 citations)  (Correct)

....per processor, 16 processors require the application to support 64 concurrent processes. However, some of our applications do not scale well to that many processes given the small data sets that we can simulate. The architecture simulator is tightly coupled to the Tango reference generator [9] to assure a correct interleaving of accesses. For example, a process doing a read operation is blocked until that read completes, where the latency of the read is determined by the architecture simulator. Unless specific directives are given by an application, main memory is distributed uniformly ....

S. R. Goldschmidt and H. Davis. Tango introduction and tutorial. Technical Report CSL-TR-90-410, Stanford University, 1990.


Load Balancing and Data Locality in Adaptive.. - Singh, Holt.. (1995)   (30 citations)  (Correct)

.... it is inflexible in configuration and unable to track program behavior statistics of interest to us (inherent communication in the program, for example) To overcome these limitations, we also perform experiments on an event driven simulator of an idealized shared address space multiprocessor [13]. The simulated multiprocessor looks exactly like that described in Figure 10, with the simple, three level, nonuniform memory hierarchy. The timing of a simulated processor s instruction set is designed to match that of the MIPS R3000 CPU and R3010 floating point unit. Access latencies in the ....

Stephen R. Goldschmidt and Helen Davis. Tango introduction and tutorial. Technical Report CSL-TR-90410, Stanford University, 1990. 35


Tolerating Latency Through Software-Controlled Prefetching in.. - Mowry, Gupta (1991)   (232 citations)  (Correct)

....between processors in the same cluster which would significantly complicate the analysis of prefetching behavior. The latency parameters for the simulated architecture are derived from the DASH prototype (see Figure 3) The architecture simulator is tightly coupled to the Tango reference generator [8] to assure a correct interleaving of accesses. For example, a process doing a read operation is blocked until that read completes, where the latency of the read is determined by the architecture simulator. Operating system references are not modeled. Shared memory used by a program is evenly ....

Stephen R. Goldschmidt and Helen Davis. Tango introduction and tutorial. Technical Report CSL-TR-90410, Stanford University, 1990.


A Parallel Adaptive Fast Multipole Method - Singh, Holt, Hennessy, Gupta (1993)   (19 citations)  (Correct)

....and a main memory . The difference is that the main memory on the node is itself converted into a very large, hardware managed cache, by adding tags to cache line sized blocks in main memory. This large cache, which is the only main memory in the machine, is called the attraction memory (AM) [6]. The location of a data item in the machine is thus decoupled from its physical address, and the data item is automatically moved (or replicated) by hardware to the attraction memory of a processor that references it. A processing node on the KSR 1 consists of a single 20MHz custom built ....

....use have the limitations of having only a certain number of processors, and of distorting inherent program behavior owing to their specific memory system configurations. To overcome these limitations we also run our programs on a simulator of an idealized shared memory multiprocessor architecture [6]. The timing of a simulated processor s instruction set is designed to match that of the MIPS R3000 CPU and R3010 floating point unit. Every processor forms a cluster with its own cache and equal fraction of the machine s physical memory. A simple three level non uniform memory hierarchy is ....

Stephen R. Goldschmidt and Helen Davis. Tango introduction and tutorial. Technical Report CSL-TR-90-410, Stanford University, 1990.


The DASH Prototype: Logic Overhead and Performance - Lenoski, Laudon, Joe.. (1993)   (92 citations)  (Correct)

....to that given in Table 2. 5.0 Performance Monitor One of the prime motivations for building the DASH prototype was to study real applications with large data sets running on a large ensemble of processors. The alternative, simulation, results in a real time slow down in the range of 1,000 100,000 [6]. Because many of the applications have an execution time measured in tens of minutes on actual hardware, simulation is prohibitively expensive. To enable more insight into these applications when running on the prototype, we have dedicated over 20 of the DC board to a hardware performance ....

S. Goldschmidt and H. Davis, Tango Introduction and Tutorial. Technical Report CSL-TR-90-410, Stanford University, 1990.


Performance Evaluation of Memory Consistency Models.. - Gharachorloo, Gupta.. (1991)   (87 citations)  (Correct)

....since it minimizes shortcomings due to the implementation. For our studies, an event driven simulator is used to simulate the major components of the DASH architecture at a behavioral level. The simulations are based on a 16 processor configuration. The simulator is tightly coupled to the Tango [7] reference generator to assure a correct interleaving of accesses. For example, a process doing a read operation is blocked until that read completes, where the latency of the read is determined by the architecture simulator. Main memory is distributed across all nodes and allocated using a ....

Stephen R. Goldschmidt and Helen Davis. Tango introduction and tutorial. Technical Report CSL-TR-90-410, Stanford University, 1990.


Architectural and Implementation Tradeoffs in the Design of.. - James Laudon (1992)   (10 citations)  (Correct)

....tag check. The specific instruction set used is the MIPS I instruction set [10] This instruction set is representative of load store RISC architectures. 2. 2 Simulation Environment The simulation environment consists of a detailed architecture simulator, which is tightly coupled to the Tango [6] reference generator. Tango executes multiple processes on a uniprocessor, interleaving the processes to simulate a multiprocessor. Tango augments the parallel program at the assembly code level to add this interleaving. Our memory system and pipeline simulator is linked with the augmented ....

Stephen R. Goldschmidt and Helen Davis. Tango introduction and tutorial. Technical Report CSL-TR-90410, Stanford University, 1990.


Tango Lite: A Multiprocessor Simulation Environment -.. - Herrod (1993)   (16 citations)  (Correct)

.... Lock declaration to protect access to the Index globalMemory; int numProcs; void main(argc, argv) int argc; char argv[ int i; double number; if (argc 2) printf( Usage: SquareRoots numberOfProcessors n ) exit( 1) Get the number of processors from the command line sscanf(argv[1], d , numProcs) if (numProcs 1 numProcs MAX PROCS) printf( Bad number of Processors n ) exit( 1) Include this macro before any others are executed MAIN INITENV 19 of 21 Allocate global, shared memory globalMemory= structglobalMemoryType )G MALLOC(sizeof(struct ....

Stephen R. Goldschmidt and Helen Davis. Tango introduction and tutorial. Technical Report CSL-TR-90-410, Stanford University Computer Systems Laboratory, January 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC