21 citations found. Retrieving documents...
E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Efficient Implementation of Cache Coherence in Scalable.. - Mannava, Kumar, Bhuyan   (Correct)

....message transmission latency to some extent. In order to study the performance we have developed an exhaustive simulation model for the proposed architecture, including the cache coherence protocol and the network. This simulator is an extension of Proteus, which is an execution driven simulator [11]. Five different application programs are used to evaluate the performance of the schemes under study. These are MP3D and WATER from SPLASH benchmarks [12] and matrix multiplication, Floyd Warshall s all pair shortest path algorithm and FFT developed by us. The results indicate that the proposed ....

....In the proposed scheme, we reduce both the number of messages sent out and the number of links used. 4 Simulation Methodology 4.1 Application Programs We have run extensive execution driven simulations for various application programs. For this purpose an existing simulator, called Proteus [11], has been extended to include our network and cache coherence models. The application programs used are MP3D and Water from the SPLASH benchmark suite. FFT, matrix multiplication, and FloydWarshall were developed in house. MP3D is a three dimensional particle simulator used in rarefied fluid ....

E. A. Brewer and C. N. Dellarocas, Proteus: User Documentation , Version 0.5, MIT, Cambridge, MA, 1992.


A New Limited Directory Cache Coherence Scheme for Shared .. - Mannava, Kumar, Bhuyan (1995)   (Correct)

....message transmission latency to some extent. In order to study the performance we have developed an exhaustive simulation model for the proposed architecture, including the cache coherence protocol and the network. This simulator is an extension of proteus, which is an execution driven simulator [10]. Four different application programs are used to evaluate the performance of the schemes under study. These are mp3d, matrix multiplication, Floyd Warshall s allpair shortest path algorithm, and FFT. The results indicate that the proposed scheme performs close to or even better than the full map ....

....using the above scheme we are able to avoid deadlocks by using just a maximum of 3 buffers for each link. 4 Simulation Methodology 4.1 Application Programs We have run extensive execution driven simulations for various application programs. For this purpose an existing simulator, called proteus [10], has been extended to include our network and cache coherence models. The application programs used are MP3D from the SPLASH benchmark suite. FFT, matrix multiplication, and Floyd Warshall were developed in house. MP3D is a three dimensional particle simulator used in rarefied fluid flow ....

E. A. Brewer and C. N. Dellarocas, Proteus: User Documentation , Version 0.5, MIT, Cambridge, MA, 1992.


QuickStep: A System for Performance Monitoring and Debugging.. - Mitra (1995)   (Correct)

....The data packets are received by the host and stored in memory until the end of the run. At the end of the run, they are output into a raw data file in a simple column format. A sample data file is given in Figure 3.9. The raw file is then processed to generate a binary file that is in the Proteus [5] trace file format, that can be viewed with a graphical interface supported by the Proteus Stats program. Chapter 2 shows examples of graphs obtained as outputs of the QuickStep system. The column headings from the raw data file are used to generate headings and menus for the graphs. The graphs ....

Eric A. Brewer and Chrysanthos N. Dellarocas. Proteus User Documentation.


QuickStep: A System for Performance Monitoring and Debugging.. - Mitra (1995)   (Correct)

....The data packets are received by the host and stored in memory until the end of the run. At the end of the run, they are output into a raw data file in a simple column format. A sample data file is given in Figure 3.9. The raw file is then processed to generate a binary file that is in the Proteus [5] trace file format, that can be viewed with a graphical interface supported by the Proteus Stats program. Chapter 2 shows examples of graphs obtained as outputs of the QuickStep system. The column headings from the raw data file are used to generate headings and menus for the graphs. The graphs ....

Eric A. Brewer and Chrysanthos N. Dellarocas. Proteus User Documentation.


Combining Funnels: A Dynamic Approach To Software Combining - Shavit, Zemach (2000)   (1 citation)  (Correct)

.... them against the most efficient software implementations of fetch and add and stacks known to date, combining trees and elimination trees, on a simulated 256 processor shared memory multiprocessor similar to the MIT Alewife machine [2] Our simulation uses the well 4 established Proteus simulator [4, 5]. Since the type of futuristic applications that will benefit from such high levels of concurrency are currently not available, we use a standard collection of synthetic benchmarks that mimic their possible access patterns. Based on our empirical results, we believe our linearizable fetch and add ....

....of reference for performance in low load situations. Our tests were performed on a simulated distributed shared memory multiprocessor similar to the MIT Alewife machine [2] of Agarwal et al. The simulation was conducted using the Proteus 4 multiprocessor simulator developed by Brewer et al. [4, 5]. The simulated Alewife machine is a 256 processor ccNUMA multiprocessor with realistic memory bandwidth and latency. We ran Proteus with accurate network simulation, which traces every packet and models contention and communication hot spots. Though this is not a real 256 node machine, we note ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Building FIFO and Priority-Queuing Spin Locks from Atomic Swap - Craig (1993)   (6 citations)  (Correct)

....and the lock holder granting its request. That protocol didn t involve a swap, but would not be reliable under some reasonable assumptions about read write ordering in a NUMA machine. We have also implemented a nesting version of our coherent FIFO lock on the Proteus multiprocessor simulator [BD91]. One benefit of working on the simulator is that we can monitor the functional performance of the lock scheme. Our prototype includes a detector 17 for violations of mutual exclusion that has been triggered only when we purposely altered the lock acquisition routine to test the detector. We have ....

Eric A. Brewer and Chrysthanos N. Dellarocas. PROTEUS User Documentation, Version 0.2. MIT, Cambridge, Massachusetts, 1991.


Towards a Practical Snapshot Algorithm - Riany, Shavit, Touitou (1999)   (1 citation)  (Correct)

.... The second contribution of this paper, in Section 6, is a comparison of the performance of several single and multi scanner algorithm snapshot techniques, including our own, on a simulated distributed shared memory multiprocessor using the well accepted Proteus Parallel Hardware Simulator [16, 17] of Brewer et al. Our choice of algorithms for simulation was driven not only by their asymptotic complexity, but also by the feasibility of implementing them on multiprocessor machines. The first two compared methods are an algorithm that blocks updates during a scan and a lock free algorithm ....

Brewer, E. A. and Dellarocas, C. N. Proteus User Documentation, Version 4.0, march 1992.


Scalable Concurrent Priority Queue Algorithms - Shavit, Zemach (1999)   (Correct)

....the final value of the counter. 4 Performance Our tests were performed on a simulated 256 processor distributed shared memory multiprocessor similar to the MIT Alewife machine [1] of Agarwal et al. The simulator used is Proteus 5 , a multiprocessor simulator developed by Brewer et al. [9, 10]. In our benchmarks processors alternate between performing some local work and accessing the priority queue. When accessing the queue processors choose whether to insert a random value or apply a delete min operation based on the result of an unbiased coin flip. In all the experiments we show ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Combining Funnels: A new twist on an old tale. . . - Shavit, Zemach (1998)   (Correct)

....in order to have a point of reference for performance in low load situations. Our tests were performed on a simulated distributed shared memory multiprocessor similar to the MIT Alewife machine [2] of Agarwal et al. using Proteus 2 , a multiprocessor simulator developed by Brewer et al. [4, 5]. Proteus simulates parallel code by multiplexing several parallel threads on a single CPU. Each thread runs on its own virtual CPU with accompanying local memory, cache and communications hardware, keeping track of how much time is spent using each component. In order to facilitate fast ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Combining Funnels: A new twist on an old tale. . . - Shavit, Zemach (1998)   (Correct)

....in order to have a point of reference for performance in low load situations. Our tests were performed on a simulated distributed shared memory multiprocessor similar to the MIT Alewife machine [2] of Agarwal et al. using Proteus 2 , a multiprocessor simulator developed by Brewer et al. [4, 5]. Proteus simulates parallel code by multiplexing several parallel threads on a single CPU. Each thread runs on its own virtual CPU with accompanying local memory, cache and communications hardware, keeping track of how much time is spent using each component. In order to facilitate fast ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Scalable Concurrent Priority Queue Algorithms - Nir Shavit (1999)   (Correct)

....determine the final value of the counter. 4 Performance Our tests were performed on a simulated 256 processor distributed shared memory multiprocessor similar to the MIT Alewife machine [1] of Agarwal et al. The simulator used is Proteus 5 , a multiprocessor simulator developed by Brewer et al. [9, 10]. In our benchmarks processors alternate between performing some local work and accessing the priority queue. When accessing the queue processors choose whether to insert a random value or apply a delete min operation based on the result of an unbiased coin flip. In all the experiments we show ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Distributed Shared Memory: Recoverable and Non-recoverable - Limited Update   (Correct)

....faults. We used trace driven simulation method for the experiments. As a preliminary test, we generated synthetic trace data by using a trace generator. The trace generator can produce trace data according to the memory access behavior which we can define as input. We also modified the Proteus [3] to produce trace data for shared memory operations, acquire, release, read, and write. The trace data, produced by our synthetic trace generator or modified Proteus, are used as input for our simulator which computed the cost (the number of page faults, the number of messages, and the amount of ....

E. Brewer and C. Dellarocas, Proteus User Documentation, 1992.


Diffracting Trees - Shavit, Zemach (1995)   (24 citations)  (Correct)

....of parallelism and fault tolerance of counting networks with the beneficial utilization of collisions of a combining tree. We compared the performance of diffracting trees to the above methods in simulated shared memory and message passing environments. The Proteus Parallel Hardware Simulator [10, 9] of Brewer, Dellarocas, Colbrook and Weihl was used to evaluate performance in a shared memory architecture similar to the Alewife machine of Agarwal et al. 3] Netsim, part of the Rice Parallel Processing Testbed [12, 26] developed by Covington, Dwarkadas, Jump, Sinclair, and Madala was used for ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Diffracting Trees - Shavit (1994)   (24 citations)  (Correct)

....of parallelism and fault tolerance of counting networks with the beneficial utilization of collisions of a combining tree. We compared the performance of diffracting trees to the above methods in simulated shared memory and message passing environments. The Proteus Parallel Hardware Simulator [10, 11] of Brewer, Dellarocas, Colbrook and Weihl was used to evaluate performance in a shared memory architecture similar to the Alewife machine of Agarwal, Chaiken, Johnson, Krantz, Kubiatowicz, Kurihara, Lim, Maa, and Nussbaumet [3] Netsim, part of the Rice Parallel Processing Testbed [15, 29] ....

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


A Steady State Analysis of Diffracting - Trees Nir Shavit   (Correct)

No context found.

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Reducing Synchronization Overhead in Parallel Simulation - Legedza (1995)   (15 citations)  (Correct)

No context found.

Eric A. Brewer and Chrysanthos N. Dellarocas. Proteus user documentation, version 0.5, December 1992.


Counting Networks are Practically Linearizable - Lynch, Shavit, Shvartsman.. (1996)   (8 citations)  (Correct)

No context found.

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


A Steady State Analysis of Diffracting Trees - Shavit, Upfal, Zemach (1997)   (10 citations)  (Correct)

No context found.

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


Reactive Diffracting Trees - Della-Libera, Shavit (1997)   (5 citations)  (Correct)

No context found.

E. A. Brewer, , and C. N. Dellarocas. Proteus user documentation, version 4.0, March 1992.


Counting Networks are Practically Linearizable - Nancy Lynch (1996)   (8 citations)  (Correct)

No context found.

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.


A Steady State Analysis of Diffracting Trees (Extended Abstract) - Shavit, al.   (Correct)

No context found.

E.A. Brewer, C.N. Dellarocas. Proteus User Documentation. MIT, 545 Technology Square, Cambridge, MA 02139, 0.5 edition, December 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC