15 citations found. Retrieving documents...
W.-D. Weber, S. Gold, P. Helland, T. Shimizu, T. Wicki, and W. Wilcke. The Mercury Interconnect Architecture: A Costeffective Infrastructure for High-Performance Servers. In Proc. International Symposium on Computer Architecture, pages 98--107. ACM, 1997.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Traffic Scheduling Solutions with QoS Support for an.. - Caminero, Carrion.. (2003)   (Correct)

....for providing low latency to best effort traffic [9] 10] These networks are not designed to permit concurrent guarantees of communication performance to multiple applications. For example, nowadays available commercial SAN LAN fabrics such as IBM SP2 [11] Myricom Myrinet [10] HAL Mercury [12], or Tandem ServerNet [13] are not designed to efficiently support real time traffic. However, contemporary router technology should provide the capacity to fit the high bandwidth and timing requirements demanded by current applications. Developing QoS aware interconnects for high speed ....

W.D. Weber et al., "The Mercury interconnect architecture: A cost-effective infraestructure for high-performance servers," in 24th International Symposium on Computer Architecture, 1997.


Reliable Verification Using Symbolic Simulation with Scalar.. - Wilson, Dill (2000)   (1 citation)  (Correct)

....for the circuit we used. Test lengths ranged from around 30 to 100 time frames. The simulator plus design required 36MB of main memory and this value did not grow significantly as a test ran. 5. 1 Test Design Description The design we are using is an industrial bus bridge chip called the MCU [9]. We are modeling the bus interface section of the chip which has approximately 140K gates and 2,402 state bits. This chip was taped out, went through bringup and subsequent followon versions were designed. In the process of verifying these followon designs, bugs were found in the original design ....

W.-D. Weber et al. The mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proc. of the 24th Annual Intl. Symp. on Computer Architecture (ISCA97), 1997.


Fine-Grain Distributed Shared Memory on Clusters of Workstations - Schoinas (1997)   (3 citations)  (Correct)

....N T buffers available in every node to accommodate the maximum buffer requirements. Typically, designers of hardware shared memory coherence protocols know the buffer requirements for their protocol and therefore, they make sure that the network has enough buffers available to avoid 63 deadlocks [WGH 97] However, such strategy is not appropriate in a general platform like Blizzard, which it is targeted to allow the development of application specific protocols. Finally, there are protocols that create an arbitrary number of concurrent activation graphs with large buffer requirements. Such ....

Wolf-Dietrich Weber, Stephen Gold, Pat Helland, Takeshi Shimizu, Thomas Wicki, and Winfried Wilcke. The Mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proceedings of the 24th Annual International Symposium on Computer Architecture, May 1997.


Reliable Verification Using Symbolic Simulation with Scalar.. - Chris Wilson Computer (2000)   (1 citation)  (Correct)

....frames. The simulator plus design required 36MB of main memory and this value did not grow significantly as a test ran. 3 On a Sun Ultra 5 workstation with 128MB of memory running SunOS 5.6. 5. 1 Test Design Description The design we are using is an industrial bus bridge chip called the MCU [9]. We are modeling the bus interface section of the chip which has approximately 140K gates and 2,402 state bits. This chip was taped out, went through bringup and subsequent follow on versions were designed. In the process of verifying these follow on designs, bugs were found in the original ....

W.-D. Weber et al. The mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proc. of the 24th Annual Intl. Symp. on Computer Architecture (ISCA97), 1997.


Symbolic Simulation with Approximate Values - Wilson, Dill, Bryant (2000)   (3 citations)  (Correct)

....was built on top of the CMU BDD package [12] This section reports on the results of running some typical test cases on a large representative system level circuit. The test design we use for these experiments is an industrial bus to network bridge for a distributed shared memory multiprocessor [16]. The properties we are verifying use only the bus portion of the design which consists of approximately 140K gates and approximately 2,402 state bits. There are two different sets of tests. In the first test, we are looking for a particular hard to find bug, and in the second test, we are trying ....

W.-D. Weber et al. The mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proc. of the 24th Annual Intl. Symp. on Computer Architecture (ISCA97), 1997.


HIPIQS: A High-Performance Switch Architecture using Input.. - Sivaram, Stunkel, Panda (1998)   (3 citations)  (Correct)

..... 18 6 Conclusion 19 1 Introduction Switch based interconnects are used for a large number of applications. Such applications vary from low latency interconnects for parallel systems based on networks of workstations [3, 10, 9, 24, 27, 8, 30, 16, 22, 1] to local area and wide area networks [31, 17, 14, 6] Current generation switches have typically been built keeping particular application domains in mind. For example, parallel system interconnects place heavy emphasis on reducing latency and on accommodating a range of packet sizes. On the ....

....simple input FIFO queuing: k 2 f . The DAMQ approach adds complexity of the required central arbitration. The cost of this arbitration varies with its optimality and with the time constraints in which it operates. It must be noted that several contemporary switches such as Arctic [4] HAL s PRC [30], and IBM s Prizma [6] already incur crossbar complexity similar to HIPIQS (approximately k 3 f ) Furthermore, the HIPIQS architecture has significant potential as a basis for multicast. Finally, as our close examination of the performance of HIPIQS in the next section shows, HIPIQS provides ....

W.-D. Weber, S. Gold, P. Helland, T. Shimizu, T. Wicki, and W. Wilcke. The Mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In Proc. 24th Ann. Int. Symp. on Computer Architecture, pages 98--107, June 1997.


End-To-End Fault Containment In Scalable Shared-Memory.. - Teodosiu (2000)   (1 citation)  (Correct)

....chosen to let the cache coherence protocol and the system software deal with packet loss and truncation rather than hiding those effects. However, if one can provide enough support in the node controller hardware, an implementation of end to end reliability could be quite effective, as shown by [Weber97]. Interconnect failures may also lead to the loss of connectivity between functioning nodes. This condition can be remedied by reprogramming the routing tables so that packets are routed around the failed areas. This reprogramming is effected by the recovery algorithm. Loss of connectivity is ....

....has been proposed in [Ghosh98] This approach only allows sharing of user pages, and does not contain all the necessary support for recovering from hardware failures. The HAL multiprocessor provides an efficient hardware implementation of an end to end reliable protocol for coherence traffic [Weber97]. Reliable delivery is achieved by means of a Reliable Packet Mover (RPM) layer implemented in the node controllers. In contrast to the CrayLink interconnect [Galles96] Galles97] the Mercury interconnect used in the HAL system is tuned towards fast packet delivery, and only provide best effort ....

W. Weber et al. "The Mercury Interconnect Architecture: A Cost-effective Infrastructure for High-performance Servers." In Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 98-107, June 1997.


Exploring the Value of Supporting Multiple DSM Protocols in.. - Kuramkote, Carter (1999)   (Correct)

....network so that the latency of a remote miss is close to that of a local memory access. The DSM controller in most commercial hardware DSM systems supports a single architecture and protocol, most often cache coherent NUMA (CC NUMA) with a sequentially consistent write invalidate protocol [16, 19, 29]. To improve performance, designers of such systems employ fast and expensive networks [16] thereby dropping the latency of a remote miss to close to that of a local miss, or add memory to the DSM controller to cache remote data [19] thereby turning remote misses into local misses. Because these ....

W. Weber, S. Gold, P. Helland, T. Shimizu, T. Wicki, and W. Wilcke. The mercury interconnect architecture: A cost-effective infrastructure for high-performance servers. In ISCA97, June 1997.


Selective, Accurate, and Timely Self-Invalidation Using.. - Lai, Falsafi (2000)   (2 citations)  (Correct)

....sharing the block. A coherence protocol coordinates sharing of memory blocks among the processors. In this paper, we focus on simple full map directory write invalidate hardware coherence protocols such as those implemented in Sun WildFire [4] SGI Origin 2000 [9] and Fujitsu Synfinity [18]. The ideas in this paper, however, are equally applicable to software protocol implementations. In a write invalidate protocol, a block can be in one of three protocol states: 1) in the Idle state, the block resides at home and is accessed only by the home processor(s) 2) in the Shared state, ....

....predictors [7] In a highly integrated DSM design such as DSMs based on Alpha 21364 [17] all the DSM hardware is on chip enabling an easy integration of an LTP. In conventional board level DSM designs (e.g. SGI Origin [9] and DSM clusters (e.g. Sun WildFire [4] and Fujitsu Synfinity [18]) in which the memory controller, the DSM hardware, and possibly a network cache for remote data [3] are implemented offchip, such a design requires that invalidation messages always be exposed to the processor. Moreover, the DSM hardware must provide an interface for the processor to perform a ....

W.-D. Weber, S. Gold, P. Helland, T. S. T. Wicki, and W. Wilcke. The Mercury interconnect architecture: A costeffective infrastructure for high-performance servers. In Proceedings of the 24th Annual International Symposium on Computer Architecture, May 1997.


Design and Performance of the Software-controlled COMA - Moga (1998)   (Correct)

....by store buffers. 6.1 SC COMA on SMP Clusters Symmetric MultiProcessors (SMP) have emerged in the last few years as one of the most attractive basic blocks for larger scale shared memory machines. From Pentium quads used in hardware systems, such as the Sequent NUMA Q [51] or the HAL S 1 [80], to dual SPARC modules used in hybrid DSM systems, such as Typhoon 0 [63] 72] and to four processor AlphaServers used in all software DSM systems, such as Shasta [69] all SMP nodes consist of two or four processors connected by a bus or a crossbar to a uniform access memory. In light of this ....

W.-D. Weber et al. The Mercury Interconnect Architecture: A Cost-effective Infrastructure for High-performance Servers. In Proc. of the 24th Annual Int'l Symposium on Computer Architecture. June 1997.


The Effectiveness of SRAM Network Caches in Clustered DSMs - Moga, Dubois (1998)   (17 citations)  (Correct)

....practically all commercially available (KSR 1 [9] NUMA Q [16] Exemplar [1] prototype (DASH [15] S3.mp [18] or proposed (DDM [5] COMA F [6] Simple COMA [21] R NUMA [3] DSM architectures have provisions for remote data caching. Notable exceptions are the SGI Origin [14] and the HAL S1 [24] which rely exclusively on page migration and replication. The design of a cache for remote data involves several key decisions. A fundamental choice is whether to allocate separate resources for caching local and remote data or to cache both in the same memory, as done in COMA [5] 6] A study ....

....takes five bus clocks. Snooping results are reported in the address phase. Main memory is two way interleaved and has a width of 16 bytes. The memory access time is 60ns and a whole cycle takes 120ns. The interconnect is a fast crossbar with a latency of 50ns and a bandwidth of 1. 6GB s per link [24]. The inter cluster adapter is a very aggressive design supporting any number of outstanding requests. Requests are buffered only when the block already has a pending request. The coherence protocol is a write invalidate with streaming of dirty blocks through the home node (misses on remote dirty ....

W.-D. Weber et al. The Mercury Interconnect Architecture: A Cost-effective Infrastructure for High-performance Servers. In Proceedings of the 24th Annual International Symposium on Computer Architecture. June 1997.


Formal Verification of the HAL S1 System Cache Coherence.. - Hu, Fujita, Wilson (1997)   (Correct)

....bandwidth (up to 9.6GBytes s per Router) For a system with 4 nodes (16 processors) cache coherent memory access time to a remote node is only about four to five times slower than access to memory local to the SMP node. Details on the S1 System and Mercury Interconnect are available elsewhere [16]; Figure 1 sketches the architecture of the system. In this paper, we are only concerned with the cache coherence protocol of the S1 System. Within each SMP node, the normal bus based snooping protocol keeps the processor caches coherent. We will assume the snooping protocol works correctly. A ....

W.-D. Weber, S. Gold, P. Helland, T. Shimizu, T. Wicki, and W. Wilcke. The Mercury interconnect architecture: A costeffective infrastructure for high-performance servers. In International Symposium on Computer Architecture, 1997.


Investigating QoS Support for Traffic Mixes with the MediaWorm.. - Ki Hwan Yum (2000)   (4 citations)  (Correct)

No context found.

W.-D. Weber, S. Gold, P. Helland, T. Shimizu, T. Wicki, and W. Wilcke. The Mercury Interconnect Architecture: A Costeffective Infrastructure for High-Performance Servers. In Proc. International Symposium on Computer Architecture, pages 98--107. ACM, 1997.


MediaWorm: A QoS Capable Router Architecture for Clusters - Yum, Kim, Das, Vaidya (2002)   (Correct)

No context found.

W.-D. Weber, S. Gold, P. Helland, T. Shimizu, T. Wicki, and W. Wilcke, "The Mercury Interconnect Architecture: A Cost-effective Infrastructure for High-Performance Servers," in Proceedings of the International Symposium on Computer Architecture (ISCA), 1997, pp. 98--107.


Cache-Coherent Distributed Shared Memory: Perspectives on.. - John Hennessy Mark (1999)   (3 citations)  (Correct)

No context found.

W.D. Weber et al. The Mercury Interconnect Architecture: A Cost-effective Infrastructure for High-performance Servers. In Proceedings of the 24th International Symposium on Computer Architecture, pages 98-107, Denver, CO, June, 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC