16 citations found. Retrieving documents...
L. A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," IEEE Tr. on Computers, vol. 44, pp. 878--890, Jul. 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Multistage Ring Network: An Interconnection Network for.. - Yoo, Park, Maeng   (Correct)

....connections can make rings run at high clock speeds and can make it easy to use the raw bandwidth of the links to the fullest extent. Moreover, ring connections are technologically scalable, and the bandwidth of the ring connections will increase at the same pace as the circuit technology improves [1]. The potential of the ring connections is demonstrated by the SCI (Scalable Coherent Interface, IEEE standard 1596) 8] which shows a very high transfer rate, up to 1 Gigabyte per second per link. To accommodate a large number of processors, multiple rings need to be interconnected because a ....

L. A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," IEEE Tr. on Computers, vol. 44, pp. 878--890, Jul. 1995.


Design and Performance Evaluation of. . . - Oi (2000)   (Correct)

....as well as LAN communications [25] Holiday and Stumm investigated the performance of the hierarchical ring based multiprocessor using parametric simulations [31] The architecture assumed in their study was an extension of Hector [70] the predecessor of the NUMAchine. Barroso and Dubois [4] evaluated the performance of the slotted ring based multiprocessor. Their study was based on a hybrid approach that combined trace driven simulations and analytical models. They showed the performance of small to medium scale slotted ring multiprocessors with three different coherence protocols ....

....at which the hierarchical ring outperformed the mesh network. Since they assumed a unidirectional ring, the nearest neighbor communication pattern that some parallel applications exhibit, was not taken into account. Barroso and Dubois evaluated the performance of the slotted ring multiprocessor [4]. They investigated the effect of the design choices such as the coherence protocols (snoopy, linked list and full map directory) and the processor speed, and they compared a unidirectional ring architecture with a split transaction bus. Zhang et al. compared the performances of the hierarchical ....

[Article contains additional citation context not shown here]

L. A. Barroso and M. Dubois, "Performance Evaluation of the Slotted Ring Multiprocessor," Transactions on Computers, IEEE, Vol. 44, No. 7, 878--890, July 1995.


A Low-Power Multiprocessor Architecture for Embedded.. - Amerijckx, Legat   (1 citation)  (Correct)

....architecture, all the blocks are connected to a single bus. The ring access control is done by a token. If a block has the token, he owns the access to the ring and can transfer data. The main disadvantage of this ring is that only one block can use the ring at a time. In a slotted ring [18] [19], the bus is divided into a certain number of slots. A block has to wait for a free slot to be able to transfer data. In other words, the slots have the same role as the token in a token ring; as soon as a block detects a free slots, a transmission can occur. The main advantage of the slotted ring ....

L. A. Barroso, M. Dubois, "Performance Evaluation of the Slotted Ring Multiprocessor", in IEEE Transactions on Computers, vol.44, no. 7, pp. 878-890, July 95.


Performance Issues in Ring-Based Multiprocessor Systems - Chung, Suh, Jhang, Jhon (2000)   (Correct)

....high speed. Sequent Computer Systems introduced ring based system, STiNG[1] which supported one way transmission on the ring. They used the Scalable Coherent Interface(SCI) specification[2] Luiz A. Barroso and Michel Dubois suggested slotted ring based multiprocessor system called Express Ring[3]. The PANDA[4] was presented to enhance the performance by reducing the data miss latency. To widen ring bandwidth, the Data General AviiON[5] architecture uses two SCI[2] rings operating at 500 MB sec each to provide an aggregate bandwidth of 1 GB sec to the system. Each SCI ring uses ....

....is clockwise and the other is counterclockwise) and the packet routing are configured to circulate in opposite directions for increased performance by choosing the ring with the shortest path to the target cluster. Although researchers have shown the benefits of ring based multiprocessor systems[3,4], there has not yet been a detailed or realistic analysis of effects of each component in ringbased multiprocessor systems. Such an analysis is required for system architects to select cost effective components in ring based multiprocessor systems. 2. Experimental Methodology The following ....

L.A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor", IEEE Trans. on Computer, vol. 44, no. 7, pp. 878-890, July 1995.


A Comparative Study of Bidirectional Ring and Crossbar.. - Oi, Ranganathan (1998)   (Correct)

....study mainly showed maximum number of nodes at which the hierarchical ring outperformed the mesh network. Since they assumed a unidirectional ring, nearest neighbor communication pattern was not taken into account. Barroso and Dubois evaluated the performance of the slotted ring multiprocessor [1]. They investigated the effect of the design choices such as coherence protocols (snoopy, linked list and full map directory) and processor speed, and compared their unidirectional ring architecture with the split transaction bus. Lang et al. studied the effective bandwidth of the crossbar ....

.... 3 Performance Model The methods that have been used for performance evaluation of computer systems include analytical models ( 17] parametric simulations ( 14] trace driven simulations ( 6] and execution driven simulations ( 5] In this paper, we use the hybrid approach by Barroso and Dubois [1] by extending it to the bidirectional ring and the crossbar. Below we briefly describe the derivation of execution time using the hybrid approach. 1 Equality holds when all the packets are transmitted in contiguous slots The execution time (ET ) in clock cycles is given by ET = Inst ....

L. A. Barroso and M. Dubois, Performance Evaluation of the Slotted Ring Multiprocessor, Transactions on Computers, IEEE, Vol. 44, No. 7, 878--890, July 1995.


Performance Analysis of the Bidirectional Ring-Based.. - Oi, Ranganathan (1997)   (1 citation)  (Correct)

....is available at http: figaro.csee.usf.edu oi RESEARCH This research is supported in part by a National Science Foundation Grant No. CDA 9522265. mance of the small to medium scale slotted ring multiprocessors with a hybrid approach (combining tracedriven simulations and analytical models) [4]. KSR 1 of Kendall Square Research was a cache only memory architecture (COMA) multiprocessor with hierarchical rings [5] Most research in the past including the above are based on unidirectional rings. In a unidirectional ring, the messages have to traverse all the way through the ring even if ....

....Also note that even without preference to the group access (PG = 0) the bidirectional ring performs by 6 to 12 better than the unidirectional ring (Figure 3) 4.2 Effect of System Size Next, we consider the scalability of the ring based multiprocessor. From the related study including [4], 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 0.01 0.02 0.03 0.04 0.05 Processor Utilization Ratio Miss Rate (Miss Instr) Pg = 0 0.25 0.5 0.75 1 Figure 3: Effect of Group Access Probability 1.1 1.15 1.2 1.25 1.3 1.35 0.01 0.02 0.03 0.04 0.05 Processor Utilization Ratio Miss Rate (Miss Instr) G = 1 2 4 8 ....

L. A. Barroso and M. Dubois, "Performance Evaluation of the Slotted Ring Multiprocessor", Transactions on Computers, IEEE, Vol. 44, No. 7, 878-- 890, July 1995.


Performance Issues in the Design of Hierarchical-ring and Direct .. - Ravindran (1998)   (Correct)

....efficient [24] In this dissertation we propose buffered wormhole switching and will show in Chapter 6 that this can reduce latency and improve system throughput in both hierarchical ring and direct networks. 2.3. 4 Cell Switching An important variation of cut through switching is cell switching [5, 42, 43, 72, 92, 93]. In cell switching, packets are divided into equi sized cells that are routed independently (see Figure 2.9) In this sense it is the same as virtual cut through switching on a per cell basis. Each cell contains its own routing information: the first cell of a packet carries the full target ....

....is more effective than either wormhole or virtual cut through switching in hierarchical ring networks, especially when combined with non blocking flow control (described in Section 2. 6) 72] It should be noted that in a single ring, the cell switching is same as the slotted ring protocol [5, 43]. We also propose and evaluate in Chapter 6 a hybrid cell wormhole switching for hierarchical ring networks. 2.4 Routing Techniques Routing determines the path selected by a packet to reach its destination. Routing is critical to network performance and a large amount of research has been done on ....

L.A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," IEEE Trans. on Computers, vol.44, no.7, pp. 878-890, July 1995.


A Performance Comparison of Hierarchical Ring- and.. - Ravindran, Stumm (1997)   (7 citations)  (Correct)

....better than meshes only for small system sizes (by 10 30 ) for larger systems the performance of hierarchical rings is severely constrained due to bisection bandwidth limitations and meshes perform significantly better. Although some previous work on the performance of hierarchical ring networks [4, 13, 16, 20, 21] and on mesh networks [1, 2, 8, 12, 23] has been published, we are aware of only one study that compares the performance of both types of networks [15] That study uses analytical models to conclude that three level hierarchical systems perform somewhat better than mesh systems. The rest of the ....

L.A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," IEEE Trans. on Computers, vol.44, no.7, pp. 878-890, July 1995.


A Comparative Study of Bidirectional Ring and Crossbar.. - Oi, Ranganathan (1998)   (Correct)

....study mainly showed maximum number of nodes at which the hierarchical ring outperformed the mesh network. Since they assumed a unidirectional ring, nearest neighbor communication pattern was not taken into account. Barroso and Dubois evaluated the performance of the slotted ring multiprocessor [5]. They investigated the effect of the design choices such as coherence protocols (snoopy, linked list and full map directory) and processor speed, and compared their unidirectional ring architecture with the split transaction bus. Lang et al. studied the effective bandwidth of the crossbar ....

.... 3 Performance Model The methods that have been used for performance evaluation of computer systems include analytical models ( 8] parametric simulations ( 4] trace driven simulations ( 9] and execution driven simulations ( 10] In this paper, we use the hybrid approach by Barroso and Dubois [5] by extending it to the bidirectional ring and the crossbar. Below we briefly describe the derivation of execution time using the hybrid approach. The execution time (ET ) in clock cycles is given by ET = Inst DataAccess (1) where Inst is instruction fetches. With assumption of sequential ....

L. A. Barroso and M. Dubois, Performance Evaluation of the Slotted Ring Multiprocessor, Transactions on Computers, IEEE, Vol. 44, No. 7, 878--890, July 1995.


A Comparative Study of Bidirectional Ring and Crossbar.. - Oi, Ranganathan (1998)   (Correct)

....study mainly showed maximum number of nodes at which the hierarchical ring outperformed the mesh network. Since they assumed a unidirectional ring, nearest neighbor communication pattern was not taken into account. Barroso and Dubois evaluated the performance of the slotted ring multiprocessor [5]. They investigated the effect of the design choices such as coherence protocols (snoopy, linked list and full map directory) and processor speed, and compared their unidirectional ring architecture with the split transaction bus. Lang et al. studied the effective bandwidth of the crossbar ....

.... 3 Performance Model The methods that have been used for performance evaluation of computer systems include analytical models ( 8] parametric simulations ( 4] trace driven simulations ( 9] and execution driven simulations ( 10] In this paper, we use the hybrid approach by Barroso and Dubois [5] by extending it to the bidirectional ring and the crossbar. Below we briefly describe the derivation of execution time using the hybrid approach. The execution time (ET ) in clock cycles is given by ET = Inst DataAccess (1) where Inst is instruction fetches. We assume that all the instruction ....

L. A. Barroso and M. Dubois, Performance Evaluation of the Slotted Ring Multiprocessor, Transactions on Computers, IEEE, Vol. 44, No. 7, 878--890, July 1995.


Analysis Of Interconnection Networks For Cache Coherent.. - Laxmi Bhuyan   (Correct)

....MIN based system uses directory cache protocol[4] for maintaining cache coherency among the shared blocks. A plethora of research has been reported on the queueing network modeling of these interconnection networks[6, 10, 12] However such models for cache based multiprocessors are rather limited[1, 8, 11]. In [8] the effect of cache coherence problem was neglected, and a uniform memory distribution was assumed. A detailed analysis of the cache coherence protocol was done in [11] but the paper assumes a synthetic workload where a memory request is randomly and uniformly distributed to all the ....

....of cache coherence problem was neglected, and a uniform memory distribution was assumed. A detailed analysis of the cache coherence protocol was done in [11] but the paper assumes a synthetic workload where a memory request is randomly and uniformly distributed to all the memory modules. In [1], an approximate analysis is reported for multiprocessors with slotted ring network where execution driven simulation is used to measure the input parameters. In this paper, we develop detailed queueing network models for shared bus and MIN processors caches interconnection network memories Figure ....

L.Barroso and M.Dubois, "Performance Evaluation of the slotted ring multiprocessor", IEEE Transaction on Computers, July 1995, pp. 878-890


Effect of Message Length and Processor Speed on the.. - Oi, Ranganathan (1997)   (Correct)

....rings. Holiday and Stumm investigated the performance of the hierarchical ring based multiprocessor using parametric simulations [4] Barroso and Dubois evaluated the performance of the slotted ring multiprocessor using a hybrid approach (combining trace driven simulations and analytical models) [5] . KSR 1 of Kendall Square Research was a cache only memory architecture (COMA) multiprocessor with hierarchy rings [6] Scalable Coherent Interface (SCI) defined by IEEE P1596 standard also provides ring interconnection networks for DSM multiprocessors [7] Most research in the past including the ....

....and the KSR 1 [6] 1 . In this section, the effect of long data messages on the performance is studied. As the size of the cache block increases we also need to increase the length of the data message. As a sample case, we use PL = PG = 0:5. One way to accommodate a longer data message used in [5] is to format the entire ring into frames each of which consists of slots for a combination of header messages and a data message. However this approach could lead to a lower utilization of the ring because each slot can only accommodate a designated message type. Another approach is to divide a ....

L. A. Barroso and M. Dubois, Performance Evaluation of the Slotted Ring Multiprocessor, Transactions on Computers, IEEE, Vol. 44, No. 7, 878-- 890, July 1995.


A Comparison of Blocking and Non-Blocking Packet Switching.. - Ravindran, Stumm (1996)   (Correct)

....It should be noted that the timeout in such systems has to be chosen large, larger than the maximum round trip latency (including all possible buffering) 3.2. 1 Slotted Ring Switching Slotted ring networks are considerably different from either virtual cut through or wormhole routed networks [1], 2] 9] 16] In a slotted ring, packets are divided into equal sized cells that are routed independently. Each cell, the size of a phit (physical transfer unit) has its own routing information: the first cell of a packet carries the full target memory address, while the remaining cells of ....

L.A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," IEEE Trans. on Computers, 44(7), pp. 878-890, 1995.


A Cache Coherence Protocol for the Bidirectional Ring Based.. - Oi, Ranganathan (1999)   (Correct)

....the performance of the hierarchical ring based multiprocessor using parametric simulations [4] The architecture assumed in their study was an extension of Hector [5] which was the predecessor of NUMAchine. Barroso and Dubois evaluated the performance of the slotted ring multiprocessor [6]. Their study was based on a hybrid approach that combined trace driven simulations and analytical models. They showed the performance of small to medium scale slotted ring multiprocessors with three different coherence protocols (snooping, full map and linked list directory) for several benchmark ....

....the memory block. A problem with the full bitmap scheme is the number of bit in a directory entry increase linearly with the number of PE s. Barroso and Dubois proposed and evaluated the directory based protocol for ring multiprocessors by comparing it with the snooping and linked list protocols [6]. NUMAchine is also a multiprocessor with a directorybased protocol. We use Barroso and Dubois and NUMAchine s protocols for comparison in Section 3 and 4. 2 Proposed Coherence Protocol for Bidirectional Ring The diagram in Figure 1 shows the possible states and transitions a cache block in ....

[Article contains additional citation context not shown here]

L. A. Barroso and M. Dubois, Performance Evaluation of the Slotted Ring Multiprocessor, Transactions on Computers, IEEE, Vol. 44, No. 7, 878--890, July 1995.


Multiprocessor Emulation With RPM: Early Experience - Dubois, Gefflaut, Jeong.. (1995)   Self-citation (Dubois)   (Correct)

....snooping protocol on one or multiple buses. However, as the speed of uniprocessors keeps increasing it becomes more and more difficult to support more than a few processors on buses. As an alternative to snooping, industry and academia are exploring other, point to point interconnect such as rings [3] or crossbars to connect ultra fast processors in a small number (1 to 32) Our first emulator is a CC NUMA under strong ordering of shared memory accesses [11] Strong order is still a requirement of many commercial systems. For example, even if the SPARC V 9 (RMO) memory model is very weak, the ....

Barroso, L.A., and Dubois, M.,"Performance Evaluation of the Slotted-Ring Multiprocessor," IEEE Transactions on Computers, Vol. 44, No. 7, pp.878-890, July 1995.


Issues in the Design of Direct Multiprocessor Networks - Ravindran, Stumm (1997)   (Correct)

No context found.

L.A. Barroso and M. Dubois, "Performance evaluation of the slotted ring multiprocessor," IEEE Trans. on Computers, vol.44, no.7, pp. 878-890, July 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC