71 citations found. Retrieving documents...
A. Agarwal, "Limits on Interconnection Network Performance", IEEE Trans. on Parallel and Distributed Systems,Vol. 2, No. 4, October 1991, pp. 398--412.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Scalable Opto-Electronic Network (SOENet) - Gupta, Dally, Singh, Towles (2002)   (Correct)

....A method for adapting fat tree networks to the bandwidth available in a given packaging technology is described in [14] In [7] networks are assumed to be limited by wire bisection and low dimensional cube networks are found to offer minimum latency under this assumption. Reference [4] combines a wire bisection limit with a pin limit and concludes that a slightly higher dimensionality is optimal. Reference [8] adds express channels to cube to balance wire and router delay and shows how channel bandwidth can be matched to the packaging technology. In [19] the impact of ....

A. Agarwal. Limits on interconnection network performance, 1991.


Stability Of 2-D Distributed Processes With.. - Bauer, Sichitiu..   (Correct)

....i.e. large upper bounds on d( reduce the amount of parallelism that can be brought to bear, at least for a nite data set S. Therefore, small and tight bounds on the interprocess communication delays are advantageous, which conrms previous results in distributed parallel computing [6] [8] Unfortunately, there are network types (such Ethernet and or TCP IP) that do not allow to construct an upper delay bound. If one still assumes a interprocess delay function d with an upper bound, this upper bound will be violated with a certain probability p. In order to ensure a parallel, ....

A. Agarwal, "Limits on interconnection network performance, " IEEE Trans. on Parallel and Distributed Systems, vol. 2, pp. 398--412, Oct. 1991.


Execution Based Evaluation of MINs for Cache-Coherent.. - Kumar, Bhuyan, Iyer (1996)   (Correct)

....times of each application for different network architecture. Finally, section 6 presents the conclusion and the direction for further work in this area. 2 Simulation Model Our simulator is based on Proteus [12] However, the simulator implemented MIN using an analytical model presented in [13]. We have modified the simulator extensively to exactly model MINs with packet switching and wormhole routing. For wormhole routing, we have also incorporated virtual channels and multi flit buffers. The system considered for evaluation in this paper is a directory based cache coherent ....

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398--412, October 1991.


Accelerated Waveform Methods for Parallel Transient.. - Reichelt, Lumsdaine.. (1993)   (2 citations)  (Correct)

....[1, 2] as MIMD machines become increasingly more popular and cost effective, it is import. ant that efficient algorithms be developed for them as well. To obtain high est performance on a MIMD parallel computer, it is critical that a numerical method avoid frequent paral lel synchronization [3]. The waveform relaxation (WR) approach to solving time dependent problems is such a method, because in parallel WR, iterates are com municated between processors only after having been computed over a time interval [4, 5, 6] Parallel pointwise methods, on the other hand, must communicate ....

A. Agarwal, "Limits on interconnection network perfor- mance," IEEE Trans. Parallel Distrib. Sys., pp. 398-412, October 1991.


Adaptive Bubble Router: a Design to Improve.. - Puente, Beivide.. (1999)   (1 citation)  (Correct)

....Moreover, this pipelined message transmission makes latency less sensitive to the distance in the network provided that messages are long enough, facilitating the search for optimal topologies. Several researchers recommended the use of low dimensional direct networks in the k ary n cube class [1, 8]. As a result, the use of bidimensional or threedimensional meshes and tori or limited degree hypercubes is common in multicomputers and DSMs. However, wormhole switching has also some disadvantages. A main one is that messages block in place when the link requested by the header is busy. So, ....

A. Agarwal, "Limits on interconnection network performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398--412, October 1991.


SMART: a Simulator of Massive ARchitectures and Topologies - Petrini, Vanneschi (1997)   (Correct)

....architecture. In our experiments we compared the execution of a 65536 input butterfly on two networks with 256 nodes, a 16 ary 2 cube and a 4 ary 4 tree. A fair comparison of interconnection networks should take into account physical constraints as the pin count, wire delay and bisection width [1]. In our experiments we normalized the communication performance by setting the flit and the data path size on the fat tree at one byte and at two bytes on the toroidal cube. In both cases the flit transmission delay is set to a single cycle. This normalization can be interpreted in the following ....

A. Agarwal. Limits on Interconnection Net6 work Performance. IEEE Transactions on Parallel and Distributed Systems, 2(4):398--412, October 1991.


On the Reduction of Deadlock Frequency by Limiting.. - López, Martínez..   (Correct)

....queue) and throughput (maximum traffic accepted by the network) Traffic is the flit reception rate. Latency is measured in clock cycles. Traffic is measured in flits per node per cycle. Taking into account the sizes of current multicomputers and the studies about the optimal number of dimensions [1], we have evaluated the performance of the new injection mechanism on a bidirectional 8 ary 3 cube network (512 nodes) 4.1 Network Model Our simulator models the network at the flit level. Each node in the network consists of a processor, its local memory, a routing control unit, a switch, and ....

....an Gamma1 ; an Gamma2 ; a 1 ; a 0 communicates with the node a 0 ; an Gamma2 ; a 1 ; an Gamma1 (exchange the most and least significant bits) For message length, 16 flit messages were considered. This value is considered as typical in other interconnection network evaluation studies [1, 26]. 4.3 Performance Comparison In this section we compare the results obtained by the routing algorithm considering the new injection limitation mechanism with different translation tables. We will also show the results obtained with the previous message injection limitation mechanism described in ....

A. Agarwal, "Limits on interconnection network performance", IEEE Transactions on Parallel Distributed Systems, vol. 2, no. 4, pp. 398--412, Oct. 1991.


The Adaptive Bubble Router - Puente, Izu, Beivide, Gregorio.. (2001)   (Correct)

....The communication subsystem of these machines is composed of a number of interconnected routers arranged in a specific topology. The performance of these direct interconnection networks is governed by that of the router and the interconnect. With respect to network topology, Dally [11] and Agarwal [1] recommended the use of low degree networks belonging to the class of the k ary n cubes. Rings, meshes, tori and hypercubes are representative networks of this class. Some parallel computer manufacturers followed their advice and machines such as the Cray T3D and Cray T3E use three dimensional ....

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398--412, October 1991.


Network Performance under Physical Constraints - Petrini, Vanneschi (1997)   (Correct)

....of shared memory computation and common parallel algorithms. Our experiments are conducted on a quaternary fat tree and a bi dimensional cube, whose communication performance is properly equalized taking into account physical limitations as the router complexity, wire delay and density [18]. This paper is an attempt to compare apples with apples: with our simulation model we try to eliminate all implementation dependent details and to compare the essential features of the two interconnection networks. The remainder of this paper is organized as follows. Sections 2 and 3 overview ....

....nodes N = k k1 1 . A 4 ary 4 tree and a 16 ary 2 cube satisfy these conditions, so we will consider these two networks in the experimental evaluation. A fair comparison of interconnection networks should also take into account physical constraints as the pin count, wire delay, bisection width [18] and the router complexity [30] In our experiments we normalize the communication performance by setting the flit and the data path size on the fat tree at two bytes and at four bytes on the cube. If we consider a 4 ary 4 tree and a 16 ary 2 cube, this normalization can be interpreted in the ....

A. Agarwal, "Limits on Interconnection Network Performance, " IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 398--412, October 1991.


The Interaction between Virtual Channel Flow Control and.. - Ramany, Eager (1994)   (7 citations)  (Correct)

....mesh and torus networks are favored by many researchers This research has been supported in part by grants from the Natural Sciences and Engineering Research Council of Canada. Appeared in the Eighth ACM Interantional Conference on Supercomputing, Manchester, England, July 11 15, 1994 [1]. An example 2 dimensional mesh is shown in Figure 1; note that we assume here that each bidirectional link has associated with it two (uni directional) physical channels. Figure 1: A 2 dimensional mesh of size 4 Theta 4 The most popular switching technology at present is wormhole switching ....

A. Agarwal, "Limits of Interconnection Network Performance ", IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No. 4, Oct. 1991, pp. 398-412.


Processor Management Policies for Multiprocessors - Yu (1994)   (Correct)

....for hypercubes also for the MIN machines. The processor allocation problem has to be extended to other classes of multiprocessors such as a cluster based system, a mesh system and k ary n cube machines. In particular, we are interested in processor allocation in k ary n cube architectures [69] [71]. An efficient and general allocation algorithm for the k ary n cube topology is difficult to achieve due to the representation problem. In [69] mixed radix representation was proposed for more general topology. In order to keep the problem tractable, we plan to confine the architecture to a 2 ....

A.Agarwal, "Limits on Interconnection Network Performance," IEEE Trans. Parallel and Distributed Systems, Vol.2, pp.398-412, Oct.1991.


Efficient Personalized Communication on Wormhole Networks - Petrini, Vanneschi (1997)   (Correct)

....three families of topologies: k ary n cubes, k ary n flies and k ary n trees and a node architecture with processing capabilities and a memory hierarchy . A fair analysis of interconnection networks should take into account physical constraints as the pin count, wire delay, bisection width [39] and the router complexity [40] In our experiments we normalize the communication performance by setting the flit and the data path size on the fat tree at two bytes and at four bytes on the toroidal cube. If we consider two representative networks with 256 nodes, a 4 ary 4 tree and a 16 ary ....

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 398--412, October 1991.


An Application-driven Study of Parallel System.. - Sivasubramaniam.. (1999)   (1 citation)  (Correct)

....time. The studies differ in the techniques used to quantify these metrics. Crovella and LeBlanc [10] use experimentation, while simulation is used in our approach. III. Related Work There have been a number of studies addressing architectural issues such as network latency and contention [11], 12] 13] 14] and synchronization [15] 16] in isolation. While such issues are extremely important, their performance impact should be put in perspective by considering them in the context of the overall application. Recognizing this importance, the current trend in architectural 4 ....

....hypercube network topology. The cube represents a highly scalable network where the bisection bandwidth grows linearly with the number of processors. Even though cubes of 1024 nodes have been built [2] cost and technology factors often play an important role in its physical realization. Agarwal [11] and Dally [12] show that wire delays (due to increased wire lengths associated with planar layouts) of higher dimensional networks make low dimensional networks more viable. The 2 dimensional [50] and 3 dimensional [51] 52] toroids are common topologies used in current day networks, and it ....

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398--412, October 1991.


The Effect of the Number of Virtual Channels on the .. - Sarbazi-Azad.. (2000)   (Correct)

....Deterministic routing has been widely used in practice [12,13,14,15,19] as a result of its simplicity and minimal requirement for virtual channels. Analytical models of deterministic routing in wormhole routed k ary n cubes, e.g. hypercubes and tori, have been widely reported in the literature [1,4,7,9,10]. More recently, a similar model for the 2 dimensional mesh has been proposed by Greenberg and Guan [8] As these models either have not considered virtual channels or have considered but assume one flit buffer for each virtual channels. So they can not show the effect of the number of virtual ....

A. Agarwal, Limits on interconnection network performance, IEEE Trans. Parallel & Distributed Systems 2 (4) (1991) 398--412.


Stability and Performance of Alternative Two-level.. - Chowdhury, Holliday (1991)   (4 citations)  (Correct)

....point to point network. They assume that the arrival of packets constitutes a Poisson process, and the packet lengths are assumed to be exponentially distributed. Many have studied issues related to k ary (the number of nodes in a dimension is k, instead of s) n cube in the one level case [8, 9, 10, 11, 12, 13]. In the next section we derive the stability conditions and plot these limits for all three communications schemes as a function of the number of processors per node. In Section 3 we derive the packet delay results and plot these delays for all three communications schemes as a function of the ....

A. Agarwal, "Limits on interconnection network performance," IEEE Transactions on Parallel and Distributed Systems, to appear.


The Performance of SCI Memory Hierarchies - Roberto Hexsel Nigel (1994)   (1 citation)  (Correct)

....the network links. Scott and Goodman, in [25] investigate the performance of pipelined k ary n cube networks. In such a network, multiple bits may be traversing the same wire simultaneously. This makes the network s cycle time independent of wire length. When compared to synchronous networks (see [10, 2]) the pipelined networks yield lower latency and higher bandwidth, especially for high dimensional networks. The optimal dimensionality of pipelined networks is higher than that of synchronous networks and they should be grown by increasing the dimensionality while keeping the radix unchanged. ....

Anant Agarwal. Limits on interconnection network performance. IEEE Trans. on Parallel and Distributed Systems, 2(4):398--412, October 1991.


Acknowledgments - Would Like To   (Correct)

....is used. However, when a constant pin out constraint is considered, higher dimensional networks provide better performance. However, Abraham and Padmanabhan s analysis does not consider the longer wiring delays that might provide an additional penalty to higher dimensional networks. Agarwal [21] also introduced a pin out constraint (which he terms constant node size) He analyzed both unidirectional and bidirectional networks under constant bisection, constant pin out, and constant link width. The latter reduces to unconstrained analysis. He obtains results similar to Dally s, although ....

....is added to the average waiting time when a hypercube network is operating under certain traffic loads. The analytical model is validated in section 2.10 using several unidirectional network simulations. In section 2. 11, our model for average queue waiting time is compared to the model proposed in [21]. Expressions for average message latency with basic message switching and virtual cut through switching are derived in section 2.12. These results are used in section 2.13 to compare various network configurations of a 4096 node system when different constraints and message lengths are applied. ....

[Article contains additional citation context not shown here]

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. Parallel and Distributed Systems, vol. 2, no. 4, pp. 398-412, Oct. 1991.


Adaptive Granularity: Transparent Integration of Fine-Grain and.. - Park (1996)   (1 citation)  (Correct)

....thread based (e.g. SPLASH2 [24] applications to be simulated without requiring the program to be modified and provides a detailed model of the various hardware components. All hardware contentions in the machine are simulated, including the network. For the network, the model proposed by Agarwal [2] is used. Instruction references are assumed to take one cycle and virtual memory is enabled in all simulations. 4.2 Benchmark Applications In order to compare the performance of Adaptive Granularity with other approaches, we use four scientific applications that have different communication ....

A. Agarwal. Limits on Interconnection Network Performance. IEEE Trans. on Parallel and Distributed Systems, 2(4):398--412, October 1991.


Scalable Architectures With K-Ary N-Cube Cluster-C Organization - Basak, Panda (1993)   (2 citations)  (Correct)

....depend on various parameters like cluster size, the intranet and the internet topologies, and the routing schemes used. Our objective is to select the configuration that offers best system performance. For a given hierarchical configuration, we use the constant bisection bandwidth constraint[1, 2] to determine channel widths in each network. The number of wires that need to cross a bisection of the network is called bisection width. In general, bisection width cannot be increased arbitrarily. Factors like available layout area[1] allowable system size, cost, and power considerations put ....

....put limitations on the bisection width. In such cases, the bisection width can be held constant at some limit. This limit directly affects the channel width and indirectly determines message length in flits. Consider a k ary n cube with W bit channels. It has a bisection width of 4Wk n Gamma1 [2]. With N being total number of nodes, this width becomes 4WN=k. For a linear array of processors, this width is 2W . For comparison across different topologies, we normalize bisection widths to that of a 2 ary n cube with unit width bidirectional channels. Thus the bisection width of such a 2 ary ....

[Article contains additional citation context not shown here]

Agarwal Anant, "Limits on Interconnection Network Performance." IEEE Trans. on Parallel and Distributed Systems, Vol.2,No.4,Oct.91.


Performance Analysis of Mesh Interconnection Networks with.. - Adve, Vernon (1994)   (41 citations)  (Correct)

....networks are a special case of k ary n cube networks in which the number of dimensions, n, is two. Recent studies of k ary n cubes with wormhole routing (a low latency pipelined routing scheme [9] have shown that under reasonable assumptions, the optimal value for n is two or three [2, 8, 10]. Many existing and emerging multiprocessor systems use such low dimensional direct networks to interconnect the processors, including the Intel Paragon, Cray T3D, Stanford Dash [14] M.I.T. Alewife [1] M.I.T. J Machine [16] and CMU Intel iWarp [5] In this paper, we develop performance models to ....

....we develop performance models to study k ary n cube networks with wormhole routing, with either single flit or infinite network buffers. Our model for the single flit buffer case includes the deadlock free routing algorithm of Dally and Seitz [9] In contrast to previous analyses of these networks [2, 10, 11], the models we derive are closed queueing network models. Also in contrast to previous work, i) we include the effects of ################## This research was supported by the National Science Foundation under grant number DCR 8451405, and by an IBM Graduate Fellowship. Vikram S. Adve is with ....

[Article contains additional citation context not shown here]

A. Agarwal, Limits on Interconnection Network Performance, IEEE Trans. on Parallel and Distributed Systems 2, 4 (October 1991), 398-412. -- --


Accelerated Waveform Methods for Parallel Transient Simulation .. - Mark Reichelt (1993)   (2 citations)  (Correct)

....[1, 2] as MIMD machines become increasingly more popular and cost effective, it is important that efficient algorithms be developed for them as well. To obtain highest performance on a MIMD parallel computer, it is critical that a numerical method avoid frequent parallel synchronization [3]. The waveform relaxation (WR) approach to solving time dependent problems is such a method, because in parallel WR, iterates are communicated between processors only after having been computed over a time interval [4, 5, 6] Parallel pointwise methods, on the other hand, must communicate iterates ....

A. Agarwal, "Limits on interconnection network performance, " IEEE Trans. Parallel Distrib. Sys., pp. 398--412, October 1991.


A Necessary and Sufficient Condition for Deadlock-Free Routing in.. - Duato (1995)   (64 citations)  (Correct)

....silicon area. Also, large buffers usually increase propagation delay, thus slowing down clock frequency. As VLSI technology progresses, more transistors are available and larger buffers can be integrated. Moreover, as circuits become faster, channel propagation delay is becoming the bottleneck [1]. As a consequence, the impact of node delay on performance will decrease over time, and virtual cut through may be the choice for the future, provided that buffer design is optimized. This is specially true for distributed shared memory multiprocessors with hardware cache coherence protocols ....

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. Parallel Distributed Syst., vol. 2, no. 4, pp. 398--412, Oct. 1991.


Issues in the Design of Direct Multiprocessor Networks - Ravindran, Stumm (1997)   (Correct)

....network and when it is received by the destination node. It is the sum of the internode distance and the amount of time the packet is blocked in the network waiting for resources. The throughput of a network is the total number of bytes transmitted in the network per unit time. Bisection width [1] is defined as the minimumnumber of channels (bisection channels) that must be removed to partition the network into equal halves. Bisection bandwidth is the total bandwidth of the bisection channels. The significance of the bisection width is that, if memory request destinations are selected at ....

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398-412, April 1991.


Performance Issues in the Design of Hierarchical-ring and Direct .. - Ravindran (1998)   (Correct)

....the packet is blocked in the network waiting for resources. The multiprocessor system throughput is defined as the total number of memory requests completed per unit time. Bisection width is defined as the minimum number of links that must be removed to partition the network into equal halves [3]. Bisection bandwidth is the total bandwidth of the bisection links. The significance of the bisection width is that if memory request destinations are selected at random, then half of the requests must traverse the bisection channels. Therefore, for an application that exhibits poor memory access ....

....had a smaller network diameter at the expense of lower per link bandwidth. More recent multiprocessor systems use lower dimensional direct networks, after it was shown that lower dimension networks (with at most 3 dimensions) generally perform better than their higher dimension counterparts [3, 20]. The most common direct network topology in use today is the 2 dimensional (2D) mesh because of its low degree, which permits efficient layouts and construction with standard components (see Figure 2.1) An interesting variation is the cube connected cycle, where each node of an n dimensional ....

[Article contains additional citation context not shown here]

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 4, pp. 398-412, April 1991.


Performance Evaluation of Wire-limited Hierarchical Networks - Hsu, Yew (1992)   (2 citations)  (Correct)

....[HsYe91] However, most of these studies are only limited to the discussions of traffic patterns with a high locality. Very few of them addressed physical packaging constraints. On the other hand, most recent work on packaging issues in multiprocessor interconnects, such as [Dall90] AbPa90] and [Agar91], address only non hierarchical systems. 4 In this paper, we study and compare the design of interconnects for hierarchical multiprocessors with packaging constraints. We will limit our study to two level hierarchical systems. In section 2, we give a short overview of previous work, and a ....

....For large systems, where signals need to run across the boundary of packages, pinout is a more severe constraint. However, in this paper, we will use both pinout and bisection width as the determinants of the system cost. Other constraints have also been studied in the literature. For example, [Agar91] examined fixed channel width, bisection width and node pinout, and took into account both switch delay and wire delay. In our analytical models, we ignore the effect of wire length and time of flight delays. The technologies used in Thinking Machine s CM 5 and IBM s Vulcan, for example, allow ....

[Article contains additional citation context not shown here]

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 4, October 1991.


Optimal Convolution Sor Acceleration Of Waveform Relaxation With .. - Reichelt (1995)   (7 citations)  (Correct)

....of this new method, it is used to solve the differential algebraic system generated by spatial discretization of the time dependent semiconductor device equations. Introduction To achieve highest performance on a parallel computer, a numerical algorithm must avoid frequent parallel synchronization [1]. The waveform relaxation approach to solving time dependent initial value problems is just such a method, as the iterates are waveforms over an interval, rather than single timepoints [2, 3, 4] Like any relaxation scheme, efficiency depends on rapid convergence, and there have been several ....

....SOR. Theorem 3.1. On a finite simulation interval, the iterations defined by (11) and (14) have the same asymptotic convergence rate. Proof. Let y k denote the large vector consisting of the concatenation of vectors Deltax k [m] at all L discrete timepoints, i.e. y k = h Deltax k [1] T ; Deltax k [L] T i T . Collecting together the equations (14) generated at each timepoint into one large matrix equation in terms of vectors y k 1 and y k yields M Deltay k 1 = N Deltay k where M ; N 2 R Ln ThetaLn are block lower triangular banded matrices, with ....

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. Parallel Distrib. Sys., pp. 398--412, October 1991.


The Offset Cube: A Three-Dimensional Multicomputer Network.. - Stephen Lacy (1996)   (1 citation)  (Correct)

....of equal size offset cube and bidrectional 3D mesh networks. We choose the mesh as a standard for comparison because its performance advantages under bisection bandwidth and pinout constraints over other direct topologies (such as binary hypercubes) have been demonstrated in numerous studies [1, 2, 12, 32]. Furthermore, the 3D mesh can be efficiently implemented using the stacked MCM technique depicted in Figure 1. Flit level network simulations assuming constant pin out show that the latency throughput characteristics of the offset cube are generally comparable to those of an equal size ....

.... and packaging aspects have been abstracted away in terms of the wireability (e.g. constant bisection bandwidth or constant node size) and wire delay (e.g. constant, linear, or logarithmic with nonpipelined or pipelined channels) associated to generic 2D and 3D wire based system implementations [1, 2, 5, 12, 32]. While these studies have identified which interconnection topology per 6 forms best given a particular set of algorithmic (traffic patterns) and technological constraints, the assumption of an electrical interconnect technology model limits the insight that can be garnered with respect to ....

[Article contains additional citation context not shown here]

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Trans. on Parallel and Distributed Systems, vol. 20, pp. 398-412, October 1994.


A Preliminary Evaluation of Cache-Miss-Initiated.. - Bianchini, LeBlanc (1994)   (12 citations)  (Correct)

....bandwidth also favors large cache blocks, since more data can be transferred for little extra cost. Large cache blocks can introduce network contention problems however, since small packets generate less contention than large ones (assuming the same amount of data is transferred in both cases) [Agarwal, 1991]. Also, memory performance is affected by the block size; large blocks increase the memory busy time, thereby delaying contending processors. Increased network and memory bandwidth can reduce the cost of transferring large cache blocks, but do not change the role of the miss rate. An increase in ....

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Transactions on Parallel and Distributed Systems, 2(4):398--412, Oct 1991.


A Performance Comparison of Hierarchical Ring- and.. - Ravindran, Stumm (1997)   (7 citations)  (Correct)

....sizes (by 10 30 ) for larger systems the performance of hierarchical rings is severely constrained due to bisection bandwidth limitations and meshes perform significantly better. Although some previous work on the performance of hierarchical ring networks [4, 13, 16, 20, 21] and on mesh networks [1, 2, 8, 12, 23] has been published, we are aware of only one study that compares the performance of both types of networks [15] That study uses analytical models to conclude that three level hierarchical systems perform somewhat better than mesh systems. The rest of the paper is organized as follows. Section 2 ....

....of mesh networks. As described in Section 2, we assume square, 2 dimensional bi directional wormhole routed meshes with no end around connection and simple, deterministic e cube routing. We are brief in our presentation, since our results are compatible with those obtained by other researchers [1, 2, 8]. Figure 12 presents latency curves for the access pattern with R = 1:0, C = 0:04, and T = 4, assuming buffers sizes of 1, 4 or cl flits, where cl is the size required to accommodate a packet containing a cache line. One significant observation from the figure is that the increase in latency as a ....

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 4, pp. 398-412, April 1991.


Designing Scalable Systems with two-level k-ary n-cube.. - Basak, Panda (1993)   (Correct)

....depend on various parameters like cluster size, the intranet and the internet topologies, and the routing schemes used. Our objective is to select the configuration that offers best system performance. For a given hierarchical configuration, we use the constant bisection bandwidth constraint[2, 3] to determine channel widths in each network. The number of wires that need to cross the bisection of a network is called bisection width. In general, this width cannot be increased arbitrarily and is limited by factors like available layout area[2] allowable system size, cost, and power ....

....the bisection width can be held constant at some limit. This limit directly affects the channel width and indirectly determines the number of flits required for a given message. Consider a k ary n cube with bidirectional channels each of width W bits. It has a bisection width of 4Wk n Gamma1 [3]. With N being total number of nodes, this width is equal to 4WN=k. For a linear array of processors, this width is 2W . For comparison across different topologies, we normalize bisection widths to that of a 2 ary n cube(hypercube) with unitwidth bidirectional channels. Thus the bisection width of ....

[Article contains additional citation context not shown here]

Agarwal Anant, "Limits on Interconnection Network Performance." IEEE Trans. on Parallel and Distributed Systems, Vol.2,No.4,Oct.91.


A Queueing Model for Wormhole Routing with Timeout - Hu, Kleinrock (1995)   (1 citation)  (Correct)

....[20] which has been adopted as the LAN infrastructure for the Supercomputer SuperNet (SSN) a research project being conducted at UCLA, JPL and Aerospace Corp. 14] Many performance studies for wormhole routing in a supercomputer environment have been carried out and presented in the literature [1, 2, 5, 6, 9, 16]. However, many performance studies do not focus on the LAN environment, which has irregular topology and usually consists of low cost non intelligent switches. Moreover, except for the simulation studies in [16] there is no analysis work evaluating the timeout reset mechanism, which is not only ....

A. Agarwal, "Limits on Interconnection Network Performance ", IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 4, Oct. 1991.


Tailoring Routing and Switching Schemes to.. - Feng, Rexford.. (1995)   (Correct)

.... routing outperforms diagonal routing in Figure 4(b) In dimension ordered routing, a packet entering a node in one direction generally exits the node traveling in the same direction; this reduces the likelihood that packets from different incoming links contend for the same output port [28]. In contrast, adaptive algorithms often allow packets to alternate dimensions, possibly blocking other arriving traffic [16] Adaptively changing dimensions may also increase congestion in the center of the mesh, as evidenced by the early saturation of the diagonal routing plot in Figure 4(b) ....

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. Parallel and Distributed Systems, vol. 2, pp. 398--412, October 1991.


LoGPC: Modeling Network Contention in Message-Passing Programs - Moritz, Frank (1998)   (13 citations)  (Correct)

....regular applications with good communication locality and tight synchronization. In addition LoGPC uses the features of the LogGP model to account for long message bandwidth. LoGPC extends these models with a simple model of network contention effects. We use Agarwal s open model for k ary n cubes [1] and close it by including the impact of network contention on the message injection rate. Finally, LoGPC models the pipelining characteristics of DMA engines which allow the overlap of memory and network access times. We validate LoGPC by comparing its predictions to the measured performance of ....

....we take is to begin with the LogGP machine parameters along with information about a specific program s messaging rate, and to then apply these parameters to a queueing model to calculate the network contention observed by the program. Our technique uses Agarwal s open model for k ary n cubes [1] to calculate the network contention from the message injection rate, and then closes the model by feeding the network contention costs back into the calculation for the message injection rate. This section begins by giving a brief overview of Agarwal s model (Equations 3 through 6) Then we ....

[Article contains additional citation context not shown here]

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 4, October 1991.


Software Technologies for Reconfigurable Systems - Hauck, Agarwal (1996)   (2 citations)  Self-citation (Agarwal)   (Correct)

No context found.

A. Agarwal, "Limits on Interconnection Network Performance", IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No. 4, pp. 398-412, October, 1991.


The Wall Mesh - Chen, Lau (1997)   (Correct)

No context found.

A. Agarwal, "Limits on Interconnection Network Performance", IEEE Trans. on Parallel and Distributed Systems,Vol. 2, No. 4, October 1991, pp. 398--412.


High-Level Power Analysis for On-Chip Networks - Eisley, Peh (2004)   (Correct)

No context found.

A. Agarwal, "Limits on Interconnection Network Performance," IEEE Trans. on Par. and Dist. Syst., vol. 2, no. 4, pp 398-412, October, 1991.


An Analytical Model of Adaptive Wormhole Routing - With Time-Out Khonsari   (Correct)

No context found.

A. Agarwal, Limits on interconnection network performance, IEEE TPDS, vol. 2, pp.398-412, 1991.


Wormhole Routing in De Bruijn Networks and Hyper-Debruijn.. - Ganesan, Pradhan (2003)   (3 citations)  (Correct)

No context found.

Agarwal, A., "Limits on interconnection network performance, " IEEE Trans. on Parallel and Distributed Systems, vol. 2, pp. 398--412, Sep 1991.


Evaluating the Cost of the Dynamic Reconfiguration of a.. - Garcia, Duato   (Correct)

No context found.

Agarwal, A. "Limits on interconnection network performance". IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No. 4, pp. 392-412, October 1991.


Viable Architectures for High-Performance Computing - Ziavras, Wang, Papathanasiou (2003)   (Correct)

No context found.

Agarwal, A. (1991) Limits on interconnection network performance. IEEE Trans. Parallel Distrib. Syst., 2, 398--412.


Dynamic Reconfiguration of Multicomputer Networks: Limitations .. - Garcia, Duato (1993)   (1 citation)  (Correct)

No context found.

Agarwal, A. "Limits on interconnection network performance". IEEE Trans. on Parallel and Distributed Systems, Vol. 2, No.


2.5n-Step Sorting on n×n Meshes in the Presence.. - Yeh, Parhami, Lee..   (Correct)

No context found.

Agarwal, A., "Limits on interconnection network performance, " IEEE Trans. Parallel Distrib. Sys., Vol. 2, no. 4, Oct. 1991, pp. 398-412.


A Novel Approach to Improve the Performance of.. - Garcia, Flores   (Correct)

No context found.

A. Agarwal. Limits on interconnection network performance. IEEE Trans. on Parallel and Distributed Systems, 2(4):398--412, Octuber 1991.


Improving Parallel System Performance by Changing the.. - Puente Izu Gregorio   (Correct)

No context found.

A. Agarwal, "Limits on Interconnection Network Performance", IEEE Trans. on Comp., vol. 2, no4, pp:398412, October 1991.


Contention and Queueing in an Experimental Multicomputer.. - Fang, Felten, al. (1996)   (7 citations)  (Correct)

No context found.

A. Agarwal. Limits on interconnection network performance. IEEE Trans. on Parallel and Distributed Systems, 2(4):398--412, Oct. 1991.


Performance Modeling of Optical Interconnection.. - Cruz-Rivera..   (Correct)

No context found.

A. Agarwal, "Limits on interconnection network performance," IEEE Trans. Parallel Distributed Syst., vol. 2, pp. 398--412, Oct. 1991.


Necessary and Sufficient Conditions for Deadlock-free Networks - Carrion Beivide   (Correct)

No context found.

A. Agarwal. "Limits on Interconnection Network Performance. " IEEE Trans. on Parallel Distributed Systems, Vol. 2,no. 4, pp. 398-412, Oct. 1991.


Modeling the Technology Impact on the Design of a Two-Level.. - Cruz-Rivera (1997)   (Correct)

No context found.

A. Agarwal, "Limits on Interconnection Network Performance ", IEEE Transactions on Parallel and Distributed Systems, 2:(10), pages 2-16, October 1991.


Final Report on Research in Parallel Computing.. - December Carnegie (1996)   (Correct)

No context found.

Agarwal, A. Limits on Interconnection Network Performance. IEEE TPDS 2(4):398-412, October, 1991.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC