95 citations found. Retrieving documents...
S. Borkar et al., "Supporting systolic and memory communication in iWarp," in Proc. 17th Int. Symp. Computer Architecture, Seattle, WA, 1990, pp. 70--81.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Compiler Support for Scalable and Efficient Memory Systems - Barua, Lee, Amarasinghe.. (2001)   (2 citations)  (Correct)

....paper aim for balanced distribution and minimizes code growth. It provides the opportunity for a back end to optimize for locality, but it does not address the locality issue directly. The methods in this paper apply to any bank exposed architecture for both generalpurpose and embedded systems [7, 11, 20, 22, 32]. In theory, it may also be used to map sequential programs onto distributed shared memory multiprocessors, although the communication latencies on DSMs have historically been too high to be able to profitably exploit the instruction level parallelism extracted by this compiler approach. This ....

....A[2i 1] are not. Low order interleaving is the distribution of array elements in a round robin manner across the memory banks. That is, for a low order interleaved array A[ element A[i] is allocated on bank i mod N . A[6] A[10] A[3] A[5] A[1] A[8] A[8] A[11] A[7] A[5] A[9] A[4] A[2] A[6] A[7] A[11] A[4] A[0] A[8] A[1] A[1] A[5] A[9] A[2] A[6] A[10] A[10] A[3] A[7] A[11] for = 0 to 99 do endfor A[ i ] A[0] A[4] A[99] c) endfor for i = 0 to 99 step 4 do A[ i ....

[Article contains additional citation context not shown here]

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70--81, June 1990.


MORPH: A System Architecture for Robust High Performance Using .. - Chien, Gupta (1996)   (5 citations)  (Correct)

....the packages to about 90 GB sec for a pin out of 1800 usable MCM pins. The bisection bandwidth would be in the range of 10 TB sec. MORPH s flexible architecture subsumes both the processor in memory (PIM) and scalable shared memory approaches. Based on the experience of several PIM like systems [21, 20, 55, 7, 6], there is evidence that PIM organizations represent significant programming challenges, particularly for irregular applications. We believe that the use of more traditional processor memory structures will yield a machine with more accessible performance Network Interface Programmable Logic ....

Borkar, S., Cohn, R., Cox, G., Gross, T., Kung, H. T., Lam, M., Levine, M., Moore, B., Moore, W., Peterson, C., Susman, J., Sutton, J., Urbanski, J., and Webb, J. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th International Symposium on Computer Architecture (1990), IEEE Computer Society, pp. 70--81.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

.... communication, avoiding the overhead of memory copying at both the sender and the receiver, and eliminating the dedicated memory for message arrival, as is found on the J Machine [8] Registermapped network interfaces have been used previously in the Mars Machine [2] J Machine, and iWarp [4], and have been described by T [26] as well as Henry and Joerg [15] However, none of these systems provide protection for user level messages. Systems, like the J Machine, that provide user access to the network interface without atomicity must temporarily disable interrupts to allow the ....

BORKAR, S., ET AL. Supporting systolic and memory communicationin iWarp. In Proceedings of the 17th International Symposium on Computer Architecture (May 1990), pp. 70-81.


Compiler Analysis to Implement Point-to-Point Synchronization in.. - Nguyen (1993)   (3 citations)  (Correct)

....In the conventional scheme of dynamic routing, a message is routed by examining its header which identifies the destination processor for the message. In situations where two messages need to access the same resource, one message must be either blocked or buffered. Architectures such as iWarp [Bor90] or NuMesh [War93] seek to alleviate contention costs by introducing the idea of static routing. When destinations of messages are known at compilation, then routing can be scheduled statically to avoid unnecessary contentions [SA91] Furthermore, hardware which supports static routing can avoid ....

Shekhar Borkar, et al. Supporting systolic and memory communication in iWarp. In Proceedings of the International Symposium on Computer Architecture, pages 70-81, 1990.


An Efficient Virtual netweork Interface in the FUGU Scalable.. - Mackenzie   (Correct)

.... advantage of the characteristics of the so called System Area Network (SAN) environment [76, 63, 80, 6, 18, 16, 28, 13] Higher performance network interfaces suitable for significantly finer grain parallel problems have been demonstrated in massively parallel processors as research prototypes [69, 7, 15, 1, 60, 2, 55] and as commercial machines [44, 68, 71] However, MPP work has largely ignored issues of mixed workloads that require multiprogramming, demand paging and interactive scheduling. A Scalable Workstation represents one vision of the convergence of SMP, cluster and MPP goals and technologies that ....

....et al. s CNI6Qm [55, 56] interface provides both a fast path and a (potentially virtual) buffered path by using the network interface to buffer messages. Hybrid solutions will be discussed in more detail in Chapter 8. Direct network interfaces, Figure 8 1a have been used in research machines [15, 7, 60, 1, 55] and one commercial machine, the CM 5 [44] These interfaces feature low latency by allowing the processor direct access to the network queue. Direct NIs can be inefficient unless placed close to the processor. Anticipating continued system integration, we place our N on the processor cache bus. ....

[Article contains additional citation context not shown here]

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70-81, June 1990.


Compressionless Routing: A Framework for Adaptive and.. - Kim, Liu, Chien (1996)   (22 citations)  (Correct)

....the bottleneck of network interface bandwidth. Figs. 14 (e) and (f) show that, when enough source and sink bandwidth is provided, 9The importance of interface bandwidth has been observed by other researchers and the multichannel interface was actually used in the design of the Intel iWARP [30]. 21 CR (vc=l,b=2) CR (vc=2, b=2) DOR (vc=2, b=2) DOR (vc=2, b=4) o DOR (vc=2, b=8) LI LI DOR (vc=2, b=16) I I I I I I I I 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.00 CR (vc=l,b=2) 7 CR vo=2, b=2) DOR (vc=2, b=2) t DOR ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lain, M. Levine, B. Moore, W. Moore, C. Peter- son, J. Susman, J. Sutton, J. Urbanski, and J. Webb, "Supporting systolic and memory communication in iWarp," in Proceedings of the 17lb International 'ymposium on Computer Architecture, pp. 70 81, IEEE Computer Society, 1990.


Evaluation of Wormhole Routed Networks Under Hybrid Traffic Loads - Kim, Chien (1993)   (9 citations)  (Correct)

....adaptive routing. Both of these features relax the binding of messages to physical channels, reducing the amount of intermessage interference possible due to the shared resources (the physical channels) Virtual lanes, described by Dally in [17] and also called logical channels by Kung, et al. [6], virtualize the physical channels, allowing them to be shared between several messages. This allows an unobstructed messages to pass a blocked message, decreasing the coupling between messages (such as that between a blocked message and others) Thus, virtual lanes reduce the performance loss due ....

....the same physical channel pass each other without interference. Each router is connected to its corresponding processing element through two channels. The importance of this path has been observed by other researchers and a multi channel path was actually used in the design of the Intel iWARP [6]. Our design includes two source channels for injecting messages to network and two sink channels for removing them from the network. One of each type is reserved for long messages; the other used for short messages. This prevents short messages from being blocked by long messages in the node. 4 ....

Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H. T. Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting Systolic and Memory Communication in iWARP. In Proceedings of the 17th International Symposium on Computer Architecture. IEEE Computer Society, 1990.


The Cost of Adaptivity and Virtual Lanes in a Wormhole Router - Aoyama, Chien (1995)   (13 citations)  (Correct)

....routing feasible, but not without cost. Deciding whether or not to incorporate adaptive routing into a router is still a complex costperformance tradeoff with the cost side of the equation still largely undefined. Virtual lanes have been proposed as a mechanism to improve router performance [13, 7]. However, adding virtual lanes not only increase channel utilization, they also increase router complexity, slowing implementations. While virtual lanes are also attractive, they have yet to attain widespread acceptance in commercial machines. In this paper, we examine the cost of adaptivity and ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iwarp. In Proceedings of the 17th International Symposium on Computer Architecture. IEEE Computer Society, 1990.


Planar-Adaptive Routing (par) :low-Cost Adaptive Networks For.. - Jae Kim Eng   (Correct)

....buffer is blocked, the packet releases the physical channel so that other packets can use it for bypassing. In [19] Dally showed that virtual lanes give much higher performance than does the use of the given storage as a single deep FIFO queue. A similar idea was used in the Intel iWARP machines [9] where they were called logical channels. Each iWARP node provides 20 logical input channels and 20 logical output channels dynamically connected by a crossbar switch. The logical channels share 8 physical channels linked to neighbor nodes by time multiplexing. In addition to increasing the ....

....two dimensional routers. The two adjacent planes are connected only through a dimension interface. Figure 5.2 shows the internal organization of a three dimensional planar adaptive router. The figures assumed that a router is integrated into a processor chip as in NDF [26] and the Intel iWARP [9]. Because of its small size and enhanced VLSI technology, the PAR can be embedded in the processor chip without a significant effect on processor performance. The embedded router scheme eliminates an external interface between a processor and a network. Thus, a number of pins required for the ....

[Article contains additional citation context not shown here]

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWARP. In Proceedings of the 17th International Symposium on Computer Architecture. IEEE Computer Society, 1990.


A High-Speed Network Interface for Distributed-Memory Systems.. - Steenkiste (1996)   (2 citations)  (Correct)

....the iWarp cell to four neighbors through 40 MByte second buses; the cells in the iWarp array are configured as a torus. The communication system supports high speed interprocessor communication for a variety of communication models, including systolic communication and memory communication [7]. In systolic communication, the CPU writes data directly onto the interconnect, thus minimizing communication latency. Memory communication is supported through the use of spools, on chip DMA engines that move data between the local memory and the interconnection network. TheiWarp system ....

....in Section 7. 5.3 The iWarp Streams Software We give a more detailed description of the streams implementation for iWarp. 5.3. 1 Data and control interface The data interface between the distributed memory system and the network interface is based on the PCS and ESPL communication libraries [24, 7]. PCS is used to create application specific connections, and ESPL is a fast spooling library that achieve bandwidths close to the 40 MByte second link rate, even for short messages. To support striping, we developed the Enhanced Multiple SPool Library (EMSPL) an extension of ESPL that manages ....

[Article contains additional citation context not shown here]

Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H.T. Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70--81, Seattle, May 1990. ACM/IEEE.


Efficient Compilation of Array Statements for Private.. - James Stichnoth February (1993)   (31 citations)  (Correct)

....unstructured and irregular data distributions. The work presented here is meant to be sufficiently detailed to allow a compiler writer to use it as a manual for implementing a similar compiler. We have validated our work by implementing it in a prototype compiler, called Fx, for the iWarp system [2, 3]. The Fx compiler takes as input Fortran 77 statements augmented with whole array and array slice syntax and directives for data distribution, and produces as output Fortran 77 code augmented with communication primitives, which is passed on to the native iWarp Fortran compiler. The paper is ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iWarp. In Proc. 17th Intl. Symposium on Computer Architecture, pages 70--81. ACM, May 1990. 34


The Cranium Network Interface Architecture: Support for Message.. - McKenzie (1997)   (Correct)

....the network interface is implemented using a separate chip, and the computing node is based on a standard processor architecture such as the SPARC, the DEC Alpha or the Intel Pentium. Some examples from academic research projects are the MIT Message Driven Processor (MDP) 22] the CMU iWarp [23] and the Caltech Mosaic [24] In all of these projects, the network interface is tightly coupled with the processor; i.e. the network interface and the processor are placed on the same silicon. The motivation for the tight coupling is to reduce the overhead associated with short messages. For ....

....network interface designs that contain two different mechanisms use two completely separate strategies for short and long messages. For instance, the MIT Alewife interface [26] provides a message passing interface for long messages and a shared memory interface for short messages. The iWarp chip [23] also provides two separate strategies, one for large messages called memory communication and the other for small messages called systolic communication. These designs therefore become more complex than is necessary. We believe that our design represents the minimum complexity that is required to ....

[Article contains additional citation context not shown here]

Shekhar Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Sussman, J. Sutton, J. Urbanski and J. Webb. Supporting systolic and memory communication in iWarp. Proc. of the 17th Annual International Symposium on Computer Architecture, Seattle WA, May 1990, pp. 70-81.


Robot Instruction by Human Demonstration - Kang (1994)   (11 citations)  (Correct)

....from the four cameras at video rate, i.e. at 30 Hz. 1. The word active refers to the addition of projected structured lighting during image capture. 2. The iWarp is a high speed system architecture which is a result of joint effort between Carnegie Mellon University and Intel Corporation [15]. Sun workstation light stripe rangefinder CyberGlove and Polhemus 16 Normally, two cameras are sufficient to recover depth from triangulation. However, having a redundant number of cameras facilitates correct matches between images, which is critical to accurate depth recovery [117] ....

S. Borkar, R. Cohn, et al., "Supporting systolic and memory communication in iWarp," Proc. 17th Int'l Symp. on Computer Architecture, Seattle, WA, 1990, pp. 7081.


A Compiler for Parallel Finite Element Methods with.. - Shewchuk, Ghattas (1993)   (1 citation)  (Correct)

.... matrix, and may be manipulated in the same fashion as K and e KI I (although the inverted matrices that compose M Gamma1 need not be explicitly formed) Below, we give our domain decomposition algorithm and the performance observed solving a heat conduction problem on a 64 processor iWarp [1]. We use the mesh of Figure 2, which has 8837 unknowns and employs quadratic triangular 6 JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS elements. For domain decomposition, an additional overhead of 0:6810 seconds is required to form the Schur complement matrix; this is quickly amortized if multiple ....

S. Borkar, G. Cox, H.T. Kung, M. Levine, W. Moore, J. Susman, J. Urbanski, R. Cohn, T. Gross, B. Moore, C. Peterson, J. Sutton, and J. Webb, Supporting systolic and memory communication in iWarp, Proceedings of the 17th Annual International Symposium on Computer Architecture, IEEE Computer Society and ACM, May 1990.


From equations to hardware. Towards the systematic .. - Charot, Frison.. (1992)   (Correct)

.... This is essential to meet the high speed requirements of applications while keeping the hardware volume reasonably low, ffl it is programmable, in order to cover as large a class of applications as possible, 1 One of the few processors which would meet theses specification is the iWarp[BCC 90], but to our knowledge, it is not available as an independent component. 8 MICS instructions MACS host Figure 5: MicMacs architecture ffl it follows a Simd execution mode, in order to simplify the control of the machine and its programming, ffl its instruction set is dedicated to systolic ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. Technical Report, Carnegie Mellon University, 1990.


Toward The Design Of Large-Scale, Shared-Memory Multiprocessors - Scott (1992)   (3 citations)  (Correct)

.... to be exploited [Seit84] Direct networks are gaining in popularity and have been employed in many recent existing or proposed machines, including the Thinking Machines CM2 [Hill85] Intel iPSC and Paragon, Cosmic Cube [Seit85] MIT Alewife [Agar90] Tera supercomputer [Alve90] CMU Intel iWarp [Bork90], and Stanford DASH multiprocessor [Leno89] The most commonly used direct networks are variants of the k ary n cube. Recall that a k ary n cube consists of N=k n nodes, arranged in n dimensions with k nodes per dimension. Figure 4.1 illustrates several different k ary n cubes. Each node is ....

....flow control, routing, performance and Ch. 4 46 many other issues regarding these networks [Tane88] Pipelined channels are not widely used in multiprocessor interconnects, however. Limited pipelined channels (allowing a small, fixed number of bits on a wire) are used in the CMU Intel iWarp [Bork90], various Cray Research machines[Smit92] and the Thinking Machines CM5 [TMC91] A higher degree of pipelining is achievable using the Caltech Slack chip [Seit92] which is designed to allow tightly coupled Mosaic channels to communicate efficiently over long cables. The IEEE Scalable Coherent ....

Borkar, S., R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. 163 Webb, Supporting Systolic and Memory Communication in iWarp, Proc. 17th Annual International Symposium on Computer Architecture, May 1990, 70-81.


ASOC: A Scalable, Single-Chip Communications Architecture - Liang, Swaminathan, Tessier (2000)   (2 citations)  (Correct)

....communication is drawn from previous work in coarse grained parallel processing. In a number of previ MEM uProc FPGA MUL MUL uProc FPGA MUL MUL Node Control Communication Interface Xbar West North East Core South Figure 1. Adaptive System on a Chip (aSOC) ous projects [7] [18] 21] distributed, pipelined routing has been used to moderate global communication cycles across large parallel systems. The availability of instruction configurable processors and bitstream configurable FPGA resources brings an important aspect of hardware adaptability to SoC ....

....software development. While many applications require at least some dynamic route support, in many cases static routing can be used for most, if not all, communication. Compile time static routing of communication has been used effectively in a number of parallel processing systems. In iWarp [7], inter processor communication patterns were determined during program compilation and implemented with the aid of programmable, inter processor buffers. This concept was extended by the NuMesh project [18] to include collections of heterogeneous processing elements interconnected in a mesh ....

S. Borkar. Supporting Systolic and Memory Communication in iWarp. In Proceedings 17th International Symposium on Computer Architecture, 1990.


ASOC: A Scalable, Single-Chip Communications Architecture - Liang, Swaminathan, Tessier (2000)   (2 citations)  (Correct)

....at compile time and inter tile communication can be performed without the need for significant dynamic arbitration. This compile time approach to scalable and flexible systemwide communication is drawn from previous work in coarse grained parallel processing. In a number of previous projects [9] [19] 22] distributed, pipelined routing has been used to moderate global communication cycles across large parallel systems. The availability of instruction configurable processors and bitstream configurable FPGA resources brings an important aspect of hardware adaptability to SoC environments. ....

....contrast to most multi processor networks that share inter component wires using time varying packet switching. Packet switched interconnect based on both static and dynamic routing has been used effectively for over 25 years for multiprocessor communication. Except for a few special cases [22] [9], a large majority of multiprocessor networks have been dynamic as characterized by run time routing determination. In general, the use of dynamic routing to support scalable single chip interconnect seems questionable, at best. Many algorithms targeted to single chip systems have predictable ....

[Article contains additional citation context not shown here]

S. Borkar. Supporting Systolic and Memory Communication in iWarp. In Proceedings 17th International Symposium on Computer Architecture, 1990.


Network Interface Support for User-Level Buffer Management - Dubnicki, Li, Mesarina (1994)   (4 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [2, 6, 3]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a nonstandard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H.T.Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th ISCA, pages 70--81, May 1990.


Architectures Parallèles Spécialisées Pour.. - Charot (1993)   (Correct)

....unite arithmetique et logique operant sur 8, 16 ou 32 bits. Les operations de division et de racine carree sont egalement supportees. L unite de registres supporte jusqu a neuf ope rations de lecture et six operations d ecriture dans un cycle d horloge de 50 ns. ffl L unit e de communication [Bork90] impl emente les m ecanismes d echange entre processeurs iWARP. Les 8 bus d entr ees sorties (4 en entr ee, 4 en sortie) sur une largeur de 8 bits chacun, supportent une bande passante egale a 320 m ega octets par seconde. Chaque port physique peut etre multiplex e pour offrir jusqu a 20 ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, et J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proc. 17th Annual Symposium on Computer Architecture, pages 70--81, Seattle WA (USA), mai 1990.


Virtual Memory Mapped Network Interface for the.. - Blumrich, Li.. (1994)   (238 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [5, 11, 7]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H.T.Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting systolic and memory communication in iWarp. In Proceedings of 17th ISCA, pages 70--81, May 1990.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....on huge databases of market activity. The advantage of dedicated machines is that the application has complete control over the hardware, with no operating system interference. Thus interprocessor communication can be done with less software layers of protection, leading to higher performance [343, 76]. It is interesting to see how history repeats itself, even after 40 years in a fast changing industry such as computers. The first uniprocessors were also dedicated to a single application, e.g. census tabulation, and many mainframes remain dedicated to transaction processing or payroll ....

S. Borkar et al., "Supporting systolic and memory communication in iWarp". In 17th Ann. Intl. Symp. Computer Architecture Conf. Proc., pp. 70--81, May 1990.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [3, 8, 4]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th Annual Symposium on Computer Architecture, June 1990.


Acknowledgments - Would Like To   (Correct)

....of attention. This is primarily due to the number of parallel computer systems that use multidimensional meshes for interconnection. Some examples of existing or proposed machines that make use of direct networks are: Caltech Cosmic Cube [4] Caltech Mosaic [5] CMU Intel iWarp [6] [7]; Connection Machine [8] HORIZON [9] Intel iPSC and Paragon; MIT Alewife [10] MIT J machine [11] MuNet [12] Stanford DASH Multiprocessor [13] Thinking Machines CM2 [8] and . Cray T3E [14] PE PE SW PE PE SW SW SW PE PE PE PE (a) SW PE SW PE SW PE SW PE (b) Fig. 1.1. ....

....routed from the PE at node (0,0,0) to each of the output links j, where 0j 6. Table 3.1 shows the message traffic. The routing notation is the same as was used in chapter 2. For example, messages routed to output link 5 are all those messages whose coordinates x, y, and z are in the ranges [1,4] [0,7], and [0,7] respectively. This is a total of 4 8 8, or 256, messages. Table 3.1 Message traffic from PE to output links at node (0,0,0) Message Traffic Input Output Number of Messages PE (1. 4,0. 7,0. 7) PE 5 (k 2)k 2 PE (5. 7,0. 7,0. 7) PE 4 (k 2 1)k 2 PE (0,1. 4,0. 7) PE 3 (k 2)k PE ....

[Article contains additional citation context not shown here]

S. Borkar et al., "Supporting systolic and memory communication in iWarp," in Proc. 17th Ann. Int. Symp. Comput. Architecture, May 1990, pp.70-81.


A New Approach for Automatic Parallelization of Blocked Linear .. - Kung, Subhlok (1991)   (6 citations)  (Correct)

.... for three reasons: 1) Efficient systolic algorithms exist for parallelizing block operations such as matrix multiplication [10] 13] 14] 2) Fine grain distributed memory parallel machines capable of efficient execution of systolic algorithms have become available such as iWarp [2] [3]. iWarp is commercially available from Intel. 3) Libraries written using block routines for linear algebra computations have been developed such as LAPACK [6] 8] Thus our approach takes advantage of advances in several areas, including parallel algorithm design, parallel architectures, and ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J.Webb, "Supporting systolic and memory communication in iWarp, in" Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70--81, Seattle, WA, May 1990.


Computer Science Department - Electrical And Computer (1994)   (Correct)

....going to the same acquisition board would be genlocked to one another. This simplified the acquisition problem since only one signal need be watched when acquiring distributing data; all the other cameras will have the same timing. 4. 3 Distribution to other processors iWarp s systolic design [Borkar, Cohn et al. 1990] allows network connections to be mapped to special on chip registers called gates. Data can be read from memory and stored to a gate just as any other register. One read from the ADC returns the four video signals as one 32 bit word; each byte represents the grey level of a single pixel, ....

Borkar, S., R. Cohn, et al. (1990). Supporting Systolic and Memory Communication in iWarp. 17th International Symposium on Computer Architecture, Seattle, WA.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

.... communication, avoiding the overhead of memory copying at both the sender and the receiver, and eliminating the dedicated memory for message arrival, as is found on the J Machine [8] Register mapped network interfaces have been used previously in the Mars Machine [2] J Machine, and iWarp [4], and have been described by T [26] as well as Henry and Joerg [15] However, none of these systems provide protection for user level messages. Systems, like the J Machine, that provide user access to the network interface without atomicity must temporarily disable interrupts to allow the sending ....

Borkar, S., et al. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th International Symposium on Computer Architecture (May 1990), pp. 70--81.


Computing the Pipelined Phase-Rotation FFT - Langhorne Withers Jr (1993)   (1 citation)  (Correct)

....the phase rotation FFT. We present a new set of recipes for generating the twiddles and shuffle indices directly in terms of the parallel pipeline. Finally, we describe mapping strategies for the phase rotation FFT on the iWarp, a parallel computer system developed by Intel and Carnegie Mellon [1, 2]. We describe a fine grained approach for an N point radix 2 phase rotation FFT that balances computation and communication to run at the full 40 Mbytes sec rate of the iWarp physical links, regardless of the size of the input data sets. Section 2 introduces the phase rotation concept. Section 3 ....

....for the radix 2 FFT on the iWarp system. The main result is a scalable implementation of the pipelined phase rotation FFT that runs at the full 40 Mbytes second rate of the iWarp physical links. 5.1. iWarp The iWarp is a private memory multicomputer developed jointly by Intel and Carnegie Mellon [1, 2]. iWarp systems are 2 dimensional tori of iWarp nodes, ranging in size from 4 to 1024 nodes. Each node consists of an iWarp component, up to 16 Mbytes of off chip local memory, and a set of 8 unidirectional communication links that physically connect the node to four neighboring nodes. The iWarp ....

BORKAR, S., COHN, R., COX, G., GROSS, T., KUNG, H. T., LAM, M., LEVINE, M., MOORE, B., MOORE, W., PETERSON, C., SUSMAN, J., SUTTON, J., URBANSKI, J., AND WEBB, J. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th Annual InternationalSymposium on Computer Architecture (Seattle, WA, May 1990), pp. 70--81.


Design, Implementation, and Performance of a Scalable.. - Frankel (1995)   (1 citation)  (Correct)

....to the user. Our system meets all of these goals, and provides considerable flexibility for future expansion. In the current implementation it supports four synchronized cameras sampling 512x480 8 bit grayscale images at 30 Hz. The foundation of this system is an iWarp parallel computer [2][3], which manages the overall data flow. Video input to the iWarp is performed by locally developed hardware. The video data is stored in the iWarp s local memory, and simultaneously sent via a High Performance Parallel Interface (HiPPI) network to a frame buffer, where all four images are displayed ....

Borkar, S., R. Cohn, et al. (1990). Supporting Systolic and Memory Communications in iWarp. 17th International Symposium on Computer Architecture, Seattle, WA, 154-163.


An Active Multibaseline Stereo System With Real-Time.. - Kang, Webb, Zitnick.. (1994)   (3 citations)  (Correct)

....of which have 16 MB DRAMS per cell. The video interface, which is described in detail elsewhere [17] is connected directly to the iWarp cell through the memory interface; the digitized video data is routed and distributed at video rate to the DRAMs by taking advantage of iWarp s systolic design [4]. 3 Camera calibration Before data images can be taken and the scene depth recovered, we must first calibrate the camera configuration. Calibrating the camera configuration refers to the determination of the extrinsic (relative pose) and intrinsic (optic center offset, focal length and aspect ....

Borkar, S., et al. Supporting Systolic and Memory Communication in iWarp. in Proceedings of the 17th International Symposium on Computer Architecture. 1990. Seattle, WA.: p. 70-81.


Message Routing on Irregular 2D-Meshes and Tori - Thomas Stricker January (1991)   (3 citations)  (Correct)

....main processor or can be implemented off chip as a coprocessor. We call this unit communication agent. A typical function performed by the communication agent is forwarding messages without any participation of the main processing unit in that cell. The communication agent of the iWarp component [iWarp90] supports wormhole message routing as well as other schemes. Wormhole routing is one of the fastest forwarding techniques currently known. In wormhole routing usually only one link is used to connect adjacent cells. A physical link can be used by a single channel or be shared among multiple ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iwarp. In Proceedings of 17th Intl' Symposium on Computer Architecture, May 1990.


A Multibaseline Stereo System with Active Illumination and.. - Sing Bing (1995)   (Correct)

....of which have 16 MB DRAMS per cell. The video interface, which is described in detail elsewhere [18] is connected directly to the iWarp cell through the memory interface; the digitized video data is routed and distributed at video rate to the DRAMs by taking advantage of iWarp s systolic design [4]. 3 Camera calibration Before data images can be taken and the scene depth recovered, we must first calibrate the camera configuration. Calibrating the camera configuration refers to the determination of the extrinsic (relative pose) and intrinsic (optic center offset, focal length and aspect ....

Borkar, S., et al. Supporting Systolic and Memory Communication in iWarp. in Proc. of the 17th Int'l Symp. on Computer Architecture. 1990. Seattle, WA.: p. 70-81.


Managing Scheduled Routing With A High-Level Communications.. - Metcalf (1997)   (1 citation)  (Correct)

....in this category include, for example, Linda [14] Split C [16] and CSP [29] 3.2 Scheduled Routing Architectures There are a number of existing scheduled routing architectures discussed in the literature. Two of them are presented here, iWarp and GF11. 3.2. 1 iWarp The iWarp architecture [54, 10, 11] is CMU and Intel s follow on project to the Warp architecture. iWarp integrates the processing and routing units on a single chip, targeted to DSP, scientific, and image processing. The interface between the processor and router is an interface register file used for systolic communication, ....

Shekhar Borkar, Robert Cohn, George Cox, et al. Supporting systolic and memory communication in iWarp. Technical Report 90-197, CMU-CS, September 1990. See also Proceedings 17th SIGARCH.


Early Experience with Message-Passing on the SHRIMP.. - Felten, Alpert.. (1996)   (23 citations)  (Correct)

....build packet headers. They have not tried to implement messagepassing libraries using the underlying communication mechanism. Several projects have tried to lower overhead by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [11, 21, 13]. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. The Connection Machine CM 5 implements user level communication through memory mapped ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proceedings of the 17th Annual Symposium on Computer Architecture, June 1990.


Stream Sockets on SHRIMP - Damianakis, Dubnicki, Felten (1996)   (6 citations)  (Correct)

....build packet headers. They have not tried to implement messagepassing libraries using the underlying communication mechanism. Several projects have tried to lower overhead by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [8, 16, 9]. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. The Connection Machine CM 5 implements user level communication through memory mapped ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proceedings of 17th International Symposium on Computer Architecture, June 1990.


Special Purpose Parallel Computing - McColl (1993)   (9 citations)  (Correct)

....magnetic resonance imagery, radar and sonar simulation, graph algorithms. The Warp project demonstrated the feasibility of programmable, high performance systolic array computers. In a successor project, Intel and Carnegie Mellon University have been developing the iWarp parallel architecture [27, 49, 50, 208]. Although based on the original Warp design, iWarp has added much more flexibility in the types of communication that can be efficiently supported. In particular, non neighbour communication can be performed without involving programs at intervening cells. Also, iWarp architectures can handle ....

.... of specification and programming languages for systolic computation [69, 77, 221, 346, 391] ffl verification of systolic designs [71, 72, 77, 164, 346] ffl investigation of asynchronous designs and wavefront arrays [213, 214, 215, 277] ffl development of programmable systolic architectures [11, 27, 49, 50, 208] A major emphasis has been placed on the formal synthesis of systolic designs. In this paper we have described most of the systolic designs informally, using pictures. In order to be able McCOLL : SPECIAL PURPOSE PARALLEL COMPUTING to specify and derive designs in a way that allows one to reason ....

S Borkar et al. Supporting systolic and memory communication in iWarp. In Proc. 17th Annual International Symposium on Computer Architecture, pages 70--81. IEEE Press, 1990.


Message Passing Support for Multi-grained.. - Ang, Chiou, Rudolph.. (1996)   (Correct)

....sharing of message queue is not addressed except among the Dataflow machines; these machines support only small messages sent and (implicitly) received with individual instructions. Examples of processors with integrated network interfaces include transputers[45] the iWarp systolic processor[6, 7], dataflow processors like Monsoon[15] and the EMC R[38] and hybrid processors such as the MDP[16] M machine[17] and the MIT Motorola 88110MP[36] The Alewife[1] is an example of a network interface on the L1 cache interface. Alewife supports multi part message specification and its Sparcle ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proceedings of the 17th International Symposium on Computer Architecture, pages 70 -- 81, May 1990.


Briques De Base Pour La Realisation D'architectures Paralleles.. - Charot (1993)   (Correct)

....unite arithmetique et logique operant sur 8, 16 ou 32 bits. Les operations de division et de racine carree sont egalement supportees. L unite de registres supporte jusqu a neuf ope rations de lecture et six operations d ecriture dans un cycle d horloge de 50 ns. ffl L unit e de communication [Bork90] impl emente les m ecanismes d echange entre processeurs iWARP. Les 8 bus d entr ees sorties (4 en entr ee, 4 en sortie) sur une largeur de 8 bits chacun, supportent une bande passante egale a 320 m ega octets par seconde. Chaque port physique peut etre multiplex e pour offrir jusqu a 20 ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, et J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proc. 17th Annual Symposium on Computer Architecture, pages 70-- 81, Seattle WA (USA), mai 1990.


Exploiting Two-Case Delivery for Fast Protected Messaging - Mackenzie, Kubiatowicz, .. (1998)   (14 citations)  (Correct)

....it only to let the operating system clear the network. A polling watchdog mode could be implemented in the FUGU system. Direct Network Interfaces. Several machines have provided direct network interfaces. These include the CM 5, the J machine, iWarp, the T interface, Alewife, and Wisconsin s CNI [20, 8, 5, 26, 1, 24]. These interfaces feature low latency by allowing the processor direct access to the network queue. Direct NIs can be inefficient unless placed close to the processor. Anticipating continued system integration, we place our NI on the processor cache bus. The CNI work shows how to partly ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting Systolic and Memory Communication in iWarp. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 70--81, June 1990.


Virtual Memory Mapped Network Interface for the.. - Blumrich, Alpert.. (1993)   (238 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [5, 10, 6]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, lowlatency communication, it requires the use of a nonstandard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H.T.Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting systolic and memory communication in iwarp. In Proceedings of 17th International Symposium on Computer Architecture, pages 70--81, May 1990.


Baring it all to Software: The Raw Machine - Waingold, Taylor, Sarkar, Lee.. (1997)   (111 citations)  (Correct)

....multigranularity. One way to view configurable logic is as a mechanism to provide the compiler a means to create customized single or multicycle instructions without resorting to longer software sequences. 1.1 Comparison with other Architectures Raw builds on several previous architectures. IWarp [3] and NuMesh [4] both share with Raw the philosophy of building point to point networks that support static scheduling. However, the cost of initiating a message in both systems was so high that compilers could exploit only coarse grain parallelism, or had to focus on very regular signal processing ....

S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iWarp. In Proc. 17th Annual Symposium on Computer Architecture, Seattle, May 1990.


Communication and Memory Requirements as the Basis .. - Subhlok.. (1994)   (21 citations)  Self-citation (Gross Webb)   (Correct)

....and multibaseline stereo imaging [17] We briefly describe each application, show how it is implemented as an Fx program, and show how to use the framework to reason about mapping task parallel programs, and to obtain a good mapping. We present performance results on a 64 processor iWarp array [3] to validate our approach. Each application has unique properties that exercise different parts of the mapping framework. We have found these applications to be invaluable; interested readers can obtain complete and self contained Fortran 77 and Fx (similar to HPF) implementations from the ....

BORKAR, S., COHN, R., COX, G., GROSS, T., KUNG, H. T., LAM, M., MOORE, M. L. B., MOORE, W., PETERSON, C., SUSMAN, J., SUTTON, J., URBANSKI, J., AND WEBB, J. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th Annual International Symposium on Computer Architecture (Seattle, WA, May 1990), pp. 70--81.


Toward Automatic Robot Instruction from Perception - Mapping.. - Kang, Ikeuchi (1997)   (32 citations)  (Correct)

No context found.

S. Borkar et al., "Supporting systolic and memory communication in iWarp," in Proc. 17th Int. Symp. Computer Architecture, Seattle, WA, 1990, pp. 70--81.


Network-Based Multicomputers: A Practical Supercomputer.. - Steenkiste (1996)   (4 citations)  (Correct)

No context found.

Shekhar Borkar, Robert Cohn, George Cox, Thomas Gross, H.T. Kung, Monica Lam, Margie Levine, Brian Moore, Wire Moore, Craig Peterson, Jim Susman, Jim Sutton, John Urbanski, and Jon Webb. Supporting Systolic and Memory Communication in iWarp. Proceedings of the 17th Annual International Symposium on Computer Architecture, ACM/IEEE, Seattle, May, 1990, pp. 70-81.


An Efficient and Low-Cost I/O Subsystem for Network.. - Pnevmatikatos, Sourdis.. (2003)   (Correct)

No context found.

S. Borkar et al., "Supporting Systolic and Memory Communication in iWarp," Proc. 17th Ann. Int'l Symp. Computer Architecture (ISCA 90), IEEE CS Press, 1990, pp. 70-81.


An Efficient and Low-Cost Input/Output Subsystem for.. - Sourdis.. (2002)   (Correct)

No context found.

Borkar, S., Cohn, R., Cox, G., Gross, T., Kung, H.T., Lam, M.. Levine, M., Moore, B., Moore, W., Peterson, C., Susman, J., Sutton, J., Urbanski, J., Webb, J. Supporting systolic and memory communication in iWarp, In Proceedings, 17th Annual International Symposium on Computer Architecture, 1990.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

No context found.

S. Borkar, R. Cohn, G. Cox, T. Gross, H. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb, "Supporting systolic and memory communication in iWarp," in Proceedings of the 17th Annual Symposium on Computer Architecture, June 1990.


Supporting the hypercube programming model on mesh architectures .. - Stricker (1992)   (13 citations)  (Correct)

No context found.

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iWarp. In Proceedings of 17th Intl' Symposium on Computer Architecture, May 1990.


Hardware Mechanisms for Efficient Interprocessor Communication - Henry (1996)   (Correct)

No context found.

S. Borkar, R. Cohn, G. Cox, T. Gross, H.T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, and J. Webb. Supporting systolic and memory communication in iWarp. In Proc. of the 17th ISCA, June 1990.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC