21 citations found. Retrieving documents...
B. Falsafi and D.A. Wood, "Scheduling Communication on an SMP Node Parallel Machine," Proc. 3rd Int'l Symp. High-Performance Computer Architecture (HPCA-3), IEEE CS Press, Los Alamitos, Calif. 1997, pp. 288-297.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
A Reconfigurable Extension to the Network Interface of.. - Underwood, Sass.. (2001)   (Correct)

....a processor dedicated to communications processing. Since then, others have considered using all processors for both communication and computation. In the cluster context, this translates into multiprocessor nodes (SMP workstations) However, results from researchers at the University of Wisconsin [10] suggest that fixing one processor for communication processing benefits light weight protocols and improves performance when communication is a bottleneck. These results when combined with the arguments made in [11] which suggest that I O system busses will continue to inhibit gigabit ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of Third International Symposium on High Performance Computer Architecture, Feb. 1997.


Acceleration of a 2D-FFT on an Adaptable Computing Cluster - Keith Underwood Ron (2001)   (Correct)

....dedicated to communications processing. Since then, others have considered using multiple processors for both communication and computation. In the cluster context, this translates into multiprocessor nodes (SMP workstations) However, results from researchers at the University of Wisconsin [7] suggest that fixing one processor for communication processing benefits light weight protocols and improves performance when communication is a bottleneck. These results when combined with the arguments made in [8] which suggest that I O system busses will continue to inhibit gigabit speed ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of Third International Symposium on High Performance Computer Architecture, San Antonio, Texas, USA, Feb. 1997.


Cost Effectiveness of an Adaptable Computing Cluster - Underwood, Sass, Ligon, III (2001)   (Correct)

....networks in their commodity systems. It is reasonable to expect that a volume produced INIC would be of similar cost to Myrinet or Servernet II interfaces. A number of efforts have researched using dedicated computational resources on network interfaces. Research at the University of Wisconsin [8] suggests that fixing one processor of an SMP for communication processing benefits light weight protocols and improves performance when communication is a bottleneck. Indeed, many gigabit networks now include embedded processors on the NIC for various network processing tasks. Research efforts ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of Third International Symposium on High Performance Computer Architecture, San Antonio, Texas, USA, Feb. 1997.


Fine-Grain Distributed Shared Memory on Clusters of Workstations - Schoinas (1997)   (3 citations)  (Correct)

....design effort increases as the processor is more extensively customized. Blizzard can also dedicate a processor for execution of protocol actions. However, studies have shown that for networks of workstations this policy generally does not result in the best utilization of the pro cessor [FW96,FW97,Fa197] 2.2 Fine Grain Access Control Abstraction This section examines the integration of the fine grain access control as a memory attribute within other memory related abstractions. When the operating system virtualizes a hardware resource, it exposes an abstraction of the resource that the ....

....of parallel applications. In the results presented in this section, message notification is done with the polling code inserted directly into the executable as discussed in Chapter 3. A dedicated network processor is not used because it does result in the best utilization for that processor [FW97] Chapter 5 discusses alternative ways to employ more processors per node in FGDSM systems. The inserted code polls on a cacheable memory location. On message arrivals, this memory location is updated by the network interface using DMA. This is the preferred technique when the polling code is ....

[Article contains additional citation context not shown here]

Babak Falsafi and David A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 128-138, February 1997.


Accelerating Shared Virtual Memory via General-purpose.. - Angelos Bila Bilaw (2001)   (1 citation)  (Correct)

.... at the other endof the spectrum.The network interf ace can be used not only to avoid interrupting the compute processor but also to perf ormf ull blown protocol processing, including di# creation and application and the management of timestamps and write notices.This approach was taken in [53] [18] reserves a compute processor in an SMP node f r protocol processing.The amount of proto Improving the Performance of Shared Virtual Memory on System Area Networks 27 col processing involved in SVM systems with SMP nodes was examined also in an earlier simulation study [30] and other ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In The 3nd IEEE Symposium on High-Performance Computer Architecture, pages 128--138, 1997.


Responsiveness without Interrupts - Perkovic, Keleher (1999)   (2 citations)  (Correct)

....this approach is that it could only be widely implemented if memory bus interfaces are standardized. Nonetheless, this approach should perform better then the fast interrupts that we simulate, and far better then basic polling mechanism used today, i.e. polling after message sends. Falsafi et al. [10] studied the scheduling of protocol operations on SMP nodes. They show that using a dedicated processor within an SMP is cost effective when the protocol is light weight, there are more then two processors per SMP node, or applications are communication intensive. Dedicated processors poll ....

B. Falsafi and D. A. Wood, "Scheduling Communication on an SMP Node Parallel Machine," in IEEE International Symposium on High Performance Computer Architecture (HPCA), February 1997.


Multigrain Shared Memory - Yeung, Kubiatowicz, Agarwal (2000)   (Correct)

....on a commercial operating system would achieve noticeably lower performance due to the higher cost of interrupts. We note, however, that the impact of costly interrupts on a commercial operating system can be minimized by modifying our MGS design to reduce the frequency of interrupts. Prior work [20, 8] has investigated polling techniques to eliminate interrupts for message invocation and TLB invalidation events. We believe that results similar to those reported in Section 5 can be achieved on a commercial operating system if existing techniques for reducing the frequency of interrupts are ....

Babak Falsafi and David A. Wood. Scheduling Communication on an SMP Node Parallel Machine. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE, February 1997.


Accelerating Shared Virtual Memory via General-purpose.. - Bilas, Jiang, al. (2001)   (1 citation)  (Correct)

....at the other end of the spectrum. The network interface can be used not only to avoid interrupting the compute processor but also to perform full blown protocol processing, including di# creation and application and the management of timestamps and write notices. This approach was taken in [53] [18] reserves a compute processor in an SMP node for protocol processing. The amount of proto Improving the Performance of Shared Virtual Memory on System Area Networks 27 col processing involved in SVM systems with SMP nodes was examined also in an earlier simulation study [30] and other research, ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In The 3nd IEEE Symposium on High-Performance Computer Architecture, pages 128--138, 1997.


NODE-LEVEL REPLICATED OBJECTS in SMP CLUSTERS - Ilker Cengiz Attila   (Correct)

....performance of our design and implementation of SMP support and NLBOC pattern. 4.1 Ring of Nodes Scheduling communication on an SMP node is a point worth to take into account when designing applications for SMP Clusters. Several research activities are going on in this field. Work proposed in [10] addresses two policies for scheduling communication in an SMP node: fixed, where one processor is dedicated for communication in each node, and floating, where all processors alternately act as communication processor. The decision for choosing a policy is closely related with the application to ....

Babak Falsafi and David A. Wood, "Scheduling Communication on an SMP Node Parallel Machine", Proceedings of the Third International Symposium on High Performance Computer Achitecture", Feb 1-5 1997, San Antonio, Texas, USA.


Improving the Performance of Shared Virtual Memory on System Area.. - Bilas (1998)   (5 citations)  (Correct)

....interface; we use a programmable network interface only to prototype and evaluate the mechanisms on a real platform. Having shown in Chapter 4 [12] that interrupts and scheduling can be a major problem for SVM, and many e#orts have been made to deal with protocol handler scheduling on SMP nodes [31], in this chapter we propose a protocol that eliminates the need for asynchronous protocol processing. Each of our mechanisms reduces the need for interrupts and asynchronous protocol handling; the final protocol (SVM NI) does not use interrupts for protocol processing at all, and in fact ....

....(by polling) or to handle the requests by interrupting one of the computation processors. A dedicated processor implementation helps to avoid interrupts. However, this choice wastes a valuable resource, by reducing the number of compute processors. Our experiments as well as other work [49, 31] show that this dedicated processor is mostly idle, since actual protocol processing overhead is not very high. If we do not devote a processor to protocol handling and we use the compute processors to handle requests, then we could either statically assign one compute processor for this purpose ....

[Article contains additional citation context not shown here]

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In The 3nd IEEE Symposium on High-Performance Computer Architecture, pages 128--138, 1997.


Using Network Interface Support to Avoid Asynchronous.. - Bilas, Liao, al. (1999)   (11 citations)  (Correct)

....much slower compared to the compute processor than in the Paragon, and it does not have good enough access to main memory to perform protocol processing e#ciently. In these cases, a compute processor in an SMP node can be reserved for protocol processing alone. This approach was examined in [19], where it was found that the benefit of this approach is small due to poor utilization of that processor, especially compared to the a system that uses all processors both for computing and protocol processing. The amount of protocol processing involved in SVM systems with SMP nodes was examined ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In The 3nd IEEE Symposium on High-Performance Computer Architecture, pages 128-- 138, 1997.


Toward A Cost-Effective DSM Organization That Exploits.. - Torrellas, Yang, Nguyen (2000)   (6 citations)  (Correct)

....P and D nodes before the application starts executing. In a dynamic approach, we let the total number and type of nodes change as the application executes. A third approach would be to dynamically multiplex a P and a D thread on each node. Such an environment has been studied by Falsa and Wood [3]. As it is, our machine does not support multiplexing because a P and a D thread use the local memory and memory controller di erently. Indeed, P threads use the local memory as a fast hardware managed cache to extract high performance and to exploit physical locality (Section 2.1.1) D threads, ....

B. Falsa and D. Wood. Scheduling Communication on an SMP Node Parallel Machine. In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture, pages 128-138, February 1997.


Network Interface Support for Shared Virtual Memory on Clusters - Bilas, Liao, Singh (1998)   (1 citation)  (Correct)

....we use a programmable network interface to only prototype and evaluate the mechanisms on a real platform. Since previous work [6] has shown that interrupts and scheduling can be a major problem for SVM, and many efforts have been made to deal with protocol handler scheduling on SMP nodes [13], we propose a protocol that eliminates the need for asynchronous protocol processing. Each of our mechanisms reduces the need for interrupts and asynchronous protocol handling; the final protocol (SVMNI) does not use interrupts for protocol processing at all, and in fact eliminates the need for ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In The 3nd IEEE Symposium on High-Performance Computer Architecture, pages 128--138, 1997.


Push-Pull Messaging: A High-Performance Communication Mechanism .. - Wong, Wang (1999)   (Correct)

....polling is a light weight approach to handle incoming packets. Polling routine watches the change of state variables and starts the handling routine if necessary. The frequency of polling determines the reliability of the channel. In COMP nodes, efficient polling mechanisms have been discussed [10][13] Stage 4: Reception Processing. After invoking the reception handler, the handler processes packets immediately. Reception processing involves re assembly of packets, copying between buffers, de queuing buffers and pending requests, and synchronization between user and kernel threads. In a ....

B. Falsafi and D. A. Wood. "Scheduling Communication on an SMP Node Parallel Machine", Proc. of the 3rd International Symposium on High-Performance Computer Architecture (HPCA-3), 1997.


A Programming Model for Block-Structured Scientific Calculations.. - Fink (1998)   (4 citations)  (Correct)

....protocol processor, compared to an implementation that scavenged cycles from compute processors to handle coherence protocol processing. Their work assumes that compute processors suffer much idle time due to stalls for remote read requests over the high latency ATM switch. Falsafi and Wood [61] also compared implementation tradeoffs when using a communication coprocessor on a SMP cluster with distributed virtual shared memory. They report that dedicating a single SMP processor to handle protocol processing improves performance for high overhead protocols and communicationintensive ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of the Third International Symposium on High-Performance Computer Architecture, pages 128--38, San Antonio, TX, February 1997. 164


Sirocco: Cost-Effective Fine-Grain Distributed . . . - Schoinas, al. (1998)   (6 citations)  (Correct)

....lower performance. Commodity network interface cards are typically placed far from processors on a slow peripheral bus and do not provide support for multiple message queues [6,8] As such, frequent network communication using a single pair of message queues on an SMP may result in a bottleneck [12]. Multiplexing computation and protocol execution on processors may also lead to cache interference, lower cache performance, and result in higher memory bus contention. Besides performance, clustering processors into SMP nodes also impacts the cost trade off. SMPs typically charge higher price ....

....request messages from multiple processors for a single memory block and allows a processor to use memory blocks fetched by others. Sharing protocol resources (e.g. the directory, message queues) allows idle processors to execute protocol handlers while other processors are busy computing [12,7]. Sharing resources, however, may violate the sharedmemory access semantics. Shared memory dictates that coherence operations on data in the remote cache and home pages must appear to execute atomically [22,25] Figure 2 illustrates examples of atomic sequences required in FGDSM coherence ....

[Article contains additional citation context not shown here]

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 128--138, Feb.1997.


Fine-Grain Protocol Execution Mechanisms & Scheduling Policies on .. - Falsafi (1998)   Self-citation (Falsafi)   (Correct)

....processors being idle (e.g. waiting for synchronization) enabling them to contribute to software protocol execution. Such a design both improves performance by increasing the parallelism in protocol execution and reduces cost by obviating the need for extra dedicated protocol processors [FW97c,FW96] In this chapter, I present a taxonomy of software protocol execution semantics: Single threaded protocol execution allows for only a single protocol thread to run on an SMP node at any given time. Multi threaded protocol execution allows multiple protocol threads to simultaneously ....

....implementations of user level interrupts and instrumented polling may incur high overheads (see Section 3.2) and result in poor performance under a multiplexed policy. Large scale SMPs increase the likelihood of one or more processors being idle, significantly reducing the frequency of interrupts [FW97c,KS96] Instrumented polling, however, always incurs a minimum overhead of checking for protocol events. Executing protocol threads on a compute processor may pollute the processor s instruction [MPO95] and data [PC94] cache hierarchy. A dedicated policy has the advantage of providing a separate ....

[Article contains additional citation context not shown here]

Babak Falsafi and David A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 128--138, February 1997.


Parallel Dispatch Queue: A Queue-Based Programming Abstraction .. - Falsafi, Wood (1999)   (2 citations)  Self-citation (Falsafi Wood)   (Correct)

....uniprocessor node parallel computers. As such, the protocol handlers either executed on the node s commodity processor along with computation or an embedded processor on the network interface card. Multiple SMP processors, however, increase the demand on fine grain protocol execution on a node [23,16,14,6]. To maintain the balance between computation and communication, protocol execution performance must increase commensurate to the number of SMP processors. One approach to increase software protocol performance is to execute protocol handlers in parallel. Legacy stack protocols (e.g. TCP IP) have ....

....invalidates the cached copy of the PDR forcing a protocol processor to read the new PDR contents. A protocol processor indicates completion of a handler by performing an uncached write into its PDR. A Hurricane 1 device allows both dedicated and multiplexed protocol scheduling on SMP processors [6]. Multiple SMP processors can be dedicated to only execute protocol handlers for the duration of an application s execution. Dedicated protocol processors save overhead by not interfering with the computation, result in a lower protocol occupancy [9] i.e. the time to execute a protocol handler, ....

B. Falsafi and D. A. Wood. Scheduling communication on an SMP node parallel machine. In Proceedings of the Third IEEE Symposium on High-Performance Computer Architecture, pages 128--138, Feb. 1997.


Hardware Support for Flexible Distributed Shared Memory - Reinhardt, al. (1998)   (1 citation)  Self-citation (Wood)   (Correct)

....of a slower integrated processor. Second, a general purpose protocol processor may be used for the application s primary computation when there are no protocol events to handle. Similarly, any processor on a symmetric multiprocessor node may serve as the protocol processor. Falsafi and Wood [17] show that dynamically scheduling protocol processing tasks across all available processors is often more efficient than dedicating a protocol processor. Both Typhoon 1 and Typhoon 0 are capable of supporting this dynamic model. However, we assume a dedicated protocol processor in this study to ....

B. Falsafi and D. A. Wood. "Scheduling communication on an SMP node parallel machine." In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture (HPCA), pages 128--138, Feb. 1997.


Architectural Support For User-Level Input/Output - Schaelicke (2001)   (Correct)

No context found.

B. Falsafi and D.A. Wood, "Scheduling Communication on an SMP Node Parallel Machine," Proc. 3rd Int'l Symp. High-Performance Computer Architecture (HPCA-3), IEEE CS Press, Los Alamitos, Calif. 1997, pp. 288-297.


Design and Performance of the Software-controlled COMA - Moga (1998)   (Correct)

No context found.

B. Falsafi and D. A. Wood. Scheduling Communication on an SMP Node Parallel Machine. In Proc. of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA-3), February 1997.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC