29 citations found. Retrieving documents...
M. Bjorkman and P. Gunningberg. Locking effects in multiprocessor implementations of protocols. In Proceedings of the ACM SIGCOMM '93 Conference, pages 74--83. ACM Press, 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Exploiting Task-level Concurrency in a Programmable Network.. - Kim, Pai, Rixner (2003)   (Correct)

....in this paper is independent of the particular user level protocols because the basic role of network interfaces is still sending and receiving packets. Others have studied increasing networking performance by parallelizing network protocols on general purpose multiprocessor operating systems [2, 6, 10]. Such parallelization schemes exploit concurrency at various levels, such as across packets, protocol layers, and connections. Parallel network protocol processing deals with the layers above the network interface, and can thus improve performance in conjunction with the network interface layer ....

M. Bjorkman and P. Gunningberg. Locking effects in multiprocessor implementations of protocols. In Proceedings of the ACM SIGCOMM '93 Conference, pages 74--83. ACM Press, 1993.


High Performance Implementation of Communication Subsystems - Dabbous   (Correct)

....diversity. Furthermore, software protocol implementations on modern workstations have shown increased performance, thereby reducing the interest of a hardware solution. ffl Parallel implementations of transport protocols were proposed to increase the protocol processing speed [18] 19] 20] [21]. ffl Integrated layer processing (ILP) 22] was proposed by Clark and Tennenhouse in order to enhance the performance of protocol implementation on RISC workstations. The main concept behind ILP is to minimize the memory reads loads by combining byte manipulation oriented operations High ....

M. Bjorkman and P. Gunningberg. Locking Effects in Multiprocessor Implementation of Protocols. Submitted to ACM SIGCOMM'93, March 1993.


High Performance Presentation and Transport Mechanisms for.. - Dabbous (1995)   (2 citations)  (Correct)

.... the protocol implementation performance in a given software or a hardware environment: outboard protocol processors (e.g. Kan88] Coo90] hardware protocol implementations (early work on XTP Protocol Engine [Che89] or parallel implementations of transport protocols [Bra92] Rut92] Lap92] [Bjo93]. A detailed survey of protocol implementation optimization techniques can be found in [Dab91] and [Fel93b] The performance of workstations has increased with the advent of modern RISC architectures but not at the same pace as the network bandwidth during past years. Furthermore, access to ....

M. Bjorkman and P. Gunningberg. "Locking Effects in Multiprocessor Implementation of Protocols", In Proceedings ACM SIGCOMM'93.


Flexible Control of Parallelism in a Multiprocessor PC Router - Chen, Morris (2001)   (8 citations)  (Correct)

....that software in a way that takes advantage of the multiple CPUs. However, current generation network processors have a limited program memory in their processing elements, which limit their use to small pieces of tight code [25] Previous work in the area of parallelizing host network protocols [20, 19, 3, 24] has compared layer, packet, and connection parallelism. One of their conclusions is that performance is best if the packets of each connection are processed on only one CPU, to avoid contention over per connection data. To a first approximation this is a claim that a host s protocol processing ....

M. Bjorkman and P. Gunningberg. Locking effects in multiprocessor implementation of protocols. In Proc. ACM SIGCOMM '93 Conference, pages 74--83, October 1993.


END: A Network Adapter Design Tool - Indiresan, Mehra, Shin (1997)   (Correct)

....events at this low granularity in simulation models would require highly accurate 24 resource models and render the simulation extremely slow. Network adapter as source sink of data: Network adapters have been modeled as simple data sources and sinks for parallel protocol implementations [24]. Interestingly, these models are executed on a separate processor within the host. Besides modeling the source sink behavior of the network, our approach captures significantly more details of adapter design and interaction with the protocol stack. I O device modeling: Other researchers have ....

M. Bjorkman and P. Gunningberg, "Locking effects in multiprocessor implementations of protocols," in Proc. of ACM SIGCOMM, pp. 74--83, September 1993. 26


Structuring Communication Software for Quality-of-Service.. - Mehra, Indiresan, Shin (1996)   (29 citations)  (Correct)

.... possible to extend the message service time computation to accurately account for the potential perturbation caused by the DMA transfers via careful analysis [27] We note that at least two other efforts have employed such artificial sources and sinks of data, namely, the virtual network device in [28] that resides on a separate processor, and the in memory network device used in [29] 5 Experimental Evaluation We evaluate the efficacy of the proposed architecture in isolating real time channels from each other and from best effort traffic. The evaluation is conducted for a subset of the ....

M. Bjorkman and P. Gunningberg, "Locking effects in multiprocessor implementations of protocols," in Proc. of ACM SIGCOMM, pp. 74--83, September 1993.


Connection-Level Parallelism For Network Protocols On.. - Yates (1997)   (1 citation)  (Correct)

....FDDI Functional Parallelism 1 1 2 2 Layer Parallelism TCP PacketLevel Parallelism ConnectionLevel Parallelism Figure 1. 3 Approaches to Parallelism Many approaches to parallelizing network protocols have been proposed and are briefly described here; additional surveys can be found in [5, 24, 67]. In general, we attempt to classify approaches by the unit of concurrency, or what it is that processing elements do in parallel. Here a processing element is a locus of execution for protocol processing, and can be a dedicated processor, a heavyweight process, 5 or lightweight thread. Figure ....

....regardless of their connection or where they are in the protocol stack, achieving speedup both with multiple connections and within a single connection. The disadvantage is that it requires locking shared state, most notably the protocol state at each layer. Systems using this approach include [5, 24, 29, 31, 50]. A set of connections forms the unit of concurrency in connection level parallelism [21, 60, 65, 67] Speedup is achieved using multiple connections, which can potentially be processed in parallel. The advantage of this approach is that it exploits the natural concurrency between connections. ....

[Article contains additional citation context not shown here]

Bjorkman, M. and Gunningberg, P. Locking effects in multiprocessor implementations of protocols. In SIGCOMM Symposium on Communications Architectures and Protocols, pages 74--83, San Francisco, CA, Sept. 1993. ACM.


Performance Issues in Parallelized Network Protocols - Nahum, Yates, Kurose, Towsley (1994)   (22 citations)  (Correct)

....and Peterson in the x kernel [14] this approach distributes packets across processors, achieving speedup both with multiple connections and within a single connection. Packets can be processed on any processor, maximizing flexibility and utilization. Other systems using this approach include [5, 11]. Several other approaches to parallelism have also been proposed and are briefly described here; more detailed surveys can be found in [5, 11] In layered parallelism, protocols are assigned to specific processors, and messages passed between layers through interprocess communication. ....

....within a single connection. Packets can be processed on any processor, maximizing flexibility and utilization. Other systems using this approach include [5, 11] Several other approaches to parallelism have also been proposed and are briefly described here; more detailed surveys can be found in [5, 11]. In layered parallelism, protocols are assigned to specific processors, and messages passed between layers through interprocess communication. Parallelism gains can be achieved mainly through pipelining effects. An example is found in [10] Connection level parallelism associates connections with ....

[Article contains additional citation context not shown here]

Mats Bjorkman and Per Gunningberg. Locking effects in multiprocessor implementations of protocols. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, pages 74--83, San Francisco, CA, September 1993.


PATROCLOS: A Flexible and High-Performance Transport Subsystem - Braun (1994)   (2 citations)  (Correct)

....by concurrent processing of the different FSMs. Parallel implementations of standard protocols like OSI TP4 CLNP or TCP IP suffers from their inherent sequential structure. Therefore, parallel implementations of standard protocols use pipelining [5] or parallelism on a per packet basis [6]. The transport subsystem PATROCLOS realizes a parallel protocol architecture with fine granularity adequate for parallel implementations and overcomes performance limitations by the integration of protocol and implementation issues. In contrast to other approaches of parallel protocols [7, 8, 9] ....

Bjrkman, M.; Gunningberg, P.: Locking Effects in Multiprocessor Implementations of Protocols, ACM Sigcomm'93 Conference Proceedings, San Francisco, CA, September13 -17, 1993, pp. 74-83


QoS-Sensitive Protocol Processing in Shared-Memory - Multiprocessor Multimedia..   (Correct)

....of processing resources, and complicate global coordination for network access. Our proposal for static partitioning of processing resources is similar to those for multiprocessor front ends [12] except that a set of processors within the host are dedicated for protocol processing, as in [13]. Vertical process architectures employing process per connection and process per message models have been proposed to exploit parallelism in protocol implementations [3] A process per message model seems unsuitable for QoS sensitive protocol processing. Assuming that each message s shepherd ....

M. Bjorkman and P. Gunningberg, "Locking effects in multiprocessor implementations of protocols," in Proc. of ACM SIGCOMM, pp. 74--83, September 1993.


Structuring Host Communication Software For Quality Of Service.. - Mehra (1997)   (Correct)

....Multiprocessor front ends may be designed using specialpurpose or general purpose hardware. Special purpose designs facilitate efficient interaction with the host and the network interface unit [82, 83] while general purpose designs must explicitly coordinate accesses to these interfaces [14, 138, 182]. More processing power in the front end can improve the quality of service provided to the applications, by reducing queuing delays within the communication subsystem. Network interfaces for multicomputers: Network interface design for multicomputer environments presents unique opportunities and ....

....be classified into horizontal or vertical process architectures [152] In horizontal architectures [182] each process implements a specific layer of a protocol graph; at most two processes can be assigned to each layer, one each for transmission and reception. In vertical process architectures [14, 76, 83], on the other hand, processes are assigned to active entities such as connections or messages and each process implements one path through the protocol graph. This approach significantly reduces context switches and message buffering that are unavoidable in horizontal process architectures. ....

[Article contains additional citation context not shown here]

M. Bjorkman and P. Gunningberg, "Locking effects in multiprocessor implementations of protocols," in Proc. of ACM SIGCOMM, pp. 74--83, September 1993.


Scheduling for Cache Affinity in Parallelized.. - Salehi, Kurose, Towsley (1994)   (2 citations)  (Correct)

....paradigms in which each message, during the course of its processing, visits a single processor and executes within the context of a single thread 2 . This captures the parallelization found in several multiprocessor protocol implementations, including parallelizations of the x kernel [2, 13], the STREAMS implementation in Plan 9 [17] and the ASX framework [19] A related form of parallelism is found in the STREAMS implementations in many commercial operating systems [18, 4, 6] We present two sets of results. First, we show that affinity scheduling can significantly reduce message ....

....techniques. Third, the model does not reflect the impact of contention for software locks, which would inflate the protocol 12 execution time (e.g. 18] However, UDP IP has been shown to scale up nearly linearly (to about 20 processors) in a multiprocessor parallelization of the x kernel [2]. Moreover, this behavior was observed within a single UDP IP stream, whereas inter stream scalability would be higher. Similar results are reported in [13] These facts support our decision to neglect lock contention in the range of processor parallelization that we consider, which generally does ....

Mats Bjorkman and Per Gunningberg. Locking effects in multiprocessor implementations of protocols. In Proceedings of the ACM SIGCOMM Conference on Communications, Architectures, Protocols and Applications, pages 74--83, San Francisco, CA, September 1993.


A Scheduling Scheme for Network Saturated NT Multiprocessors - Hansen, Jul (1997)   (1 citation)  (Correct)

....limits on how fast a processor that it is possible to use. As an alternative we propose using multiprocessors to provide sufficient processing power. Previous work in the area of multiprocessor network performance has concentrated mainly on improving the performance of higher level protocols ([1], 4] and [5] and furthermore, these approaches use a single network interface. We consider the scheduling issues related to handling one or multiple network interfaces on multiprocessors. In the following, we first describe the problems of thread starvation, then we present a two level network ....

Mats Bjorkman and Per Gunningberg. Locking Effects in Multiprocessor Implementations of Protocols. In Proceedings of SIGCOMM '93, pages 74--83, September 1993.


The END: Exploring QoS Issues in Adapter Design via an.. - Indiresan, Mehra, Shin (1996)   (Correct)

....capturing events at this low granularity in simulation models would require highly accurate resource models and render the simulation extremely slow. Network adapter as source sink of data: Network adapters have been modeled as simple data sources and sinks for parallel protocol implementations [15]. Interestingly, the adapter models are executed on a separate processor within the host. In addition to modeling the source sink behavior of the network, our approach captures significantly more details of adapter design and interaction with the protocol stack. Communication subsystem design and ....

M. Bjorkman and P. Gunningberg, "Locking effects in multiprocessor implementations of protocols," in Proc. of ACM SIGCOMM, pp. 74--83, September 1993.


Architectural Concepts in Implementation of End-system.. - Ravindran, Singh   (Correct)

....functions during execution. The relationship among functions may determine the extent of parallelism by forcing their processors to synchronize their access to the state. Typically, spin locks provide an efficient way of synchronizing the executions on various processors through shared memory [13], whereby a processor waits by spinning on a lock, i.e. by busy looping until the lock variable in shared memory is set by another processor. And the lock variable is reset once the spinning processor unblocks from the wait. Typical overhead incurred for spin locks (excluding the spinning ....

M. Bjorkman and P. Gunningberg. Locking Effects in Multiprocessor Implementations of Protocols. In proc. ACM SIGCOMM'93, pp.74-83, Sept. 1993.


Towards High Performance Cryptographic Software - Nahum, O'Malley, Orman.. (1995)   (3 citations)  (Correct)

....of processors tested. This is because the encryption protocols are compute bound, overshadowing any locking cost, and the encryption is done outside the scope of any locks. Similar linear speedups were observed for cryptographic UDP based stacks, not shown due to space limitations. Previous work [2, 11] has shown limited packet level parallelism using a single TCP connection, barring any other protocol processing. Given that the locked component of manipulating the TCP connectionstate limits the throughput to about 200 Mbits on this platform, we estimate that the TCP MD5 IP stack would ....

M. Bjorkman and P. Gunningberg. Locking effects in multiprocessor implementations of protocols. In ACM SIGCOMM Symposium on Communications Architectures and Protocols.


The Performance Impact of Scheduling for Cache Affinity.. - Salehi, Kurose, Towsley (1995)   (2 citations)  (Correct)

.... of affinitybased scheduling in the context of general parallel programs (i.e. non network related application processing) 4, 6, 12, 24, 27] In this paper, we explore affinitybased scheduling of parallel networking, an area of research which has recently generated considerable interest (e.g. [3, 11, 13, 19, 21]) In general, for affinity based scheduling to be effective the time between rescheduling of the affinity managed resource must be small in comparison to the overhead of reloading the displaced cache lines. This work was supported by NSF under grant NCR 9206908 and by ARPA under ESD AVS ....

....protocol parallelization paradigms in which each message, during the course of its processing, visits a single processor and executes within the context of a single thread. This captures the packet level and connection level parallelism found in several multiprocessor protocol implementations [3, 13, 16, 21]; a related form of parallelism is found in the STREAMS implementations in several commercial operating systems (e.g. 19] We do not consider functional or layer parallelisms, since they incur high synchronization overheads on RISC based shared memory machines [20] We present two sets of ....

[Article contains additional citation context not shown here]

M. Bjorkman and P. Gunningberg. "Locking Effects in Multiprocessor Implementations of Protocols". Proceedings of ACM SIGCOMM Conference, p. 74-83, Sep. 1993.


Protocol Parallelization - Touch (1995)   (1 citation)  (Correct)

..... sourcing limits (ability to have enough questions to think about) Thinking is a processing bottleneck, exhibited in the processing speed of TCP IP and the operating system (OS) interface involved in the transaction. The processing bottleneck for TCP has been addressed by parallelism [16] [1] (both discussed later) even though its performance has been shown not to be the predominant limitation to communication [17] 10] Other components of the protocol stack have also been parallelized, e.g. via pipelining of the IP check sum with the data transfer [5] Processing in the OS has ....

.... on imprecision) BW rule hdr time data time hdr data low bw low low avg high bw periodic low low avg high bw ASAP high high avg high bw huge packets low high low Low BW High BW Periodic High BW ASAP High BW Big Packets Part Seven Posters Others considered per packet parallelization [1]. They measured parallelism using simulations, and implementations of a multiprocessor x Kernel implementation with spin locks. They observed that the parallel processing contends for the shared protocol state (the Connection Control Block) They claim TCP saturates at a parallelism of 7 (measured ....

Bjorkman, M., and Gunningberg, P., "Locking Effects in Multiprocessor Implementation of Protocols," Proc. ACM Sigcomm, Oct. 1993, pp. 74-83.


The Effectiveness of Affinity-Based Scheduling in Multiprocessor.. - Salehi (1996)   (14 citations)  (Correct)

....which attempts to manage processors and threads in a manner that reduces cache misses and decreases execution times. In this paper, we explore affinity based scheduling of parallel network protocol processing, an area of research which has recently generated considerable interest (e.g. [2, 5, 11, 15, 18, 23, 24]) The use of parallelism in protocol processing is motivated by the development of high speed networks, such as ATM, capable of delivering gigabit range bandwidth to individual machines. Emerging large scale server applications, such as digital multimedia information respositories, require ....

M. Bjorkman and P. Gunningberg. "Locking Effects in Multiprocessor Implementations of Protocols". Proc. ACM SIGCOMM, p. 74-83, Sep. 1993.


Demultiplexed Architectures: A Solution for Efficient.. - Roca, Braun, Diot (1997)   (11 citations)  (Correct)

....most one thread per context) On the contrary, with the message parallelism, each connection can benefit from all the processors, at the expense of increased synchronization needs. Anyway, experiments show that a TCP IP stack has a limited scalability, even with a message based type of parallelism [Bjorkman93]. The TCP scalability (3.4 ratio on an octoprocessor) is well behind the optimal speedup ratio, i.e. the number of processors. The situation is different with UDP which requires less synchronization due to its connectionless nature. 3 IMPROVING STREAMS BASED COMMUNICATION SYSTEM In this section, ....

M. Bjorkman, P. Gunningberg, "Locking effects in multiprocessor implementations of protocols", ACM SIGCOMM'93, September 1993.


Networking Support For High-Performance Servers - Nahum (1997)   (Correct)

....is required in the network protocol stack; otherwise, a server s network bandwidth will be limited by the performance of a single processor, which may become a bottleneck. Many approaches to concurrency in protocols have been proposed. One that has gained favor is packet level parallelism [12, 48], where packets or messages are the unit of concurrency. Packets are processed in parallel, regardless of their connection or where they are in the protocol stack. This appears able to react to the workload more closely than other approaches to parallelism in protocols. We present a taxonomy of ....

....take advantage of the machines full capabilities. One way to improve performance in the network protocol subsystem to exploit the availability of multiple processors in the host. The use of parallelism in network protocol processing has recently become an active area of research in both academia [5, 12, 20, 21, 22, 35, 46, 47, 48, 59, 62, 63, 64, 69, 70, 71, 72, 74, 75, 76, 77, 87, 88, 89, 95, 100, 106, 107, 108, 109, 111, 112, 116, 117, 124, 125] and industry [18, 37, 42, 45, 49, 68, 90, 94, 110, 120] Many approaches to parallelism in network protocols have been proposed. We provide a brief taxonomy of parallelism in protocols here; more detailed surveys can be found in [12, 48] In general, we attempt to classify approaches by the unit ....

[Article contains additional citation context not shown here]

Bjorkman, M. and Gunningberg, P. Locking effects in multiprocessor implementations of protocols. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, pages 74--83, San Francisco, CA, Sept. 1993.


The Effectiveness of Affinity-Based Scheduling in.. - Salehi, Kurose, Towsley (1996)   (14 citations)  (Correct)

....which attempts to manage processors and threads in a manner that reduces cache misses and decreases execution times. In this paper, we explore affinity based scheduling of parallel network protocol processing , an area of research which has recently generated considerable interest (e.g. [3, 8, 19, 23, 28, 32, 33, 34, 40]) The use of parallelism An earlier version of this paper was presented at the IEEE Infocom 96 Conference. The paper was selected by the conference as one of its top papers and referred to the Transactions for possible publication after the Transactions own independent review. This work was ....

....both increase the bandwidth and decrease the latency of multiprocessor communication. In functional parallelism, an individual packet concurrently visits multiple processors (e.g. 16, 19] In layer parallelism, packets visit multiple processors in a pipelined fashion (e.g. 33] Packetlevel [3, 8, 11, 13, 14, 23, 28, 32, 33, 34] and connectionlevel [8, 11, 28, 32, 34, 40] parallelisms enable concurrency at higher levels of granularity. In general, some form of network parallelism is generally necessary on multiprocessor machines, since the alternative would restrict aggregate network access to the bandwidth capacity of ....

Mats Bjorkman and Per Gunningberg. Locking effects in multiprocessor implementations of protocols. In Proc. ACM SIGCOMM, pages 74--83, San Francisco, CA, September 1993.


Scheduling for Cache Affinity in Parallelized.. - Salehi, Kurose, Towsley (1994)   (2 citations)  (Correct)

....entity (in these cases, the process) is much larger than the time required to entirely reload the referenced memory locations into the cache. In this paper, we explore the benefits of affinity scheduling of parallel networking, an area of research which has recently generated considerable interest [3, 4, 6, 12, 13, 15, 17, 18, 20, 23, 25, 26, 27, 35] and one for which affinity scheduling has not yet been examined. Intuitively, parallel networking is a potential candidate for the technique since packets can be individually scheduled and the time to process a packet is relatively short. However, several aspects of the application domain ....

....of the x kernel 1 Depending on the coherence state of the referenced cache line. 2 In functional parallelism, an individual packet concurrently visits multiple processors [4, 15, 17] In layer parallelism, packets visit multiple processors in a pipelined fashion [7, 24, 26] Packet level [3, 6, 9, 12, 13, 20, 23, 25, 26, 27] and connection level [6, 9, 23, 25, 27] parallelisms enable concurrency at higher levels of granularity. 3 We use the terms thread and process interchangeably. 3, 20] the STREAMS implementation in Plan 9 [23] and the ASX framework [26, 27] A related form of parallelism is found in the ....

[Article contains additional citation context not shown here]

Mats Bjorkman and Per Gunningberg. Locking effects in multiprocessor implementations of protocols. In Proceedings of the ACM SIGCOMM Conference on Communications, Architectures, Protocols and Applications, pages 74--83, San Francisco, CA, September 1993.


Parallelized Network Security Protocols - Nahum, Yates, O'Malley, Orman.. (1996)   (7 citations)  (Correct)

.... proposed and are briefly described here; more detailed surveys TCP IP FDDI 1 2 1 2 TCP IP FDDI 1 2 TCP IP FDDI 1 2 2 1 Processing Element Protocol Packet Packet Flow 1 TCP Connection Level Parallelism Packet Level Parallelism Layer Parallelism Figure 1: Approaches to Concurrency can be found in [3, 13]. In general, we attempt to classify approaches by the unit of concurrency, or what it is that processing elements do in parallel. Here a processing element is a locus of execution for protocol processing, and can be a dedicated processor, a heavyweight process, or a lightweight thread. Figure 1 ....

....of their connection or where they are in the protocol stack, achieving speedup both with multiple connections and within a single connection. The disadvantage is that it requires locking shared state, most significantly the protocol state at each layer. Systems using this approach include [3, 13]. In functionalparallelism, a protocol layer s functions are the unit of concurrency. Functions within a single protocol layer (e.g. checksum, ACK generation) are decomposed, and each assigned to a processing element. The advantage to this approach is that it is relatively fine grained, and thus ....

[Article contains additional citation context not shown here]

M. Bjorkman and P. Gunningberg. Locking effects in multiprocessor implementations of protocols. In ACMSIGCOMM Symposiumon CommunicationsArchitecturesandProtocols, pages 74--83, San Francisco, CA, Sept. 1993.


Appendix D - Research Plan For   (Correct)

No context found.

Bjrkman, M. & Gunningberg, P., "Locking Effects in Multiprocessor Implementation of Protocols". Journal of High Speed Networks, 1994, No 2, Vol 3.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC