44 citations found. Retrieving documents...
D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In SIGCOMM, 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Determining The User-Level Transmission Delay In Networks Of.. - Suciu, Fetzer (1996)   (Correct)

....is the background processor load, not the background network load. This is due to the fact that, during a message transmission, most of the time is spent in the operating system kernel, not on the network wire [5] All previous studies we are aware of confirm this result. In [9] and in [10], the operating system and the memory performance are found to be the main bottlenecks of the message transmission. In [2] and in [3] varying background processor loads are placed on the respective systems, and severe network performance degradations are observed. Scheduling Priority Since the ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. Computer Communication Review, 26(4), 73--84, 1996. 9


System Support for Online Reconfiguration - Soules, Appavoo, Hui.. (2003)   (7 citations)  (Correct)

....of getting this behavior is with an IF statement at the top of a component with both implementations. A hot swapping approach separates the two implementations, simplifying testing by reducing internal states and increasing performance by reducing negative cache effects of the uncommon case code [40]. Section 6.3 evaluates the use of online reconfiguration to specialize exclusive access to a file, while still supporting full sharing semantics when necessary. Dynamic monitoring: Instrumentation gives developers and administrators useful information in the face of system anomalies, but ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. ACM SIGCOMM Conference. Published as Computer Communication Review, 26(4):73--84. ACM, 1996.


Resource Control of Untrusted Code in an Open Network Environment - Menage (2003)   (2 citations)  (Correct)

....or the video display subsystem Scout attempts to o#er guarantees to particular activities. A further advantage is that the path abstraction permits further optimisations such as partially evaluating the functions along a path to provide versions specialised for a particular data stream [Mosberger96a] The performance improvements achieved by these optimisations must be o#set by the loss of spatial locality in the processing code caused by multiple specialised versions of common functions. Nemesis [Roscoe95, Leslie96] is an operating system designed to provide resource guarantees and QoS to ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In SIGCOMM


ESP: A Language for Programmable Devices - Kumar (2002)   (Correct)

....that are available when the file is opened. The Scout operating system [75] makes paths an explicit abstraction mechanism to improve resource allocation and scheduling decisions. It uses compiler optimizations like outlining, cloning, and path inlining improve the performance of the fast paths [76]. However, since the paths are dynamically created, the compiler cannot always optimize these paths. To address this, the compiler generates optimized code for some paths. When a path is created at runtime, if the runtime system can determine that optimized code is available for that path, it uses ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of the SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, Stanford, California, August 1996.


Programming Language Optimizations for Modular Router.. - Kohler, Morris, Chen   (Correct)

....implementing and composing network protocols. Protocol nodes in the x kernel resemble Click elements. Hand optimization of x kernel configurations demonstrated that path inlining, which combines the e#ect of our devirtualizer (Section 6. 1) with inlining, can significantly decrease protocol latency [12]. Automatic configuration optimization is not supported. Scout [13] a successor to the x kernel, was designed for routing and high performance networking, rather than protocol composition. Scout comes with a simple rulebasedoptimizersimilar toourclick xformtool(Section 6.2) ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. ACM SIGCOMM Conference (SIGCOMM '96), pages 73--84, August 1996.


Applications of Randomness in System Performance Measurement - Blackwell (1998)   (2 citations)  (Correct)

....number of cycles lost due to mispredicted branches. However, the size of the frequently executed code increases. Thus the processor wastes fewer cycles due to branches, but may incur more cache misses. Work by others has focussed on reducing the footprint of code to improve performance. Mosberger [37] found that removing rarely executed code from the contiguous block of memory occupied by a function significantly reduced memory system stalls. Studies of these and other optimizations need robust and reproducible measurements of memory system costs. 37 To estimate the magnitude of the potential ....

D. Mosberger, L.L. Peterson, P.G. Bridges, S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of ACM SIGCOMM `96.


Improving Computer Communication Performance by Reducing Memory.. - Ahlgren (1997)   (2 citations)  (Correct)

.... researchers attention, several researchers have investigated the possibilities to control data placement in the cache in order to avoid cache conflicts [19, 50] University of Arizona has given special attention to caching for communication, both on the data side [59] and on the instruction side [57]. For the instruction side, they investigate in how to use compiler techniques to increase cache performance. Salehi, Kurose and Towsley [66] have studied cache behavior in the context of parallelized communication protocols. Braun and Diot [21] report on, and compare cache hit rates of, an ILP ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In SIGCOMM '96 Conference Proceedings, pages 73--84, Palo Alto, CA, USA, August 26--30, 1996. ACM SIGCOMM Computer Communication Review, 26(4).


Dynamic Kernel I-Cache Optimization - Tamches, Miller (2001)   (Correct)

....in each invocation of tcp rput data and a 7 reduction in the benchmark s elapsed run time, demonstrating that even I O benchmarks can incur enough CPU time to benefit from I cache optimization. Code positioning consists of three optimizations: Procedure splitting. Also called outlining [16], this optimization segregates frequently executed (hot) basic blocks from cold ones, to reduce I cache pollution. Cold code is prevalent in kernels, due to extensive error checking. Basic block positioning. A function s blocks are reordered to increase straight lined execution in the common ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. ACM Applications, Technologies, Architectures and Protocols for Computer Communication (SIGCOMM), Stanford, CA, August 1996.


Fine-Grained Dynamic Instrumentation Of Commodity Operating.. - Tamches (2001)   (34 citations)  (Correct)

....the call site is altered to directly call the optimized version of the function. A second example of run time code optimization that can be performed with dynamic instrumentation is code positioning, or moving seldom executed basic blocks out of line to improve 4 instruction cache performance [61, 69]. First, a function can be tested for poor instruction cache performance by inserting instrumentation that measures the number of icache misses incurred. After a time, the instrumentation is removed, and if instruction cache performance is poor, the function s basic blocks are instrumented for ....

....locations where kernel code can be inserted, can be almost any machine code instruction within the kernel. Runs on a commodity kernel. This enables instrumentation under real world workloads. It is worthwhile to note that much recent operating system research has taken place on custom kernels [10, 11, 21, 34, 35, 36, 37, 47, 59, 61, 62, 68, 73, 74, 75, 81, 82, 83, 84]; this dissertation shows that run time instrumentation is feasible on a commodity kernel. Runs on an unmodified kernel. This contribution is important, because requiring a modified or somehow customized kernel, even an otherwise commodity one, would likely preclude an instrumentation tool s ....

[Article contains additional citation context not shown here]

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. ACM Applications, Technologies, Architectures and Protocols for Computer Communication (SIGCOMM), Stanford, CA, August 1996.


Structuring Communication Software for Quality-of-Service.. - Mehra, Indiresan, Shin (1996)   (29 citations)  (Correct)

....and resource management services within the communication subsystem. Communication subsystem design and performance optimization: Several recent efforts have focused on optimizing the performance of the data transfer path in TCP IP protocol stacks, via improvement of protocol processing latency [57 59], and user level handling of network data [14 16, 60, 61] to increase throughput via data copy minimization. Several researchers have studied the issues affecting the design and performance of network adapters [8, 16, 62] and communication subsystems in general [24, 63] All of these efforts are ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. of ACM SIGCOMM, pp. 73--84, October 1996.


The Click Modular Router - Kohler (2000)   (64 citations)  (Correct)

....of the graph and to determine whether elements can share code. Both analyses use relatively simple data flow algorithms. It does not require substantial graph manipulation. Devirtualization is a well known technique in object oriented programming languages such as Java. Mosberger et al. [31] demonstrate that path inlining, essentially devirtualization with inlining, is useful for decreasing protocol latency in a modular networking system (the x kernel [22] but they implement it by hand. To our knowledge, neither the x kernel nor Scout [36] can implement devirtualization ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. ACM SIGCOMM '96 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pages 73--84, August 1996.


Resource-Bounded Partial Evaluation - Debray   (6 citations)  (Correct)

....into a partial evaluator, preliminary experiments appear encouraging. Acknowledgements This paper has benefited greatly from comments by Peter Holst Andersen as well as the anonymous referees. 5 In operating systems parlance, this kind of selective specialization is referred to as outlining [8, 26, 27]. 20 ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency", Proc. SIGCOMM '96, pp. 73--84, Sept. 1996.


Resource-Bounded Partial Evaluation - Debray (1996)   (6 citations)  (Correct)

....generalizations of ideas traditionally used in offline partial evaluation. While our algorithms have not been incorporated into a partial evaluator, preliminary experiments appear encouraging. 2 In operating systems parlance, this kind of selective specialization is referred to as outlining [7, 17, 18]. 14 ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency", Proc. SIGCOMM '96, pp. 73--84, Sept. 1996.


Structuring Host Communication Software For Quality Of Service.. - Mehra (1997)   (Correct)

....and evaluated in [1] reduces the number of accesses to network data by effectively collapsing protocol layers and executing them in an integration fashion for each data word accessed. Several recent efforts have also focused on optimizing the protocol processing latency in TCP IP protocol stacks [16, 129, 176]. User level protocol processing: Several research efforts have focused on increasing communication subsystem throughput via user level handling of network data [57, 112, 167] In 28 addition to data copy minimization compared to a server based implementation, user level protocol processing ....

....protocols is highlighted in [16] This has significant implications for system parameterization since it highlights the difficulty in measuring various processing overheads accurately. Cache predictability may be improved via appropriate protocol implementation and compilation techniques [129], or via cache partitioning and appropriate OS support [110] Any worst case processing estimates are likely to be overly conservative. We note that this problem relates to memory subsystem design for modern processors, and is not related to the actual mechanism employed to profile communication ....

[Article contains additional citation context not shown here]

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. of ACM SIGCOMM, pp. 73--84, October 1996.


Self-Parameterizing Protocol Stacks for Quality-of-Service.. - Mehra, Wang, Shin   (Correct)

....portable QoS sensitive communication software. A number of recent research efforts have developed efficient architectures and performance optimizations for the components constituting the communication subsystem, namely, the protocol stack and its interaction with the attached network interfaces [7, 8, 9, 10, 11, 12]. The primary thrust of these efforts has been to improve the average latency and throughput delivered by the communication subsystem to applications. The issues explored include optimizations to improve protocol processing latency [9, 10, 11] techniques to minimize data copies [7, 12] and ....

....with the attached network interfaces [7, 8, 9, 10, 11, 12] The primary thrust of these efforts has been to improve the average latency and throughput delivered by the communication subsystem to applications. The issues explored include optimizations to improve protocol processing latency [9, 10, 11], techniques to minimize data copies [7, 12] and high performance network interface design [8, 7] However, while improving communication subsystem performance, these approaches are insufficient for the design and development of QoS sensitive communication subsystems, as explained 2 later. It ....

[Article contains additional citation context not shown here]

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. of ACM SIGCOMM, October 1996, pp. 73--84.


Programming Language Techniques for Modular Router.. - Kohler, Chen.. (2000)   (4 citations)  (Correct)

....of the graph and to determine whether elements can share code. Both analyses are relatively simple data flow algorithms. It does not require substantial graph manipulation. Devirtualization is a well known technique in object oriented programming languages such as Java. Mosberger et al. [16] demonstrate that path inlining, essentially 12 devirtualization with inlining, is useful for decreasing protocol latency in a modular networking system (the x kernel [11] but they implement it by hand. To our knowledge, neither the x kernel nor Scout [18] can implement devirtualization ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. ACM SIGCOMM Conference (SIGCOMM '96), pages 73--84, August 1996.


Fast, Optimized Sun RPC Using Automatic Program.. - Muller, Marlet.. (1997)   (3 citations)  (Correct)

....such as specific RPC optimizations, kernel level optimizations, operating system structuring, and automatic program transformation. Let us outline the salient aspects of these research directions. General RPC optimizations. A considerable amount of work has been dedicated to optimize RPC (see [32, 17, 36, 25, 24]) In most of these studies, a fast path in the RPC is identified, corresponding to a performance critical, frequently used case. The fast path is then optimized using a wide range of techniques. The optimizations address different layers of the protocol stack, and are performed either manually ....

....to improve network throughput and to reduce latency. Madea and Bershad propose to restructure network layers and to move some functions into user space [20] Mosberger et al. describe techniques for improving protocols by reducing the number of cycles stalled to wait for memory access completion [24]. Manual specialization. In a first step, operating systems specialization has been performed manually in experiments such as Synthesis [28, 21] and Synthetix [27] Manual specialization, however, tends to compromise other system properties such as maintainability and portability. Furthermore, ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S.W. O'Malley. Analysis of techniques to improve protocol processing latency. In SIGCOMM96 [33].


Applying Optimization Principle Patterns to Real-time.. - Pyarali, O'Ryan.. (2000)   (1 citation)  (Correct)

....memory to reduce unnecessary data copying and achieve high throughput. This optimization is based on Principle Pattern 2, which focuses on eliminating gratuitous waste and Principle Pattern 3, which replaces generic schemes with efficient, special purpose ones. 4.1. 5 Improving cache affinity [51] describes a scheme called outlining that when used improves processor cache effectiveness, thereby improving performance. 4.1.6 Efficient demultiplexing Demultiplexing routes messages between different levels of functionality in layered communication protocol stacks. Most conventional ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency," in Proceedings of SIGCOMM '96, (Stanford, CA), pp. 73--84, ACM, August 1996.


A Readable TCP in the Prolac Protocol Language - Kohler (1998)   (15 citations)  (Correct)

....transport protocols, for example. In this case, leaf analysis would appropriately fail, and the necessary dynamic dispatches would be generated. It would continue to be effective within the module hierarchies for the individual protocols, however. 3.4. 2 Inlining and outlining Mosberger et al. [MPBO96] list a number of useful techniques for improving protocol efficiency. Prolac has direct support for three of these: inlining, path inlining, and outlining. Inlining is replacing a function call with the function s body; path inlining is simply recursive inlining, where functions called by an ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM 1996 Conference, pages 73--84, August 1996.


Building Reliable, High-Performance Communication.. - Liu, Kreitz, van.. (1999)   (6 citations)  (Correct)

....The first three steps do not affect the layering abstraction itself. However, the final two steps require generating special code for common cases. Finding common cases and compressing headers is far beyond the capabilities of current compiler optimization techniques, and therefore previous work [1, 2, 9, 21] involves hand optimization or at least significant annotation of the code. Both [1] and [13] report that this is a difficult and errorprone process, which is consistent with our own experience in trying to do so. Chapter 5 of [10] shows how such optimizations can be formalized using predicates ....

....computation, encryption, etc. on large data packets. In our setting, many packets are quite small, and a large amount of protocol latency is introduced by protocol abstraction boundaries. Our work presents a technique for optimizing mainly non data touching operations, similar to path inlining [21]. Path inlining turns out to be difficult because of message ordering constraints, and it is out of the reach of traditional compiler optimization techniques because of the need for 90 path constraints that cross component boundaries. Formal tools are able to analyze global properties, and we use ....

MOSBERGER, D., PETERSON, L., BRIDGES, P., AND O'MALLEY, S. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM Conference (New York, 1996), pp. 73--84.


Fast Paths in Concurrent Programs - Xu, Kumar, Li (2004)   (Correct)

No context found.

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In SIGCOMM, 1996.


Copyright 2002, Intel Corporation, All rights reserved. - Queue-Pair Ip Hybrid   (Correct)

No context found.

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," Proceedings of ACM SIGCOMM '96,ACM, Stanford, CA, USA, 1996.


System Support for Online Reconfiguration - Craig Soules Jonathan (2003)   (7 citations)  (Correct)

No context found.

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. ACM SIGCOMM Conference. Published as Computer Communication Review, 26(4):73--84. ACM, 1996.


Eddie Kohler - Technology Square Room   (Correct)

No context found.

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM 1996.


Architectural Analysis and Instruction-Set Optimization for.. - Haiyong Xie Li   (Correct)

No context found.

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. Proceedings of SIGCOMM '91 Symposium on Communication Architectures and Protocols, 1996

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC