44 citations found. Retrieving documents...
D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In SIGCOMM, 1996.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Determining The User-Level Transmission Delay In Networks Of.. - Suciu, Fetzer (1996)   (Correct)

....is the background processor load, not the background network load. This is due to the fact that, during a message transmission, most of the time is spent in the operating system kernel, not on the network wire [5] All previous studies we are aware of confirm this result. In [9] and in [10], the operating system and the memory performance are found to be the main bottlenecks of the message transmission. In [2] and in [3] varying background processor loads are placed on the respective systems, and severe network performance degradations are observed. Scheduling Priority Since the ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. Computer Communication Review, 26(4), 73--84, 1996. 9


System Support for Online Reconfiguration - Soules, Appavoo, Hui.. (2003)   (7 citations)  (Correct)

....of getting this behavior is with an IF statement at the top of a component with both implementations. A hot swapping approach separates the two implementations, simplifying testing by reducing internal states and increasing performance by reducing negative cache effects of the uncommon case code [40]. Section 6.3 evaluates the use of online reconfiguration to specialize exclusive access to a file, while still supporting full sharing semantics when necessary. Dynamic monitoring: Instrumentation gives developers and administrators useful information in the face of system anomalies, but ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. ACM SIGCOMM Conference. Published as Computer Communication Review, 26(4):73--84. ACM, 1996.


Resource Control of Untrusted Code in an Open Network Environment - Menage (2003)   (2 citations)  (Correct)

....or the video display subsystem Scout attempts to o#er guarantees to particular activities. A further advantage is that the path abstraction permits further optimisations such as partially evaluating the functions along a path to provide versions specialised for a particular data stream [Mosberger96a] The performance improvements achieved by these optimisations must be o#set by the loss of spatial locality in the processing code caused by multiple specialised versions of common functions. Nemesis [Roscoe95, Leslie96] is an operating system designed to provide resource guarantees and QoS to ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In SIGCOMM


ESP: A Language for Programmable Devices - Kumar (2002)   (Correct)

....that are available when the file is opened. The Scout operating system [75] makes paths an explicit abstraction mechanism to improve resource allocation and scheduling decisions. It uses compiler optimizations like outlining, cloning, and path inlining improve the performance of the fast paths [76]. However, since the paths are dynamically created, the compiler cannot always optimize these paths. To address this, the compiler generates optimized code for some paths. When a path is created at runtime, if the runtime system can determine that optimized code is available for that path, it uses ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of the SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, Stanford, California, August 1996.


Programming Language Optimizations for Modular Router.. - Kohler, Morris, Chen   (Correct)

....implementing and composing network protocols. Protocol nodes in the x kernel resemble Click elements. Hand optimization of x kernel configurations demonstrated that path inlining, which combines the e#ect of our devirtualizer (Section 6. 1) with inlining, can significantly decrease protocol latency [12]. Automatic configuration optimization is not supported. Scout [13] a successor to the x kernel, was designed for routing and high performance networking, rather than protocol composition. Scout comes with a simple rulebasedoptimizersimilar toourclick xformtool(Section 6.2) ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. ACM SIGCOMM Conference (SIGCOMM '96), pages 73--84, August 1996.


Applications of Randomness in System Performance Measurement - Blackwell (1998)   (2 citations)  (Correct)

....number of cycles lost due to mispredicted branches. However, the size of the frequently executed code increases. Thus the processor wastes fewer cycles due to branches, but may incur more cache misses. Work by others has focussed on reducing the footprint of code to improve performance. Mosberger [37] found that removing rarely executed code from the contiguous block of memory occupied by a function significantly reduced memory system stalls. Studies of these and other optimizations need robust and reproducible measurements of memory system costs. 37 To estimate the magnitude of the potential ....

D. Mosberger, L.L. Peterson, P.G. Bridges, S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of ACM SIGCOMM `96.


Improving Computer Communication Performance by Reducing Memory.. - Ahlgren (1997)   (2 citations)  (Correct)

.... researchers attention, several researchers have investigated the possibilities to control data placement in the cache in order to avoid cache conflicts [19, 50] University of Arizona has given special attention to caching for communication, both on the data side [59] and on the instruction side [57]. For the instruction side, they investigate in how to use compiler techniques to increase cache performance. Salehi, Kurose and Towsley [66] have studied cache behavior in the context of parallelized communication protocols. Braun and Diot [21] report on, and compare cache hit rates of, an ILP ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In SIGCOMM '96 Conference Proceedings, pages 73--84, Palo Alto, CA, USA, August 26--30, 1996. ACM SIGCOMM Computer Communication Review, 26(4).


Dynamic Kernel I-Cache Optimization - Tamches, Miller (2001)   (Correct)

....in each invocation of tcp rput data and a 7 reduction in the benchmark s elapsed run time, demonstrating that even I O benchmarks can incur enough CPU time to benefit from I cache optimization. Code positioning consists of three optimizations: Procedure splitting. Also called outlining [16], this optimization segregates frequently executed (hot) basic blocks from cold ones, to reduce I cache pollution. Cold code is prevalent in kernels, due to extensive error checking. Basic block positioning. A function s blocks are reordered to increase straight lined execution in the common ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. ACM Applications, Technologies, Architectures and Protocols for Computer Communication (SIGCOMM), Stanford, CA, August 1996.


Fine-Grained Dynamic Instrumentation Of Commodity Operating.. - Tamches (2001)   (34 citations)  (Correct)

....the call site is altered to directly call the optimized version of the function. A second example of run time code optimization that can be performed with dynamic instrumentation is code positioning, or moving seldom executed basic blocks out of line to improve 4 instruction cache performance [61, 69]. First, a function can be tested for poor instruction cache performance by inserting instrumentation that measures the number of icache misses incurred. After a time, the instrumentation is removed, and if instruction cache performance is poor, the function s basic blocks are instrumented for ....

....locations where kernel code can be inserted, can be almost any machine code instruction within the kernel. Runs on a commodity kernel. This enables instrumentation under real world workloads. It is worthwhile to note that much recent operating system research has taken place on custom kernels [10, 11, 21, 34, 35, 36, 37, 47, 59, 61, 62, 68, 73, 74, 75, 81, 82, 83, 84]; this dissertation shows that run time instrumentation is feasible on a commodity kernel. Runs on an unmodified kernel. This contribution is important, because requiring a modified or somehow customized kernel, even an otherwise commodity one, would likely preclude an instrumentation tool s ....

[Article contains additional citation context not shown here]

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. ACM Applications, Technologies, Architectures and Protocols for Computer Communication (SIGCOMM), Stanford, CA, August 1996.


Structuring Communication Software for Quality-of-Service.. - Mehra, Indiresan, Shin (1996)   (29 citations)  (Correct)

....and resource management services within the communication subsystem. Communication subsystem design and performance optimization: Several recent efforts have focused on optimizing the performance of the data transfer path in TCP IP protocol stacks, via improvement of protocol processing latency [57 59], and user level handling of network data [14 16, 60, 61] to increase throughput via data copy minimization. Several researchers have studied the issues affecting the design and performance of network adapters [8, 16, 62] and communication subsystems in general [24, 63] All of these efforts are ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. of ACM SIGCOMM, pp. 73--84, October 1996.


The Click Modular Router - Kohler (2000)   (64 citations)  (Correct)

....of the graph and to determine whether elements can share code. Both analyses use relatively simple data flow algorithms. It does not require substantial graph manipulation. Devirtualization is a well known technique in object oriented programming languages such as Java. Mosberger et al. [31] demonstrate that path inlining, essentially devirtualization with inlining, is useful for decreasing protocol latency in a modular networking system (the x kernel [22] but they implement it by hand. To our knowledge, neither the x kernel nor Scout [36] can implement devirtualization ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. ACM SIGCOMM '96 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pages 73--84, August 1996.


Resource-Bounded Partial Evaluation - Debray   (6 citations)  (Correct)

....into a partial evaluator, preliminary experiments appear encouraging. Acknowledgements This paper has benefited greatly from comments by Peter Holst Andersen as well as the anonymous referees. 5 In operating systems parlance, this kind of selective specialization is referred to as outlining [8, 26, 27]. 20 ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency", Proc. SIGCOMM '96, pp. 73--84, Sept. 1996.


Resource-Bounded Partial Evaluation - Debray (1996)   (6 citations)  (Correct)

....generalizations of ideas traditionally used in offline partial evaluation. While our algorithms have not been incorporated into a partial evaluator, preliminary experiments appear encouraging. 2 In operating systems parlance, this kind of selective specialization is referred to as outlining [7, 17, 18]. 14 ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency", Proc. SIGCOMM '96, pp. 73--84, Sept. 1996.


Structuring Host Communication Software For Quality Of Service.. - Mehra (1997)   (Correct)

....and evaluated in [1] reduces the number of accesses to network data by effectively collapsing protocol layers and executing them in an integration fashion for each data word accessed. Several recent efforts have also focused on optimizing the protocol processing latency in TCP IP protocol stacks [16, 129, 176]. User level protocol processing: Several research efforts have focused on increasing communication subsystem throughput via user level handling of network data [57, 112, 167] In 28 addition to data copy minimization compared to a server based implementation, user level protocol processing ....

....protocols is highlighted in [16] This has significant implications for system parameterization since it highlights the difficulty in measuring various processing overheads accurately. Cache predictability may be improved via appropriate protocol implementation and compilation techniques [129], or via cache partitioning and appropriate OS support [110] Any worst case processing estimates are likely to be overly conservative. We note that this problem relates to memory subsystem design for modern processors, and is not related to the actual mechanism employed to profile communication ....

[Article contains additional citation context not shown here]

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. of ACM SIGCOMM, pp. 73--84, October 1996.


Self-Parameterizing Protocol Stacks for Quality-of-Service.. - Mehra, Wang, Shin   (Correct)

....portable QoS sensitive communication software. A number of recent research efforts have developed efficient architectures and performance optimizations for the components constituting the communication subsystem, namely, the protocol stack and its interaction with the attached network interfaces [7, 8, 9, 10, 11, 12]. The primary thrust of these efforts has been to improve the average latency and throughput delivered by the communication subsystem to applications. The issues explored include optimizations to improve protocol processing latency [9, 10, 11] techniques to minimize data copies [7, 12] and ....

....with the attached network interfaces [7, 8, 9, 10, 11, 12] The primary thrust of these efforts has been to improve the average latency and throughput delivered by the communication subsystem to applications. The issues explored include optimizations to improve protocol processing latency [9, 10, 11], techniques to minimize data copies [7, 12] and high performance network interface design [8, 7] However, while improving communication subsystem performance, these approaches are insufficient for the design and development of QoS sensitive communication subsystems, as explained 2 later. It ....

[Article contains additional citation context not shown here]

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. of ACM SIGCOMM, October 1996, pp. 73--84.


Programming Language Techniques for Modular Router.. - Kohler, Chen.. (2000)   (4 citations)  (Correct)

....of the graph and to determine whether elements can share code. Both analyses are relatively simple data flow algorithms. It does not require substantial graph manipulation. Devirtualization is a well known technique in object oriented programming languages such as Java. Mosberger et al. [16] demonstrate that path inlining, essentially 12 devirtualization with inlining, is useful for decreasing protocol latency in a modular networking system (the x kernel [11] but they implement it by hand. To our knowledge, neither the x kernel nor Scout [18] can implement devirtualization ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. ACM SIGCOMM Conference (SIGCOMM '96), pages 73--84, August 1996.


Fast, Optimized Sun RPC Using Automatic Program.. - Muller, Marlet.. (1997)   (3 citations)  (Correct)

....such as specific RPC optimizations, kernel level optimizations, operating system structuring, and automatic program transformation. Let us outline the salient aspects of these research directions. General RPC optimizations. A considerable amount of work has been dedicated to optimize RPC (see [32, 17, 36, 25, 24]) In most of these studies, a fast path in the RPC is identified, corresponding to a performance critical, frequently used case. The fast path is then optimized using a wide range of techniques. The optimizations address different layers of the protocol stack, and are performed either manually ....

....to improve network throughput and to reduce latency. Madea and Bershad propose to restructure network layers and to move some functions into user space [20] Mosberger et al. describe techniques for improving protocols by reducing the number of cycles stalled to wait for memory access completion [24]. Manual specialization. In a first step, operating systems specialization has been performed manually in experiments such as Synthesis [28, 21] and Synthetix [27] Manual specialization, however, tends to compromise other system properties such as maintainability and portability. Furthermore, ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S.W. O'Malley. Analysis of techniques to improve protocol processing latency. In SIGCOMM96 [33].


Applying Optimization Principle Patterns to Real-time.. - Pyarali, O'Ryan.. (2000)   (1 citation)  (Correct)

....memory to reduce unnecessary data copying and achieve high throughput. This optimization is based on Principle Pattern 2, which focuses on eliminating gratuitous waste and Principle Pattern 3, which replaces generic schemes with efficient, special purpose ones. 4.1. 5 Improving cache affinity [51] describes a scheme called outlining that when used improves processor cache effectiveness, thereby improving performance. 4.1.6 Efficient demultiplexing Demultiplexing routes messages between different levels of functionality in layered communication protocol stacks. Most conventional ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency," in Proceedings of SIGCOMM '96, (Stanford, CA), pp. 73--84, ACM, August 1996.


A Readable TCP in the Prolac Protocol Language - Kohler (1998)   (15 citations)  (Correct)

....transport protocols, for example. In this case, leaf analysis would appropriately fail, and the necessary dynamic dispatches would be generated. It would continue to be effective within the module hierarchies for the individual protocols, however. 3.4. 2 Inlining and outlining Mosberger et al. [MPBO96] list a number of useful techniques for improving protocol efficiency. Prolac has direct support for three of these: inlining, path inlining, and outlining. Inlining is replacing a function call with the function s body; path inlining is simply recursive inlining, where functions called by an ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM 1996 Conference, pages 73--84, August 1996.


Building Reliable, High-Performance Communication.. - Liu, Kreitz, van.. (1999)   (6 citations)  (Correct)

....The first three steps do not affect the layering abstraction itself. However, the final two steps require generating special code for common cases. Finding common cases and compressing headers is far beyond the capabilities of current compiler optimization techniques, and therefore previous work [1, 2, 9, 21] involves hand optimization or at least significant annotation of the code. Both [1] and [13] report that this is a difficult and errorprone process, which is consistent with our own experience in trying to do so. Chapter 5 of [10] shows how such optimizations can be formalized using predicates ....

....computation, encryption, etc. on large data packets. In our setting, many packets are quite small, and a large amount of protocol latency is introduced by protocol abstraction boundaries. Our work presents a technique for optimizing mainly non data touching operations, similar to path inlining [21]. Path inlining turns out to be difficult because of message ordering constraints, and it is out of the reach of traditional compiler optimization techniques because of the need for 90 path constraints that cross component boundaries. Formal tools are able to analyze global properties, and we use ....

MOSBERGER, D., PETERSON, L., BRIDGES, P., AND O'MALLEY, S. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM Conference (New York, 1996), pp. 73--84.


An Efficient End-Host Protocol Processing Architecture for.. - Zuberi, Shin (1998)   (Correct)

....code is still fetched into the I cache, causing replacement misses. Moreover, repeated branches can cause CPU pipeline stalls. For relatively slow CPUs such as those used in IAs, this results in significant non data touching overhead. Researchers have proposed techniques such as outlining cloning [8] and incremental specialization [9] to reduce I cache misses, but these schemes involve lowlevel optimization of frequently executed code in the if (check1) if (check2) if (check3) Figure 1: Typical structure of error checks in protocol code. protocol stack, and this ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," in Proc. SIGCOMM, pp. 73--84, August 1996.


On the Performance Impact of Supporting QoS . . . - Kandlur, Mehra, Saha   (Correct)

....paths for QoS sessions relative to the best effort sessions. Our results also indicate that the QoS support provided has no impact on the performance of the best effort data path. The performance of protocol stacks has been the subject of numerous research articles, including some very recent ones [4, 11, 14]. However, almost all of these studies focus on the traditional best effort data path. Our study assumes significance in that it quantifies the performance penalty imposed by new data handling components in the protocol stack, and their impact on the best effort data path. While the results ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. of ACM SIGCOMM, pages 73--84, October 1996.


Optimizing a CORBA Inter-ORB Protocol (IIOP) Engine for.. - Gokhale, Schmidt   (Correct)

....factor in ORB performance. B.5 Optimization Steps 3 and 4: Optimizing for Processor Caches Processor caches are small, very fast memory used to significantly speed up operations [25] To leverage the advantages offered by the processor cache it is imperative that operation footprints be small. [26] describes several techniques to improve protocol latency. One of the primary areas to be considered for improving protocol performance is to improve the processor cache effectiveness. Data Type Analysis Method Name msec Called double read 3,413 4,665 54.93 TypeCode: traverse 2,747 1,539 ....

....3 from Table I, which replaces generalpurpose methods with efficient special purpose ones. In the present case, however, the large, monolithic interpreter is replaced by special purpose methods for encoding and decoding. 2. Using outlining to optimize for the frequently executed case: Outlining [26] is used to remove gaps that are introduced in the processor cache as a result of branch instructions arising from error handling code. Processor cache gaps are undesirable because they waste memory bandwidth and introduce useless no op instructions in the cache. The purpose of outlining is to ....

[Article contains additional citation context not shown here]

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency," in Proceedings of SIGCOMM '96, (Stanford, CA), pp. 73--84, ACM, August 1996.


A Readable TCP in the Prolac Protocol Language - Kohler, Kaashoek, Montgomery (1999)   (15 citations)  (Correct)

....for example. In this case, static class hierarchy analysis would appropriately fail, and the necessary dynamic dispatches would be generated. The analysis would continue to be effective within the module hierarchies for the individual protocols. 3.4. 2 Inlining and outlining Mosberger et al. [16] list a number of useful techniques for improving protocol efficiency. Prolac has direct support for three of these: inlining, path inlining, and outlining. Inlining is replacing a function call with the function s body; path inlining is recursive inlining, where functions called by an inlined ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM 1996 Conference, pages 73--84, August 1996.


High-Performance Application-Specific Networking - Wallach (1997)   (2 citations)  (Correct)

....the 125, the I O devices are accessed over a 12.5 MHz TURBOchannel bus. The four DECstations are connected with an AN2 switch [3] 6.1. 2 Methodology While collecting the numbers reported in this thesis, we had a fair number of problems with cache conflicts (similar to problems reported by others [42]) because the DECstations have direct mapped caches. In order to minimize the effect these conflicts had on our experiments, we automatically linked the kernel object files in many different orders and picked a best case timing to report, for every application. We feel that this ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. Technical Report TR96-93, University of Arizona, 1996.


Prolac: A Language For Protocol Compilation - Kohler (1997)   (1 citation)  (Correct)

....with the growing importance of networking, and the occasional need for protocol extensions [Ste97, BOP94] only complicates the issue. Unfortunately, these tensions work against one another. Many optimizations whichmakeprotocol code more efficient also tend to make it much harder to understand [MPBO96], and therefore harder to get right. Extensions affect deeply buried snippets of protocol code rarely identifiable a priori. Finally, the clearest organization of protocol code is often among the slowest. Specialized language tools are a natural area to investigate for a solution to this software ....

....body at a call site. Outlining is when code for an infrequently executed branch is moved to the end of the function body; this will improve a program s i cache and instruction pipeline behavior when the common path is taken. These optimizations are particularly important for network protocols [MPBO96]. Protocol code is too important to suffer frequent function call overhead and loss of intraprocedural optimization opportunities; Prolac s focus on many small rules makes inlining even more important. Secondly, much protocol code contains a lot of error handling code (or, more generally, ....

[Article contains additional citation context not shown here]

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM 1996 Conference, pages 73--84, August 1996.


Design Principles and Optimizations for High Performance ORBs - Gokhale, Schmidt (1997)   (Correct)

....A packet filter demultiplexes incoming packets to the appropriate target application(s) Rather than having demultiplexing occur at every layer, each protocol layer passes certain information to the packet filter, which allows it to identify which packets are destined for which protocol layer. [18] describes a scheme called outlining that when used improves processor cache effectiveness, thereby improving performance. Related work on CORBA performance measurements: 9, 10, 11] show that the performance of CORBA middleware implementations is relatively poor, compared to lower level ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of SIGCOMM '96, pages 73--84, Stanford, CA, August 1996. ACM.


Exploring the Performance Impact of QoS Support in.. - Engel, Kandlur, Mehra, .. (1998)   (Correct)

....(i.e. those supporting a sockets based communication system) Internet servers, the typical sources of multimedia data on the Internet. Protocol stack performance and optimizations of existing implementations has been the subject of numerous research articles, including some very recent ones [6, 7]. However, all these studies focus on the traditional besteffort data path. Our study assumes significance in that it quantifies the performance penalty imposed by new datahandling components in the protocol stack, and their impact on the best effort data path. With the popularity of networked ....

....traffic on a particular reservation is based on the service class of that reservation. Protocol Processing and Data Transfer Optimizations: Several recent efforts have focused on optimizing the performance of the data transfer path in TCP IP protocol stacks, including protocol processing latency [6, 7] and user level handling of network data [22, 23] to increase throughput via data copy minimization. All these efforts target traditional best effort traffic, and as such are complementary to our work, which focuses on the performance impact of supporting QoS in TCP IP protocol stacks. More ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley, "Analysis of techniques to improve protocol processing latency", in Proc. of ACM SIGCOMM, pp. 73--84, October 1996.


Fine-Grained Dynamic Instrumentation of Commodity Operating.. - Tamches, Miller (1999)   (34 citations)  (Correct)

....check: sites where the function is called can be examined for an actual parameter that always equals the specialized value. If so, the call site is altered to directly call the optimized version of the function. Moving seldom executed basic blocks out of line to improve instruction cache behavior [13] can be performed using fine grained dynamic instrumentation. A function s entry and exit point(s) can be annotated to measure the number of icache misses it incurs. If the value is high, the function s basic blocks can be instrumented to count execution frequency. An optimized version of the ....

....scenarios. Recovery in a commodity operating system after an open annotation fault is an area we are actively researching. Code adaptations intentionally change the behavior of the underlying system in some way. Examples include on the fly optimizations such as specialization [16] and outlining [13]. An adaptation may take some part of the kernel and replace it with code that accomplishes the same task, but in a more efficient or reliable manner. We are currently developing the mechanisms for closed looped dynamic measurement and optimization. Adaptations may also include adding new ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. ACM SIGCOMM 1996, Stanford, CA, Aug. 1996.


Networking Support For High-Performance Servers - Nahum (1997)   (Correct)

....He shows that by combining the packet checksum with the data copy, the checksum incurs little additional overhead since it is hidden in the memory latency of the copy. We have measured the cache miss rates of protocol stacks with and without both the copy and the checksum. Mosberger et al. [83] examine several compiler related approaches to improving protocol latency. They present an updated study of protocol processing on a DEC Alpha, including a detailed analysis of instruction cache effectiveness. Using a combination of their techniques (outlining, cloning, and path inlining) they ....

....4.6 Improving I Cache Performance with Cord In this section we examine the flip side of hardware software interaction: tuning or changing the software to take better advantage of the hardware. In this Chapter, we have advocated techniques that improve instruction cache behavior. Mosberger et al. [83] and Blackwell [13] provide two examples of how this can be done. Mosberger et al. examine several compiler related approaches to improving protocol latency. Using a combination of their techniques (outlining, cloning, and path inlining) they show up to a 40 percent reduction in protocol ....

Mosberger, D., Peterson, L. L., Bridges, P. G., and O'Malley, S. Analysis of techniques to improve protocol processing latency. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, Stanford, CA, Aug. 1996.


Principles for Optimizing CORBA Internet Inter-ORB Protocol.. - Gokhale, Schmidt (1998)   (12 citations)  (Correct)

....scheme can be used to marshal data types that are seldom transferred. This hybrid scheme tries to achieve an optimal time and space tradeoff by using fast, but large compiled stubs as well as a slow, but compact interpreter. 3.2. 5 Optimization Steps 3 and 4: Optimizations for Processor Caches [19] describes several techniques to improve protocol latency. One of the primary areas to be considered for improving protocol performance is to improve the processor cache effectiveness. Hence, the optimizations described in this section are aimed at improving processor cache affinity, thereby ....

....at this level of granularity. This principle is similar to Principle 3 from Table 1, which replaces general purpose methods with efficient special purpose ones. In our case, the large, monolithic interpreter is replaced by special purpose methods for encoding and decoding. ffl Using outlining [19] to optimize for the frequently executed case Outlining is used to remove gaps that are introduced in the processor cache as a result of branch instructions arising out of error handling code. Processor cache gaps are undesirable because they waste memory bandwidth and introduce useless ....

[Article contains additional citation context not shown here]

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of SIGCOMM '96, pages 73--84, Stanford, CA, August 1996. ACM.


Optimizing a CORBA Inter-ORB Protocol (IIOP) Engine for.. - Gokhale, Schmidt   (Correct)

....factor in ORB performance. B.5 Optimization Steps 3 and 4: Optimizing for Processor Caches Processor caches are small, very fast memory used to significantly speed up operations [25] To leverage the advantages offered by the processor cache it is imperative that operation footprints be small. [26] describes several techniques to improve protocol latency. One of the primary areas to be considered for improving protocol performance is to improve the processor cache effectiveness. Hence, the optimizations described in this section are aimed at improving processor cache affinity, thereby ....

....3 from Table I, which replaces generalpurpose methods with efficient special purpose ones. In the present case, however, the large, monolithic interpreter is replaced by special purpose methods for encoding and decoding. 2. Using outlining to optimize for the frequently executed case: Outlining [26] is used to remove gaps that are introduced in the processor cache as a result of branch instructions arising from error handling code. Processor cache gaps are undesirable because they waste memory bandwidth and introduce useless no op instructions in the cache. The purpose of outlining is to ....

[Article contains additional citation context not shown here]

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency," in Proceedings of SIGCOMM '96, (Stanford, CA), pp. 73--84, ACM, August 1996.


Design and Implementation of an RSVP based Quality of .. - Barzilai, Kandlur.. (1997)   (7 citations)  (Correct)

....traffic on a particular reservation is based on the service class of that reservation. Protocol Processing and Data Transfer Optimizations: Several recent efforts have focused on optimizing the performance of the data transfer path in tcp ip protocol stacks, including protocol processing latency [26, 6, 31] and user level handling of network data [23, 30, 14, 15, 10] to increase throughput via data copy minimization. All of these efforts are geared towards traditional best effort traffic, and as such are complementary to our work, which focuses on the performance impact of supporting QoS in tcp ip ....

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. In Proc. of ACM SIGCOMM, pages 73--84, October 1996.


ASHs: Application-Specific Handlers for High-Performance Messaging - Wallach (1996)   (41 citations)  (Correct)

....for instructions and data. Memory and I O devices are accessedover a 25 Mhz TURBOchannel bus. The two DECstation 240s are connected with an AN2 switch. While collecting the numbers reported in this paper, we had a fair number of problems with cache conflicts (similar to problems reported by others [32]) because the DECstations have directmapped caches. We took two steps in order to minimize the effect these conflicts had on our experiments: first, after examining the results from linking object files in many different orders, we picked a best case timing to report, and second, for any set of ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. Technical Report TR96-93, University of Arizona, 1996.


Optimizing a CORBA IIOP Protocol Engine for Minimal.. - Gokhale, Schmidt (1998)   (4 citations)  (Correct)

....3.2.5 Optimization Steps 3 and 4: Optimizing for Processor Caches Processor caches are small, very fast memory used to significantly speed up operations [20] To leverage off the advantages offered by the processor cache it is imperative that the footprint of the operations be small. [21] describes several techniques to improve protocol latency. One of the primary areas to be considered for improving protocol performance is to improve the processor cache effectiveness. Hence, the optimizations described in this section are aimed at improving processor cache affinity, thereby ....

....from Table 1, which replaces general purpose methods with efficient special purpose ones. In the present case, however, the large, monolithic interpreter is replaced by special purpose methods for encoding and decoding. 2. Using outlining to optimize for the frequently executed case: Outlining [21] is used to remove gaps that are introduced in the processor cache as a result of branch instructions arising from error handling code. Processor cache gaps are undesirable because they waste memory bandwidth and introduce useless no op instructions in the cache. The purpose of outlining is to ....

[Article contains additional citation context not shown here]

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of Techniques to Improve Protocol Processing Latency, " in Proceedings of SIGCOMM '96, (Stanford, CA), pp. 73--84, ACM, August 1996.


A Uniform and Automatic Approach to Copy.. - Volanschi.. (1996)   (1 citation)  (Correct)

....to improve network throughput and to reduce latency. Madea and Bershad propose to restructure network layers and to move some functions into user space [11] Mosberger et al. describe techniques for improving protocols by reducing the number of cycles stalled to wait for memory access completion [20]. Safety issues in extensible operating systems. Safety is a well known problem encountered in extensible operating system when an extension code has to be down loaded directly into the kernel. While the previous systems did not include a safety verification mechanism, recent extensible operating ....

D. Mosberger, L.L. Peterson, P.G. Bridges, and S.W. O'Malley. Analysis of techniques to improve protocol processing latency. Technical Report 96-03, Department of Computer Science, The University of Arizona, 1996.


Exploring the Performance Impact of QoS Support in.. - Engel, Kandlur, Mehra, ..   (Correct)

....QoS architecture is, therefore, contingent in part upon the accuracy with which an application specifies its run time communication behavior. Protocol stack performance and optimizations of existing implementations has been the subject of numerous research articles, including some very recent ones [7, 8, 9]. However, all these studies focus on the traditional best effort data path. Our study assumes significance in that it quantifies the performance penalty imposed by new data handling components in the protocol stack, and their impact on the besteffort data path. With the popularity of networked ....

....traffic on a particular reservation is based on the service class of that reservation. Protocol Processing and Data Transfer Optimizations: Several recent efforts have focused on optimizing the performance of the data transfer path in TCP IP protocol stacks, including protocol processing latency [7, 8, 9] and user level handling of network data [31, 32, 33, 34, 35] to increase throughput via data copy minimization. All of these efforts are geared towards traditional best effort traffic, and as such are complementary to our work, which focuses on the performance impact of supporting QoS in TCP IP ....

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley, "Analysis of techniques to improve protocol processing latency", in Proc. of ACM SIGCOMM, pp. 73--84, October 1996.


Principles for Optimizing CORBA Internet Inter-ORB Protocol.. - Gokhale, Schmidt (1998)   (12 citations)  (Correct)

....30.90 write Typecode: traverse 76.83 21.82 write Typecode: traverse Analysis for doubles Analysis for BinStructs Figure 16: Sender side Overhead After Applying the Second Optimization (getting rid of waste and precomputation) 3.2. 5 Optimization Steps 3 and 4: Optimizating for Processor Caches [19] describes several techniques to improve protocol latency. One of the primary areas to be considered for improving protocol performance is to improve the processor cache effectiveness. Hence, the optimizations described in this section are aimed at improving processor cache affinity, thereby ....

....from Table 1, which replaces general purpose methods with efficient special purpose ones. In the present case, however, the large, monolithic interpreter is replaced by special purpose methods for encoding and decoding. 2. Using outlining to optimize for the frequently executed case: Outlining [19] is used to remove gaps that are introduced in the processor cache as a result of branch instructions arising from error handling code. Processor cache gaps are undesirable because they waste memory bandwidth and introduce useless no op instructions in the cache. The purpose of outlining is to ....

[Article contains additional citation context not shown here]

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In Proceedings of SIGCOMM '96, pages 73--84, Stanford, CA, August 1996. ACM.


Analysis of Techniques to Improve Protocol Processing Latency - Mosberger (1996)   (43 citations)  Self-citation (Mosberger Peterson Bridges O'malley)   (Correct)

....conflicts. All code was compiled using a version of gcc 2.6.0 that was modified to support outlining [31] While we started with the regular x kernel distribution, we did apply some modifications in the process of porting it to the Alpha. These modification, which are described in detail elsewhere [22], are summarized below: ffl D cache optimizations: First, data structures were reorganized to minimize compiler introduced padding and to co locate structure members that are accessed together. Second, the kernel was adapted to use continuations [8] and stacks that are first class objects so as ....

....of the execution time of the traced code. The instruction traces do not cover all of the processing since the tracing facility did not allow the tracing of interrupt handling. Other than that, the traces are complete. For the sake of brevity, we only summarize the most important results; see [22] for a more detailed discussion. 2 Numbers is this range have been reported in the literature for FDDI and ATM controllers [7] To appear in SIGCOMM 96 10 TCP IP RPC Tp [ s] Length iCPI mCPI Tp [ s] Length iCPI mCPI BAD 167.0 Sigma 1.75 4718 1.61 4.58 154.2 Sigma 0.47 4253 1.69 4.66 STD ....

[Article contains additional citation context not shown here]

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. Technical Report 96-03, University of Arizona, Tucson, AZ 85721, 1996.


Fast Paths in Concurrent Programs - Xu, Kumar, Li (2004)   (Correct)

No context found.

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. In SIGCOMM, 1996.


Copyright 2002, Intel Corporation, All rights reserved. - Queue-Pair Ip Hybrid   (Correct)

No context found.

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley, "Analysis of techniques to improve protocol processing latency," Proceedings of ACM SIGCOMM '96,ACM, Stanford, CA, USA, 1996.


System Support for Online Reconfiguration - Craig Soules Jonathan (2003)   (7 citations)  (Correct)

No context found.

D. Mosberger, L. L. Peterson, P. G. Bridges, and S. O'Malley. Analysis of techniques to improve protocol processing latency. ACM SIGCOMM Conference. Published as Computer Communication Review, 26(4):73--84. ACM, 1996.


Eddie Kohler - Technology Square Room   (Correct)

No context found.

David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O'Malley. Analysis of techniques to improve protocol processing latency. In Proceedings of the ACM SIGCOMM 1996.


Architectural Analysis and Instruction-Set Optimization for.. - Haiyong Xie Li   (Correct)

No context found.

D. Mosberger, L.L. Peterson, P.G. Bridges, and S. O'Malley. Analysis of Techniques to Improve Protocol Processing Latency. Proceedings of SIGCOMM '91 Symposium on Communication Architectures and Protocols, 1996

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC