Results 1 - 10
of
56
Exokernel: An Operating System Architecture for Application-Level Resource Management
, 1995
"... We describe an operating system architecture that securely multiplexes machine resources while permitting an unprecedented degree of application-specific customization of traditional operating system abstractions. By abstracting physical hardware resources, traditional operating systems have signifi ..."
Abstract
-
Cited by 561 (20 self)
- Add to MetaCart
We describe an operating system architecture that securely multiplexes machine resources while permitting an unprecedented degree of application-specific customization of traditional operating system abstractions. By abstracting physical hardware resources, traditional operating systems have significantly limited the performance, flexibility, and functionality of applications. The exokernel architecture removes these limitations by allowing untrusted software to implement traditional operating system abstractions entirely at application-level. We have implemented a prototype exokernel-based system that includes Aegis, an exokernel, and ExOS, an untrusted application-level operating system. Aegis defines the low-level interface to machine resources. Applications can allocate and use machine resources, efficiently handle events, and participate in resource revocation. Measurements show that most primitive Aegis operations are 10–100 times faster than Ultrix,a mature monolithic UNIX operating system. ExOS implements processes, virtual memory, and inter-process communication abstractions entirely within a library. Measurements show that ExOS’s application-level virtual memory and IPC primitives are 5–50 times faster than Ultrix’s primitives. These results demonstrate that the exokernel operating system design is practical and offers an excellent combination of performance and flexibility. 1
Extensibility, safety and performance in the SPIN operating system
, 1995
"... This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensi ..."
Abstract
-
Cited by 392 (14 self)
- Add to MetaCart
This paper describes the motivation, architecture and performance of SPIN, an extensible operating system. SPIN provides an extension infrastructure, together with a core set of extensible services, that allow applications to safely change the operating system's interface and implementation. Extensions allow an application to specialize the underlying operating system in order to achieve a particular level of performance and functionality. SPIN uses language and link-time mechanisms to inexpensively export ne-grained interfaces to operating system services. Extensions are written in a type safe language, and are dynamically linked into the operating system kernel. This approach o ers extensions rapid access to system services, while protecting the operating system code executing within the kernel address space. SPIN and its extensions are written in Modula-3 and run on DEC Alpha workstations. 1
Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems
- In Proceedings of the Operating Systems Design and Implementation Symposium
, 1996
"... This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large a ..."
Abstract
-
Cited by 146 (19 self)
- Add to MetaCart
This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large amount of memory it consumes for protocol overhead data, and because of the diculty of garbage collecting that data. To achieve more scalable performance, we introduce and evaluate two new protocols. The rst, Home-based LRC (HLRC), is based on the Automatic Update Release Consistency (AURC) protocol. Like AURC, HLRC maintains a home for each page to which all updates are propagated and from which all copies are derived. Unlike AURC, HLRC requires no specialized hardware support. We nd that the use of homes provides substantial improvements in performance and scalability over LRC. Our second protocol, called Overlapped Home-based LRC (OHLRC), takes advantage of the communication processor found on each node of the Paragon to ooad some of the protocol overhead of HLRC from the critical path followed by the compute processor. We nd that OHLRC provides modest improvements over HLRC. We also apply overlapping to the base LRC protocol, with similar results. Our experiments were done using ve of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message trac, and memory use for each of the protocols. 1
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
Software Write Detection for a Distributed Shared Memory
- IN PROCEEDINGS OF THE FIRST USENIX SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION
, 1994
"... Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection create two problems for a DSM system. First, writes can have high overhead since they are detected w ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection create two problems for a DSM system. First, writes can have high overhead since they are detected with a page fault. As a result, a page must be writtenmany times to amortize the cost of that fault. Second, the size of a virtual memory page is too big to serve as a unit of coherency, inducing false sharing. Mechanisms to handle false sharing can increase runtime overhead and may cause data to be unnecessarily communicated between processors. In this paper, we present a new method for write detection that solves these problems. Our method relies on the compiler and runtime system to detect writes to shared data without invoking the operating system. We measure and compare implementations of a distributed shared memory system using both strategies, virtual memory and compiler /runtime, run...
Improving Release-Consistent Shared Virtual Memory using Automatic Update
- IN THE 2ND IEEE SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE
, 1996
"... Shared virtual memory is a software technique to provide shared memory on a network of computers without special hardware support. Although several relaxed consistency models and implementations are quite effective, there is still a considerable performance gap between the "software-only" approach a ..."
Abstract
-
Cited by 82 (20 self)
- Add to MetaCart
Shared virtual memory is a software technique to provide shared memory on a network of computers without special hardware support. Although several relaxed consistency models and implementations are quite effective, there is still a considerable performance gap between the "software-only" approach and the hardware approach that uses directory-based caches. Automatic update is a simple communication mechanism, implemented in the SHRIMP multicomputer, that forwards local writes to remote memory transparently. In this paper we propose a new lazy release consistency based protocol, called Automatic Update Release Consistency (AURC), that uses automatic update to propagate and merge shared memory modifications. We compare the performance of this protocol against a software-only LRC implementation on several Splash2 applications and show that the AURC approach can substantially improve the performance of LRC. For 16 processors, the average speedup has increased from 5.9 under LRC, to 8.3 under AURC.
VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication
- IN PROCEEDINGS OF HOT INTERCONNECTS
, 1997
"... The basic virtual memory-mapped communication (VMMC) model provides protected, direct communication between the sender's and receiver's virtual address spaces, but it does not support high-level connection-oriented communication APIs well. This paper presents VMMC-2, an extension to the basic VMMC.W ..."
Abstract
-
Cited by 71 (18 self)
- Add to MetaCart
The basic virtual memory-mapped communication (VMMC) model provides protected, direct communication between the sender's and receiver's virtual address spaces, but it does not support high-level connection-oriented communication APIs well. This paper presents VMMC-2, an extension to the basic VMMC.We describe the design, implementation, and evaluate the performance of three mechanisms in VMMC-2: (1) a user-managed TLB mechanism for address translation which enables user libraries to dynamically manage the amount of pinned space and requires only driver support from many operating systems# (2) a transfer redirection mechanism whichavoids copying on the receiver 's side# (3) a reliable communication protocol at the data link layer whichavoids copying on the sender's side. Tovalidate our extensions we implemented stream sockets on top of the VMMC-2 running on a Myrinet network of Pentium PCs. This zero-copysockets implementation provides a maximum bandwidth of over 84 Mbytes/s and a one-way latency of 20 µs.
A Comparison of OS Extension Technologies
, 1996
"... The current trend in operating systems research is to allow applications to dynamically extend the kernel to improve application performance or extend functionality, but the most effective approach to extensibility remains unclear. Some systems use safe languages to permit code to be downloaded dire ..."
Abstract
-
Cited by 70 (7 self)
- Add to MetaCart
The current trend in operating systems research is to allow applications to dynamically extend the kernel to improve application performance or extend functionality, but the most effective approach to extensibility remains unclear. Some systems use safe languages to permit code to be downloaded directly into the kernel; other systems provide in-kernel interpreters to execute extension code; still others use software techniques to ensure the safety of kernel extensions. The key characteristics that distinguish these systems are the philosophy behind extensibility and the technology used to implement extensibility. This paper presents a taxonomy of the types of extensions that might be desirable in an extensible operating system, evaluates the performance cost of various extension technologies currently being employed, and compares the cost of adding a kernel extension to the benefit of having the extension in the kernel. Our results show that compiled technologies (e.g. Modula-3 and sof...
Self-paging in the nemesis operating system
- In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation
, 1999
"... In contemporary operating systems, continuous media (CM) applications are sensitive to the behaviour of other tasks in the system. This is due to contention in the kernel (or in servers) between these applications. To properly support CM tasks, we require “Quality of Service Firewalling” between dif ..."
Abstract
-
Cited by 68 (3 self)
- Add to MetaCart
In contemporary operating systems, continuous media (CM) applications are sensitive to the behaviour of other tasks in the system. This is due to contention in the kernel (or in servers) between these applications. To properly support CM tasks, we require “Quality of Service Firewalling” between different applications. This paper presents a memory management system supporting Quality of Service (QoS) within the Nemesis operating system. It combines application-level paging techniques with isolation, exposure and responsibility in a manner we call self-paging. This enables rich virtual memory usage alongside (or even within) continuous media applications. 1
Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors
, 1996
"... Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited ..."
Abstract
-
Cited by 56 (3 self)
- Add to MetaCart
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architectures do not provide a fine-grained, low-overhead mechanism for observing and reacting to memory behavior directly. To fill this need, we propose a new class of memory operations called informing memory operations, which essentially consist of a memory operation combined (either implicitly or explicitly) with a conditional branch-and-link operation that is taken only if the reference suffers a cache miss. We describe two different implementations of informing memory operations---one based on a cache-outcome condition code and another based on low-overhead traps---and find that modern in-order-issue and out-of-order-issue superscalar processors already contain the bulk of the necessary hardware support. We describe how a number of software-based memory optimizations can exploit informing memory operations to enhance performance, and look at cache coherence with fine-grained access control as a case study. Our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the Alpha 21164 and MIPS R10000 processors is generally small enough to provide considerable flexibility to hardware and software designers, and that the cache coherence application has improved performance compared to other current solutions. We believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations.

