Results 1 - 10
of
464
Xen and the art of virtualization
- IN SOSP
, 2003
"... Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100 % binary compatibility at the expense of performance. Others sacrifice security or fun ..."
Abstract
-
Cited by 2010 (35 self)
- Add to MetaCart
(Show Context)
Numerous systems have been designed which use virtualization to subdivide the ample resources of a modern computer. Some require specialized hardware, or cannot support commodity operating systems. Some target 100 % binary compatibility at the expense of performance. Others sacrifice security or functionality for speed. Few offer resource isolation or performance guarantees; most provide only best-effort provisioning, risking denial of service. This paper presents Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality. This is achieved by providing an idealized virtual machine abstraction to which operating systems such as Linux, BSD and Windows XP, can be ported with minimal effort. Our design is targeted at hosting up to 100 virtual machine instances simultaneously on a modern server. The virtualization approach taken by Xen is extremely efficient: we allow operating systems such as Linux and Windows XP to be hosted simultaneously for a negligible performance overhead — at most a few percent compared with the unvirtualized case. We considerably outperform competing commercial and freely available solutions in a range of microbenchmarks and system-wide tests.
EROS: a fast capability system
- In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP’99), pages 170–185, Kiawah Island Resort
, 1999
"... EROS is a capability-based operating system for commodity processors which uses a single level storage model. The single level store's persistence is transparent to applications. The performance consequences of support for transparent persistence and capability-based architectures are generally ..."
Abstract
-
Cited by 236 (29 self)
- Add to MetaCart
(Show Context)
EROS is a capability-based operating system for commodity processors which uses a single level storage model. The single level store's persistence is transparent to applications. The performance consequences of support for transparent persistence and capability-based architectures are generally believed to be negative. Surprisingly, the basic operations of EROS (such as IPC) are generally comparable in cost to similar operations in conventional systems. This is demonstrated with a set of microbenchmark measurements of semantically similar operations in Linux. The EROS system achieves its performance by coupling well-chosen abstract objects with caching techniques for those objects. The objects (processes, nodes, and pages) are well-supported by conventional hardware, reducing the overhead of capabilities. Software-managed caching techniques for these objects reduce the cost of persistence. The resulting performance suggests that composing protected subsystems may be less costly than commonly believed. 1
SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes
- SOSP'07
, 2007
"... We propose SecVisor, a tiny hypervisor that ensures code integrity for commodity OS kernels. In particular, SecVisor ensures that only user-approved code can execute in kernel mode over the entire system lifetime. This protects the kernel against code injection attacks, such as kernel rootkits. SecV ..."
Abstract
-
Cited by 217 (10 self)
- Add to MetaCart
(Show Context)
We propose SecVisor, a tiny hypervisor that ensures code integrity for commodity OS kernels. In particular, SecVisor ensures that only user-approved code can execute in kernel mode over the entire system lifetime. This protects the kernel against code injection attacks, such as kernel rootkits. SecVisor can achieve this property even against an attacker who controls everything but the CPU, the memory controller, and system memory chips. Further, SecVisor can even defend against attackers with knowledge of zero-day kernel exploits. Our goal is to make SecVisor amenable to formal verification and manual audit, thereby making it possible to rule out known classes of vulnerabilities. To this end, SecVisor offers small code size and small external interface. We rely on memory virtualization to build SecVisor and implement two versions, one using software memory virtualization and the other using CPU-supported memory virtualization. The code sizes of the runtime portions of these versions are 1739 and 1112 lines, respectively. The size of the external interface for both versions of SecVisor is 2 hypercalls. It is easy to port OS kernels to SecVisor. We port the Linux kernel version 2.6.20 by adding 12 lines and deleting 81 lines, out of a total of approximately 4.3 million lines of code in the kernel.
Small forwarding tables for fast routing lookups
- in ACM Sigcomm
, 1997
"... For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must �nd the routing entry with the longest matching pre�x, a task that has been thought to require hardware support at lookup freq ..."
Abstract
-
Cited by 216 (0 self)
- Add to MetaCart
(Show Context)
For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must �nd the routing entry with the longest matching pre�x, a task that has been thought to require hardware support at lookup frequencies of millions per second. We present a forwarding table data structure designed for quick routing lookups. Forwarding tables are small enough to �t in the cache of a conventional general purpose processor. With the table in cache, a 200 MHz Pentium Pro or a 333 MHz Alpha 21164 can perform a few million lookups per second. This means that it is feasible to do a full routing lookup for each IPpacket at gigabit speeds without special hardware. The forwarding tables are very small, a large routing table with 40,000 routing entries can be compacted to a forwarding table of 150�160 Kbytes. A lookup typically requires less than 100 instructions on an Alpha, using eight memory references accessing a total of 14 bytes. 1
Linux security modules: General security support for the linux kernel
- In Proceedings of the 11th USENIX Security Symposium
, 2002
"... Symposium ..."
(Show Context)
Dynamic Voltage Scaling on a Low-Power Microprocessor
- Mobile Computing and Networking - Mobicom
, 2001
"... Power consumption is the limiting factor for the functionality of future wearable devices. Since interactive applications like wireless information access generate bursts of activities, it is important to match the performance of the wearable device accordingly. This paper describes a system with a ..."
Abstract
-
Cited by 156 (7 self)
- Add to MetaCart
Power consumption is the limiting factor for the functionality of future wearable devices. Since interactive applications like wireless information access generate bursts of activities, it is important to match the performance of the wearable device accordingly. This paper describes a system with a microprocessor whose speed can be varied (frequency scaling) as well as its input voltage. Voltage scaling is important for reducing power consumption to very low values when operating at low speeds. Measurements show that the energy per instruction at minimal speed (59 MHz) is 1/5 of the energy required at full speed (251 MHz). The frequency and voltage can be scaled dynamically from user space in only 140 ¡ s. This allows power-aware applications to quickly adjust the performance level of the processor whenever the workload changes. 1
Performance Characterization of the quad Pentium Pro SMP using OLTP workloads
- In Proc. of the Intl. Symp. on Computer Architecture
, 1998
"... Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these differences is that computers optimized for technical workloads may not provide good performance for commercial applications, ..."
Abstract
-
Cited by 150 (4 self)
- Add to MetaCart
(Show Context)
Commercial applications are an important, yet often overlooked, workload with significantly different characteristics from technical workloads. The potential impact of these differences is that computers optimized for technical workloads may not provide good performance for commercial applications, and these applications may not fully exploit advances in processor design. To evaluate these issues, we use hardware counters to measure architectural features of a four-processor Pentium Pro-based server running a TPC-C-like workload on an Informix database. We examine the effectiveness of out-of-order execution, branch prediction, speculative execution, superscalar issue and retire, caching and multiprocessor scaling. We find that out-of-order execution, superscalar issue and retire, and branch prediction are not as effective for database workloads as they are for technical workloads, such as SPEC. We find that caches are effective at reducing processor traffic to memory; even larger caches would be helpful to satisfy more data requests. Multiprocessor scaling of this workload is good, but even modest memory system utilization degrades application memory latency, limiting database throughput.
The performance of µ-kernel-based systems
- In 16 th ACM Symposium on Operating Systems Principles
, 1997
"... First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux op ..."
Abstract
-
Cited by 148 (17 self)
- Add to MetaCart
(Show Context)
First-generation µ-kernels have a reputation for being too slow and lacking sufficient flexibility. To determine whether L4, a lean second-generation µ-kernel, has overcome these limitations, we have repeated several earlier experiments and conducted some novel ones. Moreover, we ported the Linux operating system to run on top of the L4 µ-kernel and compared the resulting system with both Linux running native, and MkLinux, a Linux version that executes on top of a first-generation Mach-derived µ-kernel. For L4Linux, the AIM benchmarks report a maximum throughput which is only 5 % lower than that of native Linux. The corresponding penalty is 5 times higher for a co-located in-kernel version of MkLinux, and 7 times higher for a user-level version of MkLinux. These numbers demonstrate both that it is possible to implement a high-performance conventional operating system personality above a µ-kernel, and that the performance of the µ-kernel is crucial to achieve this. Further experiments illustrate that the resulting system is highly extensible and that the extensions perform well. Even real-time memory management including second-level cache allocation can be implemented at user-level, coexisting with L4 Linux. 1
Microkernels meet recursive virtual machines
, 1996
"... This paper describes a novel approach to providing modular and extensible operating system functionality and encapsulated environments based on a synthesis of microkernel and virtual machine concepts. We have developed a software-based virtualizable architecture called Fluke that allows recursive vi ..."
Abstract
-
Cited by 132 (27 self)
- Add to MetaCart
This paper describes a novel approach to providing modular and extensible operating system functionality and encapsulated environments based on a synthesis of microkernel and virtual machine concepts. We have developed a software-based virtualizable architecture called Fluke that allows recursive virtual machines (virtual machines running on other virtual machines) to be implemented efficiently by a microkernel running on generic hardware. A complete virtual machine interface is provided at each level; efficiency derives from needing to implement only new functionality at each level. This infrastructure allows common OS functionality, such as process management, demand paging, fault tolerance, and debugging support, to be provided by cleanly modularized, independent, stackable virtual machine monitors, implemented as user processes. It can also provide uncommon or unique OS features, including the above features specialized for particular applications ’ needs, virtual machines transparently distributed cross-node, or security monitors that allow arbitrary untrusted binaries to be executed safely. Our prototype implementation of this model indicates that it is practical to modularize operating systems this way. Some types of virtual machine layers impose almost no overhead at all, while others impose some overhead (typically 0–35%), but only on certain classes of applications.
Effective Distributed Scheduling of Parallel Workloads
, 1996
"... We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particul ..."
Abstract
-
Cited by 127 (5 self)
- Add to MetaCart
(Show Context)
We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler p...