Results 1 - 10
of
597
Serverless Network File Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1995
"... In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the sy ..."
Abstract
-
Cited by 473 (28 self)
- Add to MetaCart
In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the system can store, cache, or control any block of data. Our approach uses this location independence, in combination with fast local area networks, to provide better performance and scalability than traditional file systems. Further, because any machine in the system can assume the responsibilities of a failed component, our serverless design also provides high availability via redundant data storage. To demonstrate our approach, we have implemented a prototype serverless network file system called xFS. Preliminary performance measurements suggest that our architecture achieves its goal of scalability. For instance, in a 32-node xFS system with 32 active clients, each client receives nearly as much read or write throughput as it would see if it were the only active client.
Application performance and flexibility on Exokernel systems
- In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles
, 1997
"... The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exokernel system that allows specialized applications to achieve high performance without sacrificing t ..."
Abstract
-
Cited by 207 (10 self)
- Add to MetaCart
(Show Context)
The exokernel operating system architecture safely gives untrusted software efficient control over hardware and software resources by separating management from protection. This paper describes an exokernel system that allows specialized applications to achieve high performance without sacrificing the performance of unmodified UNIX programs. It evaluates the exokernel architecture by measuring end-to-end application performance on Xok, an exokernel for Intel x86-based computers, and by comparing Xok’s performance to the performance of two widely-used 4.4BSD UNIX systems (Free-BSD and OpenBSD). The results show that common unmodified UNIX applications can enjoy the benefits of exokernels: applications either perform comparably on Xok/ExOS and the BSD UNIXes, or perform significantly better. In addition, the results show that customized applications can benefit substantially from control over their resources (e.g., a factor of eight for a Web server). This paper also describes insights about the exokernel approach gained through building three different exokernel systems, and presents novel approaches to resource multiplexing. 1
A cost-effective, high-bandwidth storage architecture
- In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 1998
"... (NASD) storage architecture, prototype implementations oj NASD drives, array management for our architecture, and three,filesystems built on our prototype. NASD provides scal-able storage bandwidth without the cost of servers used primarily,fijr trut&rring data from peripheral networks (e.g. SCS ..."
Abstract
-
Cited by 197 (12 self)
- Add to MetaCart
(NASD) storage architecture, prototype implementations oj NASD drives, array management for our architecture, and three,filesystems built on our prototype. NASD provides scal-able storage bandwidth without the cost of servers used primarily,fijr trut&rring data from peripheral networks (e.g. SCSI) to client networks (e.g. ethernet). Increasing datuset sizes, new attachment technologies, the convergence of peripheral and interprocessor switched networks, and the increased availability of on-drive transistors motivate and enable this new architecture. NASD is based on four main principles: direct transfer to clients, secure interfaces via cryptographic support, asynchronous non-critical-path oversight, and variably-sized data objects. Measurements of our prototype system show that these services can be cost-#ectively integrated into a next generation disk drive ASK. End-to-end measurements of our prototype drive andfilesys-terns suggest that NASD cun support conventional distrib-uted filesystems without per$ormance degradation. More importantly, we show scaluble bandwidth for NASD-special-ized filesystems. Using a parallel data mining application, NASD drives deliver u linear scaling of 6.2 MB/s per client-drive pair, tested with up to eight pairs in our lab.
Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server Systems
, 1996
"... The explosive growth of the Internet, the widespread use of WWW-related applications, and the increased reliance on client-server architectures places interesting new demands on network servers. In particular, the operating system running on such systems needs to manage the machine’s resources in a ..."
Abstract
-
Cited by 192 (8 self)
- Add to MetaCart
(Show Context)
The explosive growth of the Internet, the widespread use of WWW-related applications, and the increased reliance on client-server architectures places interesting new demands on network servers. In particular, the operating system running on such systems needs to manage the machine’s resources in a manner that maximizes and maintains throughput under conditions of high load. We propose and evaluate a new network subsystem architecture that provides improved fairness, stability, and increased throughput under high network load. The architecture is hardware independent and does not degrade network latency or bandwidth under normal load conditions.
DPF: Fast, Flexible Message Demultiplexing using Dynamic Code Generation
- In ACM Communication Architectures, Protocols, and Applications (SIGCOMM
, 1996
"... Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibi ..."
Abstract
-
Cited by 142 (12 self)
- Add to MetaCart
(Show Context)
Fast and flexible message demultiplexing are well-established goals in the networking community [1, 18, 22]. Currently, however, network architects have had to sacrifice one for the other. We present a new packet-filter system, DPF (Dynamic Packet Filters), that provides both the traditional flexibility of packet filters [18] and the speed of hand-crafted demultiplexing routines [3]. DPF filters run 10--50 times faster than the fastest packet filters reported in the literature [1, 17, 18, 27]. DPF's performance is either equivalent to or, when it can exploit runtime information, superior to handcoded demultiplexors. DPF achieves high performance by using a carefully-designed declarative packet-filter language that is aggressively optimized using dynamic code generation. The contributions of this work are: (1) a detailed description of the DPF design, (2) discussion of the use of dynamic code generation and quantitative results on its performance impact, (3) quantitative results on how ...
Cluster I/O with River: Making the Fast Case Common
- IN PROCEEDINGS OF THE SIXTH WORKSHOP ON INPUT/OUTPUT IN PARALLEL AND DISTRIBUTED SYSTEMS
, 1999
"... We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case --- even in the face of nonuniformities in hardware, software, and workload. River is based on two simple design features: a high-p ..."
Abstract
-
Cited by 137 (14 self)
- Add to MetaCart
(Show Context)
We introduce River, a data-flow programming environment and I/O substrate for clusters of computers. River is designed to provide maximum performance in the common case --- even in the face of nonuniformities in hardware, software, and workload. River is based on two simple design features: a high-performance distributed queue, and a storage redundancy mechanism called graduated declustering. We have implemented a number of data-intensive applications on River, which validate our design with near-ideal performance in a variety of non-uniform performance scenarios.
Effective Distributed Scheduling of Parallel Workloads
, 1996
"... We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particul ..."
Abstract
-
Cited by 127 (5 self)
- Add to MetaCart
(Show Context)
We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler p...
Mondrian Memory Protection
, 2002
"... Mondrian memory protection (MMP) is a fine-grained protection scheme that allows multiple protection domains to flexibly share memory and export protected services. In contrast to earlier pagebased systems, MMP allows arbitrary permissions control at the granularity of individual words. We use a com ..."
Abstract
-
Cited by 124 (3 self)
- Add to MetaCart
Mondrian memory protection (MMP) is a fine-grained protection scheme that allows multiple protection domains to flexibly share memory and export protected services. In contrast to earlier pagebased systems, MMP allows arbitrary permissions control at the granularity of individual words. We use a compressed permissions table to reduce space overheads and employ two levels of permissions caching to reduce run-time overheads. The protection tables in our implementation add less than 9% overhead to the memory space used by the application. Accessing the protection tables adds less than 8% additional memory references to the accesses made by the application. Although it can be layered on top of demandpaged virtual memory, MMP is also well-suited to embedded systems with a single physical address space. We extend MMP to support segment translation which allows a memory segment to appear at another location in the address space. We use this translation to implement zero-copy networking underneath the standard read system call interface, where packet payload fragments are connected together by the translation system to avoid data copying. This saves 52% of the memory references used by a traditional copying network stack.
Building a robust software-based router using network processors
- In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP
, 2001
"... ABSTRACT Recent efforts to add new services to the Internet have increased in-terest in software-based routers that are easy to extend and evolve. This paper describes our experiences using emerging network pro-cessors--in particular, the Intel IXP1200--to implement a router. We show it is possible ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
ABSTRACT Recent efforts to add new services to the Internet have increased in-terest in software-based routers that are easy to extend and evolve. This paper describes our experiences using emerging network pro-cessors--in particular, the Intel IXP1200--to implement a router. We show it is possible to combine an IXP1200 development boardand a PC to build an inexpensive router that forwards minimumsized packets at a rate of 3:47Mpps. This is nearly an order ofmagnitude faster than existing pure PC-based routers, and sufficient to support 1:77Gbps of aggregate link bandwidth. At lesser aggre-gate line speeds, our design also allows the excess resources available on the IXP1200 to be used robustly for extra packet process-ing. For example, with 8 \Theta 100Mbps links, 240 register operationsand 96 bytes of state storage are available for each 64-byte packet.
Effects of communication latency, overhead, and bandwidth in a cluster architecture
- In Proceedings of the 24th Annual International Symposium on Computer Architecture
, 1997
"... This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on ..."
Abstract
-
Cited by 108 (6 self)
- Add to MetaCart
(Show Context)
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 s. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance. 1