Results 1 - 10
of
38
Sinfonia: a new paradigm for building scalable distributed systems
- In SOSP
, 2007
"... We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia ..."
Abstract
-
Cited by 153 (12 self)
- Add to MetaCart
(Show Context)
We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia keeps data for applications on a set of memory nodes, each exporting a linear address space. At the core of Sinfonia is a novel minitransaction primitive that enables efficient and consistent access to data, while hiding the complexities that arise from concurrency and failures. Using Sinfonia, we implemented two very different and complex applications in a few months: a cluster file system and a group communication service. Our implementations perform well and scale to hundreds of machines.
NV-Heaps: Making Persistent Objects Fast and Safe with Next-Generation, Non-Volatile Memories
"... Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase chang ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
(Show Context)
Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent. We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32 × and 244×, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.
Operating System Transactions
, 2008
"... Operating systems should provide system transactions to user applications, in which user-level processes execute a series of system calls atomically and in isolation from other processes on the system. System transactions provide a simple tool for programmers to express safety conditions during conc ..."
Abstract
-
Cited by 47 (9 self)
- Add to MetaCart
(Show Context)
Operating systems should provide system transactions to user applications, in which user-level processes execute a series of system calls atomically and in isolation from other processes on the system. System transactions provide a simple tool for programmers to express safety conditions during concurrent execution. This paper describes TxOS, a variant of Linux 2.6.22, which is the first operating system to implement system transactions on commodity hardware with strong isolation and fairness between transactional and non-transactional system calls. System transactions provide a simple and expressive interface for user programs to avoid race conditions on system resources. For instance, system transactions eliminate time-of-check-to-time-of-use (TOCTTOU) race conditions in the file system which are a class of security vulnerability that are difficult to eliminate with other techniques. System transactions also provide transactional semantics for user-level transactions that require system resources, allowing applications using hardware or software transactional memory system to safely make system calls. While system transactions may reduce single-thread performance, they can yield more scalable performance. For example, enclosing link and unlink within a system transaction outperforms rename on Linux by 14 % at 8 CPUs.
Boom analytics: exploring data-centric, declarative programming for the cloud
- In EuroSys
, 2010
"... Building and debugging distributed software remains extremely difficult. We conjecture that by adopting a datacentric approach to system design and by employing declarative programming languages, a broad range of distributed software can be recast naturally in a data-parallel programming model. Our ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
(Show Context)
Building and debugging distributed software remains extremely difficult. We conjecture that by adopting a datacentric approach to system design and by employing declarative programming languages, a broad range of distributed software can be recast naturally in a data-parallel programming model. Our hope is that this model can significantly raise the level of abstraction for programmers, improving code simplicity, speed of development, ease of software evolution, and program correctness. This paper presents our experience with an initial largescale experiment in this direction. First, we used the Overlog language to implement a “Big Data ” analytics stack that is API-compatible with Hadoop and HDFS and provides comparable performance. Second, we extended the system with complex distributed features not yet available in Hadoop, including high availability, scalability, and unique monitoring and debugging facilities. We present both quantitative and anecdotal results from our experience, providing some concrete evidence that both data-centric design and declarative languages can substantially simplify distributed systems programming.
Transactional flash
- In Proc. Symposium on Operating Systems Design and Implementation (OSDI
, 2008
"... Transactional flash (TxFlash) is a novel solid-state drive (SSD) that uses flash memory and exports a transactional interface (WriteAtomic) to the higher-level software. The copy-on-write nature of the flash translation layer and the fast random access makes flash memory the right medium to support ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
(Show Context)
Transactional flash (TxFlash) is a novel solid-state drive (SSD) that uses flash memory and exports a transactional interface (WriteAtomic) to the higher-level software. The copy-on-write nature of the flash translation layer and the fast random access makes flash memory the right medium to support such an interface. We further develop a novel commit protocol called cyclic commit for TxFlash; the protocol has been specified formally and model checked. Our evaluation, both on a simulator and an emulator on top of a real SSD, shows that TxFlash does not increase the flash firmware complexity significantly and provides transactional features with very small overheads (less than 1%), thereby making file systems easier to build. It further shows that the new cyclic commit protocol significantly outperforms traditional commit for small transactions (95 % improvement in transaction throughput) and completely eliminates the space overhead due to commit records. 1
Architecture of a Database System
"... Database Management Systems (DBMSs) are a ubiquitous and critical component of modern computing, and the result of decades of research and development in both academia and industry. Historically, DBMSs were among the earliest multi-user server systems to be developed, and thus pioneered many systems ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
(Show Context)
Database Management Systems (DBMSs) are a ubiquitous and critical component of modern computing, and the result of decades of research and development in both academia and industry. Historically, DBMSs were among the earliest multi-user server systems to be developed, and thus pioneered many systems design techniques for scalability and reliability now in use in many other contexts. While many of the algorithms and abstractions used by a DBMS are textbook material, there has been relatively sparse coverage in the literature of the systems design issues that make a DBMS work. This paper presents an architectural discussion of DBMS design principles, including process models, parallel architecture, storage system design, transaction system implementation, query processor and optimizer architectures, and typical shared components and utilities. Successful commercial and open-source systems are used as points of reference, particularly when multiple alternative designs have been adopted by different groups. 1
Enabling transactional file access via lightweight kernel extensions
- In Proc. 7th USENIX Conference on File and Storage Technologies (FAST ’05
, 2009
"... Transactions offer a powerful data-access method used in many databases today trough a specialized query API. User applications, however, use a different fileaccess API (POSIX) which does not offer transactional guarantees. Applications using transactions can become simpler, smaller, easier to devel ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
(Show Context)
Transactions offer a powerful data-access method used in many databases today trough a specialized query API. User applications, however, use a different fileaccess API (POSIX) which does not offer transactional guarantees. Applications using transactions can become simpler, smaller, easier to develop and maintain, more reliable, and more secure. We explored several techniques how to provide transactional file access with minimal impact on existing programs. Our first prototype was a standalone kernel component within the Linux kernel, but it complicated the kernel considerably and duplicated some of Linux’s existing facilities. Our second prototype was all in user level, and while it was easier to develop, it suffered from high overheads. In this paper we describe our latest prototype and the evolution that led to it. We implemented a transactional file API inside the Linux kernel which integrates easily and seamlessly with existing kernel facilities. This design is easier to maintain, simpler to integrate into existing OSs, and efficient. We evaluated our prototype and other systems under a variety of workloads. We demonstrate that our prototype’s performance is better than comparable systems and comes close to the theoretical lower bound for a log-based transaction manager. 1
I Do Declare: Consensus in a Logic Language,”
- ACM SIGOPS Operating Systems Review,
, 2010
"... Abstract The Paxos consensus protocol can be specified concisely, but is notoriously difficult to implement in practice. We recount our experience building Paxos in Overlog, a distributed declarative programming language. We found that the Paxos algorithm is easily translated to declarative logic, ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
(Show Context)
Abstract The Paxos consensus protocol can be specified concisely, but is notoriously difficult to implement in practice. We recount our experience building Paxos in Overlog, a distributed declarative programming language. We found that the Paxos algorithm is easily translated to declarative logic, in large part because the primitives used in consensus protocol specifications map directly to simple Overlog constructs such as aggregation and selection. We discuss the programming idioms that appear frequently in our implementation, and the applicability of declarative programming to related application domains.
Modular Data Storage with Anvil
"... Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data – for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines, however, making it difficult to experiment wit ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
Databases have achieved orders-of-magnitude performance improvements by changing the layout of stored data – for instance, by arranging data in columns or compressing it before storage. These improvements have been implemented in monolithic new engines, however, making it difficult to experiment with feature combinations or extensions. We present Anvil, a modular and extensible toolkit for building database back ends. Anvil’s storage modules, called dTables, have much finer granularity than prior work. For example, some dTables specialize in writing data, while others provide optimized read-only formats. This specialization makes both kinds of dTable simple to write and understand. Unifying dTables implement more comprehensive functionality by layering over other dTables – for instance, building a read/write store from read-only tables and a writable journal, or building a generalpurpose store from optimized special-purpose stores. The dTable design leads to a flexible system powerful enough to implement many database storage layouts. Our prototype implementation of Anvil performs up to 5.5 times faster than an existing B-tree-based database back end on conventional workloads, and can easily be customized for further gains on specific data and workloads.
ABSTRACT Rose: Compressed, log-structured replication
"... Rose 1 is a database storage engine for high-throughput replication. It targets seek-limited, write-intensive transaction processing workloads that perform near real-time decision support and analytical processing queries. Rose uses log structured merge (LSM) trees to create full database replicas u ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
Rose 1 is a database storage engine for high-throughput replication. It targets seek-limited, write-intensive transaction processing workloads that perform near real-time decision support and analytical processing queries. Rose uses log structured merge (LSM) trees to create full database replicas using purely sequential I/O, allowing it to provide orders of magnitude more write throughput than B-tree based replicas. Also, LSM-trees cannot become fragmented and provide fast, predictable index scans. Rose’s write performance relies on replicas ’ ability to perform writes without looking up old values. LSM-tree lookups have performance comparable to B-tree lookups. If Rose read each value that it updated then its write throughput would also be comparable to a B-tree. Although we target replication, Rose provides high write throughput to any application