Results 1 - 10
of
54
PNUTS: Yahoo!’s hosted data serving platform
- IN PROC. 34TH VLDB
, 2008
"... We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistenc ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.
ZooKeeper: Wait-free Coordination for Internet-scale Systems
- In USENIX Annual Technical Conference
"... In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a replicated, centralized service. The interface exposed by Zoo-Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state. These design decisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used extensively by client applications. 1
Large-scale Incremental Processing Using Distributed Transactions and Notifications
- 9th USENIX Symposium on Operating Systems Design and Implementation
"... Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These ta ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency. We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%. 1
OLTP Through the Looking Glass, and What We Found There
"... Online Transaction Processing (OLTP) databases include a suite of features — disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading — that were optimized for computer technology of the late 1970’s. Advances in modern processors, memories, and networks me ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Online Transaction Processing (OLTP) databases include a suite of features — disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading — that were optimized for computer technology of the late 1970’s. Advances in modern processors, memories, and networks mean that today’s computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less. Yet database architecture has changed little. Based on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C. Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload. Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance. We also show that there is no single “high pole in the tent ” in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations. Categories and Subject Descriptors H.2.4 [Database Management]: Systems — transaction processing; concurrency.
Abbadi, “G-Store: A Scalable Data Store for Transactional Multi key
- Access in the Cloud,” in SOCC, 2010
"... Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores – such as Bigtable, PNUTS, D ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and the data associated with them, scalable data management systems form a crucial part of the cloud infrastructure. Key-Value stores – such as Bigtable, PNUTS, Dynamo, and their open source analogues – have been the preferred data stores for applications in the cloud. In these systems, data is represented as Key-Value pairs, and atomic access is provided only at the granularity of single keys. While these properties work well for current applications, they are insufficient for the next generation web applications – such as online gaming, social networks, collaborative editing, and many more – which emphasize collaboration. Since collaboration by definition requires consistent access to groups of keys, scalable and consistent multi key access is critical for such applications. We propose the Key Group abstraction that defines a relationship between a group of keys and is the granule for on-demand transactional access. This abstraction allows the Key Grouping protocol to collocate control for the keys in the group to allow efficient access to the group of keys. Using the Key Grouping protocol, we design and implement G-Store which uses a key-value store as an underlying substrate to provide efficient, scalable, and transactional multi key access. Our implementation using a standard key-value store and experiments using a cluster of commodity machines show that G-Store preserves the desired properties of key-value stores, while providing multi key access functionality at a very low overhead.
Operating System Transactions
, 2008
"... Operating systems should provide system transactions to user applications, in which user-level processes execute a series of system calls atomically and in isolation from other processes on the system. System transactions provide a simple tool for programmers to express safety conditions during conc ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Operating systems should provide system transactions to user applications, in which user-level processes execute a series of system calls atomically and in isolation from other processes on the system. System transactions provide a simple tool for programmers to express safety conditions during concurrent execution. This paper describes TxOS, a variant of Linux 2.6.22, which is the first operating system to implement system transactions on commodity hardware with strong isolation and fairness between transactional and non-transactional system calls. System transactions provide a simple and expressive interface for user programs to avoid race conditions on system resources. For instance, system transactions eliminate time-of-check-to-time-of-use (TOCTTOU) race conditions in the file system which are a class of security vulnerability that are difficult to eliminate with other techniques. System transactions also provide transactional semantics for user-level transactions that require system resources, allowing applications using hardware or software transactional memory system to safely make system calls. While system transactions may reduce single-thread performance, they can yield more scalable performance. For example, enclosing link and unlink within a system transaction outperforms rename on Linux by 14 % at 8 CPUs.
Object Storage on CRAQ High-throughput chain replication for read-mostly workloads
"... Massive storage systems typically replicate and partition data over many potentially-faulty components to provide both reliability and scalability. Yet many commerciallydeployed systems, especially those designed for interactive use by customers, sacrifice stronger consistency properties in the desi ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Massive storage systems typically replicate and partition data over many potentially-faulty components to provide both reliability and scalability. Yet many commerciallydeployed systems, especially those designed for interactive use by customers, sacrifice stronger consistency properties in the desire for greater availability and higher throughput. This paper describes the design, implementation, and evaluation of CRAQ, a distributed object-storage system that challenges this inflexible tradeoff. Our basic approach, an improvement on Chain Replication, maintains strong consistency while greatly improving read throughput. By distributing load across all object replicas, CRAQ scales linearly with chain size without increasing consistency coordination. At the same time, it exposes noncommitted operations for weaker consistency guarantees when this suffices for some applications, which is especially useful under periods of high system churn. This paper explores additional design and implementation considerations for geo-replicated CRAQ storage across multiple datacenters to provide locality-optimized operations. We also discuss multi-object atomic updates and multicast optimizations for large-object updates. 1
ElasTraS: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud
"... Cloud computing has emerged as a pervasive platform for deploying scalable and highly available Internet applications. To facilitate the migration of data-driven applications to the cloud: elasticity, scalability, fault-tolerance, and self-manageability (henceforth referred to as cloud features) are ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
Cloud computing has emerged as a pervasive platform for deploying scalable and highly available Internet applications. To facilitate the migration of data-driven applications to the cloud: elasticity, scalability, fault-tolerance, and self-manageability (henceforth referred to as cloud features) are fundamental requirements for database management systems (DBMS) driving such applications. Even though extremely successful in the traditional enterprise setting – the high cost of commercial relational database software, and the lack of the desired cloud features in the open source counterparts – relational databases (RDBMS) are not a competitive choice for cloud-bound applications. As a result, Key-Value stores have emerged as a preferred choice for scalable and faulttolerant data management, but lack the rich functionality, and transactional guarantees of RDBMS. We present ElasTraS, an Elastic TranSactional relational database, designed to scale out using a cluster of commodity machines while being fault-tolerant and self managing. ElasTraS is designed to support both classes of database needs for the cloud: (i) large databases partitioned across a set of nodes, and (ii) a large number of small and independent databases common in multi-tenant databases. ElasTraS borrows from the design philosophy of scalable Key-Value stores to minimize distributed synchronization and remove scalability bottlenecks, while leveraging decades of research on transaction processing, concurrency control, and recovery to support rich functionality and transactional guarantees. We present the design of ElasTraS, implementation details of our initial prototype system, and experimental results executing the TPC-C benchmark.
DiSTM: A software transactional memory framework for clusters
- In Proc. of the International Conference on Parallel Processing (ICPP
, 2008
"... While Transactional Memory (TM) research on sharedmemory chip multiprocessors has been flourishing over the last years, limited research has been conducted in the cluster domain. In this paper, we introduce a research platform for exploiting software TM on clusters. The Distributed Software Transact ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
While Transactional Memory (TM) research on sharedmemory chip multiprocessors has been flourishing over the last years, limited research has been conducted in the cluster domain. In this paper, we introduce a research platform for exploiting software TM on clusters. The Distributed Software Transactional Memory (DiSTM) system has been designed for easy prototyping of TM coherence protocols and it does not rely on a software or hardware implementation of distributed shared memory. Three TM coherence protocols have been implemented and evaluated with established TM benchmarks. The decentralized Transactional Coherence and Consistency protocol has been compared against two centralized protocols that utilize leases. Results indicate that depending on network congestion and amount of contention different protocols perform better. 1.
Fabric: A platform for secure distributed computation and storage
- In Proc. ACM Symposium on Operating Systems Principles
, 2009
"... Fabric is a new system and language for building secure distributed information systems. It is a decentralized system that allows heterogeneous network nodes to securely share both information and computation resources despite mutual distrust. Its high-level programming language makes distribution a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Fabric is a new system and language for building secure distributed information systems. It is a decentralized system that allows heterogeneous network nodes to securely share both information and computation resources despite mutual distrust. Its high-level programming language makes distribution and persistence largely transparent to programmers. Fabric supports data-shipping and function-shipping styles of computation: both computation and information can move between nodes to meet security requirements or to improve performance. Fabric provides a rich, Java-like object model, but data resources are labeled with confidentiality and integrity policies that are enforced through a combination of compile-time and run-time mechanisms. Optimistic, nested transactions ensure consistency across all objects and nodes. A peer-to-peer dissemination layer helps to increase availability and to balance load. Results from applications built using Fabric suggest that Fabric has a clean, concise programming model, offers good performance, and enforces security. 1

