Results 1 -
9 of
9
Optimal Inter-Object Correlation When Replicating for Availability
, 2008
"... Data replication is a key technique for ensuring data availability. Traditionally, researchers have focused on the availability of individual objects, even though user-level tasks (called operations) typically request multiple objects. Our recent experimental study has shown that the assignment of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Data replication is a key technique for ensuring data availability. Traditionally, researchers have focused on the availability of individual objects, even though user-level tasks (called operations) typically request multiple objects. Our recent experimental study has shown that the assignment of object replicas to machines results in subtle yet dramatic effects on the availability of these operations, even though the availability of individual objects remains the same. This paper is the first to approach the assignment problem from a theoretical perspective, and obtains a series of results regarding assignments that provide the best and the worst availability for user-level operations. We use a range of techniques to obtain our results, from standard combinatorial techniques and hill climbing methods to Janson’s inequality (a strong probabilistic tool). Some of the results demonstrate that even quite simple versions of the assignment problem can have surprising answers.
Using Data Accessibility for Resource Selection in Large-Scale Distributed Systems
, 2009
"... Large-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. We introduce the notion of accessibility to capture both availability and ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Large-scale distributed systems provide an attractive scalable infrastructure for network applications. However, the loosely coupled nature of this environment can make data access unpredictable, and in the limit, unavailable. We introduce the notion of accessibility to capture both availability and performance. An increasing number of data-intensive applications require not only considerations of node computation power but also accessibility for adequate job allocations. For instance, selecting a node with intolerably slow connections can offset any benefit to running on a fast node. In this paper, we present accessibility-aware resource selection techniques by which it is possible to choose nodes that will have efficient data access to remote data sources. We show that the local data access observations collected from a node’s neighbors are sufficient to characterize accessibility for that node. By conducting trace-based, synthetic experiments on PlanetLab, we show that the resource selection heuristics guided by this principle significantly outperform conventional techniques such as latency-based or random allocations. The suggested techniques are also shown to be stable even under churn despite the loss of prior observations.
Defragmenting DHT-based Distributed File Systems
"... Existing DHT-based file systems use consistent hashing to assign file blocks to random machines. As a result, a user task accessing an entire file or multiple files needs to retrieve blocks from many different machines. This paper demonstrates that significant availability and performance gains can ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Existing DHT-based file systems use consistent hashing to assign file blocks to random machines. As a result, a user task accessing an entire file or multiple files needs to retrieve blocks from many different machines. This paper demonstrates that significant availability and performance gains can be achieved if, instead, users are able to retrieve all the data needed for a given task from only a few DHT nodes. We explore the design and implications of such a “defragmented ” DHT-based distributed file system, called D2, that also maintains important DHT properties like storage load balance. We show using real-world file system traces that a simple key encoding scheme is sufficient to maintain good defragmentation for most user tasks. Using both simulation and an actual 1,000 node deployment, we show that D2 increases availability by over an order of magnitude and improves user-perceived latency by 30– 100 % compared to a traditional design. 1.
An Analytical Framework and Its Applications for Studying Brick Storage Reliability
"... The reliability of a large-scale storage system is influenced by a complex set of inter-dependent factors. This paper presents a comprehensive and extensible analytical framework that offers quantitative answers to many design tradeoffs. We apply the framework to a number of important design strateg ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The reliability of a large-scale storage system is influenced by a complex set of inter-dependent factors. This paper presents a comprehensive and extensible analytical framework that offers quantitative answers to many design tradeoffs. We apply the framework to a number of important design strategies that a designer and/or administrator must face in reality, including topology-aware replica placement, proactive replication that uses small background network bandwidth and unused disk space to create additional copies. We also quantify the impact of slow (but potentially more accurate) failure detection and lazy replacement of failed disks. We use detailed simulation to verify and refine our analytical model. These results demonstrate the versatility of the framework and serve as a solid step towards more quantitative studies of fundamental system tradeoffs between reliability, performance, and cost in large-scale distributed storage systems. 1.
Towards Optimal Resource Allocation in Partial-Fault Tolerant Applications
"... Abstract—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applications. These applications have the characteristic that under partial failures, they can still produce useful output ..."
Abstract
- Add to MetaCart
Abstract—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applications. These applications have the characteristic that under partial failures, they can still produce useful output though the output quality may be reduced. Thus, the primary goal of resource allocation for PFT applications is to prevent, delay, or minimize the impact of failures on the application output quality. This paper is the first to approach this resource allocation problem from a theoretical perspective, and obtains a series of results regarding component assignments that provide the highest service availability under the constraints imposed by the application data flow graph and the hosting clusters. We show that (1) even simple versions of this resource allocation problem are NP-Hard, (2) a 2-approximate polynomial-time algorithm works for tree topologies, and (3) a simple greedy component placement performs well in practice for general application topologies. We implement a system prototype to study the application availability achieved by Zen compared to failure-oblivious placement, replication, and Zen+replication. Our experimental results show that three PFT applications achieve significant data output quality and availability benefits using Zen. I.
Brief Announcement: Object Replication Degree Customization for High Availability ∗ Categories and Subject Descriptors C.4 [Performance of Systems]: Reliability, availability, and serviceability General Terms
"... Object replication is commonly employed to enhance the availability of data-intensive services. As far we we know, existing availability-oriented replication schemes are oblivious to object request popularities when determining object replication degrees. However, ..."
Abstract
- Add to MetaCart
Object replication is commonly employed to enhance the availability of data-intensive services. As far we we know, existing availability-oriented replication schemes are oblivious to object request popularities when determining object replication degrees. However,
Replica Placement for High Availability in Distributed Stream Processing Systems
"... A significant number of emerging on-line data analysis applications require the processing of data streams, large amounts of data that get updated continuously, to generate outputs of interest or to identify meaningful events. Example domains include network traffic management, stock price monitorin ..."
Abstract
- Add to MetaCart
A significant number of emerging on-line data analysis applications require the processing of data streams, large amounts of data that get updated continuously, to generate outputs of interest or to identify meaningful events. Example domains include network traffic management, stock price monitoring, customized e-commerce websites, and analysis of sensor data. In this paper we look at the problem of high availability in such a distributed stream processing system. By taking into account the particular characteristics of stream processing applications we first identify design principles for a replica placement algorithm for high availability. We incorporate these principles in a decentralized replica placement protocol that aims to maximize availability, while respecting resource constraints, and making performance-aware placement decisions. We have integrated our replica placement protocol in Synergy, our distributed stream processing middleware. Our experimental comparison over PlanetLab with the current state of the art corroborates our claims that our techniques maximize availability while sustaining good performance.
by
, 2008
"... I would like to thank my advisor, Prof. Vana Kalogeraki, for the inspiration and guidance, motivation and support she has offered me. I would also like to thank Prof. Xiaohui Gu for her sharp guidance and inspiring attitude. I am also grateful to my committee members, Prof. Dimitrios Gunopulos and P ..."
Abstract
- Add to MetaCart
I would like to thank my advisor, Prof. Vana Kalogeraki, for the inspiration and guidance, motivation and support she has offered me. I would also like to thank Prof. Xiaohui Gu for her sharp guidance and inspiring attitude. I am also grateful to my committee members, Prof. Dimitrios Gunopulos and Prof. Michalis Faloutsos, for their time and their insightful feedback. I am especially thankful to all my mentors, Dr. Arun Iyengar and Dr. Isabelle Rouvellou from IBM Research, Dr. Michael Kaminsky and Dr. Haifeng Yu from Intel Research, Dr. Debby Levinson and Chris Stroberger from Hewlett-Packard, and Dr. Eric Burger from BEA, for their time and guidance. I would also like to thank the rest of the members of the Distributed Real-Time Systems lab and the anonymous reviewers of [67,68,70–72] for their comments. Finally, I would like to thank my friends and above all my family for their support throughout these years. iv Wenn die Nacht am tiefsten ist, ist der Tag am nächsten.
Towards Optimal Resource Allocation in Partial-Fault Tolerant Applications
"... Abstract—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applications. These applications have the characteristic that under partial failures, they can still produce useful output ..."
Abstract
- Add to MetaCart
Abstract—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applications. These applications have the characteristic that under partial failures, they can still produce useful output though the output quality may be reduced. Thus, the primary goal of resource allocation for PFT applications is to prevent, delay, or minimize the impact of failures on the application output quality. This paper is the first to approach this resource allocation problem from a theoretical perspective, and obtains a series of results regarding component assignments that provide the highest service availability under the constraints imposed by the application data flow graph and the hosting clusters. We show that (1) even simple versions of this resource allocation problem are NP-Hard, (2) a 2-approximate polynomial-time algorithm works for tree topologies, and (3) a simple greedy component placement performs well in practice for general application topologies. We implement a system prototype to study the application availability achieved by Zen compared to failure-oblivious placement, replication, and Zen+replication. Our experimental results show that three PFT applications achieve significant data output quality and availability benefits using Zen. I.

