Results 1 - 10
of
14
IOFlow: A Software-Defined Storage Architecture
"... In data centers, the IO path to storage is long and complex. It comprises many layers or “stages ” with opaque interfaces between them. This makes it hard to enforce end-to-end policies that dictate a storage IO flow’s performance (e.g., guarantee a tenant’s IO bandwidth) and routing (e.g., route an ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
In data centers, the IO path to storage is long and complex. It comprises many layers or “stages ” with opaque interfaces between them. This makes it hard to enforce end-to-end policies that dictate a storage IO flow’s performance (e.g., guarantee a tenant’s IO bandwidth) and routing (e.g., route an untrusted VM’s traffic through a sanitization middlebox). These policies require IO differentiation along the flow path and global visibility at the control plane. We design IOFlow, an architecture that uses a logically centralized control plane to enable high-level flow policies. IOFlow adds a queuing abstraction at data-plane stages and exposes this to the controller. The controller can then translate policies into queuing rules at individual stages. It can also choose among multiple stages for policy enforcement. We have built the queue and control functionality at two key OS stages – the storage drivers in the hypervisor and the storage server. IOFlow does not require application or VM changes, a key strength for deployability. We have deployed a prototype across a small testbed with a 40 Gbps network and storage devices. We have built control applications that enable a broad class of multipoint flow policies that are hard to achieve today. 1
Limplock: Understanding the Impact of Limpware on Scale-Out Cloud Systems
"... We highlight one often-overlooked cause of performance failure: limpware – “limping ” hardware whose performance degrades significantly compared to its specification. We report anecdotes of degraded disks and network components seen in large-scale production. To measure the system-level impact of li ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
We highlight one often-overlooked cause of performance failure: limpware – “limping ” hardware whose performance degrades significantly compared to its specification. We report anecdotes of degraded disks and network components seen in large-scale production. To measure the system-level impact of limpware, we assembled limpbench, a set of benchmarks that combine dataintensive load and limpware injections. We benchmark five cloud systems (Hadoop, HDFS, ZooKeeper, Cassandra, and HBase) and find that limpware can severely impact distributed operations, nodes, and an entire cluster. From this, we introduce the concept of limplock, a situation where a system progresses slowly due to the presence of limpware and is not capable of failing over to healthy components. We show how each cloud system that we analyze can exhibit operation, node, and cluster limplock. We conclude that many cloud systems are not limpware tolerant. 1
Natjam: Design and Evaluation of Eviction Policies For Supporting Priorities and Deadlines in Mapreduce Clusters
"... This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) exploration and evaluation of smart eviction policies for jobs and for tasks, based on ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) exploration and evaluation of smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a work-conserving task preemption mechanism for Mapreduce. We incorporated Natjam into the Hadoop YARN scheduler framework (in Hadoop 0.23). We present experiments from deployments on a test cluster, Emulab and a Yahoo! commercial cluster, using both synthetic workloads as well as Hadoop cluster traces from Yahoo!. Our results reveal that Natjam incurs overheads as low as 7%, and is preferable to existing approaches.
PriorityMeister: Tail Latency QoS for Shared Networked Storage
"... Meeting service level objectives (SLOs) for tail latency is an important and challenging open problem in cloud computing infrastructures. The challenges are exacerbated by burstiness in the workloads. This paper describes PriorityMeister – a sys-tem that employs a combination of per-workload priorit ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Meeting service level objectives (SLOs) for tail latency is an important and challenging open problem in cloud computing infrastructures. The challenges are exacerbated by burstiness in the workloads. This paper describes PriorityMeister – a sys-tem that employs a combination of per-workload priorities and rate limits to provide tail latency QoS for shared net-worked storage, even with bursty workloads. PriorityMeister automatically and proactively configures workload priorities and rate limits across multiple stages (e.g., a shared storage stage followed by a shared network stage) to meet end-to-end tail latency SLOs. In real system experiments and un-der production trace workloads, PriorityMeister outperforms most recent reactive request scheduling approaches, with more workloads satisfying latency SLOs at higher latency percentiles. PriorityMeister is also robust to mis-estimation of underlying storage device performance and contains the effect of misbehaving workloads. 1.
From application requests to Virtual IOPs: Provisioned key-value storage with Libra
"... Achieving predictable performance in shared cloud storage services is hard. Tenants want reservations in terms of system-wide application-level throughput, but the provider must ultimately deal with low-level IO resources at each storage node where contention arises. Such a guarantee has thus proven ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Achieving predictable performance in shared cloud storage services is hard. Tenants want reservations in terms of system-wide application-level throughput, but the provider must ultimately deal with low-level IO resources at each storage node where contention arises. Such a guarantee has thus proven elusive, due to the complexities inherent to modern storage stacks: non-uniform IO amplification, unpredictable IO interference, and non-linear IO performance. This paper presents Libra, a local IO scheduling framework designed for a shared SSD-backed key-value storage system. Libra guarantees per-tenant application-request throughput while achieving high utilization. To accomplish this, Libra leverages two techniques. First, Libra tracks the IO resource consumption of a tenant’s application-level requests across complex storage stack interactions, down to low-level IO operations. This allows Libra to allocate per-tenant IO resources for achieving app-request reservations based on their dynamic IO usage profile. Second, Libra uses a disk-IO cost model based on virtual IO operations (VOP) that captures the non-linear relationship between SSD IO bandwidth and IO operation (IOP) throughput. Using VOPs, Libra can both account for the true cost of an IOP and determine the amount of provisionable IO resources available under IO interference. An evaluation shows that Libra, when applied to a LevelDB-based prototype with SSD-backed storage, satisfies tenant app-request reservations and achieves accurate low-level VOP allocations over a range of workloads, while still supporting high utilization. 1.
Fairness and Isolation in Multi-Tenant Storage as Optimization Decomposition
"... Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Today’s approaches for multi-tenant resource allocation are based either on per-VM allocations or hard rate limits that assume uniform ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Shared storage services enjoy wide adoption in commercial clouds. But most systems today provide weak performance isolation and fairness between tenants, if at all. Today’s approaches for multi-tenant resource allocation are based either on per-VM allocations or hard rate limits that assume uniform workloads to achieve high utilization. Instead, Pisces, our system for shared key-value storage, achieves datacenter-wide per-tenant performance isolation and fairness. Pisces achieves per-tenant weighted fair sharing of system resources across the entire shared service, even when partitions belonging to different tenants are co-located and when demand for different partitions is skewed or timevarying. The primary focus of this paper is to highlight the optimization model that motivates the decomposition of Pisces’s fair sharing problem into a combination of four complementary mechanisms—partition placement, weight allocation, replica selection, and weighted fair queuing—that operate on different time-scales to provide system-wide max-min fairness. An evaluation of our Pisces storage prototype achieves nearly ideal (0.98 Min-Max Ratio) fair sharing, strong performance isolation, and robustness to skew and shifts in tenant demand. 1.
Universidade de Lisboa,
"... Abstract: With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract: With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication, which comes with the issue of data consistency among distributed sites. In order to offer a best-in-class service to applications, one wants to maximise performance while minimising latency. In terms of data replication, that means incurring in as low latency as possible when moving data between distant data centres. Traditional consistency models introduce a significant problem for systems architects, which is specially important to note in cases where large amounts of data need to be replicated across wide-area networks. In such scenarios it might be suitable to use eventual consistency, and even though not always convenient, latency can be partly reduced and traded for consistency guarantees so that data-transfers do not impact performance. In contrast, this work proposes a broader range of data semantics for consistency while prioritising data at the cost of putting a minimum latency overhead on the rest of non-critical updates. Finally, we show how these semantics can help in finding an optimal data replication strategy for achieving just the required level of data consistency under low latency and a more efficient network bandwidth utilisation.
Chasing the tail of atomic broadcast protocols
"... Abstract—Many applications today rely on multiple services, whose results are combined to form the application’s response. In such contexts, the most unreliable service and the slowest service determine the application’s reliability and response time, respectively. State-machine replication and atom ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Many applications today rely on multiple services, whose results are combined to form the application’s response. In such contexts, the most unreliable service and the slowest service determine the application’s reliability and response time, respectively. State-machine replication and atomic broadcast are fundamental abstractions to build highly available services. In this paper, we consider the latency variability of atomic broadcast protocols. This is important because atomic broadcast has a direct impact on the response time of services. We study four high performance atomic broadcast protocols representative of different classes of protocol design and characterize their latency tail distribution under different workloads. Next, we assess how key design features of each protocol can possibly be related to the observed latency tail distributions. Our observations hint at re-quest batching as a simple yet effective way to shorten the latency tails of some of the studied protocols; an improvement within the reach of application implementers. Indeed, our observation is not only verified experimentally, it allows us to assess which of the protocol’s key design principles favor the construction of latency predictable protocols. I.
The Case for Limping-Hardware Tolerant Clouds Thanh Do University of Wisconsin–Madison
"... With the advent of cloud computing, thousands of machines are connected and managed collectively. This era is confronted with a new challenge: performance variability, primarily caused by large-scale management issues such as hardware failures, software bugs, and configuration mistakes. In this pape ..."
Abstract
- Add to MetaCart
(Show Context)
With the advent of cloud computing, thousands of machines are connected and managed collectively. This era is confronted with a new challenge: performance variability, primarily caused by large-scale management issues such as hardware failures, software bugs, and configuration mistakes. In this paper, we highlight one overlooked cause: limping hardware – hardware whose performance degrades significantly compared to its specification. We present numerous cases of limping disks, network and processors seen in production, along with the negative impacts of such failures on existing large-scale distributed systems. From these findings, we advocate the concept of limping-hardware tolerant clouds. 1
Computer Sciences
, 2014
"... ii iv vTo my parents, my wife, and my wonderful kids vi vii Acknowledgements My Ph.D. journey has been supported greatly by faculty, friends, and family mem-bers, without whom it would have not been exceptional. I would like to express my deep gratitude for these individuals. First and foremost, I w ..."
Abstract
- Add to MetaCart
(Show Context)
ii iv vTo my parents, my wife, and my wonderful kids vi vii Acknowledgements My Ph.D. journey has been supported greatly by faculty, friends, and family mem-bers, without whom it would have not been exceptional. I would like to express my deep gratitude for these individuals. First and foremost, I would like to thank my three great advisors, Remzi H. Arpaci-Dusseau, Andrea C. Arpaci-Dusseau, and Haryadi S. Gunawi, for their countless guidance, advice, and lessons. Remzi, the research maestro, showed me how one could perform top-notch research effortlessly. Andrea’s meticulous guid-ance helped me sharpen many vital skills for a great researcher. Finally, Haryadi’s consistent, timely, effective, and invaluable detailed advice enabled me to first sur-vive and eventually enjoy every step of my Ph.D. journey. I also would like to thank Shan Lu and Menggang Yu, the other members of my thesis committee, for their great insights and feedback, which helped improve the quality of the dissertation. During my graduate life, I was fortunate to work on some projects with smart