Results 1 - 10
of
13
End-to-end Performance Isolation through Virtual Datacenters
"... The lack of performance isolation in multi-tenant dat-acenters at appliances like middleboxes and storage servers results in volatile application performance. To in-sulate tenants, we propose giving them the abstraction of a dedicated virtual datacenter (VDC). VDCs encapsulate end-to-end throughput ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
The lack of performance isolation in multi-tenant dat-acenters at appliances like middleboxes and storage servers results in volatile application performance. To in-sulate tenants, we propose giving them the abstraction of a dedicated virtual datacenter (VDC). VDCs encapsulate end-to-end throughput guarantees—specified in a new metric based on virtual request cost—that hold across distributed appliances and the intervening network. We present Pulsar, a system that offers tenants their own VDCs. Pulsar comprises a logically centralized con-troller that uses new mechanisms to estimate tenants’ demands and appliance capacities, and allocates data-center resources based on flexible policies. These al-locations are enforced at end-host hypervisors through multi-resource token buckets that ensure tenants with changing workloads cannot affect others. Pulsar’s design does not require changes to applications, guest OSes, or appliances. Through a prototype deployed across 113 VMs, three appliances, and a 40 Gbps network, we show that Pulsar enforces tenants ’ VDCs while imposing over-heads of less than 2 % at the data and control plane. 1
Strategyproof Allocation of Discrete Jobs on Multiple Machines
"... We present a model for fair strategyproof allocations in a realistic model of cloud computing centers. This model has the standard Leontief preferences but also captures a key property of virtualization, the use of containers to isolate jobs. We first present several impossibility results for determ ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We present a model for fair strategyproof allocations in a realistic model of cloud computing centers. This model has the standard Leontief preferences but also captures a key property of virtualization, the use of containers to isolate jobs. We first present several impossibility results for deterministic mechanisms in this setting. We then construct an extension of the well known dominant resource fairness mechanism (DRF), which somewhat surprisingly does not involve the notion of a dominant resource. Our mechanism relies on the connection between the DRF mechanism and the Kalai-Smorodinsky bargaining solution; by computing a weighted max-min over the convex hull of the feasible region we can obtain an ex-ante fair, efficient and strategyproof randomized allocation. This randomized mechanism can be used to construct other mechanisms which do not rely on users ’ being expected (ex-ante) utility maximizers, in several ways. First, for the case of m identical machines one can use the convex structure of the mechanism to get a simple mechanism which is approximately ex-post fair, efficient and strategyproof. Second, we present a more subtle construction for an arbitrary set of machines, using the Shapley-Folkman-Starr theorem to show the existence of an allocation which is approximately ex-post fair, efficient and strategyproof. This paper provides both a rigorous foundation for developing protocols that explicitly utilize the detailed structure of the modern cloud computing hardware and software, and a general method for extending the dominant resource fairness mechanism to more complex settings.
Tetrisched: Space-Time Scheduling for Heterogeneous
, 2013
"... Tetrisched is a new scheduler that explicitly considers both job-specific preferences and estimated job runtimes in its allocation of resources. Combined, this information allows tetrisched to provide higher overall value to complex application mixes consolidated on heterogeneous collections of mach ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Tetrisched is a new scheduler that explicitly considers both job-specific preferences and estimated job runtimes in its allocation of resources. Combined, this information allows tetrisched to provide higher overall value to complex application mixes consolidated on heterogeneous collections of machines. Job-specific preferences, provided by tenants in the form of composable utility functions, allow tetrisched to understand which resources are preferred, and by how much, over other acceptable options. Estimated job runtimes allow tetrisched to plan ahead in deciding whether to wait for a busy preferred resource to become free or to assign a less preferred resource. Tetrisched translates this information, which can be provided automatically by middleware (our wizard) that understands the right SLOs, runtime estimates, and budgets, into a MILP problem that it solves to maximize overall utility. Experiments with a variety of job type mixes, workload intensities, degrees of burstiness, preference strengths, and input inaccuracies show that tetrisched consistently provides significantly better schedules than alternative approaches.
Multi-resource fair allocation with bounded number of tasks in cloud computing systems
, 2014
"... ar ..."
(Show Context)
1Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems
"... Abstract—We study the multi-resource allocation problem in cloud computing systems where the resource pool is constructed from a large number of heterogeneous servers, representing different points in the configuration space of resources such as processing, memory, and storage. We design a multi-res ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We study the multi-resource allocation problem in cloud computing systems where the resource pool is constructed from a large number of heterogeneous servers, representing different points in the configuration space of resources such as processing, memory, and storage. We design a multi-resource allocation mechanism, called DRFH, that generalizes the notion of Dominant Resource Fairness (DRF) from a single server to multiple heterogeneous servers. DRFH provides a number of highly desirable properties. With DRFH, no user prefers the allocation of another user; no one can improve its allocation without decreasing that of the others; and more importantly, no coalition behavior of misreporting resource demands can benefit all its members. DRFH also ensures some level of service isolation among the users. As a direct application, we design a simple heuristic that implements DRFH in real-world systems. Large-scale simulations driven by Google cluster traces show that DRFH significantly outperforms the traditional slot-based scheduler, leading to much higher resource utilization with substantially shorter job completion times. Index Terms—Cloud computing, heterogeneous servers, job scheduling, multi-resource allocation, fairness. F 1
Enhanced cluster computing performance through proportional fairness. arXiv:1404.2266
, 2014
"... ar ..."
From Lone Dwarfs to Giant Superclusters: Rethinking Operating System Abstractions for the Cloud
- In: Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS). Kartause Ittingen
"... Abstract Unix took a rich smorgasbord of operating system features from its predecessors and pared it down to a small but powerful set of abstractions: files, processes, pipes, and the shell to glue the system together. In the intervening forty years, the common-case computational substrate has evo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Unix took a rich smorgasbord of operating system features from its predecessors and pared it down to a small but powerful set of abstractions: files, processes, pipes, and the shell to glue the system together. In the intervening forty years, the common-case computational substrate has evolved from a lone PDP-11 minicomputer to vast clouds of virtualized computational resources. Contemporary distributed systems are being built by adding layer upon layer atop the foundation established by Unix's chosen abstractions. Unfortunately, the resulting mess has lost the "simplicity, elegance, and ease of use" that was a hallmark of the original Unix design
Filo: consolidated consensus as a cloud service
"... Abstract Consensus is at the core of many production-grade distributed systems. Given the prevalence of these systems, it is important to offer consensus as a cloud service. To match the multi-tenant requirements of the cloud, consensus as a service must provide performance guarantees, and prevent ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Consensus is at the core of many production-grade distributed systems. Given the prevalence of these systems, it is important to offer consensus as a cloud service. To match the multi-tenant requirements of the cloud, consensus as a service must provide performance guarantees, and prevent aggressive tenants from disrupting the others. Fulfilling this goal is not trivial without overprovisioning and under-utilizing resources. We present Filo, the first system to provide consensus as a multi-tenant cloud service with throughput guarantees and efficient utilization of cloud resources. Tenants request an SLA by specifying their target throughput and degree of fault-tolerance. Filo then efficiently consolidates tenants on a shared set of servers using a novel placement algorithm that respects constraints imposed by the consensus problem. To respond to the load variations at runtime, Filo proposes a novel distributed controller that piggybacks on the consensus protocol to coordinate resource allocations across the servers and distribute the unused capacity fairly. Using a real testbed and simulations, we show that our placement algorithm is efficient at consolidating tenants, and while obtaining comparable efficiency and fairness, our distributed controller is ∼ 5x faster than the centralized baseline approach.
USENIX Association 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14) 233 End-to-end Performance Isolation through Virtual Datacenters
"... The lack of performance isolation in multi-tenant dat-acenters at appliances like middleboxes and storage servers results in volatile application performance. To in-sulate tenants, we propose giving them the abstraction of a dedicated virtual datacenter (VDC). VDCs encapsulate end-to-end throughput ..."
Abstract
- Add to MetaCart
(Show Context)
The lack of performance isolation in multi-tenant dat-acenters at appliances like middleboxes and storage servers results in volatile application performance. To in-sulate tenants, we propose giving them the abstraction of a dedicated virtual datacenter (VDC). VDCs encapsulate end-to-end throughput guarantees—specified in a new metric based on virtual request cost—that hold across distributed appliances and the intervening network. We present Pulsar, a system that offers tenants their own VDCs. Pulsar comprises a logically centralized con-troller that uses new mechanisms to estimate tenants’ demands and appliance capacities, and allocates data-center resources based on flexible policies. These al-locations are enforced at end-host hypervisors through multi-resource token buckets that ensure tenants with changing workloads cannot affect others. Pulsar’s design does not require changes to applications, guest OSes, or appliances. Through a prototype deployed across 113 VMs, three appliances, and a 40 Gbps network, we show that Pulsar enforces tenants ’ VDCs while imposing over-heads of less than 2 % at the data and control plane. 1
Altruistic Scheduling in Multi-Resource Clusters
"... Abstract Given the well-known tradeoffs between fairness, performance, and efficiency, modern cluster schedulers often prefer instantaneous fairness as their primary objective to ensure performance isolation between users and groups. However, instantaneous, short-term convergence to fairness often ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Given the well-known tradeoffs between fairness, performance, and efficiency, modern cluster schedulers often prefer instantaneous fairness as their primary objective to ensure performance isolation between users and groups. However, instantaneous, short-term convergence to fairness often does not result in noticeable long-term benefits. Instead, we propose an altruistic, long-term approach, CARBYNE, where jobs yield fractions of their allocated resources without impacting their own completion times. We show that leftover resources collected via altruisms of many jobs can then be rescheduled to further secondary goals such as application-level performance and cluster efficiency without impacting performance isolation. Deployments and large-scale simulations show that CARBYNE closely approximates the stateof-the-art solutions (e.g., DRF [27]) in terms of performance isolation, while providing 1.26× better efficiency and 1.59× lower average job completion time.