Results 1 - 10
of
47
Paragon: Qos-aware scheduling for heterogeneous datacenters
- In Proceedings of the eighteenth international
, 2013
"... Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty to match applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online and do not scale beyond few applications. We present Paragon, an online and scalable DC scheduler that is heterogeneity and interference-aware. Paragon is derived from robust analytical methods and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown, incoming workload with respect to heterogeneity and interference in multiple shared resources, by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. Paragon scales to tens of thousands of servers with marginal scheduling overheads in terms of time or state. We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91 % of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious and least-loaded schedulers only provide similar guarantees for 14%, 11 % and 3 % of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.
Sparrow: Distributed, Low Latency Scheduling
"... Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of t ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
(Show Context)
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler. 1
Quasar: Resource-Efficient and QoS-Aware Cluster Management
"... Cloud computing promises flexibility and high performance for users and high cost-efficiency for operators. Neverthe-less, most cloud facilities operate at very low utilization, hurting both cost effectiveness and future scalability. We present Quasar, a cluster management system that increases reso ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
(Show Context)
Cloud computing promises flexibility and high performance for users and high cost-efficiency for operators. Neverthe-less, most cloud facilities operate at very low utilization, hurting both cost effectiveness and future scalability. We present Quasar, a cluster management system that increases resource utilization while providing consistently high application performance. Quasar employs three tech-niques. First, it does not rely on resource reservations, which lead to underutilization as users do not necessarily understand workload dynamics and physical resource re-quirements of complex codebases. Instead, users express performance constraints for each workload, letting Quasar determine the right amount of resources to meet these con-straints at any point. Second, Quasar uses classification tech-niques to quickly and accurately determine the impact of the amount of resources (scale-out and scale-up), type of resources, and interference on performance for each work-load and dataset. Third, it uses the classification results to jointly perform resource allocation and assignment, quickly exploring the large space of options for an efficient way to pack workloads on available resources. Quasar monitors workload performance and adjusts resource allocation and assignment when needed. We evaluate Quasar over a wide range of workload scenarios, including combinations of dis-tributed analytics frameworks and low-latency, stateful ser-vices, both on a local cluster and a cluster of dedicated EC2 servers. At steady state, Quasar improves resource utiliza-tion by 47 % in the 200-server EC2 cluster, while meeting performance constraints for workloads of all types.
Scale-up vs Scale-out for Hadoop: Time to rethink?
"... In the last decade we have seen a huge deployment of cheap clusters to run data analytics workloads. The conventional wisdom in industry and academia is that scaling out using a cluster of commodity machines is better for these workloads than scaling up by adding more resources to a single server. P ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
In the last decade we have seen a huge deployment of cheap clusters to run data analytics workloads. The conventional wisdom in industry and academia is that scaling out using a cluster of commodity machines is better for these workloads than scaling up by adding more resources to a single server. Popular analytics infrastructures such as Hadoop are aimed at such a cluster scaleout environment. Is this the right approach? Our measurements as well as other recent work shows that the majority of realworld analytic jobs process less than 100 GB of input, but popular infrastructures such as Hadoop/MapReduce were originally designed for petascale processing. We claim that a single “scale-up ” server can process each of these jobs and do as well or better than a cluster in terms of performance, cost, power, and server density. We present an evaluation across 11 representative Hadoop jobs that shows scale-up to be competitive in all cases and significantly better in some cases, than scale-out. To achieve that performance, we describe several modifications to the Hadoop runtime that target scale-up configuration. These changes are transparent, do not require any changes to application code, and do not compromise scale-out performance; at the same time our evaluation shows that they do significantly improve Hadoop’s scale-up performance. ∗ Work done while on internship from Vrije Universiteit Amsterdam Copyright c ○ 2013 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page in print or the first screen in digital media. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission
IX: A Protected Dataplane Operating System for High Throughput and Low Latency
- In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14
"... The conventional wisdom is that aggressive networking requirements, such as high packet rates for small mes-sages and microsecond-scale tail latency, are best ad-dressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performan ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
The conventional wisdom is that aggressive networking requirements, such as high packet rates for small mes-sages and microsecond-scale tail latency, are best ad-dressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performance, while maintaining the key advantage of strong protection offered by existing ker-nels. IX uses hardware virtualization to separate man-agement and scheduling functions of the kernel (control plane) from network processing (dataplane). The data-plane architecture builds upon a native, zero-copy API and optimizes for both bandwidth and latency by dedi-cating hardware threads and networking queues to data-plane instances, processing bounded batches of packets to completion, and by eliminating coherence traffic and multi-core synchronization. We demonstrate that IX out-performs Linux and state-of-the-art, user-space network stacks significantly in both throughput and end-to-end la-tency. Moreover, IX improves the throughput of a widely deployed, key-value store by up to 3.6 × and reduces tail latency by more than 2×. 1
Hierarchical scheduling for diverse datacenter workloads
- in Proc. ACM SoCC
, 2013
"... There has been a recent industrial effort to develop multi-resource hierarchical schedulers. However, the existing implementations have some shortcomings in that they might leave resources unallocated or starve certain jobs. This is because the multi-resource setting introduces new challenges for hi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
There has been a recent industrial effort to develop multi-resource hierarchical schedulers. However, the existing implementations have some shortcomings in that they might leave resources unallocated or starve certain jobs. This is because the multi-resource setting introduces new challenges for hierarchical scheduling policies. We provide an algorithm, which we imple-ment in Hadoop, that generalizes the most commonly used multi-resource scheduler, DRF [1], to support hi-erarchies. Our evaluation shows that our proposed algo-rithm, H-DRF, avoids the starvation and resource ineffi-ciencies of the existing open-source schedulers and out-performs slot scheduling.
Reconciling high server utilization and sub-millisecond quality-of-service
- In European Conference on Computer Systems (EuroSys
, 2014
"... The simplest strategy to guarantee good quality of service (QoS) for a latency-sensitive workload with sub-millisecond latency in a shared cluster environment is to never run other workloads concurrently with it on the same server. Unfortu-nately, this inevitably leads to low server utilization, red ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
The simplest strategy to guarantee good quality of service (QoS) for a latency-sensitive workload with sub-millisecond latency in a shared cluster environment is to never run other workloads concurrently with it on the same server. Unfortu-nately, this inevitably leads to low server utilization, reduc-ing both the capability and cost effectiveness of the cluster. In this paper, we analyze the challenges of maintaining high QoS for low-latency workloads when sharing servers with other workloads. We show that workload co-location leads to QoS violations due to increases in queuing delay, scheduling delay, and thread load imbalance. We present techniques that address these vulnerabilities, ranging from provisioning the latency-critical service in an interference aware manner, to replacing the Linux CFS scheduler with a scheduler that provides good latency guarantees and fair-ness for co-located workloads. Ultimately, we demonstrate that some latency-critical workloads can be aggressively co-located with other workloads, achieve good QoS, and that such co-location can improve a datacenter’s effective throughput per TCO- $ by up to 52%. 1.
The Case for Tiny Tasks in Compute Clusters
"... To see the world in a grain of sand... ..."
(Show Context)
Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing
"... Abstract — Task scheduling and execution over large scale, distributed systems plays an important role on achieving good performance and high system utilization. Due to the explosion of parallelism found in today’s hardware, applications need to perform over-decomposition to deliver good performance ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Task scheduling and execution over large scale, distributed systems plays an important role on achieving good performance and high system utilization. Due to the explosion of parallelism found in today’s hardware, applications need to perform over-decomposition to deliver good performance; this over-decomposition is driving job management systems’ requirements to support applications with a growing number of tasks with finer granularity. Our goal in this work is to provide a compact, light-weight, scalable, and distributed task execution framework (CloudKon) that builds upon cloud computing building blocks (Amazon EC2, SQS, and DynamoDB). Most of today’s state-of-the-art job execution systems have predominantly Master/Slaves architectures, which have inherent limitations, such as scalability issues at extreme scales and single point of failures. On the other hand distributed job management systems are complex, and employ non-trivial load balancing algorithms to maintain good utilization. CloudKon is a distributed job management system that can support both HPC and MTC workloads with millions of tasks/jobs. We compare our work with other state-of-the-art job management systems including Sparrow and MATRIX. The results show that CloudKon delivers better scalability compared to other state-of-the-art systems for some metrics – all with a significantly smaller code-base (5%). Keywords-CloudKon, Many-Task Computing, distributed scheduling, distributed HPC scheduling