Results 1 - 10
of
184
Mesos: A platform for fine-grained resource sharing in the data center
, 2010
"... We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI 1. Sharing improves cluster utilization and avoids per-framework data replication. Mesos shares resources in a fine-grained manner, allowing frameworks to achieve ..."
Abstract
-
Cited by 141 (24 self)
- Add to MetaCart
(Show Context)
We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI 1. Sharing improves cluster utilization and avoids per-framework data replication. Mesos shares resources in a fine-grained manner, allowing frameworks to achieve data locality by taking turns reading data stored on each machine. To support the sophisticated schedulers of today’s frameworks, Mesos introduces a distributed two-level scheduling mechanism called resource offers. Mesos decides how many resources to offer each framework, while frameworks decide which resources to accept and which computations to run on them. Our experimental results show that Mesos can achieve near-optimal locality when sharing the cluster among diverse frameworks, can scale up to 50,000 nodes, and is resilient to node failures.
Improving the scalability of data center networks with traffic-aware virtual machine placement
- in Proc. of INFOCOM’10
, 2010
"... Abstract—The scalability of modern data centers has become a practical concern and has attracted significant attention in recent years. In contrast to existing solutions that require changes in the network architecture and the routing protocols, this paper proposes using traffic-aware virtual machin ..."
Abstract
-
Cited by 101 (2 self)
- Add to MetaCart
(Show Context)
Abstract—The scalability of modern data centers has become a practical concern and has attracted significant attention in recent years. In contrast to existing solutions that require changes in the network architecture and the routing protocols, this paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability. By optimizing the placement of VMs on host machines, traffic patterns among VMs can be better aligned with the communication distance between them, e.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity. We formulate the VM placement as an optimization problem and prove its hardness. We design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes. Given the significant difference in the traffic patterns seen in current data centers and the structural differences of the recently proposed data center architectures, we further conduct a comparative analysis on the impact of the traffic patterns and the network architectures on the potential performance gain of traffic-aware VM placement. We use traffic traces collected from production data centers to evaluate our proposed VM placement algorithm, and we show a significant performance improvement compared to existing generic methods that do not take advantage of traffic patterns and data center network characteristics. I.
XORing Elephants: Novel Erasure Codes for Big Data ⇤
"... Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of threereplicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of threereplicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage e ciency and high reliability. This paper shows how to overcome this limitation. We present a novel family of erasure codes that are e ciently repairable and o↵er higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeo ↵ between locality and minimum distance. We implement our new codes in Hadoop HDFS and compare
NoHype: Virtualized Cloud Infrastructure without the
"... Cloud computing is a disruptive trend that is changing the way we use computers. The key underlying technology in cloud infrastructures is virtualization – so much so that many consider virtualization to be one of the key features ratherthansimplyanimplementationdetail. Unfortunately, the use of vir ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
(Show Context)
Cloud computing is a disruptive trend that is changing the way we use computers. The key underlying technology in cloud infrastructures is virtualization – so much so that many consider virtualization to be one of the key features ratherthansimplyanimplementationdetail. Unfortunately, the use of virtualization is the source of a significant security concern. Because multiple virtual machines run on the same server and since the virtualization layer plays a considerable role in the operation of a virtual machine, a malicious party has the opportunity to attack the virtualization layer. A successful attack would give the malicious party control over the all-powerful virtualization layer, potentially compromising the confidentiality and integrity of the software and data of any virtual machine. In this paper we propose removing the virtualization layer, while retaining the key features enabled by virtualization. Our NoHype architecture, named to indicate the removal of the hypervisor, addresses each of the key roles of the virtualization layer: arbitrating access to CPU, memory, and I/O devices, acting as a network device (e.g., Ethernet switch), and managing the starting and stopping of guest virtual machines. Additionally, we show that our NoHypearchitecture mayindeed be“nohype”since nearly all of the needed features to realize the NoHype architecture are currently available as hardware extensions to processors and I/O devices.
PAST: Scalable Ethernet for data centers
- in ACM SIGCOMM CoNext Conference
, 2012
"... We present PAST, a novel network architecture for data center Ethernet networks that implements a Per-Address Spanning Tree routing algorithm. PAST preserves Ethernet’s self-configuration and mobility support while increasing its scalability and usable bandwidth. PAST is explicitly designed to accom ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
(Show Context)
We present PAST, a novel network architecture for data center Ethernet networks that implements a Per-Address Spanning Tree routing algorithm. PAST preserves Ethernet’s self-configuration and mobility support while increasing its scalability and usable bandwidth. PAST is explicitly designed to accommodate unmodified commodity hosts and Ethernet switch chips. Surprisingly, we find that PAST can achieve performance comparable to or greater than Equal-Cost Multipath (ECMP) forwarding, which is currently limited to layer-3 IP networks, without any multipath hardware support. In other words, the hardware and firmware changes proposed by emerging standards like TRILL are not required for highperformance, scalable Ethernet networks. We evaluate PAST on Fat Tree, HyperX, and Jellyfish topologies, and show that it is able to capitalize on the advantages each offers. We also describe an
Energy Aware Network Operations
- In Global Internet Symposium
, 2009
"... Abstract—Networking devices today consume a non-trivial amount of energy and it has been shown that this energy consumption is largely independent of the load through the devices. With a strong need to curtail the rising operational costs of IT infrastructure, there is a tremendous opportunity for i ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Networking devices today consume a non-trivial amount of energy and it has been shown that this energy consumption is largely independent of the load through the devices. With a strong need to curtail the rising operational costs of IT infrastructure, there is a tremendous opportunity for introducing energy awareness in the design and operation of enterprise and data center networks. We focus on these networks as they are under the control of a single administrative domain in which network-wide control can be consistently applied. In this paper, we describe and analyze three approaches to saving energy in single administrative domain networks, without significantly impacting the networks ’ ability to provide the expected levels of performance and availability. We also explore the tradeoffs between conserving energy and meeting performance and availability requirements. We conduct an extensive case study of our algorithms by simulating a real Web 2.0 workload in a real data center network topology using power characterizations that we obtain from real network hardware. Our results indicate that for our workload and data center scenario, 16 % power savings (with no performance penalty and small decrease in availability) can be obtained merely by appropriately adjusting the active network elements (links). Significant additional savings (up to 75%) can be obtained by incorporating network traffic management and server workload consolidation. I.
Inter-Datacenter Bulk Transfers with NetStitcher
"... Large datacenter operators with sites at multiple locations dimension their key resources according to the peak demand of the geographic area that each site covers. The demand of specific areas follows strong diurnal patterns with high peak to valley ratios that result in poor average utilization ac ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
(Show Context)
Large datacenter operators with sites at multiple locations dimension their key resources according to the peak demand of the geographic area that each site covers. The demand of specific areas follows strong diurnal patterns with high peak to valley ratios that result in poor average utilization across a day. In this paper, we show how to rescue unutilized bandwidth across multiple datacenters and backbone networks and use it for non-real-time applications, such as backups, propagation of bulky updates, and migration of data. Achieving the above is non-trivial since leftover bandwidth appears at different times, for different durations, and at different places in the world. To this end, we have designed, implemented, and validated NetStitcher, a system that employs a network of storage nodes to stitch together unutilized bandwidth, whenever and wherever it exists. It gathers information about leftover resources, uses a store-and-forward algorithm to schedule data transfers, and adapts to resource fluctuations. We have compared NetStitcher with other bulk transfer mechanisms using both a testbed and a live deployment on a real CDN. Our testbed evaluation shows that Net-Stitcher outperforms all other mechanisms and can rescue up to five times additional datacenter bandwidth thus making it a valuable tool for datacenter providers. Our live CDN deployment demonstrates that our solution can perform large data transfers at a substantially lower cost than naive end-to-end or store-and-forward schemes. Categories andSubject Descriptors
Its not easy being green
- In Proceedings of the ACM SIGCOMM 2012 Conference, SIGCOMM 12
, 2012
"... Large-scale Internet applications, such as content distribution networks, are deployed across multiple datacenters and consume massive amounts of electricity. To provide uniformly low access latencies, these datacenters are geographically distributed and the deployment size at each location reflects ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
(Show Context)
Large-scale Internet applications, such as content distribution networks, are deployed across multiple datacenters and consume massive amounts of electricity. To provide uniformly low access latencies, these datacenters are geographically distributed and the deployment size at each location reflects the regional demand for the application. Consequently, an application’s environmental impact can vary significantly depending on the geographical distribution of end-users, as electricity cost and carbon footprint per watt is location specific. In this paper, we describe FORTE: Flow Optimization based framework for request-Routing and Traffic Engineering. FORTE dynamically controls the fraction of user traffic directed to each datacenter in response to changes in both request workload and carbon footprint. It allows an operator to navigate the threeway tradeoff between access latency, carbon footprint, and electricity costs and to determine an optimal datacenter upgrade plan in response to increases in traffic load. We use FORTE to show that carbon taxes or credits are impractical in incentivizing carbon output reduction by providers of large-scale Internet applications. However, they can reduce carbon emissions by 10 % without increasing the mean latency nor the electricity bill.
Dynamic resource allocation and power management in virtualized data centers
- in Network Operations and Management Symposium (NOMS), 2010 IEEE
, 2010
"... Abstract—We investigate optimal resource allocation and power management in virtualized data centers with time-varying workloads and heterogeneous applications. Prior work in this area uses prediction based approaches for resource provisioning. In this work, we take an alternate approach that makes ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
(Show Context)
Abstract—We investigate optimal resource allocation and power management in virtualized data centers with time-varying workloads and heterogeneous applications. Prior work in this area uses prediction based approaches for resource provisioning. In this work, we take an alternate approach that makes use of the queueing information available in the system to make online control decisions. Specifically, we use the recently developed technique of Lyapunov Optimization to design an online admission control, routing, and resource allocation algorithm for a virtualized data center. This algorithm maximizes a joint utility of the average application throughput and energy costs of the data center. Our approach is adaptive to unpredictable changes in the workload and does not require estimation and prediction of its statistics. Index Terms—Data Center Automation, Cloud Computing, Virtualization, Resource Allocation, Lyapunov Optimization
Greenware: Greening cloud-scale data centers to maximize the use of renewable energy
- of Lecture Notes in Computer Science
"... Abstract. To reduce the negative environmental implications (e.g., CO2 emission and global warming) caused by the rapidly increasing energy consumption, many Internet service operators have started taking various initiatives to operate their cloud-scale data centers with renewable energy. Unfortunat ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
(Show Context)
Abstract. To reduce the negative environmental implications (e.g., CO2 emission and global warming) caused by the rapidly increasing energy consumption, many Internet service operators have started taking various initiatives to operate their cloud-scale data centers with renewable energy. Unfortunately, due to the intermittent nature of renewable energy sources such as wind turbines and solar panels, currently renewable energy is often more expensive than brown energy that is produced with conventional fossil-based fuel. As a result, utilizing renewable energy may impose a considerable pressure on the sometimes stringent operation budgets of Internet service operators. Therefore, two key questions faced by many cloud-service operators are 1) how to dynamically distribute service requests among data centers in different geographical locations, based on the local weather conditions, to maximize the use of renewable energy, and 2) how to do that within their allowed operation budgets. In this paper, we propose GreenWare, a novel middleware system that