Results 1 - 10
of
418
Energy-aware scheduling of mapreduce jobs
- In Proc. of the 3rd IEEE International Congress on Big Data
, 2014
"... Abstract—The majority of large-scale data intensive appli-cations executed by data centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are executed on large clusters requiring large amounts of energy, making the energy costs a large fraction of the data center ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
center’s overall costs. Therefore minimizing the energy consumption when executing MapReduce jobs is a critical concern for data centers. In this paper, we propose a framework for improving the energy efficiency of MapReduce applications, while satis-fying the service level agreement (SLA). We first
MapReduce: Simplified data processing on large clusters.
- In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04),
, 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of ..."
Abstract
-
Cited by 3439 (3 self)
- Add to MetaCart
and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
Slo-driven right-sizing and resource provisioning of mapreduce jobs
- In Workshop on Large Scale Distributed Systems and Middleware (LADIS) in conjunction with VLDB
, 2011
"... There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that require completion time guarantees or need to be completed within a given time window. Currently, there is a lack of performance models and workload analy-sis t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
-sis tools available to system administrators for automated performance management of such MapReduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of MapReduce jobs. First, we pro-pose an automated profiling tool that extracts a compact job profile from
Real-time scheduling of skewed mapreduce jobs in heterogeneous environments
"... Abstract Supporting real-time jobs on MapReduce systems is particularly challenging due to the heterogeneity of the environment, the load imbalance caused by skewed data blocks, as well as real-time response demands imposed by the applications. In this paper we describe our approach for scheduling ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract Supporting real-time jobs on MapReduce systems is particularly challenging due to the heterogeneity of the environment, the load imbalance caused by skewed data blocks, as well as real-time response demands imposed by the applications. In this paper we describe our approach for scheduling
Joint scheduling of mapreduce jobs with servers: Performance bounds and experiments (msjo package
, 2014
"... Abstract—MapReduce has achieved tremendous success for large-scale data processing in data centers. A key feature dis-tinguishing MapReduce from previous parallel models is that it interleaves parallel and sequential computation. Past schemes, and especially their theoretical bounds, on general para ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
parallel models are therefore, unlikely to be applied to MapReduce directly. There are many recent studies on MapReduce job and task scheduling. These studies assume that the servers are assigned in advance. In current data centers, multiple MapReduce jobs of different importance levels run together
Energy-aware scheduling of mapreduce jobs for big data applications
- IEEE Trans. Parallel Distrib. Syst. (forthcoming
"... Abstract—The majority of large-scale data intensive applications executed by data centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are executed on large clusters requiring large amounts of energy, making the energy costs a considerable fraction of the data ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
center’s overall costs. Therefore minimizing the energy consumption when executing each MapReduce job is a critical concern for data centers. In this paper, we propose a framework for improving the energy efficiency of MapReduce applications, while satisfying the service level agreement (SLA). We first
Improving MapReduce Performance in Heterogeneous Environments
, 2008
"... MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is cri ..."
Abstract
-
Cited by 350 (19 self)
- Add to MetaCart
MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time
Mapreduce online
, 2009
"... MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture tha ..."
Abstract
-
Cited by 181 (3 self)
- Add to MetaCart
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire output of each map and reduce task before it can be consumed. In this paper, we propose a modified MapReduce architecture
Hive- A Warehousing Solution Over a Map-Reduce Framework
- IN VLDB '09: PROCEEDINGS OF THE VLDB ENDOWMENT
, 2009
"... The size of data sets being collected and analyzed in the
industry for business intelligence is growing rapidly, mak-
ing traditional warehousing solutions prohibitively expen-
sive. Hadoop [3] is a popular open-source map-reduce im-
plementation which is being used as an alternative to store
and pr ..."
Abstract
-
Cited by 265 (1 self)
- Add to MetaCart
. Hive supports
queries expressed in a SQL-like declarative language - HiveQL,
which are compiled into map-reduce jobs executed on Hadoop.
In addition, HiveQL supports custom map-reduce scripts to
be plugged into queries. The language includes a type sys-
tem with support for tables containing primitive
Mapreduce: a flexible data processing tool
- Commun. ACM
"... contributed articles doi:10.1145/1629175.1629198 MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs. ..."
Abstract
-
Cited by 157 (0 self)
- Add to MetaCart
contributed articles doi:10.1145/1629175.1629198 MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.
Results 1 - 10
of
418