Results 1 -
3 of
3
Research Statement
"... My research interests are algorithms for massive data, data structures, and approximation/online algorithms. ..."
Abstract
- Add to MetaCart
(Show Context)
My research interests are algorithms for massive data, data structures, and approximation/online algorithms.
MIMP: Deadline and Interference Aware Scheduling of Hadoop Virtual Machines
"... Abstract—Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interference between inter ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interference between interactive and batch processing workloads currently prevents this from being a practical solution in virtualized environments. Further, the variable nature of spare capacity may make it difficult to meet big data application deadlines. In this work we propose two schedulers: one in the vir-tualization layer designed to minimize interference on high priority interactive services, and one in the Hadoop framework that helps batch processing jobs meet their own performance deadlines. Our approach uses performance models to match Hadoop tasks to the servers that will benefit them the most, and deadline-aware scheduling to effectively order incoming jobs. The combination of these schedulers allows data cen-ter administrators to safely mix resource intensive Hadoop jobs with latency sensitive web applications, and still achieve predictable performance for both. We have implemented our system using Xen and Hadoop, and our evaluation shows that our schedulers allow a mixed cluster to reduce web response times by more than ten fold, while meeting more Hadoop deadlines and lowering total task execution times by 6.5%. Keywords-scheduling; virtualization; Map Reduce; interfer-ence; deadlines
Minimizing Interference and Maximizing Progress for Hadoop Virtual Machines
"... Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interfer-ence between interactive a ..."
Abstract
- Add to MetaCart
(Show Context)
Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interfer-ence between interactive and batch processing workloads currently prevents this from being a practical solution in virtualized environ-ments. Further, the variable nature of spare capacity may make it difficult to meet big data application deadlines. In this work we propose two schedulers: one in the virtualization layer designed to minimize interference on high priority interactive services, and one in the Hadoop framework that helps batch pro-cessing jobs meet their own performance deadlines. Our approach uses performance models to match Hadoop tasks to the servers that will benefit them the most, and deadline-aware scheduling to ef-fectively order incoming jobs. We use admission control to meet deadlines even when resources are overloaded. The combination of these schedulers allows data center administrators to safely mix resource intensive Hadoop jobs with latency sensitive web applica-tions, and still achieve predictable performance for both. We have implemented our system using Xen and Hadoop, and our evalua-tion shows that our schedulers allow a mixed cluster to reduce web response times by more than ten fold compared to the existing Xen Credit Scheduler, while meeting more Hadoop deadlines and low-ering total task execution times by 6.5%.