Results 1 - 10
of
52
Performance of multithreaded chip multiprocessors and implications for operating system design
- In USENIX 2005 Annual Technical Conference
, 2005
"... An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of ..."
Abstract
-
Cited by 73 (0 self)
- Add to MetaCart
(Show Context)
An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such systems. The goal of our work is to understand the fundamentals of CMT performance and identify the implications for operating system design. We have analyzed how the performance of a CMT processor is affected by contention for the processor pipeline, the L1 data cache, and the L2 cache, and have investigated operating system approaches to the management of these performance-critical resources. Having found that contention for the L2 cache can have the greatest negative impact on processor performance, we have quantified the potential performance improvement that can be achieved from L2-aware OS scheduling. We evaluated a scheduling policy based on the balance-set principle and found that it has a potential to reduce miss ratios in the L2 by 19-37 % and improve processor throughput by 27-45%. To achieve a similar improvement in hardware requires doubling the size of the L2 cache. 1.
Power-aware scheduling of virtual machines in DVFS-enabled clusters, in
- Proc. IEEE Int’l Conf. Cluster Computing
, 2009
"... Abstract—With the advent of Cloud computing, large-scale virtualized compute and data centers are becoming common in the computing industry. These distributed systems leverage commodity server hardware in mass quantity, similar in theory to many of the fastest Supercomputers in existence today. Howe ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
(Show Context)
Abstract—With the advent of Cloud computing, large-scale virtualized compute and data centers are becoming common in the computing industry. These distributed systems leverage commodity server hardware in mass quantity, similar in theory to many of the fastest Supercomputers in existence today. However these systems can consume a cities worth of power just to run idle, and require equally massive cooling systems to keep the servers within normal operating temperatures. This produces CO2 emissions and significantly contributes to the growing environmental issue of Global Warming. Green computing, a new trend for high-end computing, attempts to alleviate this problem by delivering both high performance and reduced power consumption, effectively maximizing total system efficiency. This paper focuses on scheduling virtual machines in a compute cluster to reduce power consumption via the technique of Dynamic Voltage Frequency Scaling (DVFS). Specifically, we present the design and implementation of an efficient scheduling algorithm to allocate virtual machines in a DVFS-enabled cluster by dynamically scaling the supplied voltages. The algorithm is studied via simulation and implementation in a multi-core cluster. Test results and performance discussion justify the design and implementation of the scheduling algorithm.
Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources
- Proc. Int. Parallel and Distributed Processing Symposium (IPDPS'03
, 2002
"... Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable partitioning, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
(Show Context)
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable partitioning, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time slicing is coordinated across processors. Both schemes suffer from fragmentation, where processors are left idle because jobs cannot be packed with 100% efficiency. Naturally, this leads to reduced utilization and sub-optimal performance. Flexible coscheduling (FCS) solves this problem by monitoring each job's granularity and communication activity, and using gang scheduling only for those jobs for which it is appropriate. Processes from other jobs, which can be scheduled without any constraints, are used as filler to reduce fragmentation. In addition, inefficiencies due to load imbalance and hardware heterogeneity are also reduced because the classification is done on a per-process basis. FCS has been fully implemented as part of the STORM resource manager, and shown to be competitive with gang scheduling and implicit coscheduling.
Centralized Run-Time Resource Management in a Network-on-Chip Containing Reconfigurable Hardware Tiles
- Proceedings of Design, Automation and Test in Europe
, 2005
"... Run-time management of both communication and computation resources in a heterogeneous Network-on-Chip (NoC) is a challenging task. First, platform resources need to be assigned in a fast and efficient way. Secondly, the resources might need to be reallocated when platform conditions or user require ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
(Show Context)
Run-time management of both communication and computation resources in a heterogeneous Network-on-Chip (NoC) is a challenging task. First, platform resources need to be assigned in a fast and efficient way. Secondly, the resources might need to be reallocated when platform conditions or user requirements change. We developed a run-time resource management scheme that is able to efficiently manage a NoC containing fine grain reconfigurable hardware tiles. This paper details our task assignment heuristic and two run-time task migration mechanisms that deal with the message consistency problem in a NoC. We show that specific reconfigurable hardware tile support improves performance of the heuristic and that task migration mechanisms need to be tailored to on-chip networks. 1.
Adaptive parallel job scheduling with flexible coscheduling
- IEEE Trans. Parallel & Distributed Syst
, 2005
"... Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible CoScheduling (FCS) addresses this problem by monitoring each job’s computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the
The hybrid scheduling framework for virtual machine systems
- In Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments (VEE
, 2009
"... The virtualization technology makes it feasible that multiple guest operating systems run on a single physical machine. It is the virtual machine monitor that dynamically maps the virtual CPU of virtual machines to physical CPUs according to the scheduling strategy. The scheduling strategy in Xen sc ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
The virtualization technology makes it feasible that multiple guest operating systems run on a single physical machine. It is the virtual machine monitor that dynamically maps the virtual CPU of virtual machines to physical CPUs according to the scheduling strategy. The scheduling strategy in Xen schedules virtual CPUs of a virtual machines asynchronously while guarantees the proportion of the CPU time corresponding to its weight, maximizing the throughput of the system. However, this scheduling strategy may deteriorate the performance when the virtual machine is used to execute the concurrent applications such as parallel programs or multithreaded programs. In this paper, we analyze the CPU scheduling problem in the virtual machine monitor theoretically, and the result is that the asynchronous CPU scheduling strategy will waste considerable physical CPU time when the system workload is the concurrent application. Then, we present a hybrid scheduling framework for the CPU scheduling in the virtual machine monitor. There are two types of virtual machines in the system: the high-throughput type and the concurrent type. The virtual machine can be set as the concurrent type when the majority of its workload is concurrent applications in order to reduce the cost of synchronization. Otherwise, it is set as the high-throughput type as the default. Moreover, we implement the hybrid scheduling framework based on Xen, and we will give a description of our implementation in details. At last, we test the performance of the presented scheduling framework and strategy based on the multi-core platform, and the experiment result indicates that the scheduling framework and strategy is feasible to improve the performance of the virtual machine system.
Symbiotic space-sharing on sdsc’s datastar system
- In The 12th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP ’06
"... Abstract. Using a large HPC platform, we investigate the effectiveness of “symbiotic space-sharing”, a technique that improves system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We demonstrate that relevant benchmarks ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
(Show Context)
Abstract. Using a large HPC platform, we investigate the effectiveness of “symbiotic space-sharing”, a technique that improves system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We demonstrate that relevant benchmarks commonly suffer a 10-60 % penalty in runtime efficiency due to memory resource bottlenecks and up to several orders of magnitude for I/O. We show that this penalty can be often mitigated, and sometimes virtually eliminated, by symbiotic space-sharing techniques and deploy a prototype scheduler that leverages these findings to improve system throughput by 20%. 1
Magnet: A novel scheduling policy for power reduction in cluster with virtual machines
- 2008 IEEE International Conference on Cluster Computing
, 2008
"... Abstract—The concept of green computing has attracted much attention recently in cluster computing. However, previous local approaches focused on saving the energy cost of the components in a single workstation without a global vision on the whole cluster, so it achieved undesirable power reduction ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
Abstract—The concept of green computing has attracted much attention recently in cluster computing. However, previous local approaches focused on saving the energy cost of the components in a single workstation without a global vision on the whole cluster, so it achieved undesirable power reduction effect. Other cluster-wide energy saving techniques could only be applied to homogeneous workstations and specific applications. This paper describes the design and implementation of a novel approach that uses live migration of virtual machines to transfer load among the nodes on a multilayer ring-based overlay. This scheme can reduce the power consumption greatly by regarding all the cluster nodes as a whole. Plus, it can be applied to both the homogeneous and heterogeneous servers. Experimental measurements show that the new method can reduce the power consumption by 74.8 % over base at most with certain adjustably acceptable overhead. The effectiveness and performance insights are also analytically verified. I.
The still image lossy compression standard- JPEG
"... “JPEG ” stands for Joint Photographic Experts Group. JPEG is a well known standardized image compression technique for compressing pictures which do not have sharp changes e.g. landscape pictures. JPEG supports both color or grayscale images. JPEG loses information, so the decompressed picture is no ..."
Abstract
-
Cited by 10 (10 self)
- Add to MetaCart
(Show Context)
“JPEG ” stands for Joint Photographic Experts Group. JPEG is a well known standardized image compression technique for compressing pictures which do not have sharp changes e.g. landscape pictures. JPEG supports both color or grayscale images. JPEG loses information, so the decompressed picture is not the same as the original one. By adjusting
New Challenges of Parallel Job Scheduling
"... Abstract. The workshop on job scheduling strategies for parallel processing (JSSPP) studies the myriad aspects of managing resources on parallel and distributed computers. These studies typically focus on largescale computing environments, where allocation and management of computing resources prese ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The workshop on job scheduling strategies for parallel processing (JSSPP) studies the myriad aspects of managing resources on parallel and distributed computers. These studies typically focus on largescale computing environments, where allocation and management of computing resources present numerous challenges. Traditionally, such systems consisted of massively parallel supercomputers, or more recently, large clusters of commodity processor nodes. These systems are characterized by architectures that are largely homogeneous and workloads that are dominated by both computation and communication-intensive applications. Indeed, the large majority of the articles in the rst ten JSSPP workshops dealt with such systems and addressed issues such as queuing systems and supercomputer workloads. In this paper, we discuss some of the recent developments in parallel computing technologies that depart from this traditional domain of problems. In particular, we identify several recent and in uential technologies that could have a signi cant impact on the future of research on parallel scheduling. We discuss some of the more speci c research challenges that these technologies introduce to the JSSPP community, and propose to enhance the scope of future JSSPP workshops to include these topics. 1