Results 1 -
5 of
5
Enhancing Throughput of Hadoop Distributed File System for Interaction-Intensive Tasks
"... Abstract-The performance of the Hadoop Distributed File System (HDFS)decreases dramatically when handling interactionintensive files, i.e., files that have relatively small size but are accessed frequently. The paper analyzes the cause of throughput degradation issue when accessing interaction-inte ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-The performance of the Hadoop Distributed File System (HDFS)decreases dramatically when handling interactionintensive files, i.e., files that have relatively small size but are accessed frequently. The paper analyzes the cause of throughput degradation issue when accessing interaction-intensive files and presents an enhanced HDFS architecture along with an associated storage allocation algorithm that overcomes the performance degradation problem. Experiments have shown that with the proposed architecture together with the associated storage allocation algorithm, the HDFS throughput for interaction-intensive files increase 300% in average with only a negligible performance decrease for large data set tasks.
Evaluating Storage Systems for Scientific Data in the
"... Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, ef-fective data handling is critical to efficient application exe-cu ..."
Abstract
- Add to MetaCart
(Show Context)
Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, ef-fective data handling is critical to efficient application exe-cution. This paper investigates the capabilities of a variety of POSIX-accessible distributed storage systems to manage data access patterns resulting from workflow application ex-ecutions in the cloud. We leverage the expressivity of the Swift parallel scripting framework to benchmark the perfor-mance of a number of storage systems using synthetic work-loads and three real-world applications. We characterize two representative commercial storage systems (Amazon S3 and HDFS, respectively) and two emerging research-based stor-age systems (Chirp/Parrot and MosaStore). We find the use of aggregated node-local resources effective and economical compared with remotely located S3 storage. Our experi-ments show that applications run at scale with MosaStore show up to 30 % improvement in makespan time compared with those run with S3. We also find that storage-system driven application deployments in the cloud results in better runtime performance compared with an on-demand data-staging driven approach.
Research Article An Effective Cache Algorithm for Heterogeneous Storage Systems
"... Copyright © 2013 Yong Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Modern storage environment is commonly composed of he ..."
Abstract
- Add to MetaCart
Copyright © 2013 Yong Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Modern storage environment is commonly composed of heterogeneous storage devices. However, traditional cache algorithms exhibit performance degradation in heterogeneous storage systems because they were not designed to work with the diverse performance characteristics. In this paper, we present a new cache algorithm called HCM for heterogeneous storage systems. The HCM algorithm partitions the cache among the disks and adopts an effective scheme to balance the work across the disks. Furthermore, it applies benefit-cost analysis to choose the best allocation of cache block to improve the performance. Conducting simulations with a variety of traces and a wide range of cache size, our experiments show that HCM significantly outperforms the existing state-of-the-art storage-aware cache algorithms. 1.
Minimization of Cloud Task Execution Length with Workload Prediction Errors
"... Abstract—In cloud systems, it is non-trivial to optimize task’s execution performance under user’s affordable budget, especially with possible workload prediction errors. Based on an optimal algorithm that can minimize cloud task’s execution length with predicted workload and budget, we theoreticall ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In cloud systems, it is non-trivial to optimize task’s execution performance under user’s affordable budget, especially with possible workload prediction errors. Based on an optimal algorithm that can minimize cloud task’s execution length with predicted workload and budget, we theoretically derive the upper bound of the task execution length by taking into account the possible workload prediction errors. With such a state-of-the-art bound, the worst-case performance of a task execution with a certain workload prediction errors is predictable. On the other hand, we build a close-to-practice cloud prototype over a real cluster environment deployed with 56 virtual machines, and evaluate our solution with differ-ent resource contention degrees. Experiments show that task execution lengths under our solution with estimates of worst-case performance are close to their theoretical ideal values, in both non-competitive situation with adequate resources and the competitive situation with a certain limited available resources. We also observe a fair treatment on the resource allocation among all tasks. I.