Results 1 -
6 of
6
Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity
"... Abstract—Reliability in deduplication storage has not attracted much research attention yet. To provide a demanded reliability for an incoming data stream, most deduplication storage systems first carry out deduplication process by eliminating duplicates from the data stream and then apply erasure c ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Reliability in deduplication storage has not attracted much research attention yet. To provide a demanded reliability for an incoming data stream, most deduplication storage systems first carry out deduplication process by eliminating duplicates from the data stream and then apply erasure coding for the remaining (unique) chunks. A unique chunk may be shared (i.e., duplicated) at many places of the data stream and shared by other data streams. That is why deduplication can reduce the required storage capacity. However, this occasionally becomes problematic to assure certain reliability levels required from different data streams. We introduce two reliability parameters for deduplication storage: chunk reliability and chunk loss severity. The chunk reliability means each chunk’s tolerance level in the face of any failures. The chunk loss severity represents an expected damage level in the event of a chunk loss, formally defined as the multiplication of actual damage by the probability of a chunk loss. We propose a reliability-aware deduplication solution that not only assures all demanded chunk reliability levels by making already existing chunks sharable only if its reliability is high enough, but also mitigates the chunk loss severity by adaptively reducing the probability of having a chunk loss. In addition, we provide future research directions following to the current study. Keywords-storage; deduplication; reliability; loss severity; I.
BY
, 2012
"... I owe my thanks and gratitude to numerous people who have helped me in accomplish-ing my PhD study. First of all, I would like express my sincere gratitude to my advisor, Professor David Hung-Chang Du, for his continuous support and truly inspiring guid-ance throughout my 5-year PhD study. He is not ..."
Abstract
- Add to MetaCart
(Show Context)
I owe my thanks and gratitude to numerous people who have helped me in accomplish-ing my PhD study. First of all, I would like express my sincere gratitude to my advisor, Professor David Hung-Chang Du, for his continuous support and truly inspiring guid-ance throughout my 5-year PhD study. He is not only the person who has brought me into research field, but also devoted so much time and effort to teaching me thinking, presenting skills, and the attitude towards research work. More importantly, he taught me how to be a mature researcher. Besides my advisor, I would like to thank the rest of my thesis committee members. They are Prof. Jon Weissman, Prof. Abhishek Chandra and Prof. Matthew O’Keefe, for their encouragement and insightful comments. I also hope to take this opportunity to give my appreciation to Prof. Youngjin Nam for his immensely valuable help, guidance, and suggestions in my research. Also, I would like to thank Dr. Weijun Xiao, for his great help on my research. Particularly, without his generosity in offering me his own experimental platform for months, I could not finish this thesis. Also, I thank Cory Devor and other members of CRIS group for their help and support. I should say thanks to my mentor Dr. Erik Kruus in NEC Laboratories America for his great tutoring on my essential research skills and I will forever value and treasure the friendship with two of my friends, Dr. Biplob Debnath and Dr. Yu Jin. They have given me so much sincere help both in my research collaborations and my life. I truly enjoy the time and experience I spend with my labmates during my PhD study. Particularly, I would like to give my thanks to Nohhyun Park, Dong-chul
VELVIZHI J,
"... Data deduplication avoids redundant data in the storage of backup operation. Deduplication reduces storage space and overall amount of time. In this system, files that contain data are split into chunks by using context aware chunking and fingerprints lookup to each chunk. Backup storage process is ..."
Abstract
- Add to MetaCart
Data deduplication avoids redundant data in the storage of backup operation. Deduplication reduces storage space and overall amount of time. In this system, files that contain data are split into chunks by using context aware chunking and fingerprints lookup to each chunk. Backup storage process is for avoiding duplicate data using fingerprints lookup. In this paper, we compare three methodology of backup systems such as full backup, cumulative incremental backup and differential incremental backup. Full backups contain all data file blocks. Cumulative incremental backups contain blocks from level n-1 or lower. Restoration speed is faster than differential incremental backup but storage space occupy much more. Differential incremental backups contain only modified blocks from level n or lower. The processes of differential incremental backup in which data objects changes made since the last full backups are copied.
unknown title
"... Basically Data has been classified into two types namely ‘structured data ’ and ‘unstructured data ’ which are playing a major role in the recent trend. Normally the structure data can be easily organized includes website ..."
Abstract
- Add to MetaCart
Basically Data has been classified into two types namely ‘structured data ’ and ‘unstructured data ’ which are playing a major role in the recent trend. Normally the structure data can be easily organized includes website
Abstract—
"... with compute nodes and uniform replication of data to sustain high I/O throughput and fault tolerance. However, not all data is accessed at the same time or rate. Thus, if a large replication factor is used to support higher throughput for popular data, it wastes storage by unnecessarily replicating ..."
Abstract
- Add to MetaCart
(Show Context)
with compute nodes and uniform replication of data to sustain high I/O throughput and fault tolerance. However, not all data is accessed at the same time or rate. Thus, if a large replication factor is used to support higher throughput for popular data, it wastes storage by unnecessarily replicating unpopular data as well. Conversely, if less replication is used to conserve storage for the unpopular data, it means fewer replicas for even popular data and thus lower I/O throughput. We present AptStore, a dynamic data management system for Hadoop, which aims to improve overall I/O throughput while reducing storage cost. We design a tiered storage that uses the standard DAS for popular data to sustain high I/O throughput, and network-attached enterprise filers for cost-effective, fault-tolerant, but lower-throughput storage for unpopular data. We design a file Popularity Prediction Algorithm (PPA) that analyzes file system audit logs and predicts the appropriate storage policy of each file, as well as use the information for transparent data movement between tiers. Our evaluation of AptStore on a real cluster shows 21.3 % improvement in application execution time over standard Hadoop, while trace driven simulations show 23.7 % increase in read throughput and 43.4 % reduction in the storage capacity requirement of the system. I.