Results 1 -
4 of
4
Failure trends in a large disk drive population
- In Proceedings of the 5th USENIX Conference on File and Storage Technologies
, 2007
"... It is estimated that over 90 % of all new information produced in the world is being stored on magnetic media, most of it on hard disk drives. Despite their importance, there is relatively little published work on the failure patterns of disk drives, and the key factors that affect their lifetime. M ..."
Abstract
-
Cited by 97 (1 self)
- Add to MetaCart
It is estimated that over 90 % of all new information produced in the world is being stored on magnetic media, most of it on hard disk drives. Despite their importance, there is relatively little published work on the failure patterns of disk drives, and the key factors that affect their lifetime. Most available data are either based on extrapolation from accelerated aging experiments or from relatively modest sized field studies. Moreover, larger population studies rarely have the infrastructure in place to collect health signals from components in operation, which is critical information for detailed failure analysis. We present data collected from detailed observations of a large disk drive population in a production Internet services deployment. The population observed is many times larger than that of previous studies. In addition to presenting failure statistics, we analyze the correlation between failures and several parameters generally believed to impact longevity. Our analysis identifies several parameters from the drive’s self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported. 1
Using Fault Model Enforcement to Improve Availability
- In Proceedings of the Second Workshop on Evaluating and Architecting System dependabilitY (EASY
, 2002
"... Today's network services run on complex arrays of computing systems consisting of a myriad of hardware and software components. In this work, we claim that it is impractical to try to tolerate all (or even a significant fraction of) fault types in these systems. We argue instead that a new appro ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Today's network services run on complex arrays of computing systems consisting of a myriad of hardware and software components. In this work, we claim that it is impractical to try to tolerate all (or even a significant fraction of) fault types in these systems. We argue instead that a new approach, called fault model enforcement, that maps actual faults to expected faults of an abstract fault model can be used to increase system availability. This enforcement approach works because it transforms faults not factored into the initial design into faults that the system was designed to tolerate. Using fault injection and analytic modeling, we show that this enforcement approach has the potential to decrease the unavailability of a distributed Web server by over 50%.
The Art of Massive Storage: A Case Study of a Web Archive
- IEEE Computer
, 1999
"... This paper describes an overview of a large-scale, on-line, image collection. The archive contains 72,213 images from the San Francisco Fine Arts Museums and is the largest database of on-line art in the world. We describe our system, what we learned, and how these lessons can be used by other web-b ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes an overview of a large-scale, on-line, image collection. The archive contains 72,213 images from the San Francisco Fine Arts Museums and is the largest database of on-line art in the world. We describe our system, what we learned, and how these lessons can be used by other web-based archives. We use our experiences to address questions about this type of site. First, what are users are looking for? We describe the user access patterns. Second, how does the choice of user interface affect the system? We describe how well caching works within our user interface and how our access patterns are affected by the interface's style. Third, how good are the web server and underlying file system at caching a tile- based image workload? We estimate the cache hit ratio for our site. Finally, how much availability do users expect? We describe the failures we have experienced, and describe how much availability is needed from a web site. Introduction The advent of the web and t...
Performance comparison of ide and scsi disks
, 2001
"... It is widely believed that the IDE disks found in PCs are inexpensive but slow, whereas the SCSI disks used in servers and workstations are faster, more reliable, and more manageable. The belief that current IDE disks have performance and reliability disadvantages has been called into question by se ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
It is widely believed that the IDE disks found in PCs are inexpensive but slow, whereas the SCSI disks used in servers and workstations are faster, more reliable, and more manageable. The belief that current IDE disks have performance and reliability disadvantages has been called into question by several recent reports. Thus we consider the possibility of achieving tremendous cost advantages by using IDE disks as the foundation of a storage system. In this paper, we give an extensive performance comparison of IDE and SCSI disks. We measure their performance on a variety of micro benchmarks and macro benchmarks, and we explain these results with the help of kernel instrumentation and device activity traces collected by a SCSI analyzer. We consider the impact of several factors, including sequential vs. random workloads, file system enhancements such as journaling and Soft Updates, I/O scheduling in the kernel vs. in the disk drive (as enabled by tagged queuing), and the use of RAID technology to obtain I/O parallelism. In our testbed we find that the IDE disk is faster than the SCSI disk for sequential I/O, but the SCSI disk is faster for random I/O. We also observe that the random I/O performance deficit of the IDE disk is partly overcome by kernel I/O scheduling, and is further mitigated by scheduling in the drive (as enabled by tagged queuing), and by the use of journaling and Soft Updates. Taken as a whole, our results lead us to conclude that RAID systems based on IDE drives can be both faster and significantly less expensive than SCSI RAID systems. 1

