Results 1 - 10
of
22
Using Model Checking to Find Serious File System Errors
, 2004
"... This paper shows how to use model checking to find serious errors in file systems. Model checking is a formal verification technique tuned for finding corner-case errors by comprehensively exploring the state spaces defined by a system. File systems have two dynamics that make them attractive for su ..."
Abstract
-
Cited by 167 (16 self)
- Add to MetaCart
(Show Context)
This paper shows how to use model checking to find serious errors in file systems. Model checking is a formal verification technique tuned for finding corner-case errors by comprehensively exploring the state spaces defined by a system. File systems have two dynamics that make them attractive for such an approach. First, their errors are some of the most serious, since they can destroy persistent data and lead to unrecoverable corruption. Second, traditional testing needs an impractical, exponential number of test cases to check that the system will recover if it crashes at any point during execution. Model checking employs a variety of state-reducing techniques that allow it to explore such vast state spaces efficiently. We built a system, FiSC, for model checking file systems. We applied it to three widely-used, heavily-tested file systems: ext3 [13], JFS [21], and ReiserFS [27]. We found serious bugs in all of them, 32 in total. Most have led to patches within a day of diagnosis. For each file system, FiSC found demonstrable events leading to the unrecoverable destruction of metadata and entire directories, including the file system root directory “/”. 1
Efficient Transparent Application Recovery in Client-Server Information Systems,
"... Abstract Database systems recover persistent data, providing high database availability. However, database applications, typically residing on client or "middle-tier" application-server machines, may lose work because of a server failure. This prevents the masking of server failures from ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
(Show Context)
Abstract Database systems recover persistent data, providing high database availability. However, database applications, typically residing on client or "middle-tier" application-server machines, may lose work because of a server failure. This prevents the masking of server failures from the human user and substantially degrades application availability. This paper aims to enable high application availability with an integrated method for database server recovery and transparent application recovery in a client-server system, The approach, based on application message logging, is similar to earlier work on distributed system fault tolerance. However, we exploit advanced database logging and recovery techniques and request/reply messaging properties to significantly improve efficiency. Forced log I/OS, frequently required by other methods, are usually avoided. Restart time, for both failed server and failed client, is reduced by checkpointing and log truncation. Our method ensures that a server can recover independently of clients. A client may reduce logging overhead in return for dependency on server availability during client restart.
Rewriting Histories: Recovering from Malicious Transactions
, 1999
"... We consider recovery from malicious but committed transactions. Traditional recovery mechanisms do not address this problem, except for complete rollbacks, which undo the work of good transactions as well as malicious ones, and compensating transactions, whose utility depends on application semantic ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
We consider recovery from malicious but committed transactions. Traditional recovery mechanisms do not address this problem, except for complete rollbacks, which undo the work of good transactions as well as malicious ones, and compensating transactions, whose utility depends on application semantics. We develop an algorithm that rewrites execution histories for the purpose of backing out malicious transactions. Good transactions that are affected, directly or indirectly, by malicious transactions complicate the process of backing out undesirable transactions. We show that the prefix of a rewritten history produced by the algorithm serializes exactly the set of unaffected good transactions. The suffix of the rewritten history includes special state information to describe affected good transactions as well as malicious transactions. We describe techniques that can extract additional good transactions from this latter part of a rewritten history. The latter processing saves more ...
Repeating History beyond ARIES
, 1999
"... In this paper, I describe first the background behind the development of the original ARIES recovery method, and its significant impact on the commercial world and the research community. Next, I provide a brief introduction to the various concurrency control and recovery methods in the ARIES family ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
In this paper, I describe first the background behind the development of the original ARIES recovery method, and its significant impact on the commercial world and the research community. Next, I provide a brief introduction to the various concurrency control and recovery methods in the ARIES family of algorithms. Subsequently, I discuss some of the recent developments affecting the transaction management area and what these mean for the future. In ARIES, the concept of repeating history turned out to be an important paradigm. As I examine where transaction management is headed in the world of the internet, I observe history repeating itself in the sense of requirements that used to be considered significant in the mainframe world (e.g., performance, availability and reliability) now becoming important requirements of the broader information technology community as well. 1. Introduction Transaction management is one of the most important functionalities provided by a...
A Review of the Rationale and Architectures of PJama: a Durable, Flexible, Evolvable and Scalable Orthogonally Persistent Programming Platform
, 2000
"... This paper reports on our designs, implementations and experiences with this platform and analyses the initial evidence from the experiments ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
(Show Context)
This paper reports on our designs, implementations and experiences with this platform and analyses the initial evidence from the experiments
Verifiable Transaction Atomicity For Electronic Payment Protocols
- In ICDCS ’96: Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS ’96
, 1996
"... We study the transaction atomicity problem for designing electronic payment protocols in distributed systems. We observe that the techniques that are used to guarantee transaction atomicity in a database system are not robust enough to guarantee transaction atomicity in an electronic payment system, ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
We study the transaction atomicity problem for designing electronic payment protocols in distributed systems. We observe that the techniques that are used to guarantee transaction atomicity in a database system are not robust enough to guarantee transaction atomicity in an electronic payment system, in which a set of dishonest or malicious participants may exhibit unpredictable behavior and cause arbitrary failures. We present a new concept---verifiable transaction atomicity---for designing electronic payment protocols. We give formal specifications to the verifiable atomic commitment problem. Then we design a robust electronic currency system to meet the specifications and achieve the verifiable transaction atomicity. 1 Introduction A critical task in the execution of a monetary transaction in a distributed system is to ensure its consistent and nonrepudiational termination. The participants of the transactions whose financial records were updated by the transaction must agree on wh...
On-line Reorganization of Sparsely-populated B+-trees
- In Proceedings of ACM/SIGMOD Annual Conference on Management of Data
, 1996
"... In this paper, we present an efficient method to do online reorganization of sparsely-populated B + -trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
In this paper, we present an efficient method to do online reorganization of sparsely-populated B + -trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves may swap locations or be moved into empty pages so that they are in key order on the disk. After the leaves are reorganized, the method shrinks the tree by making a copy of the upper part of the tree while leaving the leaves in place. A new concurrency method is introduced so that only a minimum number of pages are locked during reorganization. During leaf reorganization, Forward Recovery is used to save all work already done while maintaining consistency after system crashes. A heuristic algorithm is developed to reduce the number of swaps needed during leaf reorganization, so that better concurrency and easier recovery can be achieved. A detailed description of switching from the old B + -tree to the new B + -tree is describe...
Recovery Guarantees in Mobile Systems
- In: Proceedings of the 1st ACM international Workshop on Data Engineering for Wireless and Mobile Access
, 1999
"... Mobile applications increasingly require transaction-like properties, particularly those of recovery. Because there is a lack of abstractions to decompose the machinery of recovery, realizing recovery is difficult and error-prone, especially in a novel context like mobile systems. We introduce recov ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Mobile applications increasingly require transaction-like properties, particularly those of recovery. Because there is a lack of abstractions to decompose the machinery of recovery, realizing recovery is difficult and error-prone, especially in a novel context like mobile systems. We introduce recovery guarantees to tackle this problem by characterizing the assurances relevant to recovery that a subsystem must give to another. They describe the what can be expected but not the how it is implemented for recovery. Guarantees are complemented by recovery protocols, which prescribe behaviors subsystems should follow in order to take advantage of the guarantees. In this paper we use the notions of recovery guarantees and protocols to show the relationships, vis-a-vis recovery, between the components of a mobile system. Our analysis shows which components of recovery remain unchanged (from a conventional recovery design) and which respond to the particular needs of mobile systems. This sheds...
Dynamic Hierarchical Data Clustering And Efficient On-Line Database Reorganization
, 1996
"... In recent years, as more applications start using massive databases as their main source of information, more emphasis is placed on the performance of the database system. These require not only that the database system have good performance, but also that it be continually available. The research i ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
In recent years, as more applications start using massive databases as their main source of information, more emphasis is placed on the performance of the database system. These require not only that the database system have good performance, but also that it be continually available. The research in this thesis makes strides in meeting these requirements: dynamically clustering data improves the database performance, and efficient on-line reorganization methods enable the database systems to be continually available. An new algorithm, Enc, for dynamically clustering hierarchical data is presented in this thesis. It uses a primary B + -tree as the main storage structure, all relations in the hierarchy are stored in the B + -tree. The hierarchical relationship is encoded into the keys of the B + -tree. The Enc algorithm maintains good clustering in the presence of insertions and deletions. Experimental results show that using the Enc algorithm, hierarchical queries can be process...
Phoenix: Making Applications Robust
"... Dealing with errors or exceptions is a very large part of getting applications right. Failures are not only an application programming problem but an operational and an availability problem as well. The Phoenix goal is to increase the availability ..."
Abstract
- Add to MetaCart
(Show Context)
Dealing with errors or exceptions is a very large part of getting applications right. Failures are not only an application programming problem but an operational and an availability problem as well. The Phoenix goal is to increase the availability