Results 11 - 20
of
304
RaceTrack: Efficient detection of data race conditions via adaptive tracking
- In SOSP
, 2005
"... Bugs due to data races in multithreaded programs often exhibit non-deterministic symptoms and are notoriously difficult to find. This paper describes RaceTrack, a dynamic race detection tool that tracks the actions of a program and reports a warning whenever a suspicious pattern of activity has been ..."
Abstract
-
Cited by 168 (0 self)
- Add to MetaCart
(Show Context)
Bugs due to data races in multithreaded programs often exhibit non-deterministic symptoms and are notoriously difficult to find. This paper describes RaceTrack, a dynamic race detection tool that tracks the actions of a program and reports a warning whenever a suspicious pattern of activity has been observed. RaceTrack uses a novel hybrid detection algorithm and employs an adaptive approach that automatically directs more effort to areas that are more suspicious, thus providing more accurate warnings for much less overhead. A post-processing step correlates warnings and ranks code segments based on how strongly they are implicated in potential data races. We implemented RaceTrack inside the virtual machine of Microsoft’s Common Language Runtime (product version v1.1.4322) and monitored several major, real-world applications directly out-of-the-box, without any modification. Adaptive tracking resulted in a slowdown ratio of about 3x on memory-intensive programs and typically much less than 2x on other programs, and a memory ratio of typically less than 1.2x. Several serious data race bugs were revealed, some previously unknown.
Using Secure Coprocessors
, 1994
"... The views and conclusions in this document are those of the authors and do not necessarily represent the official policies or endorsements of any of the research sponsors. How do we build distributed systems that are secure? Cryptographic techniques can be used to secure the communications between p ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
(Show Context)
The views and conclusions in this document are those of the authors and do not necessarily represent the official policies or endorsements of any of the research sponsors. How do we build distributed systems that are secure? Cryptographic techniques can be used to secure the communications between physically separated systems, but this is not enough: we must be able to guarantee the privacy of the cryptographic keys and the integrity of the cryptographic functions, in addition to the integrity of the security kernel and access control databases we have on the machines. Physical security is a central assumption upon which secure distributed systems are built; without this foundation even the best cryptosystem or the most secure kernel will crumble. In this thesis, I address the distributed security problem by proposing the addition of a small, physically secure hardware module, a secure coprocessor, to standard workstations and PCs. My central axiom is that secure coprocessors are able to maintain the privacy of the data they process. This thesis attacks the distributed security problem from multiple sides. First, I analyze the security properties of existing system components, both at the hardware and
Filtering near-duplicate documents
- Proc. FUN 98
, 1998
"... Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size “sketch ” for each document. For a large collection of documents (say hundreds of millions) the size of this sketch is of the orde ..."
Abstract
-
Cited by 157 (6 self)
- Add to MetaCart
(Show Context)
Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size “sketch ” for each document. For a large collection of documents (say hundreds of millions) the size of this sketch is of the order of a few hundred bytes per document. However, for efficient large scale web indexing it is not necessary to determine the actual resemblance value: it suffices to determine whether newly encountered documents are duplicates or near-duplicates of documents already indexed. In other words, it suffices to determine whether the resemblance is above a certain threshold. In this talk we show how this determination can be made using a ”sample ” of less than 50 bytes per document. The basic approach for computing resemblance has two aspects: first, resemblance is expressed as a set (of strings) intersection problem, and second, the relative size of intersections is evaluated by a process of random sampling that can be done independently for each document. The process of estimating the relative size of intersection of sets and the thresholdtestdiscussedabovecanbeappliedtoarbitrarysets,andthus might be of independent interest. The algorithm for filtering near-duplicate documents discussed here has been successfully implemented and has been used for the last three years in the context of the AltaVista search engine. 1
Polymorphic Worm Detection Using Structural Information of Executables
- In RAID
, 2005
"... Abstract. Network worms are malicious programs that spread automatically across networks by exploiting vulnerabilities that affect a large number of hosts. Because of the speed at which worms spread to large computer populations, countermeasures based on human reaction time are not feasible. Therefo ..."
Abstract
-
Cited by 149 (11 self)
- Add to MetaCart
(Show Context)
Abstract. Network worms are malicious programs that spread automatically across networks by exploiting vulnerabilities that affect a large number of hosts. Because of the speed at which worms spread to large computer populations, countermeasures based on human reaction time are not feasible. Therefore, recent research has focused on devising new techniques to detect and contain network worms without the need of human supervision. In particular, a number of approaches have been proposed to automatically derive signatures to detect network worms by analyzing a number of worm-related network streams. Most of these techniques, however, assume that the worm code does not change during the infection process. Unfortunately, worms can be polymorphic. That is, they can mutate as they spread across the network. To detect these types of worms, it is necessary to devise new techniques that are able to identify similarities between different mutations of a worm. This paper presents a novel technique based on the structural analysis of binary code that allows one to identify structural similarities between different worm mutations. The approach is based on the analysis of a worm’s control flow graph and introduces an original graph coloring technique that supports a more precise characterization of the worm’s structure. The technique has been used as a basis to implement a worm detection system that is resilient to many of the mechanisms used to evade approaches based on instruction sequences only.
Finding near-duplicate web pages: a large-scale evaluation of algorithms
- In Proc. of Int. Conf. on Information Retrieval (SIGIR
, 2006
"... Broder et al.’s [3] shingling algorithm and Charikar’s [4] ran-dom projection based approach are considered “state-of-the-art ” algorithms for finding near-duplicate web pages. Both algorithms were either developed at or used by popular web search engines. We compare the two algorithms on a very lar ..."
Abstract
-
Cited by 118 (2 self)
- Add to MetaCart
(Show Context)
Broder et al.’s [3] shingling algorithm and Charikar’s [4] ran-dom projection based approach are considered “state-of-the-art ” algorithms for finding near-duplicate web pages. Both algorithms were either developed at or used by popular web search engines. We compare the two algorithms on a very large scale, namely on a set of 1.6B distinct web pages. The results show that neither of the algorithms works well for finding near-duplicate pairs on the same site, while both achieve high precision for near-duplicate pairs on different sites. Since Charikar’s algorithm finds more near-duplicate pairs on different sites, it achieves a better precision overall, namely 0.50 versus 0.38 for Broder et al. ’s algorithm. We present a combined algorithm which achieves precision 0.79 with 79 % of the recall of the other algorithms.
Some applications of Rabin's fingerprinting method
- Sequences II: Methods in Communications, Security, and Computer Science
, 1993
"... Rabin's fingerprinting scheme is based on arithmetic modulo an irreducible polynomial with coefficients in Z 2 . This paper presents an implementation and several applications of this scheme that take considerable advantage of its algebraic properties. 1 Introduction Fingerprints are short tag ..."
Abstract
-
Cited by 113 (3 self)
- Add to MetaCart
(Show Context)
Rabin's fingerprinting scheme is based on arithmetic modulo an irreducible polynomial with coefficients in Z 2 . This paper presents an implementation and several applications of this scheme that take considerable advantage of its algebraic properties. 1 Introduction Fingerprints are short tags for larger objects. They have the property that if two fingerprints are different then the corresponding objects are certainly different and there is only a small probability that two different objects have the same fingerprint. (The latter event is called a collision.) More precisely, a fingerprinting scheme is a certain collection of functions F = n f :\Omega ! f0; 1g k o , where\Omega is the set of all possible objects of interest and k is the length of the fingerprint, such that, for any choice of a fixed set S ae\Omega of n distinct objects, if f is chosen uniformly at random in F , then with high probability jf(S)j = jSj : In other words, if an adversary chooses a set S ae\Ome...
Shark: Scaling File Servers via Cooperative Caching
"... Network file systems offer a powerful, transparent interface for accessing remote data. Unfortunately, in current network file systems like NFS, clients fetch data from a central file server, inherently limiting the system’s ability to scale to many clients. While recent distributed (peer-topeer) sy ..."
Abstract
-
Cited by 105 (3 self)
- Add to MetaCart
Network file systems offer a powerful, transparent interface for accessing remote data. Unfortunately, in current network file systems like NFS, clients fetch data from a central file server, inherently limiting the system’s ability to scale to many clients. While recent distributed (peer-topeer) systems have managed to eliminate this scalability bottleneck, they are often exceedingly complex and provide non-standard models for administration and accountability. We present Shark, a novel system that retains the best of both worlds—the scalability of distributed systems with the simplicity of central servers. Shark is a distributed file system designed for largescale, wide-area deployment, while also providing a dropin replacement for local-area file systems. Shark introduces a novel cooperative-caching mechanism, in which mutually-distrustful clients can exploit each others ’ file caches to reduce load on an origin file server. Using a distributed index, Shark clients find nearby copies of data, even when files originate from different servers. Performance results show that Shark can greatly reduce server load and improve client latency for read-heavy workloads both in the wide and local areas, while still remaining competitive for single clients in the local area. Thus, Shark enables modestly-provisioned file servers to scale to hundreds of read-mostly clients while retaining traditional usability, consistency, security, and accountability.
Dyad: A System for Using Physically Secure Coprocessors
- Proceedings of the Joint Harvard-MIT Workshop on Technological Strategies for the Protection of Intellectual Property in the Network Multimedia Environment
, 1991
"... The Dyad project at Carnegie Mellon University is using physically secure coprocessors to achieve new protocols and systems addressing a number of perplexing security problems. These coprocessors can be produced as boards or integrated circuit chips and can be directly inserted in standard workstati ..."
Abstract
-
Cited by 96 (1 self)
- Add to MetaCart
(Show Context)
The Dyad project at Carnegie Mellon University is using physically secure coprocessors to achieve new protocols and systems addressing a number of perplexing security problems. These coprocessors can be produced as boards or integrated circuit chips and can be directly inserted in standard workstations or PC-style computers. This paper presents a set of security problems and easily implementable solutions that exploit the power of physically secure coprocessors: (1) protecting the integrity of publicly accessible workstations, (2) tamper-proof accounting/audit trails, (3) copy protection, and (4) electronic currency without centralized servers. We outline the architectural requirements for the use of secure coprocessors. 1 Introduction and Motivation The Dyad project at Carnegie Mellon University is using physically secure coprocessors to achieve new protocols and systems addressing a number of perplexing security problems. These coprocessors can be produced as boards or integrated ...
Detecting Near-Duplicates for Web Crawling
- WWW 2007
, 2007
"... Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a nea ..."
Abstract
-
Cited by 92 (0 self)
- Add to MetaCart
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not. In the course of developing a near-duplicate detection system for a multi-billion page repository, we make two research contributions. First, we demonstrate that Charikar’s fingerprinting technique is appropriate for this goal. Second, we present an algorithmic technique for identifying existing fbit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k. Our technique is useful for both online queries (single fingerprints) and batch queries (multiple fingerprints). Experimental evaluation over real data confirms the practicality of our design.
Redundancy Elimination Within Large Collections of Files
, 2004
"... Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements ..."
Abstract
-
Cited by 83 (3 self)
- Add to MetaCart
(Show Context)
Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that reduces data sizes with an effectiveness comparable to the more expensive techniques, but at a cost comparable to the faster but less effective ones. The scheme, called Redundancy Elimination at the Block Level (REBL), leverages the benefits of compression, duplicate block suppression, and delta-encoding to eliminate a broad spectrum of redundant data in a scalable and efficient manner. REBL generally encodes more compactly than compression (up to a factor of 14) and a combination of compression and duplicate suppression (up to a factor of 6.7). REBL also encodes similarly to a technique based on delta-encoding, reducing overall space significantly in one case. Furthermore, REBL uses super-fingerprints, a technique that reduces the data needed to identify similar blocks while dramatically reducing the computational requirements of matching the blocks: it turns comparisons into hash table lookups. As a result, using super-fingerprints to avoid enumerating matching data objects decreases computation in the resemblance detection phase of REBL by up to a couple orders of magnitude.