Results 1 - 10
of
129
Handbook of Applied Cryptography
, 1997
"... As we draw near to closing out the twentieth century, we see quite clearly that the information-processing and telecommunications revolutions now underway will continue vigorously into the twenty-first. We interact and transact by directing flocks of digital packets towards each other through cybers ..."
Abstract
-
Cited by 2057 (29 self)
- Add to MetaCart
As we draw near to closing out the twentieth century, we see quite clearly that the information-processing and telecommunications revolutions now underway will continue vigorously into the twenty-first. We interact and transact by directing flocks of digital packets towards each other through cyberspace, carrying love notes, digital cash, and secret corporate documents. Our personal and economic lives rely more and more on our ability to let such ethereal carrier pigeons mediate at a distance what we used to do with face-to-face meetings, paper documents, and a firm handshake. Unfortunately, the technical wizardry enabling remote collaborations is founded on broadcasting everything as sequences of zeros and ones that one's own dog wouldn't recognize. What is to distinguish a digital dollar when it is as easily reproducible as the spoken word? How do we converse privately when every syllable is bounced off a satellite and smeared over an entire continent? How should a bank know that it really is Bill Gates requesting from his laptop in Fiji a transfer of $10,000,000,000 to another bank? Fortunately, the magical mathematics of cryptography can help. Cryptography provides techniques for keeping information secret, for determining that information
A universal algorithm for sequential data compression
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 1977
"... A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainabl ..."
Abstract
-
Cited by 965 (4 self)
- Add to MetaCart
A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.
Compression of Individual Sequences via Variable-Rate Coding
, 1978
"... this paper contains two parts: descrip five part (Section II) where all the results are stated 532 XEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1T-24, NO. 5, SEPTEMF..R 1978 and discussed and a formal part (Section III) where all proofs except that of Theorem 2 are given. The proof of Theorem 2, w ..."
Abstract
-
Cited by 612 (6 self)
- Add to MetaCart
this paper contains two parts: descrip five part (Section II) where all the results are stated 532 XEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1T-24, NO. 5, SEPTEMF..R 1978 and discussed and a formal part (Section III) where all proofs except that of Theorem 2 are given. The proof of Theorem 2, which is constructive and thus informative, is presented in the mainstream of Section II
Compressed full-text indexes
- ACM COMPUTING SURVEYS
, 2007
"... Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text l ..."
Abstract
-
Cited by 142 (70 self)
- Add to MetaCart
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, and radically changed the status of this area in less than five years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously. In this paper we present the main concepts underlying self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes up to date, focusing on the essential aspects on how they exploit the text compressibility and how they solve efficiently various search problems. We aim at giving the theoretical background to understand and follow the developments in this area.
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
Fast and Flexible Word Searching on Compressed Text
, 2000
"... ... text. When searching complex or approximate patterns, our algorithms are up to 8 times faster than the search on uncompressed text. We also discuss the impact of our technique in inverted files pointing to logical blocks and argue for the possibility of keeping the text compressed all the time, ..."
Abstract
-
Cited by 74 (30 self)
- Add to MetaCart
... text. When searching complex or approximate patterns, our algorithms are up to 8 times faster than the search on uncompressed text. We also discuss the impact of our technique in inverted files pointing to logical blocks and argue for the possibility of keeping the text compressed all the time, decompressing only for displaying purposes.
Asymptotic Behavior of the Lempel-Ziv Parsing Scheme and Digital Search Trees
- Theoretical Computer Science
, 1995
"... The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the n ..."
Abstract
-
Cited by 56 (28 self)
- Add to MetaCart
The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the number M n of phrases that one can construct from a sequence of length n. In this paper, for the memoryless source with unequal probabilities of symbols generation we derive the limiting distribution of M n which turns out to be normal. This proves a long standing open problem. In fact, to obtain this result we solved another open problem, namely, that of establishing the limiting distribution of the internal path length in a digital search tree. The latter is a consequence of an asymptotic solution of a multiplicative differential-functional equation often arising in the analysis of algorithms on words. Interestingly enough, our findings are proved by a combination of probabilistic techniques such as renewal equation and uniform integrability, and analytical techniques such as Mellin transform, differential-functional equations, de-Poissonization, and so forth. In concluding remarks we indicate a possibility of extending our results to Markovian models.
A New Challenge for Compression Algorithms: Genetic Sequences
- Information Processing & Management
, 1994
"... Universal data compression algorithms fail to compress genetic sequences. It is due to the specificity of this particular kind of "text". We analyze in some details the properties of the sequences, which cause the failure of classical algorithms. We then present a lossless algorithm, biocompress-2, ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
Universal data compression algorithms fail to compress genetic sequences. It is due to the specificity of this particular kind of "text". We analyze in some details the properties of the sequences, which cause the failure of classical algorithms. We then present a lossless algorithm, biocompress-2, to compress the information contained in DNA and RNA sequences, based on the detection of regularities, such as the presence of palindromes. The algorithm combines substitutional and statistical methods, and to the best of our knowledge, lead to the highest compression of DNA. The results, although not satisfactory, gives insight to the necessary correlation between compression and comprehension of genetic sequences. 1 Introduction There are plenty of specific types of data which need to be compressed, for ease of storage and communication. Among them are texts (such as natural language and programs), images, sounds, etc. In this paper, we focus on the compression of a specific kin...
Autocorrelation On Words And Its Applications - Analysis of Suffix Trees by String-Ruler Approach
- J. Combin.Theory Ser. A
, 1994
"... We study in a probabilistic framework some topics concerning the way words can overlap. Our probabilistic models assumes that a word is a sequence of i.i.d. symbols generated from a finite alphabet. This defines the so called Bernoulli model. We investigate the length of a subword that can be recopi ..."
Abstract
-
Cited by 46 (20 self)
- Add to MetaCart
We study in a probabilistic framework some topics concerning the way words can overlap. Our probabilistic models assumes that a word is a sequence of i.i.d. symbols generated from a finite alphabet. This defines the so called Bernoulli model. We investigate the length of a subword that can be recopied, that is, a subword that occurs at least twice in a given word. An occurrence of such repeated substrings is easy to detect in a digital tree called a suffix tree. The length of a repeated substring corresponds to the typical depth in the associated suffix tree. Our main finding shows that the typical depth in a suffix tree is asymptotically distributed in the same manner as the typical depth in a digital tree that stores independent keys (i.e., independent tries). More precisely, we prove that the typical depth in a suffix tree built from the first n suffixes of a random word is normally distributed with the mean asymptotically becoming 1=h 1 log n and the variance ff \Delta log n, where...
Lempel-Ziv parsing and sublinear-size index structures for string matching (Extended Abstract)
- Proc. 3rd South American Workshop on String Processing (WSP'96
, 1996
"... String matching over a long text can be significantly speeded up with an index structure formed by preprocessing the text. For very long texts, the size of such an index can be a problem. This paper presents the first sublinear-size index structure. The new structure is based on Lempel-Ziv parsing ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
String matching over a long text can be significantly speeded up with an index structure formed by preprocessing the text. For very long texts, the size of such an index can be a problem. This paper presents the first sublinear-size index structure. The new structure is based on Lempel-Ziv parsing of the text and has size linear in N, the size of the Lempel-Ziv parse. For a text of length n, N = O(n = log n) and can be still smaller if the text is compressible. With the new index structure, all occurrences of a pattern string of length m can be found in time O(m 2

