Results 1  10
of
954
A blocksorting lossless data compression algorithm
, 1994
"... We describe a blocksorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block o ..."
Abstract

Cited by 809 (5 self)
 Add to MetaCart
We describe a blocksorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block of input text. The transformation does not itself compress the data, but reorders it to make it easy to compress with simple algorithms such as movetofront coding. Our algorithm achieves speed comparable to algorithms based on the techniques of Lempel and Ziv, but obtains compression close to the best statistical modelling techniques. The size of the input block must be large (a few kilobytes) to achieve good compression.
Data Compression Using Adaptive Coding and Partial String Matching
 IEEE TRANSACTIONS ON COMMUNICATIONS
, 1984
"... The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during ..."
Abstract

Cited by 442 (20 self)
 Add to MetaCart
The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during the course of the transmission, and has been shown to incur a smaller coding overhead than explicit transmission of the model's statistics. But there is a basic conflict between the desire to use highorder Markov models and the need to have them formed quickly as the initial part of the message is sent. This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixedcase English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 264 (5 self)
 Add to MetaCart
(Show Context)
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Compressed fulltext indexes
 ACM COMPUTING SURVEYS
, 2007
"... Fulltext indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text l ..."
Abstract

Cited by 263 (94 self)
 Add to MetaCart
(Show Context)
Fulltext indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into selfindexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, and radically changed the status of this area in less than five years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously. In this paper we present the main concepts underlying selfindexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant selfindexes up to date, focusing on the essential aspects on how they exploit the text compressibility and how they solve efficiently various search problems. We aim at giving the theoretical background to understand and follow the developments in this area.
Optimal Prefetching via Data Compression
, 1995
"... Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the pr ..."
Abstract

Cited by 258 (7 self)
 Add to MetaCart
Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault ratio, with particular applications to largescale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and nth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
Compressing TCP/IP headers for lowspeed serial links
 RFC 1144, NETWORK INFORMATION CENTER, SRI INTERNATIONAL, MENLO PARK
, 1990
"... This RFC is a proposed elective protocol for the Internet community and requests discussion and suggestions for improvement. It describes a method for compressing the headers of TCP/IP datagrams to improve performance over low speed serial links. The motivation, implementation and performance of the ..."
Abstract

Cited by 245 (0 self)
 Add to MetaCart
This RFC is a proposed elective protocol for the Internet community and requests discussion and suggestions for improvement. It describes a method for compressing the headers of TCP/IP datagrams to improve performance over low speed serial links. The motivation, implementation and performance of the method are described. C code for a sample implementation is given for reference. Distribution of this memo is unlimited. NOTE: Both ASCII and Postscript versions of this document are available. The ASCII version, obviously, lacks all the figures and all the information encoded in typographic variation (italics, boldface, etc.). Since this information was, in the author’s opinion, an essential part of the document, the ASCII version is at best incomplete and at worst misleading. Anyone who plans to work with this protocol is strongly encouraged obtain the Postscript version of this RFC.
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 226 (17 self)
 Add to MetaCart
(Show Context)
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Replacing suffix trees with enhanced suffix arrays
, 2004
"... The suffix tree is one of the most important data structures in string processing and comparative genomics. However, the space consumption of the suffix tree is a bottleneck in large scale applications such as genome analysis. In this article, we will overcome this obstacle. We will show how every ..."
Abstract

Cited by 207 (10 self)
 Add to MetaCart
The suffix tree is one of the most important data structures in string processing and comparative genomics. However, the space consumption of the suffix tree is a bottleneck in large scale applications such as genome analysis. In this article, we will overcome this obstacle. We will show how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that uses an enhanced suffix array and solves the same problem in the same time complexity. The generic name enhanced suffix array stands for data structures consisting of the suffix array and additional tables. Our new algorithms are not only more space efficient than previous ones, but they are also faster and easier to implement.
Reliable Communication Under Channel Uncertainty
 IEEE TRANS. INFORM. THEORY
, 1998
"... In many communication situations, the transmitter and the receiver must be designed without a complete knowledge of the probability law governing the channel over which transmission takes place. Various models for such channels and their corresponding capacities are surveyed. Special emphasis is pla ..."
Abstract

Cited by 176 (5 self)
 Add to MetaCart
(Show Context)
In many communication situations, the transmitter and the receiver must be designed without a complete knowledge of the probability law governing the channel over which transmission takes place. Various models for such channels and their corresponding capacities are surveyed. Special emphasis is placed on the encoders and decoders which enable reliable communication over these channels.