MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Compactly encoding unstructured inputs with differential compression (2002) [24 citations — 8 self]

Download:
Download as a PDF | Download as a PS
by Miklos Ajtai, Randal Burns, Ronald Fagin, Darrell D. E. Long, Larry Stockmeyer
Journal of the ACM
http://www.almaden.ibm.com/cs/people/stock/diff6.ps
Add To MetaCart

Abstract:

The subject of this paper is differential compression, the algorithmic task of finding common strings between versions of data and using the commonality between versions to encode one version compactly by describing it as a set of changes from its companion. A main goal of the work is to present new differencing algorithms that operate at a fine granularity (the atomic unit of change), make no assumptions about the format or alignment of input data, and use linear time and give good compression on version data that typically arises in practice. First we review existing differencing algorithms that compress optimally but use time proportional to n 2 in the worst case and space proportional to n in the best and worst case. Then we present new algorithms, which do not always compress optimally but use considerably less time and space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains dlog ne bits, and the space to hold the input data is not included). We introduce two new techniques for differential compression, and we apply these to give additional algorithms that improve the compression performance of the linear-time algorithm and the time performance of the quadratic-time algorithm. Having presented these algorithms, we experimentally explore their properties, such as time and compression performance, by running them on actual versioned data. In these experiments, the new algorithms run in linear time and constant space, and their compression performance closely approximates that of previous algorithms that use considerably more time and space. Finally, we present theoretical results that limit the compression power of differencing algorithms that are restricted to making only a single pass over the data.

Citations

784 Information Theory and Reliable Communication – Gallager - 1968
773 A universal algorithm for sequential data compression – Ziv, Lempel - 1977
582 Algorithms on Strings, Trees, and Sequences – Gusfield - 1997
500 The Art of Computer Programming, Volume 3, Sorting and Searching – Knuth - 1975
494 Compression of individual sequences via variable rate coding. IEEE Transaction on Information Theory, IT-24(5):530-536. Copyright © 2006, Juniper Networks, Inc. All rights reserved. Juniper Networks and the Juniper Networks logo are registered trademarks – Ziv, Lempel - 1978
415 The string-tostring correction problem – Wagner, Fischer - 1974
340 RCS—A System for Version Control – Tichy - 1985
301 Linear pattern matching algorithms – Weiner
271 Probabilistic computations: towards a unified measure of complexity – Yao - 1977
239 The source code control system – Rochkind - 1975
200 Potential benefits of delta encoding and data compression for HTTP – Mogul, Douglis, et al. - 1997
193 Efficient randomized pattern-matching algorithms – Karp, Rabin - 1981
100 Compressed suffix arrays and suffix trees with applications to text indexing and string matching – Grossi, Vitter
84 Optimistic deltas for WWW latency reduction – Banga, Douglis, et al. - 1997
79 Meaningful Change Detection in Structured Data – Chawathe, Garcia-Molina - 1997
72 Reducing the space requirement of suffix trees – Kurtz - 1999
65 The string-to-string correction problem with block moves – Tichy - 1984
50 File system support for delta compression – MacDonald - 2000
46 A file comparison program – MILLER, MYERS - 1985
39 Delta algorithms: An empirical analysis – Hunt, Vo, et al. - 1998
30 Determinism versus non-determinism for linear time RAMs – Ajtai - 1999
29 Cache-based compaction: A new technique for optimizing web transfer – Chan, Woo - 1999
28 Inequalities – Hardy, Littlewood, et al. - 1934
24 An Editor for Revision Control – Fraser, Myer - 1987
18 Delta storage for arbitrary non-text files – REICHENBERGER - 1991
7 Tutorial on MPEG-2 Video Compression – Tudor - 1995
6 Efficient distributed backup and restore with delta compression – Burns, Long - 1997
5 In-place reconstruction of delta compressed files – BURNS, LONG - 1998
5 Combining of changes to a source file – JONG - 1972
3 The VCDIFF generic differencing and compression format – KORN, VO - 1999
3 PGP Source Code and Internals – Zimmerman - 1995
1 Compactly Encoding with Differential Compression 367 – KARP, RABIN - 1987