| Z. Ouyang, N. Memon, T. Suel, and D. Trenda lov. Cluster-Based Delta Compression of Collections of Files. Proc. International Conference on Web Information Systems Engineering (WISE), 2002. |
....are more similar to the general applications described herein, they did not initially use the more efficient resemblance detection methods of Manber and Broder to best select the base versions. Subsequently, they applied resemblance detection techniques to scale the technique to larger collections [19]. This work, roughly concurrent with our own, is similar in its general approach. However, the largest dataset they analyzed was just over 20,000 web pages, and they did not consider other types of data such as email. Another possibly significant distinction is that they used shingle sizes of only ....
....We have also extended Manber s use of this technique on a single server [14] and quantified potential benefits in a general file system and specific to email. For web content, we have found substantial overlap among pages on a single site. This is consistent with Chan and Woo [7] Ouyang, et al. [19], and recent work on automatic detection of common fragments within pages [23] For the five web datasets we considered, deltas reduced the total size of the dataset to 8 19 of the original data, compared to 29 36 using compression. For files and email, there was much more variability, and the ....
Zan Ouyang, Nasir Memon, Torsten Suel, and Dimitre Trendafilov. Cluster-based delta compression of a collection of files. In International Conference on Web Information Systems Engineering (WISE), December 2002.
No context found.
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. In Third Int. Conf. on Web Information Systems Engineering, December 2002.
....reference file that provides the best compression ratio. Our proposed solution is very simple, and uses random hash functions to create fingerprints for files according to a technique proposed by Broder in [2] This technique has been previously used to cluster documents in [3] and was shown in [14] to provide reasonable estimates for the size of a delta between two files. In particular, we hash all substrings of length in a file to integer values, and then retain the smallest hash values. We then estimate the similarity of two files by intersecting their samples. Given a page request, we ....
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. In Third Int. Conf. on Web Information Systems Engineering, December 2002.
....next three subsections look at different techniques that resolve this problem, and finally Subsection 4.6 presents results for our best two algorithms on a larger data set. Due to space constraints and the large number of options, we can only give a selection of our results. We refer the reader to [20] for a more complete evaluation. 4.1 Algorithms We implemented a number of different algorithms and variants. In particular, we have the following options: Basic scheme: MH vs. LSH. Number of hash function: single hash vs. multiple hash. Sample size: fixed size vs. fixed rate. Similarity ....
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. Technical Report TR-CIS-2002-05, Polytechnic University, CIS Department, October 2002.
....next three subsections look at different techniques that resolve this problem, and finally Subsection 4.7 presents results for our best two algorithms on a larger data set. Due to space constraints and the large number of options, we can only give a selection of our results. We refer the reader to [20] for a more complete evaluation. 4.1 Algorithms We implemented a number of different algorithms and variants. In particular, we have the following options: Basic scheme: MH vs. LSH. Number of hash function: single hash vs. multiple hash. Sample size: fixed size vs. fixed rate. Similarity ....
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. Technical Report TR-CIS-2002-05, Polytechnic University, CIS Department, October 2002.
No context found.
Z. Ouyang, N. Memon, T. Suel, and D. Trenda lov. Cluster-Based Delta Compression of Collections of Files. Proc. International Conference on Web Information Systems Engineering (WISE), 2002.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC