Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks (2004)
| Venue: | IN PROC. OF THE INT. CONF. ON DATA ENGINEERING |
| Citations: | 14 - 5 self |
BibTeX
@INPROCEEDINGS{Suel04improvedfile,
author = {Torsten Suel and Patrick Noel and Dimitre Trendafilov},
title = {Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks},
booktitle = {IN PROC. OF THE INT. CONF. ON DATA ENGINEERING},
year = {2004},
pages = {153--164},
publisher = {}
}
Years of Citing Articles
OpenURL
Abstract
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distibution and web caching networks, web site mirroring, storage networks, and large scale web search and mining. At the core of the problem lies the following challenge, called the file synchronization problem: given two versions of a file on different machines, say an outdated and a current one, how can we update the outdated version with minimum communication cost, by exploiting the significant similarity between the versions? While a popular open source tool for this problem called rsync is used in hundreds of thousands of installations, there have been only very few attempts to improve upon this tool in practice. In this paper,







