Friday, October 14, 2005

rsync and --partial

rsync --partial doesn't do what I thought it would do. In fact, for large files (my current project involves rsync of half gig files over relatively slow networks, and often networks which are relatively unstable) it can actually make the destination file lose so much information if the rsync dies early that it would become necessary to re-download practically the whole file.

I was under the impression that --partial did something smart, e.g., maybe it would be truly incremental. That is, if a large file was to be downloaded (from source to dest) but the download died somewhere in the middle of the file, that all the *good* data from the source half that was correctly downloaded would be saved AND THEN the other half (on the dest) would be *APPENDED* to the partly downloaded half. This way, if the download were to be continued, then the already good half would all hash correctly and the download would start at the first wrong block.

Or maybe --partial would keep all of the good downloaded file in those temporary .[real_filename].[random_letters_and_digits] files and the temporary would not be removed so that the next download would use the same temporary file.

Unfortunately, --partial doesn't do either of those. It's pretty good if you're downloading a completely new file (the destination didn't exist before or it's *much* smaller than the original). In that case, the file is partially downloaded and when the rsync dies, the partially downloaded file is kept. If you tried to do this with a large file already downloaded, and the new download died before it got to the size of the already downloaded file, you'd *lose* data since the short newly downloaded file would *replace* the old, larger, already downloaded partial file.

So --partial is very useful when downloading files from scratch, but it's not useful (in fact, may be very harmful) when syncing files which you've already partly downloaded before. For that, just use the regular rsync parameters and don't use --partial.

As good as rsync is, I think there's quite a bit of improvement yet to be done for the case where the bandwidth is slow/unstable and the files to be transferred are very large. And either of the suggested mechanisms above might be a good thing to implement.

No comments: