Friday, March 11, 2005

rsync and compressed files

I did some testing and I find that generally, it is better to rsync uncompressed files rather than the corresponding compressed files or archives. at any rate, tar.gz archives are bad for rsync. tar files are OK.


  1. i took a directory of source code and test data, around 9MB.

  2. copied it to a remote box

  3. tar cvzf on both sides to one file and also tar cvf to another file.

  4. on the source box, edit one source file, insert only one line.

  5. tar cvzf and tar cvf on the source box. the source box should have sources, tar and .tgz which vary in only one line in only one internal file.

  6. rsync of the source gives a speedup of 450 (14K sent, 94 received), rsync of the tar file gives a speedup of 85000+ (78 bytes received, 20 bytes sent), rsync of the .tgz gives a speedup of 1.48, (2.4MB sent, 12K or so received).

so rsync of a tar file is best (because only one file needs to be analyzed to see where the differences are). rsync of a compressed file (at any rate of .tgz, but probably of any compressor) is bad. not sure why, but i wouldn't be surprised if the compressed representation of a lot of data depends on what has come before, and there may be other effects like that which
confound the difference finder since too much is found to be different.

No comments: