Thursday, September 15, 2016

OOM during rsync with USB drives

I had a (maybe/probably) dying external USB drive. It's part of a non-RAID LVM so I needed to grab everything off it before it died. I mounted a second external USB drive and started an rsync job to copy everything from the failing drive to the new one. After copying around 250GB the rsync would fail with an oom. 

Subsequent attempts would fail the same way but much more quickly. Running rsync with ionice -c 3 doesn't help (suggested in an ubuntuforums post), I wasn't using any of the options that would make recursiveness non-incremental. 

 I tried quite a few things (including --delete-during so that successive rsync attempts would have fewer files to keep track of) but finally found the solution. 

While rsync is running in one screen window, in a separate screen window I run: 

while true
  echo 3 > /proc/sys/vm/drop_caches
  sleep 2

There may be interaction between the USB drivers (perhaps timing/slowness?) and the memory used by the buffer caches. Perhaps when rsync needs memory the kernel isn't able to free it fast enough and the oom killer steps in before the memory becomes available to give to rsync? 

In any case, with this drop_caches loop, rsync is now running happily for hours. I expect it'll actually finish. I don't mind the loss of caching. It's only for very large copies from USB drives and the copies are reading everything sequentially.

There won't be much of a benefit to caching. Perhaps I could use just "1" though instead of "3", so that we'd keep inodes and dentries in the cache. But the limiting factor here is really USB transfer rate and the transfer rate while dropping caches is just the same as the rate when not dropping caches (before oom).


Bopolissimus X Platypus said...

Ah, nealmcb suggests using nocache.

I'm going to be doing more copying around after I've moved the data off the bad HDD. I'll try that then. If it works it'll be *much* better than setting drop_caches in a loop :-).

Bopolissimus X Platypus said...

nocache doesn't work for me. Or there maybe some situations where it works and others where it doesn't. I'll remember to monitor buff/cache when doing large nocache rsyncs. If the number doesn't stabilize then I'll be doing the echo "3" > drop_caches.