Thursday, October 27, 2005

Large copies to USB 1.0

I've got a laptop that only has USB 1.0. This is great for my USB keyboard and mouse (there are no PS/2 ports at all, so external input devices *have* to be USB, I've never heard about PCMCIA keyboards or mice :-) and also for my USB bluetooth and memory sticks, but it's terrible for a USB hard drive.

Recently I was copying multi gigabyte files to the USB drive and I couldn't stand that it was taking so long and that the laptop would pause and block for tens of seconds. I thought what was happening was that the data was being buffered and then, when the data was actually written down to the hard drive the kernel was so busy handling interrupts (or polling, or whatever) that the computer blocked.

I came up with two workarounds. The first workaround was to set the buffer flushing period to very short, so that the laptop wouldn't block completely for tens of seconds but instead would just get slow every few seconds.

update -3 32 -f 1 -s 1

The second workaround was to limit the bandwidth going down the USB pipe. Rsync is a great tool. This time I couldn't use it to do incremental syncing since in the configuration given checksumming would be far slower than just copying the data file over directly. I didn't want to fill the pipe though since I thought I'd noticed the laptop choking a bit even with the update trick. So what I did was use rsync as a slow cp. Rsync has a --bwlimit option for app level bandwidth limiting which works even within the same computer. So:

rsync -a --bwlimit=384 [srcfile] [dstfile]

limits the amount of data transferred to 384 kilobytes per second. The link can handle up to around 1 megabit per second, but limiting the copy bandwidth keeps the laptop usable and then I just hid the rsync window so that I could continue using the laptop.

rsync --bwlimit wouldn't be enough here. Without the update trick (making buffer flushes shorter and smaller) the buffers would still get filled with hundreds of megabytes of data which would then take around one second per megabyte to go down the wire. With update and no --bwlimit, I was finding slowness and inconvenience. Together though, they made the laptop usable while it was copying large files in the background.

Eventually though, I'll just have to use a laptop that's got USB 2.0 :-).

tiger

Monday, October 24, 2005

600+ kBps!

I downloaded OpenOffice.org 2.0 just now and I was shocked at the download speed. I downloaded through bittorrent and I was getting 600+ kilobytes per second. that's one advantage of going with an ISP that doesn't do bandwidth capping. I use destiny cable internet and while I've never before seen 600+ kBps, i regularly see 100-200 kBps.

I guess someone else in the destiny network was seeding the torrent. Or maybe there's just a whole lot of bandwidth available on monday morning and I got all the benefit of it.

I downloaded the SUSE 10 eval DVD ISO a week ago and that came down at around 60kBps on average. There's something to be said about downloading an ISO via the official torrent too. When SUSE 10 was very new I tried to download with the official torrent and that was so slow (2-3kilo*bits* per second) i decided to download it over the edonkey network instead. That download completed but the ISO was bad, so it was quite a waste of time. When i downloaded from the official torrent again, it slowly went up to around 60 kiloBytes per second. So official torrent+wait+a+few+days is the right thing to do I guess.

Possibly, too, downloading other (more popular)distributions would be better via the official torrents. I've done my bit at misinforming people about SUSE licensing (old information from when it was still SUSE and not novell). Probably people stay away from SUSE because they still have the impression (as I did, until I was gently corrected by a SUSE/novell representative) that SUSE is not yet freely distributable. Well, the eval edition *is* now freely distributable.

Anyway, I'm waiting for Mandriva 2006 to become available to see what the download rate on that will be :-). I'm the second worst type of ISP client, really, downloading things sometimes only to see what the ISP's performance is like. The worst type, of course, are those guys with huge hard disks who download movies, games and mp3s as if there were no tomorrow.

tiger

Friday, October 14, 2005

rsync algorithm talk by tridge

Andrew Tridgell has a great talk on The Rsync Algorithm

rsync and --partial

rsync --partial doesn't do what I thought it would do. In fact, for large files (my current project involves rsync of half gig files over relatively slow networks, and often networks which are relatively unstable) it can actually make the destination file lose so much information if the rsync dies early that it would become necessary to re-download practically the whole file.

I was under the impression that --partial did something smart, e.g., maybe it would be truly incremental. That is, if a large file was to be downloaded (from source to dest) but the download died somewhere in the middle of the file, that all the *good* data from the source half that was correctly downloaded would be saved AND THEN the other half (on the dest) would be *APPENDED* to the partly downloaded half. This way, if the download were to be continued, then the already good half would all hash correctly and the download would start at the first wrong block.

Or maybe --partial would keep all of the good downloaded file in those temporary .[real_filename].[random_letters_and_digits] files and the temporary would not be removed so that the next download would use the same temporary file.

Unfortunately, --partial doesn't do either of those. It's pretty good if you're downloading a completely new file (the destination didn't exist before or it's *much* smaller than the original). In that case, the file is partially downloaded and when the rsync dies, the partially downloaded file is kept. If you tried to do this with a large file already downloaded, and the new download died before it got to the size of the already downloaded file, you'd *lose* data since the short newly downloaded file would *replace* the old, larger, already downloaded partial file.

So --partial is very useful when downloading files from scratch, but it's not useful (in fact, may be very harmful) when syncing files which you've already partly downloaded before. For that, just use the regular rsync parameters and don't use --partial.

As good as rsync is, I think there's quite a bit of improvement yet to be done for the case where the bandwidth is slow/unstable and the files to be transferred are very large. And either of the suggested mechanisms above might be a good thing to implement.

Monday, October 03, 2005

Resumes with no email address

I received a resume a few weeks ago and the candidate is very interesting. It's in the U.S. style, two pages with only relevant information in there. the philippine style, with parent's names, religion, and picture, is useful for discriminating against people based on surface characteristics. I discriminate against people based on stupidity, and while that can sometimes be seen in resumes, more often I have to wait until the interview for that.

Many of his skills are just what we need, and he's interested in learning other skills that we need that he doesn't quite have yet. That he studied at Mindanao State University and Ateneo de Davao and graduated from the University of the Philippines at Diliman doesn't hurt either.

But his resume doesn't have an email address. Or a celphone number either. Maybe HR got this from JobsDB or similar and they need to pay extra money to get his contact information. Whatever the case, I dropped the resume and the positions have since been filled.

Resumes with no convenient contact information is just not going to be seen. Well, by me, anyway. He's got a snailmail address there too. But it's in Los Angeles, CA, for some reason. Now that's another filter. If he's in the U.S., he's either immigrating (legally or otherwise) or he's just visiting. If the first, why submit a resume to a philippine based company? If the second, he should put down his philippine snail address since a U.S. address is instantly off-putting. Why waste time considering someone who is probably already making more money in the U.S. as a busboy than he would make in the Philippines?

Of course there are exceptions. There are people who go to the U.S. for a few months, work there as computer consultants or software developers for very small (t<6 months) projects and come home. But there should still always be an email address or at least a chikka-able number, so that the employer can check whether any such exceptions apply.