Friday, August 31, 2007

rsync speedup -- only 2.49

I had to transfer 2.4GB of svn working copy over the internet and I wanted to save as much bandwidth as possible. Just plain rsync of the working copy wouldn't have given me enough bandwidth savings since the duplicated data (not just blocks inside files but actual full files duplicated) are in different files.

I decided to try to take advantage of that duplication detection by creating a tar (not tar-gz) of the working copy. I hypothesized that there would be at least a 2x bandwidth saving because the pristine copies (in .svn/text-base) would be exactly the same as the working copy except where the working copy was modified, and even then only a few lines are typically modified, out of perhaps 500-1500.

There should be even more savings because the reason the working is so large is because there are a few branches (experimental working directories tracked via svnmerge) and 5 or so tags left lying around (older tags are removed and documented in a readme so they can be resurrected by referring to the revision number, but we don't keep everything around since we release very often and the tags are very large).

Because of the tags, I expected a speedup of 10-20. I only got 2.49 though. Which is good, but close to an order of magnitude away from what I expected. I'll play with this some more. I just took another quick look at the rsync technical report and it doesn't seem to invalidate my hypothesis
{alpha} searches through A to find all blocks of length S bytes (at any offset, not just multiples of S) that have the same weak and strong checksum as one of the blocks of B.

Possibly tweaking the block size might help. Blocksize is set according to the size of the file, and I expect it to be directly proportional to the file (to minimize the amount of data to transmit for checksums, probably). Setting it smaller will make sending the checksums larger, but possibly increase the efficiency. I expect to waste quite a lot of time and bandwidth on this :-).

Thursday, August 23, 2007

git-svn again

I'd been having trouble figuring out git-svn and git. I worked at it a bit more this week and I've finally got something working.

mkdir svn-git
git-clone [URL]
cd [directory]

and the files are there. I can do most of the basic things. I haven't yet practiced branching and merging. I'll do that in a week or so. Or tomorrow, as the mood arises. For now, I'm happy that I can remotely mirror a repository (my personal backup of the entire repository, including commit messages). It also makes viewing log messages a lot faster when the svn server is far or my link to it is slow (as happened to me and sol the past three nights, the route to the svn server passed through an ISP in singapore that was experiencing 50% packet loss, it took a few days the upstream ISPs to figure out that they shouldn't go through that lame router).

I had some confusion earlier since I deleted a subdirectory (with just rm -rf) and then couldn't figure out a way to revert. I still don't know how to revert from a "pristine copy", if git has that at all. Instead, what I had to do was a local commit and then a git-revert (which reverts a commit, something that svn doesn't have yet). I don't know if that's the canonical way to do that, but it worked.

I tested git-commit and git-svn dcommit and that worked pretty well. The commit messages are clean, which is my main issue. I'm looking into git-svn partly so that I'll be able to work with version control even if not connected to a repository. I'd previously used svk but wasn't happy with the format of the commit messages pushed into svn when it came time to merge back into svn. That might have improved by now, but then there were problems with speed too. Those won't be fixed except by rewriting in something faster than perl. If I learn, like, and learn to like git, I'll stick with it for disconnected work.

But first I need to test out its features. There's a long weekend coming up (two long weekends in a row this month), I'll play with git a bit then.

Ok, The git workflow description at Calice points me at git reset. So

git-reset --hard HEAD

does what git commit+git revert does, but without the bogosity of actually having a commit message in the logs.

Now I'm trying to figure out how to get the list of files modified in a given commit (in svn, svn log -v).

Ah well, now it's just time read the git manual. I read the CVS manual and the SVN manual, I should be able to wrap my brain around git :-).

Friday, August 17, 2007

gmail smtp requires TLS

I use evolution at work (none of that Outlook creationism for me) and, on my laptop too. At work I was confused because I couldn't send email through I thought it was just work firewall rules. It wasn't though. Formerly, I could send from my * addresses using the company SMTP server. Then the SMTP server was reconfigured to be tighter and to require authentication. I couldn't send mail with a From: of anymore. I tried to set up SMTP+SSL to And then I wondered for a long time why that was failing.

Well, it's because I'm too lazy to read the damn manual.

It says so right on gmail (i.e.,, SMTP requires TLS. It's POP3 that requires SSL. If I cared more about the asymmetry there I'd wonder why they aren't both TLS or SSL. But no, I'm too busy with other work to care about that just now :-). I'm just glad I can send mail out again without having to log into the web based gmail client. That's a great client (I was testing out yesterday and that site is ridiculously slow on my ridiculously slow 800Mhz workstation :-), but I do prefer to have everything in one place, retreating to the web based client only in emergency (which is never, actually, since if I can't check my mail on my laptop or my work desktop, then I just won't check mail. I'm too paranoid to check my mail at internet cafes since they all run windows.

Wednesday, August 15, 2007

Windows is Free

Dave Gutteridge, on has a great article on how common windows piracy is and how that affects Linux uptake.

He says a lot of things I've always wanted to say but never had the time or patience to set down at such length.


Tuesday, August 14, 2007

HP sales versus newbie

A very good friend of mine went to buy a computer. I couldn't go with her, and she's comfortable enough financially that she doesn't need to find the cheapest computer, so she went to an HP reseller.

I prefer for her (and other newbies) who can afford it to buy branded hardware anyway (although I wouldn't necessarily go with HP, I'd go with something a bit cheaper, say Lenovo, or Acer). The presumption being that the hardware quality will be a bit better than that of pure clones and while it's possible to build superior boxes out of clone parts (with careful study of all the parts/options), I didn't have the time to help her with that, and she didn't have the knowledge (frankly, neither do I, anymore) to do that.

Well, the HP reseller sold her on the "fastest computer available", it came with Vista Home Basic and with evaluation versions of Home/Student MS Office 2007.

Gripe #1: the computer came with 512MB of memory. For a $1000 computer, there should be
1-2GB of memory in there.

Gripe #2: the CPU is 1.6Ghz. That's not a speedster by any standard. My laptop is faster
than that.

Gripe #3: Vista Home Basic is ridiculously slow. A lot of the user interface has changed
so it's difficult to find things.

Gripe #4: MS Office 2007 has completely discombobulated the user interface. I couldn't find
anything until, after some random clicking, we clicked on the icon at the top right
of the window (sort of like the [Start] button, except Vista doesn't have a start
button anymore, it's just an icon, in Office, it's a something button that brings
down a vertical menu because the former horizontal menu isn't there anymore).

Even the random clicking didn't help much at first since the computer is so slow
that clicking on that icon didn't do anything (or it caused some flicker as the
menu popped down and then went away instantly due to clicks elsewhere).

Boy, I'd forgotten how dumb windows makes me feel. And the fact that MS changes the user interface gratuitiously and so radically makes me wonder about its future. I'm not going to have any fun helping my friend with her computer. If I had the time I'd save her a bunch of money and get some other brand from any store other than the one she bought from, and definitely not HP either. Unfortunately, she's going to upgrade her computer to a better CPU and I'll go back there to help her set things up on it. The only fun I'll be having is when I surreptitiously repartition the drive and put Linux on there, and when I claim my unlimited beers. The beers will deaden the pain of working with Vista (yech) and Office (yech, yech, gaaak), and HP (gaak).

Thursday, August 09, 2007

10 things your IT guy wants you to know

Good article on 10 Things your IT guy wants you to know.

I like it because it's solution oriented and very mature. the very opposite of the BOFH