Monotrematica: 2004

Wednesday, December 22, 2004

Reading Multiple GMail accounts using the same computer user

I've had a bit of a hack in place for reading my two gmail accounts. I would open two copies of firefox (or mozilla), but using two different logins. The second would do an su and run the other firefox copy. E.g., I'd run firefox as user tiger, and then run firefox as user tiger1. I could then read my bopolissimus@gmail.com account as tiger, and the other firefox running as tiger1 would read bopolissimus.lists@gmail.com.

It worked out that way (couldn't read bopolissimus.lists@gmail.com as tiger) since when I'd surf to www.gmail.com, the browser would notice that it already had a cookie for there and would identify itself with that cookie.

I just noticed that it *is* possible to open two different gmail accounts as the same (linux logged in) user. Firefox has a -P parameter (for profile). The solution is:

1. start firefox with the -P parameter. This allows you to create,edit,delete profiles.
2. create two profiles (or one profile for the other account, use the default profile
for the first account).
3. when starting firefox, specify which profile to use with -P

This isn't as big a deal as it might seem, since the normal thing to do would
be to create a launcher (shortcut) on the desktop. For the main account,
set the program to run as "firefox -P default", and for the other account,
set it to "firefox -P [whatever_the_other_profile_is]".

Netscape has always had that feature, as does mozilla, I think. I just never
used profiles before. Now it comes in handy though.

Sunday, December 19, 2004

Misc Articles

I read a lot. mostly online, but dead tree too. I often find
something worth passing on, but I rarely do. No time, lazy,
not online. http://www.jerrypournelle.com is a good source
for good articles and discussion.

Here's one. Not my country, don't care that much. One of
the things that I didn't like about life in the US though.

A Nation of Wimps

Focus on the first derivative

Sunday, November 14, 2004

Things to do/learn

Things I need to do or learn when I find the time (not anytime soon, but maybe one thing at a time on weekends when current project is done).

1. Switch to Subversion from CVS (might do this soon).
2. Learn how to set up drbd (vmware or UML for testing).
3. Learn Postgresql Slony replication.
4. Set up bugzilla and mantisbt (the one just for fun, the other, probably for use).
5. Test setting up heartbeat and virtual servers (#2, maybe watch ian or cedric set this up?
More fun to figure it out on my own, but takes a lot longer :)
6. Test setting up ypserv and ypbind
7. openldap na rin for PAM and email and other auth...
8. Java this, Java that, lots of new stuff to look at there.
9. Test MySQL transactions (InnoDB) and subselects (latest version). Of course MySQL
really isn't usable yet for serious databases until something like plpgsql becomes
available for it. But it's getting there.

hahay, kailan pa ba magkaka-oras para sa lahat na iyan.

Saturday, October 09, 2004

CVS on USB Flashdisk

I recently got a sandisk 128MB USB flash disk and I've decided to use it as my CVS repository (or arch, I'm looking into that, or svn, I'll look into that after I look at arch). I shuttle between multiple sites, and they're not all on the internet. Some of them are completely firewalled off from the world.

I don't share the repository with anyone, so there's no problem keeping it in usb. Of course that wouldn't work well for a team. But for me, well, I just like having all my code, editing history, and releases in a repository, so a USB disk is fine as a repository.

The only thing I'm worried about is the write cycle. Need to research on that. Some flash memories are supposed to have only a limited number of write cycles and beyond that limit it's not possible to write to the device anymore.

-- flash -- I decided on svn. I didn't like the arch interface (actually, I couldn't understand it, it's probably easy, I just didn't give it enough of a chance). I really like svn though. Performance isn't that great, but then my code base isn't that huge and svn has features that have been missing in cvs forever (file and directory rename, move, etc).

Porting Blogs

I've got another blog on another server but I'll be moving those posts here. Did one already, I'll be doing the rest slowly. Maybe one or two posts a day. I won't be doing the comments though. Not motivated enough for that. Hmmm, I should go over there and get the mysql database so I've got it here, in case the server goes away.

Sunday, October 03, 2004

Consulting

I recently moved into consulting, after a few comfortable and enjoyable years developing software for an internet service provider.

I'm enjoying it a lot. It's not the money (I've seen more of that than I used to make, but I've also moved to where things are more expensive, so things even out). Rather, it's the fact that I work with many different technologies, so I get to learn more as I work.

Lately, I've been learning postgresql functions and triggers in plpgsql. I went back to something I used to do all the time, code generation. I wrote a multi-threaded program in C/C++ and learned the benefits of STL at the same time. Of course I already knew what the STL could give me, I just hadn't actually experienced the benefits yet.

After using STL for a few days, I dropped my handcrafted string and container classes.

Doxygen is great and I'll be getting into PHP+SOAP (it should really be java+SOAP or something similar, but I need quick wins, like results within a week, and java is too complicated for that kind of thing, maybe on my next project I'll work with java).

One thing about consulting is, there's always something different happening, so burnout due to boredom won't happen. Although burnout due to stress is always a possibility :). But then I can usually set my own schedule, so that's *less* likely than it seems.

Friday, August 20, 2004

Gmail Retroactive application

I've wanted to apply filters retroactively in Gmail (i.e., if I've got 10MB of mail in GMail and now I want to organize it into labels, I create the labels, create the filters to auto-archive and set labels, but how do I apply the same filter to old mail?).

There didn't seem to be an obvious way to do that. But then that's because I was thinking about it as "applying filters". So naturally I was looking for it in the Filter stuff (create, edit, delete, test, etc).

As it happens, there'ss a simple way to do retroactive filters in GMail. Just search for the relevant email using the search function. It'ss not going to be exactly the same as filtering, UNLESS one uses the "Show search options" link. That brings up a dialog box similar to create a filter, except it works only for one search.

After the relevant emails have been found, it's a simple matter to select them all and apply a label and (i like to do this so they go away from the inbox but are still in the label/folders) archive them. It's not quite as easy as it might be, but it works. Now I just need to remember that that's how it's done.

Saturday, July 24, 2004

Spam Classification Results from an informal test

I'd been noticing that SpamAssassin, at a threshold of 4.5 and even with its built-in Bayesian scoring was just not performing as well as Bogofilter, which ONLY has Bayesian scoring (but of course, I tweaked the spam and ham cutoffs and other parameters around 3 months ago). I decided to do an informal test.

Procedure:

0. I used my already trained bogofilter and sa-learn setups. For about a month now, I've
been taking spam that bogofilter found but that spamassassin did not determine to be
spam, and I've been feeding them to sa-learn in hopes that spamassassin would eventually
score them as spam since spamassassin would learn through its bayesian test about the
spam that it had not found before. However, even after a month of this training, I see
the result documented below (i.e., spamassassin's bayesian component doesn't seem to
learn very well).

1. Get Mboxes from various sources. The Mboxes include spam and ham

2. Run the email through spamassassin and bogofilter. The bogofilter wordlist does not
include any spamassassin markup because all email is run through a filter that removes
such markup (and performs other cleanup, e.g., removing all lines with too many
consecutive characters without whitespace, the main effect of this is to throw away attachments
that are encoded via MIME, BASE-64 or other encoding schemes).

3. Have evolution group the email into ham, mail that only bogofilter thought was spam,
mail that only spamassassin thought was spam, and mail that both thought was spam.

4. Eyeball all that email (very quickly, mainly looking at from and subject lines, and then
viewing the body of suspicious email).

At the end of all that, I see the following numbers:

On the positive side for both:

1339 spam correctly classified by bogofilter

1337 spam correctly classified by both bogofilter and spamassassin

697 non-spam correctly classified by both bogofilter and spamassassin

0 false negatives by either bogofilter or spamassassin

0 false positives misclassified by bogofilter

On the minus side:

104 bogofilter false-negatives (spam that bogofilter didn‘t classify, all these false negatives were also misclassified as negatives by spamassassin)

90 false positives misclassified by spamassassin only (bogofilter correctly said they were not spam)

SpamAssassin has too high a false positive rate for me. Any false positives are a major problem since, with so much spam overwhelming the nonspam, false positives are very likely to hide in the spam noise and thus get lost. And while the rate here is very low in terms of probability, that is still too high for me.

False negatives aren't such a big deal since basically, the amount of spam is cut down to 1/100th or less of the true spam volume and the little spam left in inboxes is merely a nuisance and not the productivity destroyer that it used to be.

Given these results, where fully half of the spam I found is not correctly classified by SpamAssassin, I cannot afford to use only SpamAssassin. Of course, possibly my threshold of 4.5 is too high, but with the already too high levels of false positives now, lowering the threshold to catch more spam will mean that there will be an increase in false positives too. I‘ll continue my current system where both spamassassin and bogofilter are in use.

Email that bogofilter doesn't flag as spam but spamassassin does, is examined and, if it's really spam, sent to bogofilter for training.

If it's not really spam, then it's sent to sa-learn for training as –ham, so that the bayesian component will eventually learn that it isn't spam and, hopefully, contribute to decreasing the spamassassin scores of similar email in the future.

Email that bogofilter flags as spam but spamassassin doesn't is examined and if it's really spam, is sent to sa-learn for training.

If it isn't spam, then it's sent to sa-learn for training as –ham

Email that neither bogofilter nor SA classifies as spam but which *are* spam (false negatives) are trained as spam in both

I generally just delete email that is flagged as spam by both since my false positive rates are zero, I haven<'t seen any false positives from bogofilter, or from bogofilter+spamassassin in a year

Friday, July 23, 2004

Mailbomb DDOS and Postfix solution

We'resuddenly getting hit by a DDoS that's mailbombing our SMTP server with many simultaneous incoming emails for email addresses that don't exist. So we're getting a lot of errors in our logs about rejected email because of "User unknown in local recipient table". It took us a while to get a handle on this. We got part of the way with some hacks, but the server was still unstable. I posted questions on the Philippine Linux User's Group mailing list and the postfix-users mailing list, and I've got a recipe of things to mitigate the problem.

Orly at mozcom says to do:

disable_vrfy_command = yes
smtpd_banner = $myhostname NO UCE ESMTP
smtpd_delay_reject = no

# slowing down bad clients [added recommendations from wietse]
# we NEED hard_error_limit in order for dictionary-attack stoppage to work
smtpd_error_sleep_time = 0s
smtpd_soft_error_limit = 5
smtpd_hard_error_limit = 10
smtpd_timeout = 30s

and Victor Duchovni on the postfix mailing list gave me the smtp_error_sleep time thing too. Thanks to both. We've checked with upstream and downstream mailservers and they're not getting bombed. So it's probably a targetted DDoS. Some competitor in CDO is sufficiently worried about us that they're willing to pay real money to have thousands of zombie computers out there (many of the IPs resolve to dsl and cable companies in the states, so they're always-on, high bandwidth, cracked-wide-open windows boxes being orchestrated to attack us at the same time) attack us. We had a similar problem around midnight one night, very high UDP packets coming in. Ah well, there's probably no way to trace this back to the person or company that commissioned this short of going and finding the person/persons who cracked those zombie machines and, well, dismembering them little by little until they squeal.