Tuesday, March 27, 2007

dropping duplicate email in evolution

I recently had to reload all my mail from gmail again (I have an always-on computer that runs postfix and stores the email in maildir format, fetchmail gets the email from gmail and delivers locally, I then check my mail at that box regularly).

I had to reload because I decided to run a script that I have that removes duplicate emails from mboxes (evolution uses mbox, unfortunately). Due to a brain fart and Ctrl-C, and because the script isn't very smart, I lost some mboxes. It was simplest to just reload everything from gmail, even if it takes three days or so to get everything from gmail.

My scripts used formail and wrote temporary files which, once complete, were written back to the main mbox, with duplicates removed.

I'm switching to a safer alternative. In Evolution, I've created a filter thus:

Filter action is Pipe to Program
The program is /usr/bin/formail -D 9999999 /home/tiger/.evolution/dup-idcache
Action is move to some folder (e.g., ZZZZDups)

I've tested it with a mailbox that was horrendously bloated due to duplicate emails, and it found the dups, kept one email in the original mailbox and moved the duplicates to the designated dup folder). I double checked the emails (well, sampled, since there were too many) and ever dup found really was a dup (same email was in the original mailbox).

This is a much safer way to find dups, although I'll need to Ctrl-E the Inbox regularly, it's not that efficient (runs formail for every incoming email) and evolution doesn't much like the program specification (when I open filters, it shows an error with the filter with spaces converted to %20, etc., so URL-encoded). It's very convenient though and I'll use this until I can figure out a better way (probably hack up a script that hides the parameters and returns what formail returns).

The same technique works pretty well with bogofilter. I'd wondered how to integrate bogofilter into evolution. I'm OK with just finding spam and running bogofilter -s on it directly, so I don't need any fancy interface. For now (well, for my first test run earlier), bogofilter successfully found spam and moved them to a designated folder. I'll see if it catches more spam in the future.

I don't really get a lot of spam anymore. Gmail catches most of what goes there, and the mailserver at work has reasonably good spam filters. But sometimes some spam does get through. I'll see if bogofilter does a good job with those.

No comments: