Wednesday, March 23, 2005

Anti-spam filtering the Bayesian Way

Anti-spam filtering the Bayesian Way

I don't know about you, but I get hundreds of spam emails per day. I even get spam to my domain names that don't even have a website up yet. I'm Sooo Fed up with spam, I just want to strangle somebody! I've got dozens of Outlook rules set up to no avail. To top it off, every once in a while Outlook loses its mind and completely ignores my filters and spills all the junkola right into my inbox. Arrgh!

Filters help some but, filtering by rules doesn't work when the spamsters use a different email address every time, a fake email address, or a randomly generated subject.

I just love all those naive people whose best anti-spam advice is "What's the big deal about spam? Just use the delete key." It makes me wonder if these guys are spammers themselves. That would explain their nonchalant attitude about the whole subject while the rest of us are pulling our hair out, killing our productivity, spending as much as an hour a day just deleting spam. I need an intelligent anti-spam solution but one that won't delete my important business emails.

I recently tried out SpamBayes which is free (those SourceForge guys, you just gotta love'em!). It only works with Outlook, not Outlook Express. Installation is easy enough. When you re-open Outlook after installing the program, the configuration wizard starts. You can either let it do it's thing and learn as it goes along, or if you want faster results you can take a proactive approach and separate your spam from your non-spam or "ham" as they put it.

SpamBayes learns to detect Spam based on training that you provide. You filter your good emails into one folder and filter bad emails into a different folder. You point it towards your spam emails as if it were a drug-sniffing dog. And then you show it your good emails. The more emails you show it, the more effective it is. The learning process is ongoing. SpamBayes doesn't delete the spam. It puts it into a special folder. It puts the suspected spam into another folder. That's the part I like. I can't afford to lose even one business email so I can't use a program that automatically deletes emails.

If a suspected spam email turns out to be legit, then you just click on that email and click on a button that lets SpamBayes know it isn't spam. And this trains the program to recognize it as legitimate email in the future. Likewise if a spam email gets past the filters, click on it, then choose "Delete as Spam" and Voila! you just helped the program get even smarter the next time it sees a similar spam e-mail.

Nosy techie that I am, after some snooping around the program, I found that the way this program works isn't all mystery. You can actually see the man behind the curtain pulling the strings. When you click on an e-mail in the suspected junk mail folder, then click on "Show spam clues for current message", you can actually see which words in the e-mail are causing the email to be considered spam, and the spam probability percentage. This is great information because it also helps you to reduce the spam factor of your own e-mails that you send out. It also lets you know which of your anti-spam tricks (such as using the word "freee") ain't foolin' the spam filters one bit.

Disadvantages: I would like to be able to tell it which e-mails are definitely spam. For example, e-mails sent to specific email addresses that I've never used are always spam. But guess what? So far I don't have to tell it. It knows already. I'm so used to using filters that you have to hold it's hand and tell it what to do that it's hard to get used to letting the program figure it out on its own.

And the results...Wow! I'm actually waiting for the spam e-mails in anticipation just to see what this baby does with it. I'm blown away by this program so far after using it a couple of hours. I'll let you know how it's working after I've used it a bit longer.

Get SpamBayes yourself at:

http://spambayes.sourceforge.net/windows.html

No comments: