Tue, 11 Jun 2013

De-spammed this blog (with Naive Bayes)

This morning, I was trying to decrease the amount of email in my inbox. I had a few messages with subjects like:

But all the comments in this case were spam. I'm using an Akismet API plugin for pyblosxom, but that has a few shortcomings. Like anything else, it misses some spam, but moreover, it doesn't help me find and remove old spam comments in bulk.

My pattern with email is basically to ignore it for a while, and then deal with it in bulk, sometimes missing messages from the past. The result is that I have often missed these comment notifications, and it was a bit of a drag to figure out which comments I had dealt with already.

So I wrote a small tool this morning. Here is how it works:

Voila! A spam moderation queue with artificial intelligence.

You can find it here, on my Github account:

Permission to re-use the code is granted under the terms of CC Zero or Apache License 2.0, at your option.

Moreover, now I believe there are zero spam comments left lying around this blog!

