Monday, April 18, 2005

Another trackback spam filter


I have modified the validatetrackback plugin, that I wrote earlier, to include an option to hook into the Bayesian spam filter that plog has built into it.  Now, when enabled, this plugin has two options.


  1. Check the url that is included in the trackback itself, and see if it points at a html page that has a trackback url.  If it doesn't then throw away the trackback.

  2. Tokenize the trackback, and pass it through the bayesian spam filter that plog has built in.  If the spam score is higher than the user specified threshold, then toss the trackback.





Notes:


  1. The bayesian filter will only do something if the user has enabled the spam filter for their comments

  2. I am not sure if this will be as effective for trackback spams. In general trackback spams are limited to 255 characters.  There may not be enough tokens to correctly calculate a spam score.

  3. The bayesian spam filter is not trained on the trackbacks.  So this will not make the spam catching effectiveness any better or worse for comments.


For testing purposes, I have disabled the trackback url filter, and only have the bayesian filter enabled.  I am curious to see how well it does.