Tuesday, July 12, 2005

Fixed filtering bug in pLog

I had been noticing that spam comment messages had been getting through my spam filter in pLog.  When I looked at the bayesian token database, it appeared that the tokens were being considered non-spam tokens.  I wasn't able to find the messages that had those tokens in them.

After a little investigation, I found the problem.  It appeared that if the message pass through the bayesian filter, and it thought that it wasn't spam, and then if it was blocked by another filter, the message was deleted.  This would prevent me from being able to correct the bayesian spam filter.

I made a fix for this and have put it in the bug report.  Now when a comment is rejected for any reason, if the bayesian filter through that it was not spam, it will untrain it and then retrain it as spam.  Once it is approved, I will check it in.

With this fix, all spam messages have been blocked.  (And the bayesian database is being trained correctly.)

Update: The large change was not taken into pLog 1.0.2, but I checked in a smaller one.  The checked in change just makes sure that the bayesian filter runs last.

Personally, I am running the larger change.  It will train messages a spam if they were blocked by any other filter.  I have posted the diffs (plog diff, plugin diff), so you can apply them your self, if you are interested.

Technorati Tags: , ,