Thursday, January 20, 2005

Google and comment spam

Google and Yahoo have agreed to a new mechanism to help combat blog comment spam. They are going to respect rel="nofollow" attribute of links. When a link has this attribute these search engines will not follow the links.

I think that this will help to keep the rank of these websites down in the search results, but I don't think that this really solves the problem. These websites that have these urls in their comments will still be visible to users. I see these comments as a sort of graffiti on these web sites. I wouldn't want to leave a graffiti on a wall even if I have a note that says don't read this. (Which is what the nofollow attributes state.)

There are two solutions that I think are better:

  1. Have a Baysian spam filter for the comments. pLog has a great implementation of this. This is keeping all of the comment spams that I get from even apearing.

  2. Having Google or some other company keep track of all of the urls that are posted in comments. If a comment contains urls that have been posted too many times, the blog software could reject the comment.

  3. Use SURBL. This is something that I wrote about a while ago, and have been using for part of my email spam solution. This looks at uris in the body of a message and allows you to block messages that have spam uri listed. and will indicate if those url. SURBL uses different sources to get the list of urls, like SpamCop.


I have had great luck with pLog's Baysian spam filter. It has not mis-categorized any valid comments. I think that it helps that it trains on the body of the posts themselves. This helps since it gets a good corpus of ham content.

1 comment:

  1. I'm not a big fan of the second solution. It's tough to choose a number of comments that you allow with a URL.
    The first solution seems pretty good, and would probably solve my comment spam problems. Do you have any statistics about the percentage of comments that are miscategorized?


Unlocking Raspberry Pi Potential: Navigating Network Booting Challenges for Enhanced Performance and Reliability

I've set up several Raspberry Pis around our house for various projects, but one recurring challenge is the potential for SD card failur...