Sunday, August 7, 2005

duckdriver referrer


I noticed some strange referrers in my http logs.  I saw some entries like:


80.175.64.69 - - [08/Aug/2005:01:30:59 -0700] "GET /imgs/rss20_logo.gif HTTP/1.1" 200 989 "http://lisa.duckdriver.com:3420/cache.px?id=22823" "Mozilla/5.0 (compatible; Konqueror/3.1; Linux)"


It looks like there is a web site called DuckDriver which states it is a Personal Internet Manager.  It looks like it is caching my site.  But that is not the bad part, it looks like it is stripping part of my html.  One of the things that it is stripping is my Google Adsense javascript.


I have blocked access from clients with this referrer, and from the host that is doing the caching itself (80.175.64.72).


Update:  It looks like the site that had a cache of my content has been made inaccessible from the outside, and access are redirected to this page.  I found something interesting on that page:


Because the bots do not 'spider' your website (ie. they do not recursively grab pages) they do not check robots.txt before scanning. The ability to regularly scan the submitted page is essential for the upkeep of the Blogwise database and we require this action to avoid delisting blogs unnecessarily.



So if I wanted to prevent the indexing from their spider, I wouldn't be able to use the standard robots.txt mechanism.  This seems wrong.


Update #2:  Sven from Blogwise added a comment describing that this is the Blogwise cache.  It is not intended to be a public cache.  I will be removing the blocks that I added before.


Technorati Tags: , ,