I noticed some strange referrers in my http logs. I saw some entries like:
18.104.22.168 - - [08/Aug/2005:01:30:59 -0700] "GET /imgs/rss20_logo.gif HTTP/1.1" 200 989 "http://lisa.duckdriver.com:3420/cache.px?id=22823" "Mozilla/5.0 (compatible; Konqueror/3.1; Linux)"
I have blocked access from clients with this referrer, and from the host that is doing the caching itself (22.214.171.124).
Update: It looks like the site that had a cache of my content has been made inaccessible from the outside, and access are redirected to this page. I found something interesting on that page:
Because the bots do not 'spider' your website (ie. they do not recursively grab pages) they do not check robots.txt before scanning. The ability to regularly scan the submitted page is essential for the upkeep of the Blogwise database and we require this action to avoid delisting blogs unnecessarily.
So if I wanted to prevent the indexing from their spider, I wouldn't be able to use the standard robots.txt mechanism. This seems wrong.
Update #2: Sven from Blogwise added a comment describing that this is the Blogwise cache. It is not intended to be a public cache. I will be removing the blocks that I added before.