Imagine for a second that the Google index is a big sheet of glass. Now, imagine GoogleBot is a big, fat, black marker. How do you go removing traces of ink? In other words, how do you get GoogleBot to de-index .Mac pages?

Once upon a time, FJZone.org was a .Mac hosted site. It was as uninteresting as it is today, a lot more purple and, dare say it, it used tables for layout. A good year and a half ago, I moved it from .Mac to my first real host, a company with which I still am in great terms today.

Over the years, the Zone has been moved and removed more than reasonable. Now, I always strive to use proper redirection in my htaccess files so as not to break anything but, as always, a couple links are no longer valid. Whenever such a situation arises, I ask Google to remove the page in question from their index. Not an ideal solution by any means, but it prevents disappointments and helps reduce on the number of incoming visitors requesting that particular link. Of course GoogleBot is supposed to be auto-updating but that never proved true in my experience.

For links that used to live at my host’s, the solution is simple: remove the page, add a mod_rewrite declaration in the appropriate htaccess file so that requesting it returns a “Gone” error code and fill in a form on Google’s site.

Pages created on .Mac however are the source of a never-ending headache. Indeed, whenever one requests a page on an account that no longer exists (such as the former .Mac FJZone account), the Apple servers dutifully serve a tri-lingual error page… all the while returning a “200 Found” code. In other words, as far as robots are concerned, .Mac pages live forever.

I’ve spent the past few weeks pondering on the topic and my conclusion so far is that the ball lies in .Mac’s camp. Does anyone know of a better solution?