Imagine for a second that the Google index is a big sheet of glass. Now, imagine GoogleBot is a big, fat, black marker. How do you go removing traces of ink? In other words, how do you get GoogleBot to de-index .Mac pages?
Once upon a time, FJZone.org was a .Mac hosted site. It was as uninteresting as it is today, a lot more purple and, dare say it, it used tables for layout. A good year and a half ago, I moved it from .Mac to my first real host, a company with which I still am in great terms today.
Over the years, the Zone has been moved and removed more than reasonable. Now, I always strive to use proper redirection in my htaccess files so as not to break anything but, as always, a couple links are no longer valid. Whenever such a situation arises, I ask Google to remove the page in question from their index. Not an ideal solution by any means, but it prevents disappointments and helps reduce on the number of incoming visitors requesting that particular link. Of course GoogleBot is supposed to be auto-updating but that never proved true in my experience.
For links that used to live at my host’s, the solution is simple: remove the page, add a mod_rewrite declaration in the appropriate htaccess file so that requesting it returns a “Gone” error code and fill in a form on Google’s site.
Pages created on .Mac however are the source of a never-ending headache. Indeed, whenever one requests a page on an account that no longer exists (such as the former .Mac FJZone account), the Apple servers dutifully serve a tri-lingual error page… all the while returning a “200 Found” code. In other words, as far as robots are concerned, .Mac pages live forever.
I’ve spent the past few weeks pondering on the topic and my conclusion so far is that the ball lies in .Mac’s camp. Does anyone know of a better solution?


Can you upload replacement pages identical to Apple's 404 pages, but with meta tags that say "no-index, no-follow" and "redirect"?
(To be clear, the replacement pages are replacements for your old pages. The index.htmls and whatnot. It will consist of a lot of redundant files, but it will get the job done, hopefully.)
Carl,
Thank you for taking the time to post.
That would, indeed, probably work. It would however, in the current situation, require re-opening the FJZone .Mac account (which, indeed is something I could do). This however isn't overly practical, especially considering what it would take to globally solve the problem...
FJ
The pages may indeed be removed from Google's index, but if someone out there in the world has a link to your old page, Google finds it again, and re-indexes it. Try searching for the URL of the "bad" page with the link: parameter to see what pages are linking to the web page. Then you can ask if those web sites would change or remove the link.
Yes, Apple/.Mac is handling the situation poorly. They should be returning 404s. You should not have to pay for hosting at two places forever because of Apple's poor administration decision.
Have you submitted feedback to Apple about it?
http://www.apple.com/feedback/mac/tm.html
Anon,
I have indeed, thanks for reminding me! :-)
FJ
Michael,
Hmm, I would think Google would attempt to follow the link from a third-party site and, upon failure, refrain from adding the URL to its index again. Am I wrong?
FJ
I just checked, becvause .Mac updated today, and it looks like they fixed that.
If you type http://web.mac.com/fred/dfkdfk you get are proper 404.
Is it still broken for you?
Fred,
Thanks for your kind message. I am afraid even the link you sent does not work properly from here. Indeed, the HTTP headers resulting from attempting to load that page do specify "Found". The page itself is indeed a "404 page" to a human but not to a robot.
Do you think I am missing something?
Thanks again,
FJ
I hope this doesn't sound rude, but why did you even begin to use .Mac for a professional website? .Mac is designed for beginning web users, people who just want to post pictures of their babies or put up a simple blog or something. For professional websites, there are far better (and cheaper!) solutions. Since you obviously know a lot about HTTP protocols and web design, I don't understand why you considered .Mac in the first place.
Trevor,
Thanks for your comments. When FJZone started, a couple years ago, it was merely a place for me to post a resume and links to my O'Reilly articles. A lot of thought went into it as I progressively started using it as an experimenting ground but, at the time, .Mac was a perfect fit for what it was supposed to be (and stay!).
FJ
Have you tried a "Disallow" tag in the robots.txt to try to keep the pages from being re-indexed?
KTC
Kevin,
Thank you for your comment. Unfortunately, .Mac does not allow for the creation of robots.txt files.
FJ