Does Anything Really Disappear from the Internet?

You may also like...

7 Responses

  1. Sure it can vanish… if you didn’t send it anywhere (e.g. via RSS), and if no one came to read it in the interim. For unpopular unlinked sites that can be a very long time. For a site like this one…well, just hope you didn’t ping any sites to come and do updates, or have the bad luck to be visited by a googlebotor other robot, not to mention a person who kept a copy). As a practical matter, how long the ‘window of forgiveness’ may be depends on your traffic…and luck. But sure, lots of old stuff is gone forever, and new stuff too can vanish un-archived, especially if you get to it quickly enough.

    [And if your sever is on Unix, when a file is deleted/changed it is much more erased than on Windows.]

  2. Yes, you can stay out of the wayback machine on and out of most search engines– you put a robot exclusion on your website. you can also just stay out of the wayback machine and in the search engines etc. search on “robot exclusion” and you will find the magic incantations.


  3. I’ve actually spent time trying to run things down that went away. Especially comments on blogs that have been deleted, etc.

    The archive cycle, especially a while back, was not an hourly or even a daily one.

  4. Paul Gowder says:

    It always surprises me that the engines seem to obey robots.txt, but they do. I don’t know why, unless they think it will insulate them from liability.

    I have successfully destroyed files from the early days of the internet, but today…

  5. John Armstrong says:

    I used to blog back in college before the word was invented. Had to hard-code the HTML myself. That site is mostly dust in the wind. I’ve only ever been able to find the splash page archived anywhere, imploring surfers to look on my works and despair.

    Could a site be up for more than a year and be scoured from the net anymore? I doubt it.

  6. Marie says:

    Deleting the URL from your blog program, doesn’t necessarily delete it from the net.

    Tip: To “delete” content from places like Bloglines’ cache, replace the words on the page with new words, or a dot, or something else. But, do it quickly – before you post the number of posts (lastn=”_”) specified in your feed file. I hope that makes sense.

    Anyway, when Bloglines makes its next pass, it’ll cache the new words and the old words will be gone. In theory, this should work with Goggle’s bot, too.