Does Anything Really Disappear from the Internet?

magician1.jpgI just posted about the Wayback Machine and that got me wondering whether anything really disappears from the Internet when it is deleted. Certainly, a ton gets archived in the Wayback Machine as well as in Google cache and in RSS readers. Of course, if something appears on the Internet, somebody could see it and copy it before it gets taken down.

But I was wondering to what extent information can vanish completely from the Internet. Thus, if a blogger posts something and then deletes it a minute later, can it escape from permanent fame? Maybe some ill-fated performances might be so brief that they can sneak on and off the Internet without being caught. What about a comment to a blog post that gets zapped quickly by the blog author? Can this escape becoming part of some permanent record?

The question, put another way: Can something posted briefly on the Internet, seen and heard by hardly anyone, not snatched up by anybody, and then deleted, be gone forever? Is there an Internet equivalent to a tree falling in the forest that nobody hears?

I don’t know the answer to this question, and I would like to hear from those with more technical expertise.

UPDATE: People with expertise have answered, and their replies are worth checking out if you’re interested in the issue.

You may also like...

7 Responses

  1. Sure it can vanish… if you didn’t send it anywhere (e.g. via RSS), and if no one came to read it in the interim. For unpopular unlinked sites that can be a very long time. For a site like this one…well, just hope you didn’t ping any sites to come and do updates, or have the bad luck to be visited by a googlebotor other robot, not to mention a person who kept a copy). As a practical matter, how long the ‘window of forgiveness’ may be depends on your traffic…and luck. But sure, lots of old stuff is gone forever, and new stuff too can vanish un-archived, especially if you get to it quickly enough.

    [And if your sever is on Unix, when a file is deleted/changed it is much more erased than on Windows.]

  2. Yes, you can stay out of the wayback machine on archive.org and out of most search engines– you put a robot exclusion on your website. you can also just stay out of the wayback machine and in the search engines etc. search on “robot exclusion” and you will find the magic incantations.

    -brewster

  3. I’ve actually spent time trying to run things down that went away. Especially comments on blogs that have been deleted, etc.

    The archive cycle, especially a while back, was not an hourly or even a daily one.

  4. Paul Gowder says:

    It always surprises me that the engines seem to obey robots.txt, but they do. I don’t know why, unless they think it will insulate them from liability.

    I have successfully destroyed files from the early days of the internet, but today…

  5. John Armstrong says:

    I used to blog back in college before the word was invented. Had to hard-code the HTML myself. That site is mostly dust in the wind. I’ve only ever been able to find the splash page archived anywhere, imploring surfers to look on my works and despair.

    Could a site be up for more than a year and be scoured from the net anymore? I doubt it.

  6. Marie says:

    Deleting the URL from your blog program, doesn’t necessarily delete it from the net.

    Tip: To “delete” content from places like Bloglines’ cache, replace the words on the page with new words, or a dot, or something else. But, do it quickly – before you post the number of posts (lastn=”_”) specified in your feed file. I hope that makes sense.

    Anyway, when Bloglines makes its next pass, it’ll cache the new words and the old words will be gone. In theory, this should work with Goggle’s bot, too.