Forever is a long time
People say things live on the internet forever. With Twitter limiting access to old tweets and Google apparently becoming increasingly forgetful as it ages, that may not quite be the case.
Twitter tweets expiration
The foundation story of Twitter claims that the first tweet was made by Jack Dorsey and was “Just setting up my twttr“. But what was his second tweet? Or his third one? What was his first @ message? Today, it’s impossible to answer any of those questions because neither the Twitter search engine nor scrolling through the complete list of tweets from someone will provide you with all the results.
The Twitter search engine apparently expires content after a few days. Tweets become inaccessible after 3200 tweets or roughly three and a half days if you are tweeting at the top rate allowed on the service (users of Twitter are allowed a maximum of 1,000 tweets, which may explain why there have been so few uses of Twitter as a fully interactive type of service).
With Twitter now claiming an important role in events like the 2009 Iranian uprising or the 2011 events in the rest of the middle east, it seems that expiring tweets is a bad idea as it deletes an important historical record. At the current time, Facebook claims that developers can access “all of a user’s status” which might imply that their retention policy is stronger that Twitter’s.
Fun with Google searches
But social media may be the exception and not the rule so I decided to start looking at web pages, which have been around for almost two decades now. Searching the internet of the past is an interesting thing. For example, let’s look at the tech industry:
The Netscape IPO seen as the first big internet IPO, happened on August 10, 1995. Doing a search the week before and after returns 7 results.
Microsoft’s introduction of Internet Explorer was in August 1995, with a second big announcement in December of that year. A search for “Microsoft introduces internet explorer” in 1995 returns 40 results.
Some may claim that I am being unfair, picking events that happened before Google’s creation. So I decided to look at events after 1999, at a time that would be contemporary with Google’s existence.
For example, the presidential election of 2000 was one of the hottest political contest in American history. It pitted Al Gore (421 Google results between January 1, 2000 and January 1, 2001) against George Bush (418 results for the same time period) and left the country wondering who was the winner for several days. There wasn’t a 24 hour news channel or newspaper in the country that did not cover the events extensively. And yet, we are left with less than a thousand pages from the period.
Some of those pages in the Google index may not even be from that time period. For example, the last page in my search for “George Bush” in the time range of January 1, 2000 to January 1, 2001 returned a site called celebritytweet.com. Considering that twitter wouldn’t exist for a few more years, I have doubt that the site existed in 2000.
If politics may be too narrow a topic, maybe something like the attacks on the World Trade Center might have more impact. So doing a search for pages relating to the week it happened (I did a search with a date range between September 10, 2001 and September 18, 2001) would probably returns TONS of pages. The result, according to Google is 461 pages.
Let me repeat that figure: 461 pages of historical record for what is widely agreed as one of the most important historical event in our lifetime.
For a quick comparison, I decided to take a somewhat less important event from the past week. Sure, I could have gone for the raid on Bin Laden but instead I decided to go for something a little more inconsequential: Lady Gaga’s deal with Zynga. A search limited to the last week has returned 477 results.
So if Google is the arbitrer of what’s important and the repository of most of our collective memory, a visitor from another planet looking at it could easily conclude that Lady Gaga cutting a deal with Zynga was more important that the attacks of 9/11. I’m not one to pass judgment on the cultural importance of Lady Gaga but something tells me that either the Google algorithm is wrong here or the Internet tends to be a very forgetful place.
As more an more media becomes digital, the concept of media retention is becoming increasingly important. It should become a growing area of concern for most historian and archivists to see that large portion of the late 20th century and the beginning of the 21st century may be leaving behind a smaller footprint of data than previous era. Efforts like the Google Book Search project are making great strike making things like physical books more accessible by creating digital reproductions of that content but they should also start considering making more recent, already digitized data archived in some fashion. Otherwise, the lack of a past may make us more susceptible to creating a less perfect future.