TNL.net is designed for modern browsers but the content is still readable in older ones. If you want to ensure the best experience, please install a browser that was developed after 2009.

tnl.net

Google has 24 billion items index, considers MSN search nearest competitor

From John Battelle’s site comes the news that Google has decided to drop the num­ber of doc­u­ments it listed on its front page. The com­pany now claims its index is three times larger than its near­est com­peti­tor. Let’s look at the number.

Google vs. Yahoo

A few weeks ago, Yahoo! claimed that its index was over 20 bil­lion items large, bro­ken as follows:

just over 19.2 bil­lion web doc­u­ments, 1.6 bil­lion images, and over 50 mil­lion audio and video files

If we assume that Google believes its near­est com­peti­tor is Yahoo!, this would put the Google index at roughly 60 bil­lion items, a fairly large num­ber, which is prob­a­bly on the high side. So we need to do more analy­sis in order to get closer to the truth.

Google vs. Google

As part of Google’s sev­enth birth­day cel­e­bra­tion, Google staffers posted an entry on the offi­cial Google blog, claim­ing that their index is now 1,000 times the size of their orig­i­nal index. If that’s the case, fig­ur­ing out what the orig­i­nal index size was should give us a good num­ber. For­tu­nately, I have a copy of John Bat­telle’s excel­lent book about the com­pany (it’s enti­tled The Search, which is a must-read for any­one inter­ested in the search space. No other book has got­ten as deeply into the his­tory of inter­net search and few have ana­lyzed more keenly poten­tial futures for Google). In the book, Bat­telle relays an email from Larry Page to Terry Wino­grad dated July 15, 1996. In order to give some con­text, one has to real­ize that Google started in March of 1996 so, in July of that year, Google was all of four months old. The email is regard­ing some of the growth issue that the search engine is hav­ing and reads (empha­sis is mine):

I am almost out of disk space.

I have down­loaded about… 24 mil­lion unique URLs

and about 100 mil­lion links… I think I will need 8 gigs more to store every­thing… Cur­rent retail prices are about $1000/4 gigs… I have only about 15% of the pages but it seems promising

If we take that num­ber as a start­ing point, that would mean that the orig­i­nal index was around 24 mil­lion pages. From there, it is easy to mul­ti­ply by the 1,000 fac­tor they talk about in their blog and get a num­ber of items in the Google index.

That num­ber would be

24 bil­lion items in the Google Index

, a lit­tle more than what Yahoo! has in their index.

Google vs. MSN

In Novem­ber 2004, MSN was esti­mated to have about 5 bil­lion pages. Ken Moss, the Gen­eral Man­ager of MSN Search claimed that they added a lot to their index. While he’s not forth­com­ing with any detailed infor­ma­tion in his post, we can still assume that the MSN search index is now larger than 5 billion.

This is inter­est­ing in itself in that it may actu­ally help us tri­an­gu­late to the right size for the Google index. If we try dif­fer­ent growth curves against the MSN search, we could look at the following:

If we take Google’s assess­ment that it is three times larger than its near­est com­peti­tor and assume that Google is con­sid­er­ing MSN search to be its near­est com­peti­tor, those growth curves trans­late as follows:

When one looks at those results, a pat­tern emerges: Let’s first remem­ber the rough claim of 24 bil­lion based on the Google vs. Google analy­sis above. On the 50% MSN growth curve, Google is at 22.5 bil­lion items indexed. On the 75% MSN growth curve, Google is at 26.5 bil­lion items indexed. It could then be that Google con­sid­ers MSN Search, and not Yahoo! to be its near­est com­peti­tor, as the 24 bil­lion mark seems to fall right in between.

Con­clu­sion

While the index size is largely a game of pub­lic rela­tions, it appears that the Google index is sit­ting some­where between 22.5 and 26.5 bil­lion items indexed and, more prob­a­bly than not, at the 24 bil­lion items indexed mark. This gives it a slight edge over the Yahoo! index and shows that the com­pany con­sid­ers Microsoft its near­est com­peti­tor. Of course, this is my own spec­u­la­tion so your mileage may vary.

Originally published on September 27, 2005 in Business, Technology . You may find related thoughts pieces under the following terms: , , ,

  • Pingback: VirgoBrain’s Vision of the World » Blog Archive » Speculating about the Semantic Web: Where is the Data?

  • Pingback: VirgoBrain's Vision of the World