TNL.net

Google has 24 billion items index, considers MSN search nearest competitor

27th
2

From John Battelle’s site comes the news that Google has decided to drop the num­ber of doc­u­ments it listed on its front page. The com­pany now claims its index is three times larger than its near­est com­peti­tor. Let’s look at the number.

Google vs. Yahoo

A few weeks ago, Yahoo! claimed that its index was over 20 bil­lion items large, bro­ken as follows:

just over 19.2 bil­lion web doc­u­ments, 1.6 bil­lion images, and over 50 mil­lion audio and video files

If we assume that Google believes its near­est com­peti­tor is Yahoo!, this would put the Google index at roughly 60 bil­lion items, a fairly large num­ber, which is prob­a­bly on the high side. So we need to do more analy­sis in order to get closer to the truth.

Google vs. Google

As part of Google’s sev­enth birth­day cel­e­bra­tion, Google staffers posted an entry on the offi­cial Google blog, claim­ing that their index is now 1,000 times the size of their orig­i­nal index. If that’s the case, fig­ur­ing out what the orig­i­nal index size was should give us a good num­ber. For­tu­nately, I have a copy of John Bat­telle’s excel­lent book about the com­pany (it’s enti­tled The Search, which is a must-read for any­one inter­ested in the search space. No other book has got­ten as deeply into the his­tory of inter­net search and few have ana­lyzed more keenly poten­tial futures for Google). In the book, Bat­telle relays an email from Larry Page to Terry Wino­grad dated July 15, 1996. In order to give some con­text, one has to real­ize that Google started in March of 1996 so, in July of that year, Google was all of four months old. The email is regard­ing some of the growth issue that the search engine is hav­ing and reads (empha­sis is mine):

I am almost out of disk space.

I have down­loaded about… 24 mil­lion unique URLs

and about 100 mil­lion links… I think I will need 8 gigs more to store every­thing… Cur­rent retail prices are about $1000/4 gigs… I have only about 15% of the pages but it seems promising

If we take that num­ber as a start­ing point, that would mean that the orig­i­nal index was around 24 mil­lion pages. From there, it is easy to mul­ti­ply by the 1,000 fac­tor they talk about in their blog and get a num­ber of items in the Google index.

That num­ber would be

24 bil­lion items in the Google Index

, a lit­tle more than what Yahoo! has in their index.

Google vs. MSN

In Novem­ber 2004, MSN was esti­mated to have about 5 bil­lion pages. Ken Moss, the Gen­eral Man­ager of MSN Search claimed that they added a lot to their index. While he’s not forth­com­ing with any detailed infor­ma­tion in his post, we can still assume that the MSN search index is now larger than 5 billion.

This is inter­est­ing in itself in that it may actu­ally help us tri­an­gu­late to the right size for the Google index. If we try dif­fer­ent growth curves against the MSN search, we could look at the following:

If we take Google’s assess­ment that it is three times larger than its near­est com­peti­tor and assume that Google is con­sid­er­ing MSN search to be its near­est com­peti­tor, those growth curves trans­late as follows:

When one looks at those results, a pat­tern emerges: Let’s first remem­ber the rough claim of 24 bil­lion based on the Google vs. Google analy­sis above. On the 50% MSN growth curve, Google is at 22.5 bil­lion items indexed. On the 75% MSN growth curve, Google is at 26.5 bil­lion items indexed. It could then be that Google con­sid­ers MSN Search, and not Yahoo! to be its near­est com­peti­tor, as the 24 bil­lion mark seems to fall right in between.

Con­clu­sion

While the index size is largely a game of pub­lic rela­tions, it appears that the Google index is sit­ting some­where between 22.5 and 26.5 bil­lion items indexed and, more prob­a­bly than not, at the 24 bil­lion items indexed mark. This gives it a slight edge over the Yahoo! index and shows that the com­pany con­sid­ers Microsoft its near­est com­peti­tor. Of course, this is my own spec­u­la­tion so your mileage may vary.

Related Posts with Thumbnails

Related Terms

, , ,

2 Comments

  1. 1VirgoBrain’s Vision of the World » Blog Archive » Speculating about the Semantic Web: Where is the Data? — October 19, 2006 at 7:11 pm

    […] Ref­er­ences [15] Gulli, A. and Sig­norini, A. 2005, “The index­able web is more than 11.5 bil­lion pages”, In Spe­cial inter­est Tracks and Posters of the 14th inter­na­tional Con­fer­ence on World Wide Web (Chiba, Japan, May 10 — 14, 2005). WWW ‘05. ACM Press, New York, NY, 902–903. DOI= http://doi.acm.org/10.1145/1062745.1062789 [20] T. Berners-Lee, J. Hendler, and O. Las­sila, “The Seman­tic Web”, Sci­en­tific Amer­i­can, vol. 284, no. 5, 2001, pp. 34—43 Avail­able at: http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 [21] Tris­tan Louis Blog, “Google has 24 bil­lion items index, con­sid­ers MSN search near­est com­peti­tor”, Sep­tem­ber 27, 2005 Avail­able at http://tnl.net/blog/2005/09/27/google-has-24-billion-items-index-considers-msn-search-nearest-competitor/ […]

  2. 2VirgoBrain's Vision of the World — October 19, 2006 at 10:55 pm

    [21] Tris­tan Louis Blog, “Google has 24 bil­lion items index, con­sid­ers MSN search near­est com­peti­tor”, Sep­tem­ber 27, 2005 Avail­able at http://tnl.net/blog/2005/09/27/google-has-24-billion-items-index-considers-msn-search-nearest-competitor/

Comments are disabled.