TNL.net is designed for modern browsers but the content is still readable in older ones. If you want to ensure the best experience, please install a browser that was developed after 2009.

tnl.net

The state of HTML validation

There’s been a lot of talk about HTML5 recently and, in some geek cir­cles, there have been snick­ers when com­pa­nies have done a poor job of imple­ment­ing it. But what is the true state of html5. To find out, I decided to check whether the top sites on the inter­net had imple­mented it and how suc­cess­ful they were in doing so.

Method­ol­ogy

One of the first thing in this effort was to get a decent list of sites. Unfor­tu­nately, it seems that it has become increas­ingly dif­fi­cult to get a sense of which sites are the most pop­u­lar when it comes to num­ber of vis­its. I even­tu­ally set­tled down on Alexa’s Top Sites list because it fea­tured most of the sites peo­ple think of when con­sid­er­ing what large sites are and includes a few non-US sites.

I then used the W3C Val­ida­tor against each of the top 25 sites. This allowed me to get 3 dif­fer­ent pieces of information:

Sur­pris­ingly, a num­ber of pop­u­lar Web 2.0 sites were not in Alexa’s Top 25 so I cre­ated a sep­a­rate list for them.

Top 25

Look­ing at the top 25, here are the results:

Name Doc­type Encod­ing Val­i­da­tion
Google HTML 5 iso-8859–1 37 errors, 3 warnings
Face­book HTML 5 utf-8 34 errors
YouTube HTML 5 utf-8 120 errors, 2 warnings
Yahoo! HTML 5 utf-8 144 errors, 8 warnings
Blog­ger HTML 4.0 Strict utf-8 34 errors, 45 warnings
Baidu HTML 5 gb2312 6 errors, 6 warnings
Wikipedia HTML 5 utf-8 5 errors, 1 warning
Win­dows Live HTML 4.01 Transitional utf-8 33 errors, 17 warnings
Twit­ter HTML 5 utf-8 5 errors, 1 warning
QQ.com XHTML 1.0 Transitional gb2312 val­ida­tor crashed
MSN XHTML 1.0 Strict utf-8 Com­pletely valid
Yahoo Japan HTML 4.01 Transitional utf-8 26 errors, 24 warnings
LinkedIn HTML 5 utf-8 12 errors, 1 warning
Google India HTML 5 iso-8859–1 40 errors, 2 warnings
Ama­zon HTML 4.01 Transitional iso-8859–1 516 errors, 125 warnings
Sina.com.cn XHTML 1.0 Transitional gb2312 val­ida­tor crashed
Taobao.com HTML 5 gb2312 val­ida­tor crashed
Word­Press XHTML 1.0 Transitional utf-8 4 errors
Google HK HTML 5 Big5 40 errors, 1 warning
Google Ger­many HTML 5 iso-8859–1 37 errors, 3 warnings
Ebay HTML 4.01 Transitional utf-8 386 errors, 19 warnings
Yan­dex HTML 4.01 Transitional utf-8 52 errors, 12 warnings
Google UK HTML 5 iso-8859–1 37 errors, 3 warnings
Google Japan HTML 5 shift_jis 39 errors, 1 warning
Bing XHTML 1.0 Transitional utf-8 16 errors

Look­ing at the data, the first thing that is inter­est­ing is how many sites have made the switch to HTML 5. Of the top 25 sites, 14 have made the switch to HTML 5. This means than in the last year, 56 per­cent of the largest sites on the inter­net have com­pletely mod­i­fied their code base to com­ply with a new stan­dard. 6 sites are still left on the old HTML stan­dard and 5 are stick­ing to the some­what more recent XHTML standard.

How­ever, it is also inter­est­ing to note that none of the sites which have made the tran­si­tion com­ply with proper HTML stan­dards. In fact, of the top 25 sites in the Alexa list, only MSN was found to pro­vide com­pletely valid code. Maybe Microsoft could point those peo­ple towards their other prop­er­ties. Ama­zon was the worst offender, with 516 errors in their code, show­ing that dis­re­gard for stan­dard com­pli­ance does not seem to have an impact on eco­nomic per­for­mance. However, Ebay and Yahoo came closely behind with hun­dreds of errors in their code, maybe high­light­ing Ama­zon as an exception.

Another inter­est­ing phe­nom­e­non is that most of the large sites have adopted UTF 8, the encod­ing type that sup­port most lan­guages, as their default lan­guage. Once again, over half (56%) of the sites have switched with Ama­zon and Google being among the rare excep­tions. An inter­est­ing aside here is that the W3C val­ida­tor may have issues when it comes to val­i­dat­ing chi­nese sites as it was not able to fin­ish the job.

Web 2.0 Companies

Look­ing at Web 2.0 com­pa­nies, the data was surprising:

Name Doc­type Encod­ing Val­i­da­tion
Face­book HTML 5 utf-8 34 errors
YouTube HTML 5 utf-8 120 errors, 2 warnings
Blog­ger HTML 4.0 Strict utf-8 34 errors, 45 warnings
Twit­ter HTML 5 utf-8 5 errors, 1 warning
LinkedIn HTML 5 utf-8 12 errors, 1 warning
Word­Press XHTML 1.0 Transitional utf-8 4 errors
Flickr HTML 5 utf-8 15 errors, 3 warnings
Tum­blr XHTML 1.0 Transitional utf-8 19 errors
Foursquare XHTML 1.0 Strict utf-8 40 errors
Groupon XHTML 1.0 Transitional utf-8 6 errors
Zynga XHTML 1.0 Transitional utf-8 4 errors, 6 warnings

I cap­tured the data for com­pa­nies other than those in the top 25 and a few inter­est­ing trends seem to pop up. The first thing that came as a sur­prise is that there seems to be that a lower num­ber of sites have made the tran­si­tion to HTML 5, with only 5 sites out of 11 (or 45 per­cent) hav­ing com­pleted the tran­si­tion. There seems to still be a strong pref­er­ence for XHTML as the way to encode pages.

Also of note is that all sides have plans for glob­al­iza­tion, encod­ing their page in the UT-8 for­mat that can sup­port both west­ern and non-western alphabets.

How­ever, none of the sites suc­cess­fully val­i­date in any of their pre­ferred stan­dard. It looks like there is still much room for improve­ment in the world of HTML validation.

Originally published on August 21, 2011 in Technology . You may find related thoughts pieces under the following terms: , , , , , , , , , , , , ,

  • Pingback: The State of HTML5 Validation According to Tristan Louis

  • Rose­marie Pritchard

    This is prob­a­bly just cheeky, but: this entry has 46 errors :P

    • http://www.tnl.net Tris­tan Louis

      It’s actu­ally 15, after I real­ized that 2 word­press plu­g­ins were cre­at­ing mass amounts of errors. Still have to fig­ure out how to deal with the other 15 errors.

  • Anon

    there is so much more to HTML than val­i­da­tion. It’s ok to break a few rules now again if you know why you’re break­ing them. Val­i­date, explain to your­self and have a nice cup of tea and stop worrying.

    much love

    • http://www.tnl.net Tris­tan Louis

      Corn Syrup tastes won­der­fully sweet so don’t worry about the fact that it causes obe­sity” seems to fall under the same line of argu­ment. Yes, HTML is awe­some but it doesn’t mean it couldn’t get bet­ter. Can’t we all work together to upgrade the web to some­thing that is as good as (or bet­ter than) it is today? Wouldn’t it be awe­some if most web sites were as advanced as they are today AND also val­i­dated, allow­ing for both for­ward and back­ward compatibility?

      • Anon 2

        Your anal­ogy intro­duced an effect, when your orig­i­nal argu­ment had none. That is, you say that corn syrup causes obe­sity, but you never estab­lished that the val­i­da­tion errors on major web­sites are caus­ing any kind of real-world problem.

        The best exam­ple, prob­a­bly, is the Google home­page. The designers/developers/maintainers of that page are inti­mately aware of exactly which val­i­da­tion errors they are trig­ger­ing, and are keep­ing those errors because they allow for a major reduc­tion in band­width with­out los­ing any users.

        We aren’t all top 25 web devel­op­ers, though, which is why val­i­da­tion can still be a con­struc­tive goal for us, but also why this post strug­gles to be rel­e­vant to readers.

      • http://www.tnl.net Tris­tan Louis

        I guess you helped me refine the post. As you point out, it’s still a con­struc­tive goal, which is what we should all strive for. The way I look at it is that if even 1 devel­oper is look­ing to make sites more com­pli­ant as a result of this post, it’s been a success.