<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TNL.net &#187; XHTML</title>
	<atom:link href="http://www.tnl.net/blog/tag/xhtml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tnl.net/blog</link>
	<description>Turning Data into Knowledge</description>
	<lastBuildDate>Wed, 08 Feb 2012 20:15:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='www.tnl.net' port='80' path='/blog/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>The state of HTML validation</title>
		<link>http://www.tnl.net/blog/2011/08/21/the-state-of-html-validation/</link>
		<comments>http://www.tnl.net/blog/2011/08/21/the-state-of-html-validation/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 00:45:51 +0000</pubDate>
		<dc:creator>Tristan Louis</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[HTML 4.0]]></category>
		<category><![CDATA[HTML 5]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[Markup languages]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[UTF-8]]></category>
		<category><![CDATA[XHTML]]></category>
		<category><![CDATA[YouTube]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[eBay]]></category>
		<category><![CDATA[validation]]></category>

		<guid isPermaLink="false">http://www.tnl.net/blog/?p=2657</guid>
		<description><![CDATA[What is the state of HTML5 compliance among large sites?<p><p><i><a href="http://tnl.net/who" rel="author" title="Who is Tristan Louis?">Tristan Louis</a> is the founder and CEO of <a href="http://www.keepskor.com" title="Keepskor">Keepskor</a> and  writes the influential <a href="http://www.tnl.net/" title="tnl.net">tnl.net</a> weblog, where this was initially posted under the title <a href="http://www.tnl.net/blog/2011/08/21/the-state-of-html-validation/">The state of HTML validation</a>. You can follow him on twitter <a href="https://twitter.com/TNLNYC">here</a> or receive his weekly newsletter by subscribing <a href="http://eepurl.com/gb6zD">here</a>.</i></p>
</p>
]]></description>
			<content:encoded><![CDATA[<p>There’s been a lot of talk about HTML5 recently and, <a href="http://news.ycombinator.com/item?id=2897756">in some geek circles</a>, there have been snickers when companies have done a poor job of implementing it. But what is the true state of html5. To find out, I decided to check whether the top sites on the internet had implemented it and how successful they were in doing so.</p>
<h2>Methodology</h2>
<p>One of the first thing in this effort was to get a decent list of sites. Unfortunately, it seems that it has become increasingly difficult to get a sense of which sites are the most popular when it comes to number of visits. I eventually settled down on <a href="http://www.alexa.com/topsites">Alexa’s Top Sites</a> list because it featured most of the sites people think of when considering what large sites are and includes a few non-US sites.</p>
<p>I then used the W3C Validator against each of the top 25 sites. This allowed me to get 3 different pieces of information:</p>
<ul>
<li><strong>Doctype</strong>: This is what the site declares as its HTML code version. In other words, how the site identifies what version of HTML it supports.</li>
<li><strong>Encoding</strong>: This is the language the site uses, which gives us a better understanding as to whether they are targeting a particular language or trying to offer a global site.</li>
<li><strong>Validation</strong>: This is how the site validated when tested for errors relating to the HTML version it purported to be offering. It gives us an idea as to how compliant with the standards the site truly is.</li>
</ul>
<p>Surprisingly, a number of popular Web 2.0 sites were not in Alexa’s Top 25 so I created a separate list for them.</p>
<h2>Top 25</h2>
<p>Looking at the top 25, here are the results:</p>
<table>
<tbody>
<tr>
<th>Name</th>
<th>Doctype</th>
<th>Encoding</th>
<th>Validation</th>
</tr>
<tr>
<td>Google</td>
<td>HTML 5</td>
<td>iso-8859–1</td>
<td>37 errors, 3 warnings</td>
</tr>
<tr>
<td>Facebook</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>34 errors</td>
</tr>
<tr>
<td>YouTube</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>120 errors, 2 warnings</td>
</tr>
<tr>
<td>Yahoo!</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>144 errors, 8 warnings</td>
</tr>
<tr>
<td>Blogger</td>
<td>HTML 4.0 Strict</td>
<td>utf-8</td>
<td>34 errors, 45 warnings</td>
</tr>
<tr>
<td>Baidu</td>
<td>HTML 5</td>
<td>gb2312</td>
<td>6 errors, 6 warnings</td>
</tr>
<tr>
<td>Wikipedia</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>5 errors, 1 warning</td>
</tr>
<tr>
<td>Windows Live</td>
<td>HTML 4.01 Transitional</td>
<td>utf-8</td>
<td>33 errors, 17 warnings</td>
</tr>
<tr>
<td>Twitter</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>5 errors, 1 warning</td>
</tr>
<tr>
<td>QQ.com</td>
<td>XHTML 1.0 Transitional</td>
<td>gb2312</td>
<td>validator crashed</td>
</tr>
<tr>
<td>MSN</td>
<td>XHTML 1.0 Strict</td>
<td>utf-8</td>
<td>Completely valid</td>
</tr>
<tr>
<td>Yahoo Japan</td>
<td>HTML 4.01 Transitional</td>
<td>utf-8</td>
<td>26 errors, 24 warnings</td>
</tr>
<tr>
<td>LinkedIn</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>12 errors, 1 warning</td>
</tr>
<tr>
<td>Google India</td>
<td>HTML 5</td>
<td>iso-8859–1</td>
<td>40 errors, 2 warnings</td>
</tr>
<tr>
<td>Amazon</td>
<td>HTML 4.01 Transitional</td>
<td>iso-8859–1</td>
<td>516 errors, 125 warnings</td>
</tr>
<tr>
<td>Sina.com.cn</td>
<td>XHTML 1.0 Transitional</td>
<td>gb2312</td>
<td>validator crashed</td>
</tr>
<tr>
<td>Taobao.com</td>
<td>HTML 5</td>
<td>gb2312</td>
<td>validator crashed</td>
</tr>
<tr>
<td>WordPress</td>
<td>XHTML 1.0 Transitional</td>
<td>utf-8</td>
<td>4 errors</td>
</tr>
<tr>
<td>Google HK</td>
<td>HTML 5</td>
<td>Big5</td>
<td>40 errors, 1 warning</td>
</tr>
<tr>
<td>Google Germany</td>
<td>HTML 5</td>
<td>iso-8859–1</td>
<td>37 errors, 3 warnings</td>
</tr>
<tr>
<td>Ebay</td>
<td>HTML 4.01 Transitional</td>
<td>utf-8</td>
<td>386 errors, 19 warnings</td>
</tr>
<tr>
<td>Yandex</td>
<td>HTML 4.01 Transitional</td>
<td>utf-8</td>
<td>52 errors, 12 warnings</td>
</tr>
<tr>
<td>Google UK</td>
<td>HTML 5</td>
<td>iso-8859–1</td>
<td>37 errors, 3 warnings</td>
</tr>
<tr>
<td>Google Japan</td>
<td>HTML 5</td>
<td>shift_jis</td>
<td>39 errors, 1 warning</td>
</tr>
<tr>
<td>Bing</td>
<td>XHTML 1.0 Transitional</td>
<td>utf-8</td>
<td>16 errors</td>
</tr>
</tbody>
</table>
<p>Looking at the data, the first thing that is interesting is how many sites have made the switch to HTML 5. Of the top 25 sites, 14 have made the switch to HTML 5. This means than in the last year, 56 percent of the largest sites on the internet have completely modified their code base to comply with a new standard. 6 sites are still left on the old HTML standard and 5 are sticking to the somewhat more recent XHTML standard.</p>
<p>However, it is also interesting to note that none of the sites which have made the transition comply with proper HTML standards. In fact, of the top 25 sites in the Alexa list, only MSN was found to provide completely valid code. Maybe Microsoft could point those people towards their other properties. Amazon was the worst offender, with 516 errors in their code, showing that disregard for standard compliance does not seem to have an impact on economic performance. However, Ebay and Yahoo came closely behind with hundreds of errors in their code, maybe highlighting Amazon as an exception.</p>
<p>Another interesting phenomenon is that most of the large sites have adopted UTF 8, the encoding type that support most languages, as their default language. Once again, over half (56%) of the sites have switched with Amazon and Google being among the rare exceptions. An interesting aside here is that the W3C validator may have issues when it comes to validating chinese sites as it was not able to finish the job.</p>
<h2>Web 2.0 Companies</h2>
<p>Looking at Web 2.0 companies, the data was surprising:</p>
<table>
<tbody>
<tr>
<th>Name</th>
<th>Doctype</th>
<th>Encoding</th>
<th>Validation</th>
</tr>
<tr>
<td>Facebook</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>34 errors</td>
</tr>
<tr>
<td>YouTube</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>120 errors, 2 warnings</td>
</tr>
<tr>
<td>Blogger</td>
<td>HTML 4.0 Strict</td>
<td>utf-8</td>
<td>34 errors, 45 warnings</td>
</tr>
<tr>
<td>Twitter</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>5 errors, 1 warning</td>
</tr>
<tr>
<td>LinkedIn</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>12 errors, 1 warning</td>
</tr>
<tr>
<td>WordPress</td>
<td>XHTML 1.0 Transitional</td>
<td>utf-8</td>
<td>4 errors</td>
</tr>
<tr>
<td>Flickr</td>
<td>HTML 5</td>
<td>utf-8</td>
<td>15 errors, 3 warnings</td>
</tr>
<tr>
<td>Tumblr</td>
<td>XHTML 1.0 Transitional</td>
<td>utf-8</td>
<td>19 errors</td>
</tr>
<tr>
<td>Foursquare</td>
<td>XHTML 1.0 Strict</td>
<td>utf-8</td>
<td>40 errors</td>
</tr>
<tr>
<td>Groupon</td>
<td>XHTML 1.0 Transitional</td>
<td>utf-8</td>
<td>6 errors</td>
</tr>
<tr>
<td>Zynga</td>
<td>XHTML 1.0 Transitional</td>
<td>utf-8</td>
<td>4 errors, 6 warnings</td>
</tr>
</tbody>
</table>
<p>I captured the data for companies other than those in the top 25 and a few interesting trends seem to pop up. The first thing that came as a surprise is that there seems to be that a lower number of sites have made the transition to HTML 5, with only 5 sites out of 11 (or 45 percent) having completed the transition. There seems to still be a strong preference for XHTML as the way to encode pages.</p>
<p>Also of note is that all sides have plans for globalization, encoding their page in the UT-8 format that can support both western and non-western alphabets.</p>
<p>However, none of the sites successfully validate in any of their preferred standard. It looks like there is still much room for improvement in the world of HTML validation.</p>
<p><p><i><a href="http://tnl.net/who" rel="author" title="Who is Tristan Louis?">Tristan Louis</a> is the founder and CEO of <a href="http://www.keepskor.com" title="Keepskor">Keepskor</a> and  writes the influential <a href="http://www.tnl.net/" title="tnl.net">tnl.net</a> weblog, where this was initially posted under the title <a href="http://www.tnl.net/blog/2011/08/21/the-state-of-html-validation/">The state of HTML validation</a>. You can follow him on twitter <a href="https://twitter.com/TNLNYC">here</a> or receive his weekly newsletter by subscribing <a href="http://eepurl.com/gb6zD">here</a>.</i></p>
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnl.net/blog/2011/08/21/the-state-of-html-validation/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Much Ado About XHTML 2</title>
		<link>http://www.tnl.net/blog/2003/04/15/much-ado-about-xhtml-2/</link>
		<comments>http://www.tnl.net/blog/2003/04/15/much-ado-about-xhtml-2/#comments</comments>
		<pubDate>Tue, 15 Apr 2003 19:09:33 +0000</pubDate>
		<dc:creator>Tristan Louis</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Browser]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Implementation]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Standard]]></category>
		<category><![CDATA[XHTML]]></category>

		<guid isPermaLink="false">http://tnl.net/blog/2003/04/15/much-ado-about-xhtml-2/</guid>
		<description><![CDATA[There has recently been much grumbling about XHTML 2 in general and its deprecation of the IMG tag in favor of the OBJECT one. While XHTML 2 is indeed a departure from the existing standards instead of being an evolution, it is important to realize that some of the things the workgroup is trying to [...]<p><p><i><a href="http://tnl.net/who" rel="author" title="Who is Tristan Louis?">Tristan Louis</a> is the founder and CEO of <a href="http://www.keepskor.com" title="Keepskor">Keepskor</a> and  writes the influential <a href="http://www.tnl.net/" title="tnl.net">tnl.net</a> weblog, where this was initially posted under the title <a href="http://www.tnl.net/blog/2003/04/15/much-ado-about-xhtml-2/">Much Ado About XHTML 2</a>. You can follow him on twitter <a href="https://twitter.com/TNLNYC">here</a> or receive his weekly newsletter by subscribing <a href="http://eepurl.com/gb6zD">here</a>.</i></p>
</p>
]]></description>
			<content:encoded><![CDATA[<p>There has recently been <a title="OBJECT of desire" href="http://www.zeldman.com/daily/0303a.shtml#ap1503">much grumbling</a> about <acronym title="eXtensible HyperText Markup Language">XHTML</acronym> 2 in general and its deprecation of the <code>IMG</code> tag in favor of the <code>OBJECT</code> one.</p>
<p>While XHTML 2 is indeed a departure from the existing standards instead of being an evolution, it is important to realize that some of the things the workgroup is trying to do is fix old issues and help improve the overall development of the web. While I agree with Zeldman’s assertion that <code>IMG</code> should be deprecated in this version instead of being completely tossed out, I believe that the tag should never have been in <acronym title="HyperText Markup Language">HTML</acronym> in the first place. The argument for an <code>OBJECT</code> tag date back to the early days of the web (circa 1993) when things broke down into two camps: one that wanted a quick and dirty way to show images on the web (the <code>IMG</code> crowd) and the other that looked forward and wanted any type of media to be embedded in a page (the <code>OBJECT</code> crowd). We are now paying for the decisions that were made back then and, much like tables are still in use for layout on most sites instead of being replaced by <acronym title="Cascading Style Sheets">CSS</acronym>, we will continue to see <code>IMG</code> tags in code for a very long time.</p>
<p>The next assumption by the anti-XHTML 2 crowd is that XHTML 2 won’t be supported by browsers for a long time to come. However, because browsers have now evolved to the point where properly formatted text can be presented, most modern browsers can already display XHTML 2 without any problems (for an example, just check <a title="Sjoerd Visscher's weblog in XHTML 2" href="http://w3future.com/weblog/index.xml?notransform">Sjoerd Visscher’s weblog</a>), as long as a proper <acronym title="Document Type Definition">DTD</acronym> is pointed to. This means that once XHTML 2 makes it to recommendation level, then all modern browsers will be able to exploit it. However, I suspect there will be a slow uptake (as there has already been a slow one on the existing XHTML implementation) largely because a lot of developers do not want to have to deal with the rigorousness of XHTML (making sure all tags are closed, making sure not improper characters are inputted, etc…)</p>
<p>The first step in making sure that XHTML 2 will move forward is in ensuring that the browser vendors fix their implementations to conform to the standard. Microsoft’s implementation of the <code>OBJECT</code> is broken and needs to be fixed. It does not meet the standard so it is their responsibility to fix it. The same is true of other browsers that do not render it properly. In the long run, the success or failure of XHTML 2.0 will be based more on whether those things are fixed than on what people feel is right and, much like the fights over improper CSS nowadays, this kind of thing will only happen once the development community pressures browser vendors into fixing their code.</p>
<p><p><i><a href="http://tnl.net/who" rel="author" title="Who is Tristan Louis?">Tristan Louis</a> is the founder and CEO of <a href="http://www.keepskor.com" title="Keepskor">Keepskor</a> and  writes the influential <a href="http://www.tnl.net/" title="tnl.net">tnl.net</a> weblog, where this was initially posted under the title <a href="http://www.tnl.net/blog/2003/04/15/much-ado-about-xhtml-2/">Much Ado About XHTML 2</a>. You can follow him on twitter <a href="https://twitter.com/TNLNYC">here</a> or receive his weekly newsletter by subscribing <a href="http://eepurl.com/gb6zD">here</a>.</i></p>
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tnl.net/blog/2003/04/15/much-ado-about-xhtml-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced
Database Caching 2/23 queries in 0.074 seconds using disk: basic

Served from: www.tnl.net @ 2012-02-09 23:22:10 -->
