Warning: This is a very geeky article.
As readers of the weblog may have noticed, I’ve been getting into an increasingly obscure area of the Internet by trying to meld two different web formats (RSS and XHTML) and come up with documents that could be understood by multiple devices (web browsers, RSS readers).
The exercise was largely an academic one to study the validity of statements made about the semantic web. Much has been made about the subject and I figured that I would test the validity of the statements being made. The idea behind the semantic web is that the web could become embedded with some basic intelligence, allowing computers to understand extra tagging in a document and allowing for those tags to provide more information.
In order to tie everything together, the W3C came up with RDF. The framework uses a concept called namespaces. The idea of a name space is that you can create shortcut in a document and refer to multiple types of XML and “embed” them all together. This is very good in theory because it frees the framework from actually being smart, leaving those kinds of details to the people that write XML standard. Based on this, you have a set of modules (think of each XML standard as a Lego block) that you can tie together using RDF. Or so the theory goes….
RSS as an RDF module
While there has not been many example of the semantic web, one area where there has been some development is the syndication space. A few years ago, a new format called RSS (for Really Simple Syndication) was created to syndicate stories on the web. The basic structure was simple: every file was a channel, and every channel had item. An item was a link, a title for that link, and a short description of what it was about. It was nice, it was simple, it was the perfect thing to put together a proof of concept about the semantic web. After many fights within the RSS community, a new RDF-specific version of RSS came up. Now, remember that RDF is supposed to tie all that stuff together so technically an RDF-based RSS feed should be modular. RSS 1.0 (as this new formulation of RSS came to be known) has its own definition that can be referenced in a namespace.
In an effort to allow to existing HTML document to bravely move into that new world, the world wide web consortium came up with XHTML, a reformulation of HTML that follows an XML structure that is modular and can be used in the semantic web. XHTML also has its own namespace reference.
One would then think that with two different document types, with two different namespaces could be put together using RDF and work properly. Let’s just review our assumptions so far and look at their logical extension:
– the semantic web is a representation of data on the world wide web,
– and if RSS is data
– and if the world wide web is composed of documents that are XHTML
– and if XHTML is a representation of HTML as XML
– and if XML is modular
– and if the modularity is handled through namespaces
– purely theoretically, it should be possible to have a document that is composed of 2 modules
– then those two modules would be referenced through namespaces
– then a tool that reads XHTML would use the XHTML tags.
– then a tool that reads RSS would read the RSS tags.
– then a tool that can read both would look at the structure of the document and, based on that, represent the data appropriately.
Of course, the theory looks correct but it is when trying to implement this that I started to run into problems. For starters, there is no way to used XHTML as your base document. The W3C in all its wisdom essentially said that you can’t do this. Not ideal but I figured that we could go the other way, embedding an XHTML document within an RDF module. But there comes the next problem. In order for a document to conform to the XHTML standard,
the root element of the document must be
there must be a
DOCTYPEdeclaration in the document prior to the root element.
So basically, there is no way to embed it in another document.
Now this sounds like I have reached an impasse. An XHTML document cannot be embedded into an RDF one and an RDF document cannot be embedded within an XHTML one. This means that XHTML cannot be treated as a module (since the root element must always be . If we are to embed documents with any kind of semantic markup, this does not seem to make much sense to me.