TNL.net is designed for modern browsers but the content is still readable in older ones. If you want to ensure the best experience, please install a browser that was developed after 2009.

tnl.net

Module Madness and Semantic Stupidity

Warn­ing: This is a very geeky article.

As read­ers of the weblog may have noticed, I’ve been get­ting into an increas­ingly obscure area of the Inter­net by try­ing to meld two dif­fer­ent web for­mats (RSS and XHTML) and come up with doc­u­ments that could be under­stood by mul­ti­ple devices (web browsers, RSS readers).

The exer­cise was largely an aca­d­e­mic one to study the valid­ity of state­ments made about the seman­tic web. Much has been made about the sub­ject and I fig­ured that I would test the valid­ity of the state­ments being made. The idea behind the seman­tic web is that the web could become embed­ded with some basic intel­li­gence, allow­ing com­put­ers to under­stand extra tag­ging in a doc­u­ment and allow­ing for those tags to pro­vide more information.

In order to tie every­thing together, the W3C came up with RDF. The frame­work uses a con­cept called name­spaces. The idea of a name space is that you can cre­ate short­cut in a doc­u­ment and refer to mul­ti­ple types of XML and “embed” them all together. This is very good in the­ory because it frees the frame­work from actu­ally being smart, leav­ing those kinds of details to the peo­ple that write XML stan­dard. Based on this, you have a set of mod­ules (think of each XML stan­dard as a Lego block) that you can tie together using RDF. Or so the the­ory goes.…

RSS as an RDF module

While there has not been many exam­ple of the seman­tic web, one area where there has been some devel­op­ment is the syn­di­ca­tion space. A few years ago, a new for­mat called RSS (for Really Sim­ple Syn­di­ca­tion) was cre­ated to syn­di­cate sto­ries on the web. The basic struc­ture was sim­ple: every file was a chan­nel, and every chan­nel had item. An item was a link, a title for that link, and a short descrip­tion of what it was about. It was nice, it was sim­ple, it was the per­fect thing to put together a proof of con­cept about the seman­tic web. After many fights within the RSS com­mu­nity, a new RDF-specific ver­sion of RSS came up. Now, remem­ber that RDF is sup­posed to tie all that stuff together so tech­ni­cally an RDF-based RSS feed should be mod­u­lar. RSS 1.0 (as this new for­mu­la­tion of RSS came to be known) has its own def­i­n­i­tion that can be ref­er­enced in a name­space.

XHTML mod­u­lar­iza­tion

In an effort to allow to exist­ing HTML doc­u­ment to bravely move into that new world, the world wide web con­sor­tium came up with XHTML, a refor­mu­la­tion of HTML that fol­lows an XML struc­ture that is mod­u­lar and can be used in the seman­tic web. XHTML also has its own name­space reference.

Assump­tions

One would then think that with two dif­fer­ent doc­u­ment types, with two dif­fer­ent name­spaces could be put together using RDF and work prop­erly. Let’s just review our assump­tions so far and look at their log­i­cal extension:

IF
— the seman­tic web is a rep­re­sen­ta­tion of data on the world wide web,
— and if RSS is data
— and if the world wide web is com­posed of doc­u­ments that are XHTML
— and if XHTML is a rep­re­sen­ta­tion of HTML as XML
— and if XML is mod­u­lar
— and if the mod­u­lar­ity is han­dled through namespaces

THEN
— purely the­o­ret­i­cally, it should be pos­si­ble to have a doc­u­ment that is com­posed of 2 mod­ules
— then those two mod­ules would be ref­er­enced through name­spaces
— then a tool that reads XHTML would use the XHTML tags.
— then a tool that reads RSS would read the RSS tags.
— then a tool that can read both would look at the struc­ture of the doc­u­ment and, based on that, rep­re­sent the data appropriately.

Cog­ni­tive Dissonance

Of course, the the­ory looks cor­rect but it is when try­ing to imple­ment this that I started to run into prob­lems. For starters, there is no way to used XHTML as your base doc­u­ment. The W3C in all its wis­dom essen­tially said that you can’t do this. Not ideal but I fig­ured that we could go the other way, embed­ding an XHTML doc­u­ment within an RDF mod­ule. But there comes the next prob­lem. In order for a doc­u­ment to con­form to the XHTML stan­dard,

the root ele­ment of the doc­u­ment must be

and

there must be a DOCTYPE dec­la­ra­tion in the doc­u­ment prior to the root element.

So basi­cally, there is no way to embed it in another document.

Con­clu­sion

Now this sounds like I have reached an impasse. An XHTML doc­u­ment can­not be embed­ded into an RDF one and an RDF doc­u­ment can­not be embed­ded within an XHTML one. This means that XHTML can­not be treated as a mod­ule (since the root ele­ment must always be . If we are to embed doc­u­ments with any kind of seman­tic markup, this does not seem to make much sense to me.

Originally published on April 25, 2003 in Technology . You may find related thoughts pieces under the following terms: ,