HTML 5, the Dialog Tag, and Microformats

A followup post on my HTML 5 thoughts: specifically how some of the new concepts can be implemented today using microformats.

A couple of days back I blogged about some first impressions while skimming through the HTML 5 spec, and I made a note about its length, which partly stems for a whole slew of new tags such as <dialog>, <figure> etc.

I can certainly see them being useful in creating a somewhat featured vocabulary for creating semantic markup. Here is the example from the draft spec:

<dialog>
  <dt>Costello
  <dd>Look, you gotta first baseman?
  <dt>Abbott
  <dd>Certainly.
  ...
</dialog>

Ignoring the problem about this being invalid xml (Strange! Why doesn't the HTML 5 require valid XML?), I think there are a couple of issues I see with this approach overall.

First this introduces a new tag, which isn't going to be available in all browsers any time soon. So I wonder whether if developers are likely to use the new tag, or will they just continue to use the <dl> tag or other appropriate existing tags.

Second, as it stands, it doesn't really address a meaningful conversation scenario where it might be useful to identify the participants, time stamps etc. There are numerous examples of conversations in a general sense across the web - every blog post's comment stream is a conversation of sorts. Facebook's Wall-To-Wall feature is a conversation. A saved chat transcript is a conversation. The <dialog> tag doesn't really capture the common pieces of information that are typically represented in all these conversations.

Finally, the number of interesting semantics is huge; the number of possible tags is small... so you fundamentally need a way to define semantics and layer them on basic tags, at some point or the other.

During the holidays, I was reading about microformats to understand a bit more about the concepts and patterns at play. So immediately when I saw the <dialog> tag, I began wondering, isn't any value in describing the semantics of a conversation, already achievable today using some sort of microformats-based approach? My own definition of microformats is basically a set of patterns using existing HTML tags to convey semantic data and metadata embedded in your page content. The microformats.org site describes them as a set of simple, open data formats built upon existing and widely adopted standards. See that page for more about what they are and aren't, as well as other related content.

There isn't an hConversation or hDialog type of microformat agreed upon right now. But here is a rough sketch of how you might use microformats approach to get not only a working visualization supported by all browsers today, but at the same time, embed additional semantics. To clarify, this isn't some approved conversation microformat, but me being creative and mocking one up while writing this post. Who knows... something might even come out of this. :-)

<ol class="hConversation">
  <li class="message">
    <span class="participant">Nikhil</span>
    <span class="messagecontent">...</span>
  </li>
  ...
</ol>

You can now leverage common microformats patterns. For example, you can include time-stamps using the datetime pattern... so you get:

<ol class="hConversation">
  <li class="message">
    <abbr class="dt" title="2008-01-25T10:00:00">25th January, 2008</abbr>
    <span class="participant">Nikhil</span>
    <span class="messagecontent">...</span>
  </li>
  ...
</ol>

Furthermore you could compose microformats as if they were building blocks. For example, you might want to identify the participants, and use the existing hCard microformat.

<ol>
  <li id="participant1">
    <span class="vcard">
      <a class="fn name url" href="http://www.nikhilk.net">Nikhil</a>
    </span>
  </li>
  ...
</ol>
<ol class="hConversation">
  <li class="message">
    <abbr class="dt" title="2008-01-25T10:00:00">25th January, 2008</abbr>
    <span class="participant vcard"><a class="include" href="#participant1">Nikhil</a></span>
    <span class="messagecontent">...</span>
  </li>
  ...
</ol>

The example above also shows linking to a hCard, rather than embedding it inline in the conversation, using the include pattern. The hCard microformat itself allows for various bits of information to be specified about the identity it corresponds to.

Over the years, I've increasingly become a fan of semantic markup as the basis for attaching presentation and behavior, achieving SEO etc. especially since starting work on ASP.NET Ajax. It is certainly interesting to think about how to convey the semantics of the data contained within a page rendering. Ultimately the success of these microformats is gated to whether they are broadly applicable, are standardized, as well as whether things like search engines or browser addins allow you to extract the semantics and do something interesting with them, to make it worth your while to put them in there. Sort of a chicken and egg problem here.

But to come full circle to HTML 5, personally I think it would be great to see new tags in HTML when they equate to new capabilities, features and user experiences in the browser (eg. the <video> tag or <input type="date" />), rather than simply higher level tags designed to convey semantics. In other words, I would argue: address the biggest limitations of HTML first with an HTML 5 Core spec.


[ Tags: | | ]
Posted on Friday, 1/25/2008 @ 5:37 PM


Comments

8 comments have been posted.

Damir

Posted on 1/26/2008 @ 4:16 AM
I am not quite sure what did you mean by the question "why doesn't HTML 5 require valid XML syntax?", so sorry if I'm answering to the wrong question here. The main difference between HTML 5 and 4 is that the former is not defined in terms of language syntax, but rather the document model. HTML and XHTML 5 are simply two different serializations of the HTML 5 standard and they both will be available. I am myself pretty unsure about the status of XHTML 2.0 whose working draft was published last time in July 2006. There are some nice features there...

If your question was "why do they even bother supporting HTML", I believe the answer would be because they want HTML 5 to remain backwards compatible (they even didn't completely remove the font tag!) and to make life easer to those who hate XHTML way of assigning attribute values like checked.

Anup Shah

Posted on 1/26/2008 @ 4:38 AM
I agree that dialog is limited. The definition list element has always looked broken to me, because definition term and description elements look like they should be grouped further.

Couple of additional notes:
1) Perhaps even more semantic markup for dialogs (in HTML 4) might be to use ordered lists with cite for the person and a block quote to represent what they say. The cite and block quote would appear per list item.

2) Your note on semantic markup being useful for search engine optimization: there are important differences between ranking high and indexing. Semantic markup helps a little bit (but not much) with indexing. Using title, keyword and descriptions are important, but ranking typically boils down to good content that people are compelled to link to. I try to explain this further here:

http://www.onenaught.com/posts/30/explaining-natural-seo-search-engine-ranking-vs-indexing
Hope that helps.

(Wow, it has taken a while to write this reply, because each time I submit the comment, I just get back an icon saying some words are not allowed without any clue as to which ones!!)

Michael Herndon

Posted on 1/26/2008 @ 8:49 AM
actually damir is partially right.

xhtml 1.0 = xml + html 4
xhtml 2.0 = xml + html 5

but i agree html 5 should be nixed and the next stage should only be xhtml 2.0

Nikhil Kothari

Posted on 1/26/2008 @ 9:24 AM
About the XML, XHTML discussion - I wasn't trying to say HTML 5 should be XHTML, though why there are separate specs is a whole separate discussion. I was merely saying the dialog sample in the spec isn't valid XML in terms of syntax - it doesn't have close tags. As simple and mundane as that.

SEO - the tie with SEO is indirect. First trying to keep pages true to just semantic markup keeps a bunch of other stuff out, which lets the search engine do a better job. However, I think microformats would allow some new semantic searches. For example, if I had a question - find all published conversations that I am part of - could the search engine do that?
I wasn't trying to somehow point to getting your page ranked better because of semantic markup, where other things like title, url, in-bound links etc. all matter much more.

zcorpan

Posted on 1/26/2008 @ 10:20 AM
Michael Herndon: You're right that XHTML 1.0 is "XML + HTML4", however XHTML 2.0 is not "XML + HTML5". "XML + HTML5" is called XHTML5, and XHTML 2.0 is something different altogether.

Nikhil Kothari: The end tags for DT and DD are optional in HTML, and have always been. Why would you apply XML rules to HTML?

David Yancey

Posted on 1/28/2008 @ 7:21 AM
zcorpan: The reason to apply XML rules to HTML is for consistency. While DT, DD, P, IMG and many other closing tags are optional in HTML. For readability and consistancey it is better to apply the rules of XML to that of HTML.

If you see where a closing DT, DD tag are then the next developer to work on that project knows that that DT/DD etc has ended.

Nikhil Kothari

Posted on 1/28/2008 @ 8:10 AM
Not only do closing tags improve readability, and easier authoring, but it also helps if you needed to parse HTML - you can simply use any XML parser, as opposed to hacking one up with all sorts of special rules.

It really shouldn't be a debate anymore on whether end tags should be optional.

Michael Herndon

Posted on 1/28/2008 @ 8:57 AM
you're right, it looks like they made xhtml 2.0 a separate spec and that html 5 & xhtml 2.0 reside in totally separate namespaces. I guess thats due to xhtml 2.0 being in the works some time now.

but its regressive. html should just be skipped and only xhtml 5.0 released so that browsers have a good reason to implement true xml parsing and then people can extend or even create their own doctypes and schema.

Also you could simply parse a page and get the things that you need from the page without having to worry about broken markup. the w3c is really starting outlive it's usefulness.

In the xhtml 5 spec they are still supporting the font, numbered headings, i, and b tags and the are pushing predefined classes like copyright & error.... very few people know how to use the number h tag properly.
The discussion on this post has been closed. Please use my contact form to provide comments.