InformationWeek: The Business Value of Technology

InformationWeek: The Business Value of Technology
Valley View - Register Now

May 8, 2000

Printer ready
Printer ready

XHTML: A Bridge To The Future

The W3C's recommendation blends XML and HTML to produce extensible Web-page formatting

By Don Kiely

Related links:

  • sidebar: XML Document Type Definitions

  • sidebar: Tags And Attributes In XHTML
  • And from our sister publications:

  • InternetWeek Web Development (4/17/00)

  • InternetWeek Sgml's cousin makes the big time (4/3/00)
  • TechEncyclopedia
    Need a definition of a technology term? Look it up here:

    Send Us Your Feedback
    H ypertext Markup Language, an aging, inflexible formatting standard, has fueled the phenomenal growth of the Web. Now a new technology, a flexible data-markup standard called Extensible Markup Language, promises nearly complete flexibility. In a flash of brilliance, the World Wide Web Consortium (W3C) has combined HTML and XML into the new XHTML recommended standard, which reformulates HTML 4.02--the latest version--with XML document type definitions (DTD).

    HTML is the language behind one of the fastest, most widespread technology adoptions ever. Derived from Standard General Markup Language (SGML), HTML is simple to learn and reasonably flexible for formatting text and graphics, but it doesn't have the extensibility to adapt to dynamic Web applications. Most every site with valuable content is more of a Web application than a Web site, requiring code components, multimedia effects, and other features that strain the limits of HTML.

    HTML is usually extended by innovations in a single browser, usually Microsoft's Internet Explorer or America Online's Netscape Communicator, and these changes gradually make their way into other browsers. Inevitably, the implementations are different enough that Web authors have a tough time making their sites viewable from different browsers, much less older versions of those browsers. The more popular extensions eventually make their way into the group's HTML standard--frames and scripting languages, for example.

    In the last couple of years, XML has been taking the Web by storm. Whereas HTML formats and presents information, XML marks up data so that the individual pieces of information on a Web page are identified as being of a particular type. In a bank's data, for example, $4,562.03 is marked as the outstanding balance of a customer's loan, and $123.90 as the monthly payment, identifying them as particular kinds of data points. Without XML, these would be just two character strings in a sea of text on a Web page. XML provides metadata--data about data.

    The most important feature of XML is the "X." HTML has a fixed set of tags, but with XML you can create multiple namespaces that define custom tags. Industries can band together and create namespaces that facilitate the exchange of information.

    Continuing the bank example, <Balance> and <Payment> can identify the two character strings as being specific types of information. This facilitates exchanging data between applications and computer systems, limiting the need for expensive, complex data-conversion programs.

    The XHTML recommendation was published by the consortium on Jan. 26, and refers to XHTML as "a bridge to the future." This new standard promises to make the Web more adaptable to existing and future uses, without forcing a wholesale redesign of millions of Web sites. Backward compatibility was a major concern of the W3C committee, and by staying faithful to the HTML 4.0 standard but extending it with XML, the consortium has created a very flexible technology for building Web sites.

    According to the W3C, XHTML offers three major advantages to Web-site developers: extensibility, portability, and modularity. XHTML is extensible, as it adds new elements to HTML without altering the entire DTD that the document is based on.

    The second major advantage is portability, sometimes referred to as interoperability. Most Internet access is through browsers on desktop computers, though more and different types of devices are being introduced. Some of these devices, such as cell phones, won't have the processing power of a desktop PC, and browsers on them will be less tolerant of any malformed markup used to render a document. XHTML is designed to make Web documents accessible and interoperable across platforms, in part by enforcing a rigorous coding standard.

    Modularity made it into the specification late in the process, and acknowledges the growing role that the Web is playing in handheld devices. Browsers on these devices won't need all XHTML elements, so XHTML allows subsets of elements. The primary focus for the next version of XHTML, already called XHTML 1.1, will be modularization.

    The semantics of XHTML elements and their attributes are defined by the current HTML 4.02 specification. XHTML 1.0 specifies three XML document types that correspond to the three HTML 4.02 DTDs: Strict, Transitional, and Frameset. These XHTML DTDs are more restrictive than HTML because XML is more restrictive in its syntax. The table below lists the three DTDs and the DOCTYPE tag for each. The recommendations describe a number of ways in which XHTML and HTML differ--some due to the relative sloppiness allowed by most browsers when rendering HTML, and others because of the way XML does things. Most of the issues involved with displaying HTML pages as XHTML are a result of XML's validation and conformance requirements. An XHTML document must be structured properly, and elements that HTML doesn't require must be included in an XHTML document.

    As with any new technology, a Web author has to decide whether to migrate pages from HTML or start from scratch and take full advantage of XHTML. There are a number of benefits to upgrading, as well as some major pitfalls.

    Because HTML is a pervasive standard and XML is becoming one, users can view carefully crafted XHTML documents in current versions of many browsers. XHTML supports three main media types supported by most browsers: text/ html, text/xml, and application/xml. Any scripting code that uses the HTML or XML document object models will work well in the new format.

    As this new standard is more widely adopted, Web editors are supporting XHTML; some will automatically convert existing pages. Code translators have long been the Holy Grail of computer science, but there is a reasonable chance that HTML to XHTML tools will actually work reliably. Sloppy HTML code, acceptable to many browsers, will translate poorly in some cases. The new code must then be validated against the HTML 4.02 compatibility guidelines. Following these guidelines means that the same code base can be used with XHTML-compliant browsers as well as those supporting straight HTML, as long as you avoid using new tag definitions. There are various tools listed on the W3C's XHTML Web site, such as HTML Tidy (, HTML Kit Web editor, with support for HTML Tidy (, and WDG HTML Validator ( tools/validator/batch.html).

    Despite the best efforts of the consortium, HTML has evolved in a less-than-orderly fashion. Because HTML is itself not extensible, browser vendors have added tags rather haphazardly. HTML has evolved at a faster pace than any standards body could possibly keep up with, so the HTML standard is mostly a codification of existing practice, rather than a source of innovation. As a result, any given HTML authoring tool supports--at best--a snapshot of HTML tags in use at a given time, no matter how fast the tool's author tries to keep up.

    Unfortunately, this means that today's favorite HTML editing tool may not be the tool of choice tomorrow, when XHTML becomes the norm--unless, of course, you write HTML in a plain text editor; then you're in fine shape to write new code. XHTML is too new for any of the major players to have made a commitment to support it. But with the rapid spread of support for XML, it would be surprising if all the major editors didn't rush to implement support for XHTML.

    During the transition to XHTML, validating code will be one of the biggest challenges. Validation is a process that verifies documents against the associated DTD, checking to make sure that the structure, elements, and attributes are consistent with the definitions in the DTD. Validating an XHTML 1.0 document involves verifying its markup against one of the three XHTML DTDs.

    The W3C has an HTML Validation Service that's based on an SGML parser, with options such as including Weblint debugging results and displaying the parse tree. The good news is that when HTML compatibility guidelines are followed, XHTML 1.0 documents can be rendered on HTML 4.0-compliant browsers. One way to use the W3C validator is to place a link to on your Web page. Clicking the link with your page loaded validates your page.

    XHTML 1.1 is under development, and should make this next stage of Web technologies even more flexible. XML and HTML have a lot to offer each other. XML isn't the "HTML-killer" it was once touted to be, but when teamed with its alleged victim, it promises to take over the Web.

    Back to This Week's Issue
    Send Us Your Feedback
    Top of the Page

    Get InformationWeek Daily

    Don't miss each day's hottest technology news, sent directly to your inbox, including occasional breaking news alerts.

    Sign up for the InformationWeek Daily email newsletter

    *Required field

    Privacy Statement

    This Week's Issue

    Technology Whitepapers

    Featured Reports

    Featured Webcasts