Digital Web Magazine

The web professional's online magazine of choice.

Introduction to XML

Got something to say?

Share your comments on this topic with other web professionals

In: Articles

By J. David Eisenberg

Published on September 17, 2003

What is XML?

If you've been producing sites for any length of time you probably hear about XML, and wonder exactly why so many people are excited about it. XML stands for Extensible Markup Language. Let's take a closer look at the parts of the acronym, and then we'll show you how it all fits together.

Markup

Markup comes from the bad old days before word processors. If you needed a brochure, you'd type it on a typewriter, and then literally mark it up with a red pen to tell the typesetter what you wanted it to look like. The typesetter would follow your instructions and return a finished document to you:

Typewritten data with red-pen markup

In this instance, we're using markup not only to show how text should be presented (italic rather than normal text), but also to tell how the document is structured: some of the words form a heading, the other words are just ordinary text.

How to Buy a Wrench

There are two kinds of wrenches: wrenches with fixed size, and adjustable wrenches.

Language1

The idea of using markup to impose structure on otherwise anonymous data is such a good one that people came up with a standardized way to create markups for general use. This method was called the Standard Generalized Markup Language, or SGML. SGML really isn't a language in and of itself, but instead is more of a rulebook that tells you how to develop these markup languages. Any markup that follows the SGML rulebook is called an application of SGML.

The most widely known application of SGML is a language used to mark up text for delivery and presentation on the World Wide Web. That language is HTML, the HyperText Markup Language. In HTML, we can mark up the example above to send to a web browser instead of a typesetter:

<h3>How to Buy a Wrench</h3>
<p>There are two kinds of wrenches: wrenches with fixed size,
and <i>adjustable</i> wrenches.</p>

There are many other applications of SGML, but they're mostly found in large corporations and government agencies. That's because the SGML rulebook is very complex, which makes it hard to learn. For example, SGML allows optional opening and closing tags. Quick: is </li> required or not? How about <body>? Additionally, it's difficult (and expensive!) to develop tools that can manage data marked up according to those rules.

HTML Doesn't Do It All

While HTML is a good thing, it doesn't solve all our problems. Consider the following two tables. While the data is structured into rows and cells, there's nothing to tell you (other than your intuition) that the first table gives maximum and minimum temperatures, while the second table gives current and maximum capacities for water reservoirs.

<table border="1">
<tr>
  <td>Chicago</td><td>13</td><td>6</td>
</tr>
<tr>
  <td>Dallas</td><td>60</td><td>20</td>
</tr>
</table>
<table border="1">
<tr>
  <td>Calero</td><td>5538</td>
  <td>10050</td>
</tr>
<tr>
  <td>Uvas</td><td>6095</td>
  <td>9935</td>
</tr>
</table>

XML Solves the Problems

To solve the complexity issue, XML was designed as a subset of SGML. It eliminates the features that make SGML difficult to learn and parse while retaining most of the power of SGML. Tools that analyze and display XML are easier to write, and are widespread and inexpensive. Since XML is a subset of SGML, it lets you devise any set of tags you wish, thus solving the problem of differentiating what would be otherwise be anonymous numbers:

<temperatures>
<city name="Chicago">
	<max>13</max><min>6</min>
</city>
<city name="Dallas">
	<max>60</max><min>20</min>
</city>
</temperatures>
<water-banks>
<reservoir name="Calero">
   <current>5538</current>
   <capacity>10050</capacity>
</reservoir>
<reservoir name="Uvas">
   <current>6095</current>
   <capacity>9935</capacity>
</reservoir>
</water-banks>

Extensibility2

With XML, you can devise tags for marking up all the data that appears on the weather page of the newspaper. With this custom markup, the purpose of each number in the table is unambiguous.

People have developed custom XML markup for such diverse content areas as chemical formulas, descriptions of real estate, news stories, and even cooking recipes. This shows the extensible part of XML, making it a very flexible, customizable markup language.

XML and the Web

These custom tags are all well and good, but your browser, which is designed to interpret HTML3 tags, doesn't understand <reservoir> or <city>.

If you're using the very latest browsers, you can use Cascading Style Sheets to tell a browser how to display your tags. For example, you could present <min> temperature tags in blue and <max> tags in red. That's a client-side solution.

Multi-Purpose Documents

For example, an HTML-formatted weather report is good for one purpose: web display. As we saw earlier, it's difficult to figure out what the numbers mean in a mass of HTML. However, if we have the XML-formatted weather report, we may then use freely available XML tools to convert that one document to:

This, then, is why everyone is excited about XML. By carefully constructing a markup that shows your data's structure, you create your content once and use XML tools to pour that content into a variety of other molds.

Where Do I Go From Here?

If you're centered in the web design area, and don't have to work with custom tags, you may want to start by producing your new web pages in XHTML (HTML written according to the XML rulebook). This won't give you the abstraction of custom tags, but it will make your documents available to be manipulated by XML tools.

If you're interested in a specific W3C markup specification such as Math Markup Language, VML, or SVG, go to the World Wide Web Consortium site and look for resources for that particular language.

If you need to produce your own markup, you'd be well advised to read XML for the World Wide Web by Elizabeth Castro (Peachpit), Learning XML by Erik T. Ray (O'Reilly) or XML, HTML, XHTML Magic by Molly E. Holzschlag (New Riders).


Footnotes

1 Technically, SGML and XML are meta-languages: languages used to describe other languages, However, it's easier to think of SGML and XML as "rulebooks."
Back to content

2 Technically, this isn't what the X in XML stands for. The XML specification itself doesn't say what extensible means, and we've yet to find it in any of the books on the subject.
Back to content

3 You may be wondering where XHTML fits into all of this. XHTML is simply the HTML that you know and love, written according to the XML rulebook. Properly-written XHTML will render properly in any browser.
Back to content

Got something to say?

Share your comments  with other professionals (1 comment)

Related Topics: XML, Content Management Systems (CMS)

 

J. David Eisenberg is a programmer and instructor in San Jose, California, where he lives with his kittens, Marco and Zoë.

Media Temple

via Ad Packs