Basic XML Parsing in Delphi

Copyright (c) 2000 by Charlie Calvert

Get Old Open XML Example Source (nodes.pas)

Get Updated Open XML Example Source (xdom.pas)

Get MS XML Example Source

(The Open XML Components recently changed. I have updated this paper to reflect those changes. April 05, 2000.)

XML allows you to store data in a standardized format that can be deciphered by a large number of tools. This paper will explain how to get started parsing simple XML files in Delphi using a DOM parser. This paper is Part I in a two part series. Part II is available now on the Borland Community site!

DOM is the Document Object Model. You can use the DOM to find the elements in an HTML or XML document. If you need a more complete explanation of DOM or XML, you should turn to other sources, as this paper assumes you know the basics about this subject. I discuss DOM at some length in my book Delphi 4 Unleashed.

This paper will explore two different technologies for parsing XML. The first is a set of components that are part of an open source Delphi project called Open XML. You can read about Open XML and download the Open XML native Delphi components at http://www.philo.de/xml/. The second technology I explore belongs to Microsoft. It is an XML parser that ships with the Internet Explorer, and is therefore built into most versions of Windows.

Both the Open XML and Microsoft technologies do more or less the same thing, using objects that have more or less the same names, and which are arranged in almost identically structured hierarchies. In short, both technologies implement the same industry wide standards. I would normally just use the Microsoft technology, since it is fairly well documented. But with Kylix looming on the horizon, I have decided to show two techniques, one that works on Windows, and one that I hope will work in both Windows and in Kylix. (I am, of course, also interested in Open XML because it is part of an Open Source project, and is thus available to all Delphi programmers at no charge.)

Open XML

Open XML is written entirely in Delphi. It comes in the form of a single Pascal file called XDOM.pas which is a somewhat daunting 8803 lines in length. However, you don't need to know all that code. In fact, to get started, you need only be familiar with a few simple objects and methods. Those methods and objects are the subject of this short paper.

Open XML implements DOM Core Level 2 and Level 2 Document Traversal. That's a fancy way of saying that it is a very complete and very up to date tool for working with XML. This component has the same classes and most of the same methods that you find in the MS XML parser, and indeed in most of the complete and well formed XML parsers that you find. For instance, it includes Delphi implementations of the DocumentElement and ChildNodes classes that are such a common feature in standard implementations of this technology.

Installing the Component

It is a simple matter to install the Open XML components. If you are used to installing Delphi components, you can probably safely skip this section. The first two paragraphs are meant to guide newcomers to Delphi component installation, while the third paragraph gives a few advanced tips.

To install this component, first download it from the Internet. It comes in the form a zip file. In the zip file you will find XDOM.pas and XDOM.dcr. Copy these files to a directory where you normally create packages. I like to put it in a place on my D drive called d:\srcpas\packages, but you can place it where you want. Though I personally don't like to put code in directories that are deleted during product install, one common place to put it might be the Delphi Lib directory. Placing files in the Lib directory means that they will automatically be on the Delphi path, which can be convenient in some cases, particularly if you are using an older version of Delphi.

Now choose Component | Install Component from the Delphi menu. A dialog called Install Component is launched. By default, the Package File Name field should be set to dclusr50.dpk. If it is not, browse to the Delphi Lib directory and open this file. Now use the Browse button in the dialog to help you set the Unit File Name field to XDOM.pas. Press the Ok button and your package will be recompiled and two new components should be added to the component palette. To find them, scroll all the way down to the end of the component palette and look for a new page called XML. On that page you should find two components, one called TDomImpelmentation, and the other called TXmlToDomParser.

On my own machine, I actually prefer to follow a different (and marginally more complex) method than the one I just described. First I like to choose File | New from the Delphi menu, and elect to create a new package. I save my package in my d:\srcpas\packages directory under the name ThirdParty.dpk. I then close all other projects, and open ThirdParty.dpk in the Delphi IDE. I then use the tools in the package dialog to add XDOM.pas to the package, and hit the recompile button. The components are then installed automatically. (Needless to say, I think use ThirdParty.dpr to install most other packages that I get in this way. The big difference between this method and the one described above is that I keep ThirdParty.dpk in a directory that is safely sheltered from the Delphi install and uninstall process.

Figure 1: The Third Party package.

For more information on the act of installing components, look up the following topic in the Delphi online help: "packages, insalling component." (That's actually not a typo, but the way the item really appears in the online help for Delphi 5.)

Building an XML Parsing Program

Its now time to start building your first XML application. Choose File | Close All from the file menu to be sure that you have completely closed the package you just finished installing. Now create a new application and drop down a TButton, TMemo and TListBox on a Delphi form.

Drop down both a TXmlToDomParser object and a TDomImplementation object on your form. You can, needless to say, find both of these objects on the XML page that you just created. Now you need to link these two components together so they can work with one another. To do this use the mouse to set the DomImpl property of the TXmlToDomParser to the TDomImplementation object that resides on your form.

Create an OnClick handler for your button and enter the following lines of code in it:

procedure TForm1.Run1Click(Sender: TObject);
var
  Doc: TDOMDocument;
  List: TDomNodeList;
  Len, i: Integer;
  S: string;
begin
  Memo1.Clear;
  ListBox1.Items.Clear;
  
  Doc := XmlToDomParser1.FileToDom(GetStartDir + 'sam1.html');
  Len := DomImplementation1.documents.Length;
  for i := 0 to Len - 1 do begin
    S := DomImplementation1.Documents.Item(i).ChildNodes.Item(i).Code;
    Memo1.Lines.Add(S);
  end;

  List := Doc.DocumentElement.GetElementsByTagName('CD');
  for i := 0 To List.length -1 do
    ListBox1.Items.Add(List.item(i).Code);
end;

This code uses both of the components you just installed to iterate through an XML file and display its contents. The code actually parses the code in two different ways, both of which should be of some interest to readers of this document. The two examples I've chosen will hopefully act as starting points from which you can branch out into a more detailed exploration of this technology.

In the third line, you retrieve an instance of the TDomDocument class from TXmlToDomParser component. Needless to say, TDomDocument represents an entire XML document. I use the TXmlToDomParser component that you just installed to load an XML file. Here is a sample of a very simple XML file you can work with if you want:

<HTML>
  <HEAD>
    <TITLE>Sample XML File</TITLE>
  </HEAD>
  
  <BODY>
    <P>Right under this line I insert an XML data island.</P> 
    <XML ID="CDXML">
    <CDS>
      <CD>Two Against Nature</CD>
      <CD>Giant Steps</CD>
      <CD>Round About Midnight</CD>
      <CD>Imaginary Day</CD>
    </CDS>
    </XML>
  </BODY>
</HTML>  

Just save this file to disk as Sam1.xml or Sam1.html. Then you can load it into the document you just created by writing something like this:

Doc := XmlToDomParser1.FileToDom('c:\temp\sam1.xml');

Of course, on your system, there might be a different path to the file than the one I show here.

The next thing you want to find out is how many nodes there are at the top level of your document. The nodes of the document are arranged hierarchically, so that as you dig into the document you find nodes buried within nodes. For instance, the <CD> node is buried within the <CDS> node. But at first, all we want is the top level nodes. To get them, write the following line of code:

Len := DomImplementation1.documents.Length;

Now you want to iterate through these nodes and see what is in them. To get at the content of the nodes, write the following code:

var
  S: string;
begin
  ... // Code ommitted heren
  for i := 0 to Len - 1 do begin
    S := DomImplementation1.Documents.Item(i).ChildNodes.Item(i).Code;
    Memo1.Lines.Add(S);
  end;
  ... // Code ommitted here.
end;  

I've changed the code slightly here from the version I showed you earlier in this paper. The change was implemented to make the code easier for you to read.

The output from this part of the program should look something like this:

  <HEAD>
    <TITLE>Sample XML File</TITLE>
  </HEAD>
  
  <BODY>
    <P>Right under this line I insert an XML data island.</P> 
    <XML ID="CDXML">
    <CDS>
      <CD>Two Against Nature</CD>
      <CD>Giant Steps</CD>
      <CD>Round About Midnight</CD>
      <CD>Imaginary Day</CD>
    </CDS>
    </XML>
  </BODY>

The code discards the top level HTML tags and divides the content of the HTML file into two sections, or nodes. The first node contains the HEAD element, and the second node contains the BODY element. You can, if you want, dig deeper into the document by using dot notation to find the ChildNodes of the ChildNodes. In short, you can discover the child nodes that comprise the two large nodes we parsed here. But I don't show how to find those child nodes in this paper, in part because there is a simpler way to dig down into the document, as shown in the next paragraph.

What we really want is to focus in on particular nodes within the HTML file. For instance, we might want to find the nodes that list the CDs shown here by Steely Dan, John Coltrane, Thelonious Monk, and Pat Metheny. To get at this information, use the GetElementsByTagName method:

var
  List: TDomNodeList;
begin
  ... // code ommitted here
  List := Doc.DocumentElement.GetElementsByTagName('CD');
  for i := 0 To List.length -1 do
    ListBox1.Items.Add(List.item(i).Code);
end;

As you can see, you can go directly to a particular tag in your file. In this example, I simply iterate through the list of elements returned to me, and show a complete list of all the CDs in my little XML database:

   <CD>Two Against Nature</CD>
   <CD>Giant Steps</CD>
   <CD>Round About Midnight</CD>
   <CD>Imaginary Day</CD>

That's all you need to know to get started with this technology. There is, of course, much more to learn about these big, open source, components, but hopefully this simple example will get you started on the road to XML happiness.

Working with MS HTML

To get started using Microsoft's tools, you need to install the Microsoft XML components into Delphi. If you have a copy of Internet Explorer 5.0 the components are almost certainly already registered with your operating system, but not necessarily with Delphi itself. To register them with Delphi, Choose Project | Import Type Library from the Delphi menu and scroll through the list of registered components until you find Microsoft XML, version 2.0. (If you can't find the component, you should be able to use the browse button to find MSXML.DLL in the c:\Windows\System or c:\winnt\system32 directory. After selecting this DLL, the components should appear in the list as described in the previous sentence but one.)

Figure 2: Importing the MS XML parser into Delphi.

Install the components by clicking the OK button. When you are done, you should find the components on the Active X page of the Component Palette. (If you don't have Delphi 5 Enterprise or Pro version, you may not be able to perform all these steps, in which case I would suggest just using the native Delphi components described above.)

Create a new application. Drop down a button, a listbox and a memo control. Now add the TDomDocument object from the component palette on the ActiveX page. (The ability to wrap COM interfaces in a component is new to Delphi 5. Older versions of the product won't allow you to wrap the object as a component. Instead, see the code I quote at the very end of this article.) 

procedure TForm1.Button2Click(Sender: TObject); 
var 
  Len,i: Integer;
  ElemList: IXMLDOMNodeList;
begin
  DOMDocument1.load('c:\temp\sam1.xml');
  Len := DOMDocument1.documentElement.childNodes.Get_length;
  for i := 0 to Len - 1 do begin
    Memo1.Lines.Add('Node Name  := ' + DOMDocument1.documentElement.Get_nodeName);
    Memo1.Lines.Add('');
    Memo1.Lines.Add(DOMDocument1.documentElement.childNodes.item[i].Text);
  end;

  ElemList := DOMDocument1.documentElement.getElementsByTagName('CD');
  for i := 0 to ElemList.length -1 do
    ListBox1.ITems.Add(ElemList.item[i].xml);
end;

As you can see, this code is nearly identical to the code shown in the Open XML example. As a result, I will not repeat the explanation found in the first part of this paper.

Figure 3: Running the MSXMLTest program.

Here is what the code would look like if you are using an earlier version of Delphi that does not support wrapping COM objects in a component:

procedure TForm1.Button1Click(Sender: TObject);
var
  Doc: IXMLDOMDocument;
  Len,i: Integer;
  ElemList: IXMLDOMNodeList;
begin
  Doc := CreateOleObject('Microsoft.XMLDOM') as IXMLDomDocument;
  
  Doc.load('c:\temp\sam1.html');
  Len := Doc.documentElement.childNodes.Get_length;
  for i := 0 to Len - 1 do begin
    Memo1.Lines.Add('Node Name  := ' + Doc.documentElement.Get_nodeName);
    Memo1.Lines.Add('');
    Memo1.Lines.Add(Doc.documentElement.childNodes.item[i].Text);
  end;

  ElemList := Doc.documentElement.getElementsByTagName('CD');
  for i := 0 to ElemList.length -1 do
    ListBox1.ITems.Add(ElemList.item[i].xml);
end;

This code differs from the previous example in that it explicitly calls CreateOleObject and retrieves an instance the IXMLDOMDocument object. This chore is performed automatically for you when Delphi 5 wraps a COM server in a component. Once you have an instance of the interface you need, then you can call it exactly as shown in the previous discussion.

Summary

In this paper you have seen some very simple examples of how to get started parsing XML documents in Delphi 5. There is much more to this technology than I show here, but these simple examples should get you started. I will try to find time to give some more complex examples in a second article.

If you are interested in learning more about XML, read Part II of this article, called Intermediate XML Parsing with Delphi.