IBMSkip to main content
Advanced search
    IBM home  |  Products & services  |  Support & downloads   |  My account

IBM developerWorks : XML zone : XML zone articles
developerWorks
Using tDOM and tDOM XSLT
72KB e-mail it!
Contents:
tDOM characteristics
tDOM XSLT
Dual-level XML applications
Summary
Resources
About the author
Rate this article
Related content:
Programming XML and Web services in TCL
Transforming DocBook documents using XSLT
Subscribe to the developerWorks newsletter
Also in the XML zone:
Tutorials
Tools and products
Code and components
Articles
A high-performance Tcl-scripted XSLT engine

Cameron Laird (Cameron@Lairds.com)
Vice president, Phaseit, Inc.
February 2002

tDOM is a high-performance, C-coded, DOM-oriented XML processor. tDOM XSLT is an XSLT engine built with tDOM that has extremely good performance in simple tests. tDOM and tDOM XSLT are open-source projects already in mission-critical production for several organizations. This article explains what you need to know to enjoy their advantages.

Simple benchmarks show that tDOM is one of the best-performing XML processors currently available. Access through the Tcl "scripting" language makes for a particularly potent development environment -- fast in both development and execution. A "dual level" (or two-language) model of development combines the advantages of Tcl and XSLT for different aspects of XML manipulation.

tDOM characteristics
tDOM is an open-source, C-coded DOM binding to the Tcl language, created and maintained by Jochen Loewer. tDOM incorporates a version of James Clark's expat, currently based on SourceForge release 1.95.1. expat is renowned for its quality and performance. The most recent release supports DOM Level 2. (Addresses for tDOM and related information appear in Resources.)

Several commercial products already rely on tDOM, including these examples:

  • Ideogramic ApS uses it in its object-oriented CASE tool to parse and generate XML Metadata Interchange (XMI) documents.
  • BMEcat, as described on its home page, is "the most widespread (XML) exchange standard for electronic product catalogues in German-speaking countries"; tDOM is the basis for tools developed by independent consultant Rolf Ade to manage and edit BMEcat data.
  • Engineer Zoran Vasiljevic builds tDOM into AOLserver for very high performance DOM-based HTML generation.
  • Other applications involve complex product configuration and mobile data delivery.

Programming with tDOM
Several benefits motivated Loewer in his implementation of tDOM. It has a flexible object-oriented syntax. It's very speedy, which is a particular advantage with the large XML documents involved in enterprise-scale cataloguing and procurement work. It is also thrifty on memory usage, and boasts a convenient and powerful XPath implementation. Quantitative measurements of these advantages appear below.

It's easy to make a start with tDOM, particularly on UNIX hosts. While current source downloads (see Resources) include Makefiles for both Mingw32 and Visual C++, Loewer does not actively maintain the latter. He does provide a tdom0.4.dll for Tcl8.2 against Mingw32 on the download page.

For UNIX, generation and installation are conventional. The README in the root directory directs you to cd unix; ./configure; make; make install, all of which works smoothly on most systems I tried. It's not enough to have Tcl binaries installed: tDOM generation expects Tcl sources as well. I had slightly easier results with gcc than with the bundled compiler on Tru64 UNIX. You might prefer to replace ./configure in the sequence above with ./configure --enable-gcc.

Once you have tDOM installed, it's easiest to use it from Tcl. Begin with a simple example like this:

domBench.tcl

      #!/usr/local/bin/tclsh

       package require tdom
       cd tests
       source test.tcl

While the contents of the docs directory included with the tDOM source distribution is terse, what's there is accurate. It takes a programmer who's familiar with XML concepts only a short time to begin coding productively with tDOM.

tDOM saves time and space
Rolf Ade has done the most extensive benchmarking of tDOM, within the framework of the "Java XML Representations Benchmark" (XMLbench). The XMLbench "Full Download" includes ready-to-run .jars of:

  • Crimson v1.1
  • Xerces v1.2.0
  • JDOM b6
  • dom4j v0.2
  • Electric XML v1.4

The "Full Download" also has a small benchmarking framework, including a tiny DOS batch file to iterate through the tests, and three example XML instances:

  • much_ado.xml (202029 bytes; an XML markup of Shakespeare's play)
  • periodic_table.xml (116505 bytes; basic chemical data on the elements)
  • xml.xml (162471 bytes; the XML version of the official XML "recommendation")

Ade supplemented these with his own run.sh to port the tests to Linux:

run.sh

      #! /bin/bash
       java -Xms128M -Xmx128M -cp \
         lib/xmlbm.jar:lib/jaxp.jar:lib/xerces.jar:\
         lib/crimson.jar:lib/jdom.jar:lib/dom4j.jar:lib/EXML.jar \
         com.sosnoski.xmlbench.XMLBench $1 $2 $3 $4 $5 $6

For his experiments, Ade used two hosts:

  • Linux 2.2.13 on Pentium II 333MHz with 384 MB memory, running Java version 1.3.0 Classic VM 1.3.0 (IBM Corporation)
  • Win 2000 Professional on Celeron 466 MHz with 192 MB memory, running Java version 1.3.0rc1 Java HotSpot(TM) Client VM 1.3.0rc1-S (Sun Microsystems Inc.)

He used tDOM-0.62-2905 on Linux and tDOM-0.61 on Win 2000, both interim releases. He translated the Java-coded benchmarking code into this small Tcl program:

domBench.tcl

        proc SAXtest {xmlstring count} {
             set parser [expat]
             puts "Parsed document text in [time {time {
                 $parser parse $xmlstring
                 $parser reset} $count}]"
         }
         
         proc DOMbuild {xmlstring count} {
             puts "Build from parse in [time {time {
                 set doc [dom parse $xmlstring]
                 $doc delete} [expr $count - 1]
                 set doc [dom parse $xmlstring]}]"
             return $doc
         }
         
         proc runTest {filename count} {
             set fd [open $filename]
             set xmlstring [read $fd]
             close $fd
         
             SAXtest $xmlstring $count
             set doc [DOMbuild $xmlstring $count]
         }

Ade notes that the SAX speed comparison is biased slightly against Java, because the XMLbench implementing code includes a simple event handler. As his interest was in SAX parsing in isolation, he did not write a corresponding event handler for the tDOM SAX test. Tests have shown this slows the Java applications only "a little bit."

When to use tDOM
The final results? Ade summarizes: "Under Linux tDOM SAX is 4 times faster than Java, under Windows 3 times. tDOM DOM is around 4 times faster than the fastest Java solution under both platforms." Memory tests confirmed Ade's own intensive experience over 18 months of working with DOM commercially: "the tDOM DOM tree needs typically between 2 and 3.5 times memory of the XML file size ..." Common DOM parsing engines in commercial use bound to C and Java frequently require five to 30 (!) times as much memory as the base document.

This does not mean that tDOM applications are all three times as fast as comparable Java-based ones. Interpretive access of the DOM tree, as currently implemented with Tcl for tDOM, slows down processing considerably. Large-scale XML editors, for which parse time dominates and memory consumption can be critical, are a natural choice for tDOM, based purely on performance.

However, even projects that require relatively sluggish Tcl-coded characterdata programming are increasingly choosing tDOM. Its XPath capabilities are uniquely convenient. It also offers a fast, convenient command for in-memory DOM creation, stream-oriented SAX processing, DTD validation, partial support for XPointer and other XML "accessories," and, perhaps most crucially, tDOM XSLT.

Those with extreme performance requirements should consider tDOM, as recent releases include a special "high gear." Independent consultant and frequent open-source volunteer Richard Hipp contributed an add-on streamlined parser to tDOM. Documents known to encode seven-bit ASCII or ISO-8859-1 with no external entities can request tDOM's -simple parser. This builds DOM trees twice as fast as the normal expat-based tDOM -- an order of magnitude faster than Java-coded engines!

tDOM only supports DOM 2 in an alpha release
If you have specific needs for full DOM 2 capabilites, though, the releases of tDOM before 0.7 are probably not for you. It's only with 0.7, scheduled for first release during the first week of February 2002, that tDOM fully supports namespaces. Earlier versions also did not support Unicode conversion, as Loewer and other existing tDOM users have been comfortable to process (external) XML data stores and (internal) DOM structures in the same formats. Moreover, earlier releases of tDOM did not handle complex entity declarations.

Loewer has been implementing and testing the methods DOM 2 requires, including setAttributeNS, getAttributeNS, and so on, since summer 2001. Does 0.7 alpha obsolete the 0.6 release? Apparently not; Loewer's versions have been upward compatible, almost without exception, and the current tDOM users with whom I've spoken expressed few urgent needs for DOM 2 capabilities. Note that TclDOM, described in an earlier developerWorks feature, Programming XML and Web Services in TCL, has supported DOM level 2 since spring 2000. TclDOM performance appears to fall roughly in the range of competitive Java engines.

tDOM XSLT
tDOM XSLT is a new implementation of XSLT based on tDOM. Recall that XSLT (XML Stylesheet Transformation) is a functional "language for transforming XML documents into other XML documents," as its current defining standard introduces it.

Here's how most organizations use XSLT: They maintain master-source documents, typically in XML. These masters are comprehensive and unified -- that is, each one is supposed to include all the information about a topic, including semantic content, formatting markup, and relations with other documents. To manifest a particular document in a format appropriate for end use, XSLT projects the XML master into a specific syntax. A common example is generation of HTML that's appropriate for conventional Web publication from corresponding XML sources.

XSLT is a high-level language that is generally very efficient at typical XML transformations. One advantage of tDOM XSLT is that it gives developers the opportunity to code XPath XSLT functions in Tcl, just as tDOM does for DOM programming. This is an easy way to make XSLT applications extensible.

How to use tDOM XSLT
When you generate and install release 0.63 or later of tDOM, tDOM XSLT is present as a built-in bonus. tDOM XSLT is still in rapid development. When you are ready to work with it, you'll likely be best off contacting Loewer for the latest sources, because he doesn't always maintain corrections in public areas.

Release 0.63 does not yet include documentation for tDOM XSLT in the form of manual pages. However, the source distribution includes xslttest, an entire directory of examples. As with tDOM proper, the easiest place to start is probably by way of included sample code:


      #!/usr/local/bin/tclsh

       package require tdom
       cd tests
       source xslttest.tcl

Performance of tDOM XSLT
Loewer has tracked tDOM XSLT's performance by testing it on one example XSL specification, the prettyprint.xsl of the XSLTMark. On his 600 MHz Linux host, one iteration using a tDOM XSLT compiled with gcc -O3 takes about 0.18 seconds. The XSLTMark site itself reports per-iteration times on a 500 MHz host of between 0.3 and 16 seconds. Microsoft XML 3.0 came in at 0.45 seconds, and Oracle XSLT 2.0 at 0.43 seconds. While tDOM XSLT remains a "very young" product, in Loewer's words, and no one has yet run direct comparisons, these measurements suggest that tDOM XSLT is quite competitive.

Dual-level XML applications
The performance and the memory thriftiness of tDOM XSLT are sure to catch the eye of readers who have been working with other XSLT engines. tDOM XSLT has other advantages, though.

One benefit of tDOM XSLT is its support of "dual-level" designs. This is a topic I introduced in Programming XML and Web Services in TCL. The idea is to script XSLT applications -- meaning, practitioners code as much of a program as is possible or convenient in XSLT, and then use small fragments of Tcl (or a comparable scripting language) to "glue" the XSLT body to external interfaces. XSLT has the efficiency, transparency (or even provability), and domain aptness of a good functional language. Tcl's expressivity, ease of learning, and extensive connections to "outside" resources including databases, graphical user interfaces (GUI), and system interfaces make for a great partnership between the two languages.

Consider this: Suppose your organization has an involved suite of precise XSLT programs that automate, for example, generation of a payment request from an invoice. This is an excellent job for XSLT, of course. However, Tcl is a good way to connect the application to real world data sources and sinks:

  • Tcl is good at filtering out attached XML instances from a stream of incoming e-mail.
  • Tcl can patch in data that require authentication against such data managers as the lightweight directory access protocol (LDAP).
  • Tcl can implement a GUI control panel that monitors financial summaries in real time by accumulating results that are called back from within the XSLT code.

From a managerial perspective, XSLT handles domain knowledge. XSLT captures organizational intelligence about the workflow of bids, purchase orders, invoices, and so on. Tcl has all the computer-related responsibilities of programming dataflow.

Plenty of other languages are also candidates as a "dual" to XSLT. Java, in particular, appears to be favored by many large organizations now. The efficiency of tDOM makes a compelling argument; its tiny memory footprint in comparison with some of the Java DOM engines simply eliminates the memory barrier many current XML projects face. Tcl's succinctness and simplicity are equally good matches for the "glue" role I propose. Glue doesn't need the programming weight carried by Java's declarations and type safety. It's better for this to have a lightweight language with the intelligence to sort out types on its own, dynamically.

Summary
tDOM XSLT is a young project that's only now moving into production, mostly in Europe. On the other hand, it boasts attention-grabbing performance, and its first users have been successful with early trials of a "dual-level" style that marries XSLT and Tcl. If you want rapid development of speedy XSLT applications, tDOM XSLT deserves your attention.

Resources

About the author
Author photo: Cameron LairdCameron Laird is a full-time developer for independent consultancy
Phaseit, Inc. He writes frequently on XML and other technical subjects. You can contact him at Cameron@Lairds.com.


72KB e-mail it!
What do you think of this article?
Killer! (5) Good stuff (4) So-so; not bad (3) Needs work (2) Lame! (1)

Comments?



IBM developerWorks : XML zone : XML zone articles
developerWorks
  About IBM  |  Privacy  |  Legal  |  Contact