Generating XSL for Schema Validation

Francis Norton

1999-05-20

Abstract

An XSL stylesheet can be used to generate XSL validators from XML schemas. This document outlines the mechanics of this process and speculates on other uses for this technology.

Precursers

This note builds directly on the idea of using XSL as a validator for XML Schemas described by Rick Jellife in Using XSL for Structural Validation.

Definitions

DCD
A schema language in XML for XML documents. http://www.w3.org/TR/NOTE-dcd
XML-Data
A schema language supported in Microsoft's IE5 XML parser. See http://www.w3.org/TR/1998/NOTE-XML-data
XML Schema
The W3C proposed schema language. See http://www.w3.org/TR/xmlschema-1/
XSL
The XML stylesheet language. Includes an XML transformation subset, XSLT
XSLT
The subset of XSL for transforming XML. Strong pattern matching and string processing functionality. Still a W3C Working Draft, available at http://www.w3.org/TR/WD-xslt

Why generate Schema Validators automatically?

It is possible to code by hand an XSL stylesheet that will validate an XML document against some or all constraints of an XML schema. This note presents the case for generating such Validator stylesheets automatically by transforming an XML schema through an XSL validator-generator. The resulting XSL validator can then be used at run-time to validate XML documents that claim to conform to the original XML schema, returning an XML document that contains a list of invalid elements or is, for a valid document, empty.

It is perfectly plausible to suppose that a single schema language such as xml-schema will emerge as a dominant standard, and that parsers from leading parser suppliers will come to have have built-in support for validating XML documents against schemas in these languages, in the same way that there is already support for xml-data in the IE5 XML parser.

However the XSL approach has a number of specific advantages, namely that it will provide:

Independence and Portability

The approach is highly portable - any XML environment that supports the XSLT subset of XSL can be used to generate and run validators from XML schemas, and these validators will behave in precisely the same way regardless of parser supplier. This should encourage the take-up of Schemas by users who will no longer have to worry about the down-side of choosing a "wrong" schema language.

Support for the evolution of Schema Languages

This portability also has the advantage that it will permit schema languages to keep evolving along the space between utility and simplicity by reducing the pressure for a dominant single standard to freeze the evolutionary process.

A Case Study for XSLT

Finally this approach will demonstrate that XSLT has the power to take on tasks way beyond its obvious problem domain of document presentation.

Illustration: generating a validator for part of a DCD schema

This illustration generates an XSL validator for part of the element content of a DCD schema.

The Sample DCD Schema

The schema is a slightly simplified copy of an illustration in the DCD Note.
<?xml version="1.0"?>
<DCD xmlns:RDF="http://www.w3.org/TR/WD-rdf-syntax#" >
 <ElementDef Type="Booking" Model="Elements" Content="Closed">
   <Description>Describes an airline reservation</Description>
   <Element>LastName</Element> <Element>FirstInitial</Element>
   <Element>SeatRow</Element> <Element>SeatLetter</Element>
   <Element>Departure</Element> <Element>Class</Element>
 </ElementDef>

 <!-- example omits boring field declarations -->
 <ElementDef Type="SeatRow" Model="Data" Datatype="i1" Min="1" Max="72" />
 <ElementDef Type="SeatLetter" Model="Data" Datatype="char" Min="A" Max="K"/>
 <ElementDef Type="Class" Model="Data" Datatype="char" Default="1"/>
</DCD>

A stylesheet for generating (limited) validators

This stylesheet generates a validator stylesheet for part of the element content of a DCD schema to illustrate a possible approach.
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

  <!-- match the root element -->
  <xsl:template match="/">
     <xsl:element name="zxsl:stylesheet">
       <xsl:attribute name="xmlns:zxsl">
        http://www.w3.org/TR/WD-xsl
      </xsl:attribute>

      <!-- the root processor -->
      <xsl:element name="zxsl:template" match="/">
        <xsl:attribute name="match">/</xsl:attribute>
        <xsl:element name="zxsl:apply-templates">
          <xsl:attribute name="select">*</xsl:attribute>
        </xsl:element>
      </xsl:element>

      <!-- stick the error processor in place -->
      <xsl:element name="zxsl:template">
        <xsl:attribute name="match">*</xsl:attribute>
        <P>
          Error:
		  
          <xsl:element name="zxsl:node-name"/>
          <xsl:element name="zxsl:value-of"/>
        </P>
      </xsl:element>

      <xsl:apply-templates select="*|@*|comment()|text()"/>
     </xsl:element>
  </xsl:template>

  <!-- priority doesn't work in IE5 - make this the lowest priority match by putting it first... -->
  <xsl:template match="*|@*|comment()|pi()|text()">
      <xsl:apply-templates select="*|@*|comment()|pi()|text()"/>
  </xsl:template>

  <!-- pick up the top level ElementDefs -->
  <xsl:template match="DCD/ElementDef">
     <xsl:element name="zxsl:template">
      <xsl:attribute name="match">
        /<xsl:value-of select="@Type"/>
      </xsl:attribute>
      <xsl:element name="zxsl:apply-templates"/>
     </xsl:element>
    <xsl:apply-templates select="*|@*|comment()|pi()|text()"/>
  </xsl:template>

  <!-- now the content models -->
  <xsl:template match="ElementDef/Element">
     <xsl:element name="zxsl:template">
      <xsl:attribute name="match">
        <xsl:value-of select="../@Type"/>/<xsl:value-of/>
      </xsl:attribute>
      <xsl:element name="zxsl:apply-templates"/>
     </xsl:element>
    <xsl:apply-templates select="*|@*|comment()|pi()|text()"/>
  </xsl:template>

</xsl:stylesheet>
    

An auto-generated validator

This is the output, hand-formatted, from running the sample schema through the generator above.
<zxsl:stylesheet xmlns:zxsl="http://www.w3.org/TR/WD-xsl">
  <zxsl:template match="/">
    <zxsl:apply-templates select="*" /> 
  </zxsl:template>
  <zxsl:template match="*">
    <P>
    Error: 
    <zxsl:node-name /> 
    <zxsl:value-of /> 
    </P>
  </zxsl:template>
  <zxsl:template match="/Booking">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="Booking/LastName">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="Booking/FirstInitial">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="Booking/SeatRow">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="Booking/SeatLetter">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="Booking/Departure">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="Booking/Class">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="/SeatRow">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="/SeatLetter">
    <zxsl:apply-templates /> 
  </zxsl:template>
  <zxsl:template match="/Class">
    <zxsl:apply-templates /> 
  </zxsl:template>
</zxsl:stylesheet>
    

An XML document requiring validation

This was also borrowed (and corrupted) from the DCD Note.
<Booking>
  <LastName>Bray</LastName><FirstInitial>T</FirstInitial>
  <SeatRow>33</SeatRow><SeatLetter>B</SeatLetter>
  <Departure>1997-05-24T07:55:00+1</Departure>
  <Offence>Speeding</Offence>
</Booking> 

Results of validation

The validator stylesheet reports an element that wasn't specified in the booking schema.
<P>Error: Offence Speeding</P>
  

Limitations

Although XSLT has supports string processing, pattern matching and the generation and processing of lists of nodes, it is not clear whether pure XSLT has enough programming features to validate all the constraints of any one schema language (especially since some useful features such as Regular Expressions may still be added). Using JavaScript or Java extensions would provide a possible solution in this case.

Performance is obviously an issue if validation is to be performed at run-time, particularly in the transaction-like environment of middleware. Using XSL may well be slower than using a parser's built-in schema support. However there is likely to be considerable investment in optimising the performance of XSL, and there is also the possiblity of pre-compiling an XSL validation stylesheet to Java (as with SAXON) or C, which would allow compilation or other optimisations.

Possible future uses of this approach.

These uses are possibly more plausible if only in that they have lower performance requirements. But we have to start somewhere - the more schemas add value, the more they will be used in tools and projects, the more ways of adding value will emerge.

Data Viewers and Report Formatters

Any schema is likely to tell us enough about the structure of the data involved that we can generate a default format for viewing or reporting the information. Key fields and relational integrity constraints can tell us how to group the data, while data types will tell us which fields may be totalled.

Entry Forms

Given that the purpose of a schema language to hold all the information we need in order to validate a document, and given that most high-level programming tools hold their forms as templates which get interpreted at run-time, a schema could be used to generate a template for editing instances of conforming documents. In this case the generated template would, I assume, not be XSL so much as XML for some GUI environment. There are already tools (such as XFDL) using XML schemas to configure form editors, but using XSL for the build-time template generation rather than hard-coding it in a low-level language (but I don't know how existing tools were implemented!) may speed the spread of this technology.

EDI

EDI fits in a slighly less direct way: tools for converting EDI to and from XML (such as Redix) will - if they don't already - support the generation of matching XML schemas which would then become available for the viewing, reporting and editing functions.


Copyright (C) 1999 Francis Norton. Feel free to publish this in any way you like, but please keep my name on it and try to update it to the most recent version.