Skip navigation.

Strongly Typed XML in Java with XMLBeans

by Cezar Cristian Andrei
05/04/2004

This article presents how XML documents can be manipulated in Java applications in a strongly typed fashion. There isn’t a clear definition of strongly typed language, but for the scope of this article, when I say strongly typed XML I mean it in the same way Java is a strongly typed language. In other words, there are type checks both at compile-time and runtime.
Download the author's files associated with this article


XMLBeans Status

XMLBeans was developed inside BEA’s WebLogic Workshop team to solve the XML Java data-binding problem. It shipped with BEA WebLogic Workshop 8.1 and was later admitted as an incubated project in the Apache community. After that, a version 1 release has been produced and a second tree has been added to the project for the next version. Currently version 1 tree continues to be very stable -- only bug fixes get checked in -- while heavy work is done in the v2 tree, trying to keep compatibility with v1 public interfaces as much as possible. In this article the examples are based on v1 for the XMLBeans and schema type system, while the streaming part is based on the v2 new code.

The Problem

Let’s start with a simple XML purchase order, like the next one, and because we talk about strongly typed XML we need to define the constraints of these kinds of XML documents. These constraints are described in the following schema:

Instance document:

  &ltpurchase-order xmlns="http://openuri.org/purchase-order">
    &ltdate>2003-12-19 12:21</date>
    &ltitem description="recliner">
       &ltquantity>5</quantity>
       &ltprice>759</price>
    </item>
    &ltitem description="pen">
       &ltquantity>100</quantity>
       &ltprice>3.99</price>
    </item>
    <!-- These are gifts -->
    &ltitem description="iPod">
       &ltquantity>3</quantity>
       &ltprice>399</price>
    </item>
  </purchase-order>

Schema:

  &ltxs:schema
     xmlns:xs="http://www.w3.org/2001/XMLSchema"
     xmlns:po="http://openuri.org/purchase-order"
     targetNamespace="http://openuri.org/purchase-order"
     elementFormDefault="qualified">

    &ltxs:element name="purchase-order">
      &ltxs:complexType>
        &ltxs:sequence>
          &ltxs:element name="date" type="xs:dateTime"/>
          &ltxs:element name="item" type="po:item"minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>

    &ltxs:complexType name="item">
      &ltxs:sequence>
        &ltxs:element name="price" type="xs:float"/>
        &ltxs:element name="quantity" type="xs:int"/>
      </xs:sequence>
      &ltxs:attribute name="description" type="xs:string"/>
    </xs:complexType>

  </xs:schema>

In order to find out the total price of this purchase order, we have to do several things:
  • Parse the XML
  • Take the contents of the price and quantity elements for each item
  • Transform the contents into values of the right types (i.e., int, float)
  • Multiply price by quantity and add it to the total
This sounds pretty easy, but several things are more delicate than they look. Validation is one of them -- what happens if the message is not conformant with the schema, or maybe some applications don’t care too much about this, as long it doesn’t affect the computation of the total price. And also it’s very important to make sure that a perfectly valid document doesn’t get rejected as being invalid.

Usually the content of the quantity element arrives in Java programs typed as a java.lang.String from DOM, SAX or StAX. We know that it should be in int, so it needs to be parsed into a Java int. Even if the semantic space of xsd:int and Java int is equal, the lexical space is a little bit different. Example +5 is a valid xsd int but the Integer.parseInt(“+5”) will throw a NumberFormatException.

We have to make sure that the XML values get translated into the right Java values. Java int isn’t the only type that has this problem; others do also, such as short, byte, float, double, and Integer. For date/time types it’s even worse, as the semantic space is different.

XMLBeans can help users with this problem.

The Beans in XMLBeans

In order to get the price and quantity content, just compile the schema with the XMLBeans compiler:

scomp schemafile.xsd [options.xsdconfig]

The compiler will take the schema files and will construct a schema type system that will contain the resolved information about all element ant types and for each type in the schema type system. Then, a Java bean interface will be generated.

1

Note that a PurchaseOrderDocument has been generated to represent the type of the document that contains a purchase-order element. Inside it has a PurchaseOrder interface which represents the type of purchase-order element. These two types are not explicitly named in XML schema, but we have no choice in Java world. Java types and the schema types line up, making everything symmetric, including the types and the instances. In addition, one should consider the schema files with the same rank as Java sources.

Accessing the Document Data

After compiling the schema files we can write this code:


  PurchaseOrderDocument pod = 
        PurchaseOrderDocument.Factory.parse(new File(“po.xml”));
  PurchaseOrderDocument.PurchaseOrder po = pod.getPurchaseOrder();
        
  float total = 0;
  Item[] items = po.getItemArray();
  for (int i=0; i&ltitems.length; i++)
  {
     float price = items[i].getPrice();
     int quantity = items[i].getQuantity();
     total += price * quantity;
  }

  System.out.println("Total price: " + total); 

You can see that everything in this document is strongly typed, having all the advantages of Java language applied to your XML schema-type system. A whole class of errors will be caught at compile-time by the Java compiler.

As you see in the message, there is a piece of information that we don’t see in our Java program: the comment. Most of the users are interested only in elements, attributes, and their content, but there are a few that are very interested in the entire XML info set. If they are interested in comments, processing instructions, white space, or the exact order of the siblings, they have to use API that is one level lower, the XMLCursor API. Using the XMLCursor API will lose the advantages of the strongly typed system, but you not only have access to everything in the XML document, but you can always find out what is the type of the XML you are on and switch back to the strongly typed world.

Creating New Documents

With the generated beans, brand new documents can be created from scratch. If a program would produce the previous instance, it would have to look like this:


        PurchaseOrderDocument pod = PurchaseOrderDocument.Factory.newInstance();
        PurchaseOrderDocument.PurchaseOrder po = pod.addNewPurchaseOrder();
        po.setDate(new GregorianCalendar());
        Item item1 = po.addNewItem();
        item1.setDescription("recliner");
        item1.setPrice(759);
        item1.setQuantity(2);
        Item item2 = po.addNewItem();
        item2.setDescription("pen");
        item2.setQuantity(100);
        item2.setPrice(3.99f);
        Item item3 = po.addNewItem();
        item3.setDescription("iPod");
        item3.setQuantity(3);
        item3.setPrice(399);

        pod.save(System.out, new XmlOptions().setSavePrettyPrint());

Schema Type System

If schema type information is needed, remember that XMLBeans supports 100% of XML schema, and one should use SchemaTypeSystem API. This API is the equivalent of the Java reflection API for XML schema types. SchemaTypeSystem is a finite set of component definitions.

There are two ways of getting a schema type system:
  1. From a compiled bean:

  2. SchemaTypeSystem typeSystem = PurchaseOrderDocument.type.getTypeSystem();

  3. From an array of XmlObject that represent schema documents:

  4. SchemaTypeSystem typeSystem = XmlBeans.compileXsd(new XmlObject[]
    { XmlObject.Factory.parse(new File("po.xsd")) },
    XmlBeans.getBuiltinTypeSystem(),
    options);
Note that in the later case there is no Java information associated with the schema types, since there were no Java interfaces and implementation classes generated during the compilation.

Once we have a schema type system, all the information about the schema is available:
  • Global type definitions
  • Global element definitions
  • Global attribute definitions
  • Named model group definitions
  • Attribute group definitions
Also, not all types are global; XML schema allows definition of unnamed inner types. This is the standard way of finding all the schema types:


  List allSeenTypes = new ArrayList();
  allSeenTypes.addAll(Arrays.asList(typeSystem.documentTypes()));
  allSeenTypes.addAll(Arrays.asList(typeSystem.attributeTypes()));
  allSeenTypes.addAll(Arrays.asList(typeSystem.globalTypes()));
  for (int i = 0; i < allSeenTypes.size(); i++)
  {
     SchemaType sType = (SchemaType)allSeenTypes.get(i);
     System.out.prinlnt("Visiting " + sType.toString());
     allSeenTypes.addAll(Arrays.asList(sType.getAnonymousTypes()));
  }

Streaming

XMLStreamReader is the interface for reading XML documents from JSR 173. It’s equivalent to the SAX interfaces and the main difference is that the user pulls information out of XMLStreamReader where in SAX the information is pushed to the user program. In XMLBeans v2 the XMLStreamReader interface was extended so that one can get the simple content values of elements and attributes directly typed into Java. XMLStreamReader interface contains methods like int getIntValue() and float getFloatValue(). The contract of this interface is the following:

  • The stream should to be placed on a startElement, text, space or CData
  • Simple content is expected, i.e. only text, CData, space, Entity Ref and comments, if a start element is encountered an exception will be thrown
  • If multiple text, CData, space, entity tef are encountered their values will be concatenated and comments will be ignored
  • The right white space collapsing style will be applied and parsed into the right value
  • It should not do any validation or matching with the schema, the simple value schema type will be implied only from the called method, and sometimes like in the date types from the lexical value
For accessing the values in the attributes, there are two kinds of methods:


            Int getAttributeIntValue(int attributeIndex)
            Int getAttributeIntValue(String uri, String local)


For getting the date values there are three Java types to represent them: Gdate, XMLCalendar and date. For string values there are two kinds of methods one that returns the value as it was in the XML document and another one that returns the value after the white space collapsing style was applied.

Conclusion

XMLBeans should be used in 90% of cases when XML is accessed/created/manipulated in Java because of its:
  • Full fidelity representation and access to full XML info set
  • 100% XML schema support
  • Full-fledged XML schema type system API

Article Tools

 E-mail
 Print
 Discuss
 Blog