Tuesday, November 11, 2008

Generate XML Schemas from XML with inst2xsd

In an earlier blog entry, I wrote about using Trang to generate XML Schema from an XML source document. In this blog entry, I will look at using Apache XMLBeans's tool called inst2xsd to also generate XML Schema from a source XML file.

In most cases, one would expect to define an XML Schema and then generate XML that is compliant with that schema. However, I have been surprised at the number of times that I have needed to approach XML and XML Schema from a "backwards" perspective. This is usually the case when I am provided with XML from another source and want to use a framework or tool that relies upon an XSD to work properly. For example, if I want to use JAXB's xjc binding compiler to generate Java classes but only have an example XML file and no XSD, a tool like inst2xsd is very helpful for generating that XSD.

The inst2xsd tool is part of Apache XMLBeans and so can be downloaded from one of the Apache download mirrors. The entire ZIP file is relatively small and the tool is located in the bin directory of the unzipped contents.

An attractive feature of inst2xsd is its simplicity. The following screen snapshot shows its relatively simple command usage. This output can be obtained by running inst2xsd without any options or source XML files or by running it with the -help option.


inst2xsd Help/Usage




As the usage/help information indicates, there are multiple XML Schema design patterns that can be employed when using inst2xsd to generate XML Schema files from source XML. The Sun document Introducing Design Patterns in XML Schema provides an overview of the three XML Schema design patterns that inst2xsd supports (Russian Doll, Salami Slice, and Venetian Blind [default in inst2xsd]) and also covers a fourth called Garden of Eden. The same four XML Schema design patterns are also covered in this presentation.

To demonstrate use of inst2xsd, a source XML file is required. The next code listing shows a simple XML file that will be used as the source in this example.


publications.xml


<?xml version="1.0"?>
<publications>

<publication title="Applying Flash to Java: Flex and OpenLaszlo"
publicationDate="2008-10-20"
publisher="Colorado Software Summit"
url="http://softwaresummit.org/2008/speakers/marx.htm"
description="Using Flex and OpenLaszlo with Java EE.">
<topics>
<topic>Flash</topic>
<topic>Java</topic>
<topic>Flex</topic>
<topic>OpenLaszlo</topic>
<topic>RIA</topic>
<topic>Web</topic>
</topics>
</publication>

<publication title="Java Management Extensions (JMX) Circa 2008"
publicationDate="2008-10-21"
publisher="Colorado Software Summit"
url="http://softwaresummit.org/2008/speakers/marx.htm"
description="JMX in 2008 is simpler, more open, and more useful.">
<topics>
<topic>Java Management Extensions</topic>
<topic>JMX</topic>
<topic>Java</topic>
<topic>Spring Framework</topic>
<topic>Web Services</topic>
</topics>
</publication>

<publication title="Basic Java Persistence API Best Practices">
<topics>
<topic>Java</topic>
<topic>JPA</topic>
<topic>ORM</topic>
<topic>Oracle</topic>
<topic>RDBMS</topic>
<topic>Best Practices</topic>
</topics>
</publication>

<publication title="Add Some Spring to Your Oracle JDBC Access"
publicationDate="2005-11"
publisher="Oracle Technology Network"
url="http://www.oracle.com/technology/pub/articles/marx_spring.html"
description="Use Spring Framework JDBC support with Oracle DB.">
<topics>
<topic>Spring Framework</topic>
<topic>Oracle</topic>
<topic>Java</topic>
<topic>JDBC</topic>
<topic>RDBMS</topic>
</topics>
</publication>

<publication title="More JSP Best Practices"
publicationDate="2003-07-25"
publisher="JavaWorld"
url="http://www.javaworld.com/javaworld/jw-07-2003/jw-0725-morejsp.html"
description="More and updated tips for better JavaServer Pages.">
<topics>
<topic>JavaServer Pages</topic>
<topic>JSP</topic>
<topic>Best Practices</topic>
<topic>Maintainability</topic>
<topic>Web</topic>
</topics>
</publication>

<publication title="JSP Best Practices"
publicationDate="2001-11-29"
publisher="JavaWorld"
url="http://www.javaworld.com/javaworld/jw-11-2001/jw-1130-jsp.html"
description="Tips for reusable and maintainable JavaServer Pages.">
<topics>
<topic>JavaServer Pages</topic>
<topic>JSP</topic>
<topic>Best Practices</topic>
<topic>Maintainability</topic>
<topic>Web</topic>
</topics>
</publication>

</publications>


The next screen snapshot shows the inst2xsd command run four times. These four commands specify the three available design patterns (Russian Doll via 'rd', Salami Slice via 'ss', and Venetian Blind via 'vb') as well as the default (no design explicitly specified), which really is equivalent to running it with Venetian Blind specified. The other options used in these commands include explicit specification of the output directory for the generated XSD files and specification of the prefix of the generated file names.


Running inst2xsd with All Available Design Patterns




The next three code listings show the output from running inst2xsd as shown above. Note that I don't show the results of running the default design pattern because they are exactly the same as explicitly specifying Venetian Blind with the 'vb' setting.


russian_doll_schema0.xsd


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="publications">
<xs:complexType>
<xs:sequence>
<xs:element name="publication" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="topics">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="topic" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute type="xs:string" name="title" use="optional"/>
<xs:attribute type="xs:string" name="publicationDate" use="optional"/>
<xs:attribute type="xs:string" name="publisher" use="optional"/>
<xs:attribute type="xs:anyURI" name="url" use="optional"/>
<xs:attribute type="xs:string" name="description" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>



salami_slice_schema0.xsd


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="topic" type="xs:string"/>
<xs:element name="topics">
<xs:complexType>
<xs:sequence>
<xs:element ref="topic" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="publications">
<xs:complexType>
<xs:sequence>
<xs:element ref="publication" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="publication">
<xs:complexType>
<xs:sequence>
<xs:element ref="topics"/>
</xs:sequence>
<xs:attribute type="xs:string" name="title" use="optional"/>
<xs:attribute type="xs:string" name="publicationDate" use="optional"/>
<xs:attribute type="xs:string" name="publisher" use="optional"/>
<xs:attribute type="xs:anyURI" name="url" use="optional"/>
<xs:attribute type="xs:string" name="description" use="optional"/>
</xs:complexType>
</xs:element>
</xs:schema>



venetian_blind_schema0.xsd


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="publications" type="publicationsType"/>
<xs:complexType name="publicationType">
<xs:sequence>
<xs:element type="topicsType" name="topics"/>
</xs:sequence>
<xs:attribute type="xs:string" name="title" use="optional"/>
<xs:attribute type="xs:string" name="publicationDate" use="optional"/>
<xs:attribute type="xs:string" name="publisher" use="optional"/>
<xs:attribute type="xs:anyURI" name="url" use="optional"/>
<xs:attribute type="xs:string" name="description" use="optional"/>
</xs:complexType>
<xs:complexType name="publicationsType">
<xs:sequence>
<xs:element type="publicationType" name="publication" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="topicsType">
<xs:sequence>
<xs:element type="xs:string" name="topic" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:schema>



The above examples demonstrate how easy it is to generate an XML Schema definition file from an XML source file according to the desired XML Schema design pattern. One can choose the advantages and features he or she desires in a schema and choose the pattern that most closely satisfies those expectations. If you're wondering where the names Russian Doll, Salami Slice, and Venetian Blinds come from in this context or, more specifically, what they have to do with XML Schemas, see this explanation. That same document also explains why different design patterns may be preferable for different situations.

The examples already covered demonstrate the central value of inst2xsd. However, it does have some other nice features as well. For example, as indicated in the usage information shown earlier, the -validate option can be used to validate the source XML against the just-generated XML Schema as part of the XML Schema generation process. This is shown in the next screen snapshot.


Validating Source XML with Generated XML Schema




There is also a -verbose option to see a whole lot of output as part of the XML Schema generation process. Finally, it is worth noting that inst2xsd also requires well-formed XML to generate an XSD. When the source XML is not well-formed, an error message like one of the following will often be displayed:

XML Source Prolog Missing Closing Question Mark




XML Source Not Well-Formed Due to Never-Closed Tag




The Apache XMLBeans tool inst2xsd is a simple but highly useful tool for generating XML Schema files from source XML. The tool allows different XML Schema design patterns to be employed. Once the XML Schema file is generated, it can be manually modified to be more descriptive or more narrow in its definitions.

No comments: