Home > Standards > XML Schema Standards Recommendation

XML Schema Standards Recommendation

February 11, 2010 Leave a comment Go to comments

Document Purpose

The goal of this document is to recommend a set of  standards and guidelines that should be applied when designing XML schemas for use in interoperability scenarios. This document is intended for a technical audience that is responsible for creating and/or modifying XML schemas.  The reader should be familiar with XML, general use of development tools, and preferably object oriented design.

Structural Guidelines

General Guidance

  • Avoid deep element nesting. The more complex XML schemas are the more processing will be required to parse and validate the document. This guideline is especially important for schemas that are likely to be leveraged by webservices. The consumer will have to marshal their programming objects to XML elements before issuing a request, and the service provider will have to map the XML elements to programming objects in processing the request. This may result in significant performance overhead for large documents.
  • Do not use unnamed complex types. Their use can result in difficult to use programming language constructs when the schema is mapped/bound to programming objects.
  • Use explicit namespace prefixes. Some older parsers treat elements in an unrefined default namespace as having no namespace.
  • Don’t use <any> element to pass information. Schemas should be explicit. The content expected in this element will not be documented in your XML schema. Therefore, the use of <any> elements almost always have the impact of introducing ambiguity by reducing the ability of the schema to be self-describing. Additionally, use of <any> elements introduce the possibility of inconsistent structural validation for the schema.
  • All elements, types, and attributes defined in XML schemas should be annotated.

NameSpace Definition Guidance

Namespaces in XML schemas are used to provide a scope or logical space for the definitions described by the schema. Namespaces in XML serve a similar role to package names in Java and namespaces in .Net languages. When using multiple schemas in an XML document, namespaces help ensure the uniqueness of elements and attributes.

 A complete specification of the rules governing namespaces can be found at: http://www.w3.org/TR/REC-xml-names/ .

According the specification, namespaces can be any valid URI excluding an empty string. While this provides a great deal of autonomy it does create a strong likelihood of a namespace conflict if no additional guidelines are applied.

 I encourage you to  leverage schema namespaces to assist in classifying the purpose of a XML schema document. The recommend namespaces format is:

http://schemas.{companyname}.com/{domain}/{purpose}/{version}

Figure 1 – Schema Namespace Format

where the bolded items are replaced with values appropriate for your application context. The following is an example of well defined schema namespace:

 http://schemas.ion.com/consulting/survey/1-0

Figure 2 – Example Namespace

Namespace Assignment Guidance

  • Always assign a target namespace to the schema.
  • Set the elementFormDefault attribute to “qualified”. A qualified settings forces elements in any instance documents that use the schema to be fully qualified with the namespace.
  • Set attributeFormDefault to “unqualified”.

The following is an example of the settings in a schema using the namespace from figure 2

<xs:schema xmlns:vha=http://schemas.ion.com/consulting/survey/1-0

xmlns:xs=http://www.w3.org/2001/XMLSchema

targetNamespace=http://schemas.vha.com/clinical/survey/1-0

elementFormDefault=”qualified”

attributeFormDefault=”unqualified”>

Figure 3 –Namespace Related Settings

Programming Language Constraints

Schema designers should be aware of the constraints in commonly used development tools that are likely to be used to consume schemas. When possible, these constructs, excluding annotations should be avoided.

Java .Net
  • In JAX-RPC 1.1 there are no constructs to represent: 
    • Default values
    • Optional attributes 
    • Annotations
    •  In JAX-RPC 1.1 WSDL to Java mapping tools are allowed to reject documents that use
       <xs:redefine>
       <xs:notation>
       substitution groups
    • In JAX-RPC 1.1 support for the following constructs are optional. Thus, behavior will vary by vendor platform:
      •  <xs:entity>
      • <xs:notation>
      • <xs:idref>
      • <xs:union>
      • <xs:anyType>
      • <xs:choice>
      • <xs:group>
      • Nested arrays and arrays of any subtype of the array type
      • Deriving complex types by restriction
      • Abstract types
    • JAX-WS schema support
    • Xsd.exe and XmlSerializer (System.Xml.Serialization) has the following restrictions:
      • <annotation> : Not supported
      • <any> : Partially supported
      • <anyAttribute>  : Partially supported
      • <appinfo>  : Not supported
      • <attributeGroup>         : Partially supported
      • <documentation>         : Not supported
      • <enumeration>    : Partially supported
      • <field>  : Not supported
      • <fractionDigits>  : Not supported
      • <group> : Partially supported
      • <import> : Partially supported
      • <include>  : Partially supported
      • <key>  : Not supported
      • <keyref>  : Not supported
      • <length>  : Not supported
      • <list> : Partially supported
      • <maxExclusive>  : Not supported
      • <maxInclusive>  : Not supported
      • <maxLength>  : Not supported
      • <minExclusive>  : Not supported
      • <minInclusive>     : Not supported
      • <minLength>  : Not supported
      • <notation>  : Not supported
      • <pattern>  : Not supported
      • <redefine>  : Not supported
      • <restriction>  : Partially supported
      • <selector>  : Not supported
      • <simpleContent>  :Partially supported
      • <totalDigits>  : Not supported
      • <union>  : Not supported
      • <unique>  : Not supported
      • <whitespace>  : Not supported
 

    • Xsd.exe and XmlSerializer (System.Xml.Serialization) has the following restrictions:
      • <annotation> : Not supported
      • <any> : Partially supported
      • <anyAttribute>  : Partially supported
      • <appinfo>  : Not supported
      • <attributeGroup>         : Partially supported
      • <documentation>         : Not supported
      • <enumeration>    : Partially supported
      • <field>  : Not supported
      • <fractionDigits>  : Not supported
      • <group> : Partially supported
      • <import> : Partially supported
      • <include>  : Partially supported
      • <key>  : Not supported
      • <keyref>  : Not supported
      • <length>  : Not supported
      • <list> : Partially supported
      • <maxExclusive>  : Not supported
      • <maxInclusive>  : Not supported
      • <maxLength>  : Not supported
      • <minExclusive>  : Not supported
      • <minInclusive>     : Not supported
      • <minLength>  : Not supported
      • <notation>  : Not supported
      • <pattern>  : Not supported
      • <redefine>  : Not supported
      • <restriction>  : Partially supported
      • <selector>  : Not supported
      • <simpleContent>  :Partially supported
      • <totalDigits>  : Not supported
      • <union>  : Not supported
      • <unique>  : Not supported
      • <whitespace>  : Not supported
 

Table 1 – Language Constraints

Datatype Utilization Guidance

  • Prefer the token data type (xs:token) to the string data type (xs:string) when extra spaces in a value are not significant. Tokens have the same lexical space as the string data type, but provide the extra benefit of normalizing whitespace. The lexical space for a data type is the set of valid literals it supports. Each value in the data type’s value space maps to one or more valid literals in its lexical space. For example, “100.0” and “1.0E2″ are two different representations for the same value. In other words, ” this is a string ” represented token data type will be normalized to “this is a string” whereas a string data type allows the value to remain as ” this is a string “. This may seem trivial, but such distinctions are the cause of a surprising number of value comparison errors.
  • Avoid the use of unsignednumerical types (xs:unsignedInteger). Java does not support unsigned types and the mapping from the schema type to a Java type may vary by implementation. This will result unpredictable system behavior.
  • Avoid the use of data type restrictions facets (e.g., restricting a number between –180 and 180) Most platforms don’t support restrictive sub-classing when representing the XML structure as a language structure.
  • Avoid the use of decimal and floating point numbers in XML schemas. Each platform may support precision differently. This is especially important in cross platform systems involving financial data.
  • Avoid the use of arrays (soapenc:arrayType) since there are differences across platforms regarding how empty arrays are serialized.
  • Avoid using “nillable” primitives (e.g. xs:int) and date types in XML schemas. This practice can cause issues across platforms since many language do not support assigning a null object to a primitive.
  • Avoid serialization platform-specific XML types, such as dataset from .NET or collection classes from Java. The use of such structure will result in a lack in interoperability across platforms. Complex types explicitly defined in the XML schema should be used to represent these structures.

Schema Versioning

It is recommended that schemas support tracking changes using version numbers. Version numbers are controlled at two levels:

  • Major—widespread changes, most likely not backward compatible
  • Minor—small changes and refactoring which are backward compatible. Minor changes may introduce new features without removing or changing the existing structures

 The major number must be included inside the target XML namespace of the XML schema. Figure 2 above includes an example of this. The minor version number should be specified in the version attribute of the schema as illustrated in figure 4 below.

 <xs:schema xmlns:vha=http://schemas.ion.com/consulting/survey/1-0

xmlns:xs=http://www.w3.org/2001/XMLSchema

targetNamespace=http://schemas.ion.com/consulting/survey/1-0

elementFormDefault=”qualified”

attributeFormDefault=”unqualified” version=”2”>

Figure 4 –Schema Versioning

Design Guidelines

General Guidance

Attributes Versus Elements

There are many cases in which data can be represented as an attribute or an element.

For example:

<item price='42.25'/>

Figure 5 –Element Form

or

<item><price>42.25</price></item>

Figure 6 –Attribute Form

There are no rule as to when attributes should be used versus elements.  However, attributes do have the constraints of only being singular. The following guidance should be used in your decision making:

  • Attributes should be used for metadata about the parent element (item is the parent element above).
  • Use elements for data that has a meaning separate from the enclosing element.
  • If you don’t know which to use, then use the element form. It is more extensible.

Global Elements

Only declare items as global elements if they represent a valid instance document. For example, if a schema includes a survey element and a survey person if you do not intend to process a survey person as a standalone instance document it should be declared as a child of the survey.

 Complex types can be declared globally without this side effect.

 Domain Driven Design

XML schemas should be designed in a similar manner to an object or component oriented system. XML elements should reflect the language and structure of the business domain as closely as possible as opposed to any specific technical implementation. When possible, technical concerns should be factored out of schemas and transmitted as meta-data (e.g. in a SOAP header).

 For example an XML schema to describe a book in the library might be designed as:

<?xml version=”1.0″ encoding=”UTF-8″?>

<xs:schema xmlns:vha=”http://schemas.ion.com/library/book/1-0&#8243; xmlns:xs=”http://www.w3.org/2001/XMLSchema&#8221; targetNamespace=”http://schemas.ion.com/library/book/1-0&#8243; elementFormDefault=”qualified” attributeFormDefault=”unqualified” version=”1″>

       <xs:element name=”Book”>

              <xs:annotation>

                     <xs:documentation>Comment describing your root element</xs:documentation>

              </xs:annotation>

              <xs:complexType>

                     <xs:sequence>

                           <xs:element name=”Author” type=”xs:token” maxOccurs=”unbounded”/>

                           <xs:element name=”Pages” type=”xs:int”/>

                           <xs:element name=”Subject” type=”xs:token” maxOccurs=”unbounded”/>

                     </xs:sequence>

              </xs:complexType>

       </xs:element>

</xs:schema>

Figure 7 –Simple Book Schema

There are innumerable texts that describe how to identify objects and create domain driven designs; therefore, that content will not be reiterated within this document.

Design Patterns

Similar to other languages, a set of design patterns have emerged for creating XML documents. An XML pattern catalog is available at http://www.xmlpatterns.com/patterns.shtml. These patterns represent solutions to common concerns when developing XML schemas. These patterns should be viewed as information point for designing schemas rather than a rule set that must be followed.

Schema  Reuse

Before ANY new schema is developed, it is recommend a central schema repository should be created when possible reviewed to determine if an existing schema can be leveraged. All new schemas should be stored in this repository. Initally organizations can leverage existing document or source management repositories to serve this need. As XML usage becomes mores sophisticated a dedicated XML repository can be considered if it provides a tangible benefit over the less sophisticated solution (e.g. ESB, XML Firewall, etc.)

Data defined in schemas should be non-redundant. XML schemas should import and include other XML schemas rather than duplicating or re-defining types and elements locally.

Applicable Standards

  • XML  1.1
  • XML Schema 1.0

 

Please note that W3C XML schemas (.xsd) is the recommended format for  the development of schemas. Other formats such as DTD and Relax NG are not preferred due to less wide spread adoption and incompatability with modern Web Service standards.

Tools

Tool Description
Xsd2Code An open source add-in for Visual Studio available at http://www.codeplex.com/Xsd2Code that generates C# or VB classes from an XML schema
Xml Spy .
Xsd.exe A command line tool included with Visual Studio that can be used to generate a XSD from classes or classes from an XSD
Schema Agent  

Table 2 – XML Tools

Appendix – Datatype Mappings

The following datatype mapping is provided to minimize compatibility issues across languages:

XML Schema Types .Net Types Java Types
anyURI System.Uri java.net.URI
base64Binary System.Byte[] byte[]
boolean System.Boolean boolean
byte System.SByte byte
date System.DateTime java.util.Calendar
decimal System.Decimal java.math.BigDecimal
double System.Double double
float System.Single float
hexBinary System.Byte[] byte[]
ID System.String java.lang.String
IDREF System.String java.lang.String
IDREFS System.String[] java.lang.String[]
int System.Int32 int
integer System.Decimal java.math.BigInteger
language System.String java.lang.String
long System.Int64 long
month System.DateTime long
Name System.String java.lang.String
NCName System.String java.lang.String
negativeInteger System.Decimal java.math.BigInteger
NMTOKEN System.String java.lang.String
NMTOKENS System.String[] java.lang.String[]
nonNegativeInteger System.Decimal java.math.BigInteger
nonPositiveInteger System.Decimal java.math.BigInteger
normalizedString System.String java.lang.String
NOTATION System.String java.lang.String
positiveInteger System.Decimal java.math.BigInteger
QName System.Xml.XmlQualifiedName javax.xml.namespace.QName
short System.Int16 short
string System.String java.lang.String
time System.DateTime java.util.Calendar
timePeriod System.DateTime java.util.Calendar
timePeriod System.DateTime java.util.Calendar
token System.String java.lang.String
unsignedByte System.Byte short
unsignedInt System.UInt32 long
unsignedLong System.UInt64 java.math.BigInteger
unsignedShort System.UInt16 int

Table 3 – Datatype Mappings

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: