1
2
3
The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet. That's because XML has emerged as the standard for exchanging data across disparate systems, and Java technology provides a platform for building portable applications. This partnership is particularly important for Web services, which promise users and application developers program functionality on demand from anywhere to anywhere on the Web. XML and Java technology are recognized as ideal building blocks for developing Web services and applications that access Web services. 4
But how do you couple these partners in practice? More specifically, how do you access and use an XML document (that is, a file containing XML-tagged data) through the Java programming language? One way to do this, perhaps the most typical way, is through parsers that conform to the Simple API for XML (SAX) or the Document Object Model (DOM). Both of these parsers are provided by Java API for XML Processing (JAXP). Java developers can invoke a SAX or DOM parser in an application through the JAXP API to parse an XML document -- that is, scan the document and logically break it up into discrete pieces. The parsed content is then made available to the application. In the SAX approach, the parser starts at the beginning of the document and passes each piece of the document to the application in the sequence it finds it. Nothing is saved in memory. The application can take action on the data as it gets it from the parser, but it can't do any in-memory manipulation of the data. For example, it can't update the data in memory and return the updated data to the XML file. In the DOM approach, the parser creates a tree of objects that represents the content and organization of data in the document. In this case, the tree exists in memory. The application can then navigate through the tree to access the data it needs, and if appropriate, manipulate it. 5
6
Suppose you need to develop a Java application that accesses and displays data in XML documents such as books.xml. These documents contain data about books, such as book name, author, description, and ISBN identification number. You could use the SAX or DOM approach to access an XML document and then display the data. For example, suppose you took the SAX approach. In that case, you would need to: Write a program that creates a SAX parser and then uses that parser to parse the XML document. The SAX parser starts at the beginning of the document. When it encounters something significant (in SAX terms, an "event") such as the start of an XML tag, or the text inside of a tag, it makes that data available to the calling application. Create a content handler that defines the methods to be notified by the parser when it encounters an event. These methods, known as callback methods, take the appropriate action on the data they receive. 7
Now let's look at how you use JAXB to access an XML document such as books.xml and display its data. Using JAXB, you would: Bind the schema for the XML document. Unmarshal the document into Java content objects. The Java content objects represent the content and organization of the XML document, and are directly available to your program. After unmarshalling, your program can access and display the data in the XML document simply by accessing the data in the Java content objects and then displaying it. There is no need to create and use a parser and no need to write a content handler with callback methods. What this means is that developers can access and process XML data without having to know XML or XML processing. 8
JAXB simplifies access to an XML document from a Java program by presenting the XML document to the program in a Java format. The first step in this process is to bind the schema for the XML document into a set of Java classes that represents the schema. Schema: A schema is an XML specification that governs the allowable components of an XML document and the relationships between the components. For example, a schema identifies the elements that can appear in an XML document, in what order they must appear, what attributes they can have, and which elements are subordinate (that is, are child elements) to other elements. Assume, for this example, that the books.xml document has a schema, books.xsd, that is written in the W3C XML Schema Language. This schema defines a <Collection> as an element that has a complex type. This means that it has child elements, in this case, <book> elements. Each <book> element also has a complex type named booktype. The <book> element has child elements such as <name>, <ISBN>, and <author>. Some of these have their own child elements. 9
For example, the JAXB Reference Implementation provides a binding compiler that you can invoke through scripts. 10
The -p option identifies a package for the generated classes, and the -d option identifies a target directory. So for this command, the classes are packaged in test.jaxb within the work directory. In response, the binding compiler generates a set of interfaces and a set of classes that implement the interfaces. Here are the interfaces it generates for the books.xsd schema: 11
Because the classes are implementation-specific, classes generated by the binding compiler in one JAXB implementation will probably not work with another JAXB implementation. So if you change to another JAXB implementation, you should rebind the schema with the binding compiler provided by that implementation. In total, the generated classes represent the entire books.xsd schema. Notice that the classes define get and setmethods that are used to respectively obtain and specify data for each type of element and attribute in the schema. You then compile the generated interfaces and classes. 12
Unmarshalling an XML document means creating a tree of content objects that represents the content and organization of the document. The content tree is not a DOM-based tree. In fact, content trees produced through JAXB can be more efficient in terms of memory use than DOM-based trees. The content objects are instances of the classes produced by the binding compiler. In addition to providing a binding compiler, a JAXB implementation must provide runtime APIs for JAXB-related operations such as marshalling. The APIs are provided as part of a binding framework. 13
This object provides the entry point to the JAXB API. When you create the object, you need to specify a context path. This is a list of one or more package names that contain interfaces generated by the binding compiler. By allowing multiple package names in the context path, JAXB allows you to unmarshal a combination of XML data elements that correspond to different schemas. This object controls the process of unmarshalling. In particular, it contains methods that perform the actual unmarshalling operation This method does the actual unmarshalling of the XML document. Recall that the classes that a JAXB compiler generates for a schema include get and set methods you can use to respectively obtain and specify data for each type of element and attribute in the schema. You can validate source data against an associated schema as part of the unmarshalling operation. 14
15
Instead of accessing data in an XML document, suppose you need to build an XML document through a Java application. Here too using JAXB is easier. Let's investigate. This is the same operation you perform prior to unmarshalling a document. In this case, the schema is for the XML document you want to build. Of course, if you've already bound the schema (for instance, you unmarshalled an XML document, updated the data, and now want to write the updated data back to the XML document), you don't have to bind the schema again. Create the Content Tree The content tree represents the content that you want to build into the XML document. You can create the content tree by unmarshalling XML data, or you can create it using the ObjectFactory class that's generated by binding the appropriate schema. 16
Marshalling is the opposite of unmarshalling. It creates an XML document from a content tree. To marshal a content tree, you: Create a JAXBContext object, and specify the appropriate context path -- that is, the package that contains the classes and interfaces for the bound schema. Create a Marshaller object. This object controls the process of marshalling. In particular, it contains methods that perform the actual marshalling operation. The Marshaller object has properties that you can set through the setproperty method. For example, you can specify the output encoding to be used when marshalling the XML data. Or you can tell the Marshaller to format the resulting XML data with line breaks and indentation. Call the marshal method. This method does the actual marshalling of the content tree. When you call the method, you specify an object that contains the root of the content tree, and the output target. Validator validator = jaxbcontext.createvalidator(); validator.validate(collection); 17
18
Here, by comparison, is a JAXB program that updates an XML document. Specifically, it updates an unmarshalled content tree and then marshals it back to an XML document. Notice how JAXB simplifies the process. The program has direct access to the object it needs to update. The program uses a get method to access the data it needs, and a set method to update the data. Although it's tempting to think that the XML data can make a "roundtrip" unchanged, there's no guarantee of that. In other words, if you use JAXB to unmarshal an XML document and then marshal it back to the same XML file, there's no guarantee that the XML document will look exactly the same at it did originally. For example, the indentation of the resulting XML document might be a bit different than the original. The JAXB specification does not require the preservation of the XML information set in a roundtrip from XML document-to-java representation-to XML document. But it also doesn't forbid the preserving of it. 19
20
For example, the XML built-in datatype xsd:string must be bound to the Java data type java.lang.string. For example, suppose you want an XML data type mapped to a Java data type that is different than the type called for by the default binding specification. Or you want the binding compiler to assign a name of your choice to a class that it generates. 21
22
All binding declarations are in an annotation element and its subordinate appinfo element. In fact, all inline binding declarations must be made this way. Make global customizations: The <jaxb:globalbindings...> element specifies binding declarations that have global scope. In JAXB, binding declarations can be specified at different levels, or "scopes." Each scope inherits from the scopes above it, and binding declarations in a scope override binding declarations in scopes above it. Add method signatures. The declaration generateissetmethod="true" tells the binding compiler to generate isset methods for the properties of all generated classes. Change binding style. By default, schema components that have complex types and that have a content type property of mixed or element-only are bound with a style called element binding. In element binding, each element in the complex type is mapped to a unique content property. Alternatively, you can change the binding style to model group binding by specifying bindingstyle="modelgroupbinding" choicecontentproperty="true". In model group binding, schema components that have complex type and that are nested in the schema are mapped to Java interfaces. This gives users a way to specifically customize these nested components. 23
24
25
26