&6&7XWRULDO '20 (GZDUG;LD Document Object Model (DOM) DOM Supports navigating and modifying XML documents Hierarchical tree representation of documents DOM is a language-neutral specification -- Bindings exist for Java, C++, CORBA, JavaScript DOM versions -- DOM 1.0 (1998) -- DOM 2.0 Core Specification (2000) -- Official website for DOM http://www.w3c.org/dom/ )HEUXDU\ 1 CSC309 Tutorial -- DOM 2 DOM Advantages & Disadvantage Java API for XML Parsing (JAXP) Advantage -- Robust API for the DOM tree -- Relatively simple to modify the data structure and extract data Disadvantage -- Stores the entire document in memory -- As DOM was written for any language, method naming conversions don t follow standard Java programming conventions JAXP provides a vendor-neutral interface to the underlying DOM or SAX parser ( http://java.sun.com/xml/jaxp/dist/1.1/docs/api/index.html ) DOM -- You can convert an XML document into a collection of objects -- You can visit any part of the data. -- You can then modify the data, remove it, or insert new data. -- Suitable for small documents -- Easily modify document -- Memory intensive; load the complete XML document SAX (Simple API for XML) -- Suitable for large documents; saves significant amounts of memory. -- Only traverse document once, start to end -- Event driven -- Limited standard functions CSC309 Tutorial -- DOM 3 CSC309 Tutorial -- DOM 4
Steps for DOM Parsing Set CLASSPATH and Import Packages Invoke the parser to create a document representing an XML document Normalize the tree Obtain the root node of the tree Examine and modify properties of the node Step 1: Set CLASSPATH and Import Packages // On CDF the standard interface to the parser (JAXP) and the Xerces // parser itself are both contained in the file /u/csc309h/lib/xerces.jar import javax.xml.parsers.*; // This is the API to navigate an XML document called the 'dom. // An implementation is contained in the file /u/csc309h/lib/saxon.jar import org.w3c.dom.*; Xerces: XML parser developed by Apache XML project. It implements standard APIs such as JAXP. SAXON: collections of tools for processing XML document. CLASSPATH = '.:/u/csc309h/lib/saxon.jar:/u/csc309h/lib/xerces.jar javac -classpath $CLASSPATH Test.java java -classpath $CLASSPATH Test or setenv CLASSPATH for xerces.jar and saxon.jar in your.cshrc file javac Test.java java Test CSC309 Tutorial -- DOM 5 CSC309 Tutorial -- DOM 6 Step 2: Create a JAXP Document Builder // A design pattern called "Factory" which will dynamically // find an appropriate class to parse the xml file and create // an im-memory DOM model. DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); // Use the factory to find a DOM document builder. DocumentBuilder dombuilder = factory.newdocumentbuilder(); Step 3: Invoke the Parser to Create a Document // A design pattern called "Builder" is used to take care // of all of the details of reading an XML file, parsing it, // and creating an in-memory DOM for it. // It returns DOM-standardized object that references the // entire document Document doc = dombuilder.parse(new java.io.file(args[0])); First create an instance of a builder factory, then use that to create a DocumentBuilder object A builder is basically a wrapper around a specific XML parser. Call the parser method of the DocumentBuilder, supplying an XML document (input stream). The Document class represents the parsed results in a tree structure CSC309 Tutorial -- DOM 7 CSC309 Tutorial -- DOM 8
Step 4: Normalize the Teee Step 5: Obtain the Root Node of the Tree Normalization has two affects: -- Combines textual nodes that span multiple lines -- Eliminates empty textural nodes doc.getdocmentelement().normalize(); Traversing and modifying the tree begins at the root node Element rootelement = doc.getdocumentelement(); -- An Element is a subclass of the more general Node class and represents an XML element -- A Node represents all the various components of an XML document Document, Element, Attribute, Entity, Text, CDATA, Processing Instruction, Comment, etc. CSC309 Tutorial -- DOM 9 CSC309 Tutorial -- DOM 10 Step 6: Examine and Modify Properties of the Node Step 6: Examine and Modify Properties of the Node (cont d) Examine the various node properties getnodename -- Returns the name of the node getnodetype -- Returns the node type -- Compare to Node constants DOCUMENT_NODE, ELEMENT_NODE, etc. getattributes -- Returns a NameNodeMap (collection of Nodes,each representing an attribute) getchildnodes -- Returns a NodeList colleciton of all the children Modify the document setnodevalue Assigns the text value of the node appendchild Adds a new node to the list of children removechild Removes the child node from the list of children replacechild Replace a child with a new node CSC309 Tutorial -- DOM 11 CSC309 Tutorial -- DOM 12
Node Attr CDATASection Comment Document documentfragment DocumentType Element Entity Entity Reference Notation ProcessingInstruc. Text DOM Node Types name of att. #data-section #comment #document #document-frag. doc. Type name tag name entity name nameofentityref notation name Target #text NodeName() NodeValue() value of att. Content Content entire content exc. Target Content attributes namedno demap nodetype() 2 4 8 9 11 10 1 6 5 12 7 3 DOM Node Type -- Named Constants Node Type Named Constant 1 ELEMENT_NODE 2 ATTRIBUTE_NODE 3 TEXT_NODE 4 CDATA_SECTION_NODE 5 ENTITY_REFERENCE_NODE 6 ENTITY_NODE 7 PROCESSING_INSTRUCTION_NODE 8 COMMENT_NODE 9 DOCUMENT_NODE 10 DOCUMENT_TYPE_NODE 11 DOCUMENT_FRAGMENT_NODE 12 NOTATION_NODE CSC309 Tutorial -- DOM 13 CSC309 Tutorial -- DOM 14 Example -- DOM Node Type Example -- DOM Node Type (cont d) // walk the DOM tree and print as you go public void walk(node node) int type = node.getnodetype(); switch(type) case Node.DOCUMENT_NODE: System.out.println("<?xml version=\"1.0\" encoding=\""+ "UTF-8 + "\"?>"); //end of document case Node.ELEMENT_NODE: System.out.print('<' + node.getnodename() ); NamedNodeMap nnm = node.getattributes(); if (nnm!= null ) int len = nnm.getlength() ; Attr attr; for ( int i = 0; i < len; i++ ) attr = (Attr)nnm.item(i); System.out.print(' ' + attr.getnodename() + "=\" + attr.getnodevalue() + '"' ); System.out.print('>'); //end of element CSC309 Tutorial -- DOM 15 CSC309 Tutorial -- DOM 16
Example -- DOM Node Type (cont d) Example -- DOM Node Type (cont d) case Node.ENTITY_REFERENCE_NODE: System.out.print('&' + node.getnodename() + ';' ); //end of entity case Node.CDATA_SECTION_NODE: System.out.print( "<![CDATA[" + node.getnodevalue() + "]]>" ); case Node.TEXT_NODE: System.out.print(node.getNodeValue()); //end of switch CSC309 Tutorial -- DOM 17 // recurse for(node child = node.getfirstchild(); child!= null; child = child.getnextsibling()) walk(child); //without this the ending tags will miss if ( type == Node.ELEMENT_NODE ) System.out.print("</" + node.getnodename() + ">"); //end of walk CSC309 Tutorial -- DOM 18 A Complete Example A Complete Example (cont d) Input file: <?xml version="1.0" encoding="iso-8859-1"?> <students> <student id="980912987"> <first>john</first> <last>smith</last> <department>computer Science</department> </student> <student id="975654132"> <first>bill</first> <last>wong</last> <department>mathematics</department> </student> </students> CSC309 Tutorial -- DOM 19 CSC309 Tutorial -- DOM 20
A Complete Example (cont d) Output: student id: 980912987 first: John last: Smith department: Computer Science student id: 975654132 first: Bill last: Wong department: Mathematics A Complete Example (cont d) import javax.xml.parsers.*; import org.w3c.dom.*; // Test xerces and saxom. public class SaxonTest // Parameter is the name of an xml file to parse. public static void main(string args[]) try DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder dombuilder = factory.newdocumentbuilder(); Document doc = dombuilder.parse(new java.io.file(args[0])); Element students = doc.getdocumentelement(); students.normalize(); NodeList studentlist = students.getelementsbytagname("student"); CSC309 Tutorial -- DOM 21 CSC309 Tutorial -- DOM 22 A Complete Example (cont d) for(int i=0; i<studentlist.getlength(); i++) Node student = studentlist.item(i); System.out.println(student.getNodeName()); System.out.println(" id: " + ((Element)student).getAttribute("id")); NodeList childlist = student.getchildnodes(); for(int j=1; j<childlist.getlength(); j+=2) Node child = childlist.item(j); Node leaf = child.getfirstchild(); System.out.println(" " + child.getnodename() + ": " + leaf.getnodevalue()); catch(exception e) System.err.println(e); CSC309 Tutorial -- DOM 23