Document Object Model (DOM) Java API for XML Parsing (JAXP) DOM Advantages & Disadvantage &6&7XWRULDO (GZDUG;LD

Similar documents
Web architectures Laurea Specialistica in Informatica Università di Trento. DOM architecture

Processing XML with Java. XML Examples. Parsers. XML-Parsing Standards. XML Tree Model. Representation and Management of Data on the Internet

XML APIs. Web Data Management and Distribution. Serge Abiteboul Philippe Rigaux Marie-Christine Rousset Pierre Senellart

Document Object Model (DOM)

XML in the Development of Component Systems. The Document Object Model

3) XML and Java. XML technology makes the information exchange possible, and Java technology makes automation feasible.

XML. Technical Talk. by Svetlana Slavova. CMPT 842, Feb

Needed for: domain-specific applications implementing new generic tools Important components: parsing XML documents into XML trees navigating through

SAX & DOM. Announcements (Thu. Oct. 31) SAX & DOM. CompSci 316 Introduction to Database Systems

Technical University of Braunschweig. Institute of Operating Systems and Networks

Web Technologies. XML data processing (I) DOM (Document Object Model) Dr. Sabin Buraga profs.info.uaic.ro/~busaco/

Marco Ronchetti - Java XML parsing J0 1

Parsing XML documents. DOM, SAX, StAX

Part IV. DOM Document Object Model

languages for describing grammar and vocabularies of other languages element: data surrounded by markup that describes it

[MS-DOM1X]: Microsoft XML Document Object Model (DOM) Level 1 Standards Support

[MS-DOM1]: Internet Explorer Document Object Model (DOM) Level 1 Standards Support Document

Part IV. DOM Document Object Model. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 70

Java and XML. XML documents consist of Elements. Each element will contains other elements and will have Attributes. For example:

XML for Java Developers G Session 3 - Main Theme XML Information Modeling (Part I) Dr. Jean-Claude Franchitti

INTERNET & WEB APPLICATION DEVELOPMENT SWE 444. Fall Semester (081) Module 4 (VII): XML DOM

XML An API Persepctive. Context. Overview

The Document Object Model (DOM) is a W3C standard. It defines a standard for accessing documents like HTML and XML.

XML for Java Developers G Session 3 - Main Theme XML Information Modeling (Part I) Dr. Jean-Claude Franchitti

The DOM approach has some obvious advantages:

XML Extensible Markup Language

Chapter 11 Objectives

Request for Comments: 2803 Category: Informational IBM April Digest Values for DOM (DOMHASH) Status of this Memo

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Understanding DOM. Presented by developerworks, your source for great tutorials ibm.com/developerworks

Introduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p.

XPath Basics. Mikael Fernandus Simalango

EDA095 extensible Markup Language

Lecture 6 DOM & SAX. References: XML How to Program, Ch 8 & /3/3 1

SYBEX Web Appendix. DOM Appendix: The Document Object Model, Level 1

An Introduction to XML

Handouts. 2 Handouts for today! Manu Kumar. Recap. Today. Today: Files and Streams (Handout #26) Streams!?? #27: XML #28: SAX XML Parsing

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

XML extensible Markup Language

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1

Document Object Model (DOM) A brief introduction. Overview of DOM. .. DATA 301 Introduction to Data Science Alexander Dekhtyar..

extensible Markup Language (XML) Announcements Sara Sprenkle August 1, 2006 August 1, 2006 Assignment 6 due Thursday Project 2 due next Wednesday

CS193j, Stanford Handout #29 XML. Suppose you have a bunch of dots (x,y pairs) you need to represent in a program for processing.

CSC System Development with Java Working with XML

Simple API for XML (SAX)

Semi-structured Data: Programming. Introduction to Databases CompSci 316 Fall 2018

The concept of DTD. DTD(Document Type Definition) Why we need DTD

Part V. SAX Simple API for XML

Session 17. JavaScript Part 2. W3C DOM Reading and Reference. Background and introduction.

Part V. SAX Simple API for XML. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 84

1. Introduction to API

JAXP: Beyond XML Processing

Understanding DOM. Presented by developerworks, your source for great tutorials ibm.com/developerworks

Chapter 13 XML: Extensible Markup Language

PROCESSING NON-XML SOURCES AS XML. XML Amsterdam Alain Couthures - agencexml 06/11/2015

XML Programming in Java

XML Parsers. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Processing XML Documents with DOM Using BSF4ooRexx

CSC Web Technologies, Spring Web Data Exchange Formats

XML: Tools and Extensions

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Practical 5: Reading XML data into a scientific application

XML: Tools and Extensions

Knowledge Engineering pt. School of Industrial and Information Engineering. Test 2 24 th July Part II. Family name.

9.3.5 Accessing the Vote Service

SourceGen Project. Daniel Hoberecht Michael Lapp Kenneth Melby III

What is XML? XML is designed to transport and store data.

Introduction to XML. XML: basic elements

Intro to XML. Borrowed, with author s permission, from:

Data Presentation and Markup Languages

Web Technologies. XML data processing (I) DOM (Document Object Model) Dr. Sabin Buraga profs.info.uaic.ro/~busaco/

XML Databases 4. XML Processing,

Using the MCP XMLPARSER. Using the MCP XMLPARSER

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Introduction to XML. M2 MIA, Grenoble Université. François Faure

Agenda. Summary of Previous Session. XML for Java Developers G Session 6 - Main Theme XML Information Processing (Part II)

1 <?xml encoding="utf-8"?> 1 2 <bubbles> 2 3 <!-- Dilbert looks stunned --> 3

Author: Irena Holubová Lecturer: Martin Svoboda

SAX Simple API for XML

CS193k, Stanford Handout #17. Advanced

Object Oriented Programming and Internet Application Development. Unit 8 XML and the Semantic Web. What is XML?

SOAP with Attachments API for Java (SAAJ) 1.3

DOM Interface subset 1/ 2

Document Parser Interfaces. Tasks of a Parser. 3. XML Processor APIs. Document Parser Interfaces. ESIS Example: Input document

4. XML Processing. XML Databases 4. XML Processing, The XML Processing Model. 4.1The XML Processing Model. 4.1The XML Processing Model

Java and the Apache XML Project

References between Mapping Programs in SAP-XI/PI as of Release 7.0 and 7.1

Application Note AN Copyright InduSoft Systems LLC 2006

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

Document Object Model (DOM) Level 3 Load and Save

SDN Community Contribution

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Accessing XML Data from an Object-Relational Mediator Database

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

Googles Approach for Distributed Systems. Slides partially based upon Majd F. Sakr, Vinay Kolar, Mohammad Hammoud and Google PrototBuf Tutorial

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

XML: Extensible Markup Language

4 SAX. XML Application. SAX Parser. callback table. startelement! startelement() characters! 23

Data Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html

Index. Symbols "" (double quotes) handling in XML, 76

Transcription:

&6&7XWRULDO '20 (GZDUG;LD Document Object Model (DOM) DOM Supports navigating and modifying XML documents Hierarchical tree representation of documents DOM is a language-neutral specification -- Bindings exist for Java, C++, CORBA, JavaScript DOM versions -- DOM 1.0 (1998) -- DOM 2.0 Core Specification (2000) -- Official website for DOM http://www.w3c.org/dom/ )HEUXDU\ 1 CSC309 Tutorial -- DOM 2 DOM Advantages & Disadvantage Java API for XML Parsing (JAXP) Advantage -- Robust API for the DOM tree -- Relatively simple to modify the data structure and extract data Disadvantage -- Stores the entire document in memory -- As DOM was written for any language, method naming conversions don t follow standard Java programming conventions JAXP provides a vendor-neutral interface to the underlying DOM or SAX parser ( http://java.sun.com/xml/jaxp/dist/1.1/docs/api/index.html ) DOM -- You can convert an XML document into a collection of objects -- You can visit any part of the data. -- You can then modify the data, remove it, or insert new data. -- Suitable for small documents -- Easily modify document -- Memory intensive; load the complete XML document SAX (Simple API for XML) -- Suitable for large documents; saves significant amounts of memory. -- Only traverse document once, start to end -- Event driven -- Limited standard functions CSC309 Tutorial -- DOM 3 CSC309 Tutorial -- DOM 4

Steps for DOM Parsing Set CLASSPATH and Import Packages Invoke the parser to create a document representing an XML document Normalize the tree Obtain the root node of the tree Examine and modify properties of the node Step 1: Set CLASSPATH and Import Packages // On CDF the standard interface to the parser (JAXP) and the Xerces // parser itself are both contained in the file /u/csc309h/lib/xerces.jar import javax.xml.parsers.*; // This is the API to navigate an XML document called the 'dom. // An implementation is contained in the file /u/csc309h/lib/saxon.jar import org.w3c.dom.*; Xerces: XML parser developed by Apache XML project. It implements standard APIs such as JAXP. SAXON: collections of tools for processing XML document. CLASSPATH = '.:/u/csc309h/lib/saxon.jar:/u/csc309h/lib/xerces.jar javac -classpath $CLASSPATH Test.java java -classpath $CLASSPATH Test or setenv CLASSPATH for xerces.jar and saxon.jar in your.cshrc file javac Test.java java Test CSC309 Tutorial -- DOM 5 CSC309 Tutorial -- DOM 6 Step 2: Create a JAXP Document Builder // A design pattern called "Factory" which will dynamically // find an appropriate class to parse the xml file and create // an im-memory DOM model. DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); // Use the factory to find a DOM document builder. DocumentBuilder dombuilder = factory.newdocumentbuilder(); Step 3: Invoke the Parser to Create a Document // A design pattern called "Builder" is used to take care // of all of the details of reading an XML file, parsing it, // and creating an in-memory DOM for it. // It returns DOM-standardized object that references the // entire document Document doc = dombuilder.parse(new java.io.file(args[0])); First create an instance of a builder factory, then use that to create a DocumentBuilder object A builder is basically a wrapper around a specific XML parser. Call the parser method of the DocumentBuilder, supplying an XML document (input stream). The Document class represents the parsed results in a tree structure CSC309 Tutorial -- DOM 7 CSC309 Tutorial -- DOM 8

Step 4: Normalize the Teee Step 5: Obtain the Root Node of the Tree Normalization has two affects: -- Combines textual nodes that span multiple lines -- Eliminates empty textural nodes doc.getdocmentelement().normalize(); Traversing and modifying the tree begins at the root node Element rootelement = doc.getdocumentelement(); -- An Element is a subclass of the more general Node class and represents an XML element -- A Node represents all the various components of an XML document Document, Element, Attribute, Entity, Text, CDATA, Processing Instruction, Comment, etc. CSC309 Tutorial -- DOM 9 CSC309 Tutorial -- DOM 10 Step 6: Examine and Modify Properties of the Node Step 6: Examine and Modify Properties of the Node (cont d) Examine the various node properties getnodename -- Returns the name of the node getnodetype -- Returns the node type -- Compare to Node constants DOCUMENT_NODE, ELEMENT_NODE, etc. getattributes -- Returns a NameNodeMap (collection of Nodes,each representing an attribute) getchildnodes -- Returns a NodeList colleciton of all the children Modify the document setnodevalue Assigns the text value of the node appendchild Adds a new node to the list of children removechild Removes the child node from the list of children replacechild Replace a child with a new node CSC309 Tutorial -- DOM 11 CSC309 Tutorial -- DOM 12

Node Attr CDATASection Comment Document documentfragment DocumentType Element Entity Entity Reference Notation ProcessingInstruc. Text DOM Node Types name of att. #data-section #comment #document #document-frag. doc. Type name tag name entity name nameofentityref notation name Target #text NodeName() NodeValue() value of att. Content Content entire content exc. Target Content attributes namedno demap nodetype() 2 4 8 9 11 10 1 6 5 12 7 3 DOM Node Type -- Named Constants Node Type Named Constant 1 ELEMENT_NODE 2 ATTRIBUTE_NODE 3 TEXT_NODE 4 CDATA_SECTION_NODE 5 ENTITY_REFERENCE_NODE 6 ENTITY_NODE 7 PROCESSING_INSTRUCTION_NODE 8 COMMENT_NODE 9 DOCUMENT_NODE 10 DOCUMENT_TYPE_NODE 11 DOCUMENT_FRAGMENT_NODE 12 NOTATION_NODE CSC309 Tutorial -- DOM 13 CSC309 Tutorial -- DOM 14 Example -- DOM Node Type Example -- DOM Node Type (cont d) // walk the DOM tree and print as you go public void walk(node node) int type = node.getnodetype(); switch(type) case Node.DOCUMENT_NODE: System.out.println("<?xml version=\"1.0\" encoding=\""+ "UTF-8 + "\"?>"); //end of document case Node.ELEMENT_NODE: System.out.print('<' + node.getnodename() ); NamedNodeMap nnm = node.getattributes(); if (nnm!= null ) int len = nnm.getlength() ; Attr attr; for ( int i = 0; i < len; i++ ) attr = (Attr)nnm.item(i); System.out.print(' ' + attr.getnodename() + "=\" + attr.getnodevalue() + '"' ); System.out.print('>'); //end of element CSC309 Tutorial -- DOM 15 CSC309 Tutorial -- DOM 16

Example -- DOM Node Type (cont d) Example -- DOM Node Type (cont d) case Node.ENTITY_REFERENCE_NODE: System.out.print('&' + node.getnodename() + ';' ); //end of entity case Node.CDATA_SECTION_NODE: System.out.print( "<![CDATA[" + node.getnodevalue() + "]]>" ); case Node.TEXT_NODE: System.out.print(node.getNodeValue()); //end of switch CSC309 Tutorial -- DOM 17 // recurse for(node child = node.getfirstchild(); child!= null; child = child.getnextsibling()) walk(child); //without this the ending tags will miss if ( type == Node.ELEMENT_NODE ) System.out.print("</" + node.getnodename() + ">"); //end of walk CSC309 Tutorial -- DOM 18 A Complete Example A Complete Example (cont d) Input file: <?xml version="1.0" encoding="iso-8859-1"?> <students> <student id="980912987"> <first>john</first> <last>smith</last> <department>computer Science</department> </student> <student id="975654132"> <first>bill</first> <last>wong</last> <department>mathematics</department> </student> </students> CSC309 Tutorial -- DOM 19 CSC309 Tutorial -- DOM 20

A Complete Example (cont d) Output: student id: 980912987 first: John last: Smith department: Computer Science student id: 975654132 first: Bill last: Wong department: Mathematics A Complete Example (cont d) import javax.xml.parsers.*; import org.w3c.dom.*; // Test xerces and saxom. public class SaxonTest // Parameter is the name of an xml file to parse. public static void main(string args[]) try DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder dombuilder = factory.newdocumentbuilder(); Document doc = dombuilder.parse(new java.io.file(args[0])); Element students = doc.getdocumentelement(); students.normalize(); NodeList studentlist = students.getelementsbytagname("student"); CSC309 Tutorial -- DOM 21 CSC309 Tutorial -- DOM 22 A Complete Example (cont d) for(int i=0; i<studentlist.getlength(); i++) Node student = studentlist.item(i); System.out.println(student.getNodeName()); System.out.println(" id: " + ((Element)student).getAttribute("id")); NodeList childlist = student.getchildnodes(); for(int j=1; j<childlist.getlength(); j+=2) Node child = childlist.item(j); Node leaf = child.getfirstchild(); System.out.println(" " + child.getnodename() + ": " + leaf.getnodevalue()); catch(exception e) System.err.println(e); CSC309 Tutorial -- DOM 23