KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache

Similar documents
A Distributed Query Engine for XML-QL

COMP9321 Web Application Engineering

XML for Java Developers G Session 8 - Main Theme XML Information Rendering (Part II) Dr. Jean-Claude Franchitti

COMP9321 Web Application Engineering

IT6503 WEB PROGRAMMING. Unit-I

Tradeoffs and Guidelines for Selecting Technologies to Generate Web Content from Relational Data Stores

The Xlint Project * 1 Motivation. 2 XML Parsing Techniques

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Agenda. Summary of Previous Session. XML for Java Developers G Session 6 - Main Theme XML Information Processing (Part II)

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions

Introduction to XML. XML: basic elements

XML Parsers. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

<title> An XML based web service for an electronic logbook </title>

M359 Block5 - Lecture12 Eng/ Waleed Omar

By Chung Yeung Pang. The Cases to Tackle:

Simple Object Access Protocol (SOAP) Reference: 1. Web Services, Gustavo Alonso et. al., Springer

Extreme Java G Session 3 - Sub-Topic 5 XML Information Rendering. Dr. Jean-Claude Franchitti

WebCCA provides the following benefits :

A web application serving queries on renewable energy sources and energy management topics database, built on JSP technology

Module 3 Web Component

Alpha College of Engineering and Technology. Question Bank

COURSE DETAILS: CORE AND ADVANCE JAVA Core Java

Data Presentation and Markup Languages

Servlet Performance and Apache JServ

Java Training For Six Weeks

Distributed Object-Based Systems The WWW Architecture Web Services Handout 11 Part(a) EECS 591 Farnam Jahanian University of Michigan.

Agenda. Summary of Previous Session. XML for Java Developers G Session 7 - Main Theme XML Information Rendering (Part II)

XML: Extensible Markup Language

Writing Servlets and JSPs p. 1 Writing a Servlet p. 1 Writing a JSP p. 7 Compiling a Servlet p. 10 Packaging Servlets and JSPs p.

Shankersinh Vaghela Bapu Institue of Technology

X-S Framework Leveraging XML on Servlet Technology

COMP9321 Web Application Engineering

AJAX Programming Overview. Introduction. Overview

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

Integration Framework. Architecture

DOWNLOAD OR READ : XML AND XSL TWO 1 HOUR CRASH COURSES QUICK GLANCE PDF EBOOK EPUB MOBI

Extending CMIS Standard for XML Databases

11. EXTENSIBLE MARKUP LANGUAGE (XML)

> Semantic Web Use Cases and Case Studies

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

Servlets. How to use Apache FOP in a Servlet $Revision: $ Table of contents

PRIMIX SOLUTIONS. Core Labs. Tapestry : Java Web Components Whitepaper

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Lupin: from Web Services to Web-based Problem Solving Environments

Data Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html

Course Design, Representation and Browser for Web Based Education

Agent-Enabling Transformation of E-Commerce Portals with Web Services

How A Website Works. - Shobha

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data

KINGS COLLEGE OF ENGINEERING 1

Apache Wink Developer Guide. Draft Version. (This document is still under construction)

Parser Design. Neil Mitchell. June 25, 2004

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004

Querying XML Documents. Organization of Presentation

XML Processing & Web Services. Husni Husni.trunojoyo.ac.id

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Servlets. How to use Apache FOP in a Servlet $Revision: $ Table of contents

Chapter 2 XML, XML Schema, XSLT, and XPath

ReST 2000 Roy Fielding W3C

SAS Solutions for the Web: Static and Dynamic Alternatives Matthew Grover, S-Street Consulting, Inc.

Web Services & Axis2. Architecture & Tutorial. Ing. Buda Claudio 2nd Engineering Faculty University of Bologna

MythoLogic: problems and their solutions in the evolution of a project

Internet Application Developer

Getting Started With Squeeze Server

Questions and Answers:

This page discusses topic all around using FOP in a servlet environment. 2. Example Servlets in the FOP distribution

CT51 WEB TECHNOLOGY ALCCS-FEB 2014

Implementing XForms using interactive XSLT 3.0

Uniform Resource Locators (URL)

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications

Oracle9i Application Server Architecture and Com

ELECTRONIC LOGBOOK BY USING THE HYPERTEXT PREPROCESSOR

IT6801-SERVICE ORIENTED ARCHITECTURE

Towards XML-oriented Internet Management

Managing Application Configuration Data with CIM

Socket attaches to a Ratchet. 2) Bridge Decouple an abstraction from its implementation so that the two can vary independently.

CTI Short Learning Programme in Internet Development Specialist

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

XML Query (XQuery) Requirements

04 Webservices. Web APIs REST Coulouris. Roy Fielding, Aphrodite, chp.9. Chp 5/6

Architectural Patterns. Architectural Patterns. Layers: Pattern. Architectural Pattern Examples. Layer 3. Component 3.1. Layer 2

Architectural Patterns

1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

1 CUSTOM TAG FUNDAMENTALS PREFACE... xiii. ACKNOWLEDGMENTS... xix. Using Custom Tags The JSP File 5. Defining Custom Tags The TLD 6

CaptainCasa Enterprise Client. CaptainCasa Enterprise Client. CaptainCasa & Java Server Faces

Session 16. JavaScript Part 1. Reading

ESPRIT Project N Work Package H User Access. Survey

New Release for Rapid Application Development

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Types and Methods of Content Adaptation. Anna-Kaisa Pietiläinen

Type of Classes Nested Classes Inner Classes Local and Anonymous Inner Classes

CTI Higher Certificate in Information Systems (Internet Development)

Embedded Web Server Architecture for Web-based Element Management and Network Management

Full Stack Java Developer Course

XML Query Requirements

Anno Accademico Laboratorio di Tecnologie Web. Sviluppo di applicazioni web JSP

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

Rsyslog: going up from 40K messages per second to 250K. Rainer Gerhards

Transcription:

KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache Xidong Wang & Shiliang Hu {wxd, shiliang}@cs.wisc.edu Department of Computer Science, University of Wisconsin Madison 1. Introduction To study the exact operation process of web publishing systems, we installed and studied the Apache Web server, Jakarta-Tomcat Servlet Engine, Cocoon web publishing framework and the Kweelt Quilt processing mechanism. We partially replaced KSP (Kweelt in Cocoon) processor with Niagara 1.0 and realized an interesting KNSP, Kweelt and Niagara based Server Page processing scheme. Section 2 introduces the Cocoon web publishing framework and relevant background on the Apache web server, Jakarta-tomcat Servlet Engine and two important XML Query languages Quilt and XML-QL. Section 3 introduces the architecture and implementation of our KNSP, Kweelt-Niagara based Server Page system. Preliminary performance results and comparisons are presented in Sections 4. We conclude in section 5. 2. Cocoon Web Publishing Framework Cocoon is a web publishing framework that relies on new W3C Technologies (such as DOM, XML, and XSL) to provide web content. The Cocoon project aims to change the way web information is created, rendered and served. The new Cocoon paradigm is based on the fact that document content, style and logic are often created by different individuals or working groups. Cocoon aims for a complete separation of the three layers, allowing the three layers to be independently designed, created and managed, reducing management overhead, increasing work reuse and reducing time to market. As part of the Apache XML project, the cocoon web publishing system embeds a Quilt query processor called Kweelt inside Apache web server so that the web server can run queries on XML data sources. A rough image about the operation scenario of an XML-support web server, say the Apache web server with Cocoon inside, may look like this way: First, some sort of client, say a web browser or wireless palm, sends its requests in URI or POST form to a web server. The server routes this request to an appropriate processor or engine inside itself based on the URI request. For XML-related request, it will be dispatched to the Cocoon system, which functions at an engine level, or in other words it manages various actors such all kinds of producers, processors, formatters, interpreters or transformers. The request will be directed by Cocoon Engine to an appropriate producer to generate XML document, possibly embedded with Quilt queries or other kinds of embedded logic. This XML document will be further processed by various processors that the Cocoon system has based on the kinds of logic embedded in the XML document. This processing keeps until no further server end processing and including is needed. Each processor may process lots of XML data sources during this process to replace logic with content.

Then the result XML file, which has the user desired content, will be formatted with appropriate formatters, based on the desired style or the type of the user-agent, to certain formats, such as html file, PDF document or WML for mobile devices. Finally, the well-styled document for the client is sent back to the happy client. We talk briefly about the Apache web server and Jakarta-Tomcat Servlet Engine in $2.1, Cocoon publishing system in $2.2. We also introduce XML-QL and Quilt in $2.3 as the background of our NSP efforts. 2.1 Apache Web Server & Jakarta-Tomcat Servlet Engine When Apache starts, it loads the jserv module mod_jserv.so into its space, then spawns one main process and multiple (default 5) worker processes to serve client s request. Main process will listen for HTTP requests. When client send a HTTP request, main process will place the available socket onto a list of jobs. Then any one of worker process will take the top job off the job list and handle the connection to completion. If the request should be processed by Jserv engine, worker process makes use of mod_jserv.so, which is provided by Jserv engine, to connect with Jserv. Special protocol is designed and implemented for sending parameter and receiving answer. HTTP request Main process Jserv response worker mod_jserv.so Cocoon Apache Figure 1: apache and Jserv 2.2 The Framework of Cocoon Web Publishing System We use the module invocation relationship graph in Figure 1 to capture the framework of Cocoon web publishing system. As mentioned in $2.1, http requests are delivered from Apache processes to Jakarta-Tomcat JServ through the special communication protocol. Jakarta TCP Listener thread accepts http request connections and spawn an TcpConnectionThread to handle the request. Most of the modules in Figure 2 fit into this magic thread for the request processing. The leftmost module in Figure 2 stands for the TcpConnectionThread that runs the Jakarta-Tomcat Servlet Engine code. It uses HttpConnectionHandler to have ContextManager to deliver requests to the right Servlet processor wrapped in ServletWrapper interface. In our case, the request is to be delivered to the Cocoon publishing module. However, requests could also be routed to other Servlets. For example, JSP requests, which have URLs ended with *.jsp, are routed to Jasper JSP servlet. Cocoon is a processor engine in the sense that it manages various kinds of producers, processors, formatters, interpreters, transformers, cache managers etc. Producers generate XML document based on http request. Processors are invoked based the processing instructions embedded within the XML document and process the corresponding type of logic in the document and replace them with processing results. In this project, we are only interested in one kind of logic Quilt Queries embedded inside the

XML document. Therefore, we talk in detail about the Kweelt Quilt Processor in detail because it is the only Quilt Processor available for Cocoon by now. Formatters are used to render desired document style, which is out of our interests for now, so are other actors, like interpreter and transformer.! "$# %! % -DNDUWD 7 R P F DW -V H UY 6 H UY O H WV & R F R R Q ; 0 / S UR G X F H UV Niagara query engine 3 UR F H V V R UV DQ G I R UP DWWH UV Figure 2. Module Invocation Relationships Cocoon process Quilt logic in XML document by KweeltProcessor Quilt processor, which parses the Quilt Query with a parser generated by JavaCC. As we will see later that the parser specification file for JavaCC is main battlefield in this project. KweeltProcessor parsed Quilt query is in Kweelt s AST and will be evaluated recursively without generating a logical plan and optimizing the query plan. The functional programming style recursive evaluation mechanism is pretty clear and neat, but could be very inefficient. We partially replaced Kweelt Quilt processor in Cocoon with Niagara version 1. Jserv process G! H I"J# %! % =>7 3?;3;,63$@BA414,C9 DFE4?(7 E41 &(' )+*,.-0/122-, )434,657 348 8 9:,.10*;8 30/1< & R F R R Q Http 7 R P F DW -6 H UY Requests Apache Processes Jserv thread Niagara operator threads Figure 3. Processes/Threads Relationships Figure 3 show the process and thread relationship in the total architecture. Apache web server, front end of total system, is a bunch of process, in which one is main process for listening for requests and

others are worker process. Jserv as a whole is one process with many threads. Some of threads are Jserv thread for apache requests, others are Niagara operator threads started by KNSP. We can see from Figure 3 that Jserv, Cocoon, AST part of KSP and Niagara query engine part are resident in one thread, TcpConnectionThread. Niagara search engine is another process to provide some assistance for query engine. 2.3 XML Query Languages : XML-QL and Quilt, What different? According to current trend, there will be huge amount of web-accessible XML-files in the near future. How to query this up-coming World Wide Database becomes interesting research and development issues. There are many proposed XML query languages, for example, XML-QL, XQL and Quilt. One of the most important issues in these query languages is the navigation mechanism for the nested structure of the XML files. XML-QL was one of the earliest XML query language. A typical XML-QL query consists of a WHERE clause to specify data source and query conditions and a CONSTRUCT clause to specify the XML elements to be returned for the query. XML-QL uses patterns, similar to the XML element structure to specify both the navigation of query conditions and the pattern of query results. XML-QL supports wildcards to query XML files across the Internet. On the other hand, the new Quilt query language adopted the XPATH navigation method. To some extent, XPATH looks similar to the normal navigation method in a typical file system by specifying the next navigation direction and desired components. However, XPATH is much more flexible Currently, Quilt doesn t support wildcards to query across the Internet. Although Quilt borrows idea from many languages, its conceptual integrity comes from a deep reliance on the structure of XML, which is based on hierarchy, sequence, and reference. Quilt is able to express queries based on document structure and to produce query results that either preserve the original document structure or generate a new structure. Quilt can operate on flat structures, such as rows from relational databases, and generate hierarchies based on the information contained in these structures. It can also express queries based on parent/child relationships or document sequence, and can preserve these relationships or generate new ones in the output document. Although Quilt can be used to retrieve data from objects, relational databases, or other non-xml sources, this data must be expressed in an XML view, and Quilt relies solely on the structure of the XML view in its data model. Quilt is an XML query language that not only conforms to the W3C XML query language requirements, but also combines a lot of advantages of other existing XML query languages, hence, provides more flexibility and capability. Many people believe Quilt could become the standard of XML query language instead of XML-QL. 3. NSP Architecture and Implementation Our NSP processing is totally embedded inside the Cocoon framework. The only change we made is replacing the Kweelt Quilt query evaluation with the Niagara query processing. ' D W D 6 W U X F W X U H L Q. Z H H O W D Q G 1 L D J D U D Cocoon passes XML documents with Quilt queries to KSP processor, KweeltProcessor.. KweeltProcessor parses XML documents with certain XML parsers such Sun, IBM or Xerces parser, and extracts kweelt-query nodes out of the documents and processes them one by one. For each of the

kweelt-query node, KweeltProcessor calls QuiltParser to parse the query and generate the Quilt Abstract Syntax Tree. Originally KSP process the Quilt AST by evaluate it recursively. While more efforts are taken in Niagara in order to promote parallelism and improve performance. Logical plan is produced from Abstract Syntax Tree, Query optimization techniques are taken on logical plan to get physical plan. The physical operators in final physical plan are executed in parallel to achieve high performance. In order to get high performance by integrating Quilt parsing facility of Kweelt and Query optimization and evaluation advantage of Niagara, data structures in both sides must be mapped and integrated smoothly. Since there is no logical plan or physical plan in Kweelt, KNSP is designed to map AST of Kweelt to AST of Niagara, then all work afterwards such as generating logical plan form AST and optimizing logical plan to physical plan, are implemented in Niagara. Quilt query Kweelt Parser AST in Kweelt Mapping program AST in Niagara Niagara Logical plan generator Logical plan in Niagara Figure above shows that most of our work is mapping program whose input is AST in Kweelt and output is AST in Niagara. After that, instead of letting KSP to process it further, we pass the query containing AST of Niagara to the Niagara query engine for further processing. The Niagara query engine returns a stream of the query result. We replace the corresponding kweelt-query node with its query results for the content of the requested XML file. After all processing finish, we return the processed XML file back to the Cocoon Engine. Cocoon will process the XML file further and/or format it, e.g. with XSLT, and return the proper result media to the client to finish its URI request. 7 K H 0 D S S L Q J E H W Z H H Q. Z H H O W $ 6 7 1 L D J D U D $ 6 7 Kweelt parse the quilt query into a Syntax tree and this is realized by parsing quilt query recursively with Quilt Language Grammar, constructing one expression node for each step. Therefore those expression nodes in different level of recursion form a syntax tree. The type or content of expression node in syntax tree do not catch our attention, but the construction procedure of syntax tree really do since it reveal the parse structure of that query. The construction procedure is realized in QuiltParser.jj, a java parser specification document to be compiled by JavaCC package. Our work is to insert some Niagara related codes into some steps in syntax tree construction procedure, utilizing its parse procedure to produce Niagara syntax tree. We do not map directly between Kweelt AST and Niagara AST, instead we make use of the construction procedure of Kweelt AST to construct Niagara AST. The core of an XML-QL query consists of a WHERE clause that binds one or more variables, and a CONSTRUCT clause that uses the values of the bound variables to construct a new document. From XML-QL, Quilt borrows the concept of constructing a new document from bound variables, but Quilt uses a different paradigm for binding the variables and a richer data model to represent the result of the binding. In Niagara query about one document or one kind of documents is encapsulated in a inclause structure, whose source element indicates the source of documents to be evaluated, dtdtype element indicates the document type evaluated documents should conform to, and pattern element points to a pattern tree preserving the information about hierarchic and sequential relationships among the elements of the evaluated document. In syntax tree construction procedure step of Kweelt, we have to build all these data structures of Niagara. The recursive steps used to build Kweelt syntax tree is carefully studied and

some codes are injected. In the bottom level step of Kweelt to build document expression node, Niagara inclause object is built and its dtdtype and source element are assigned with information from Kweelt document expression node. When Kweelt parse Quilt s powerful XPATH expression, hierarchical structures of pattern tree in inclause object of Niagara is set up. When Kweelt make use For or Let clause to bind some variables, some nodes in pattern tree in inclause object of Niagara save this binding to its data members. Such kind of inclause construction happens when Kweelt build construction expressions to return result, since those return clauses also include some XPATH expressions. One thing we would like to mention is that Quilt is more powerful than XML-QL, so it is great possible that one Quilt query cannot be mapped to an equivalent XML-QL query. This belongs to difference of query language. Since this project mainly deals with feasibility of integration of Kweelt and Niagara and insert the result of integration into Cocoon as a processor, and demonstration of Niagara s performance advantage, we ignore the possible difference of two languages.. Z H H O W 1 L D J D U D, Q W H J U D W L R Q After work on QuiltParser.jj parser specification file, the Quilt query parser generated by JavaCC will return queries conform to the query data structure in the Niagara system. We added another methods, executequiltquery to the Niagara QueryEngine module. This method calls logical plan generator in Niagara to generate logical plan and then calls Niagara query optimizer to optimize the logical plan and generate the physical plan. The physical plan is submitted for execution by appending the plan to the query queue. After this point, the execution of the Quilt query in NSP is identical to the normal query processing in Niagara system. The query is scheduled by execution scheduler, among physical operator queues, and the execution scheduler also builds the streams that connect these physical operators so that they can be processed in the pipeline style to enhance efficiency. Physical operators are executed by physical operator threads. 4. Preliminary Performance and Structure Evaluation Query evaluation in Kweelt is very primitive. There is an eval function in each node of syntax tree and evaluation is executed directly in syntax tree recursively in top to bottom order. Since document objects can only be obtained in document expression node of bottom level, in fact actual evaluation work are executed in bottom to top order and results are returned in same way. No query optimization are taken and query is executed in single thread way. All intermediate results are kept in an environment object transferred in eval functions of different levels. When join operation is executed, tuples from one document are kept in environment, and for each tuple, query for the other documents are executed and that tuple are joined with result tuples of the second document. So it is actually a nested loop join. And since query for the second documents are executed multiple times, its performance will much lower than nested loop join, in which query for two documents are executed once and results are joined. And there is no optimization work in Kweelt. We can easily write a query in which there are scan and join related to one document and Kweelt will execute join first and scan next. If selection factor of scan is small, Kweelt will lose in performance again. Since Niagara is designed and implemented in parallel way, we can conclude that Niagara s performance must be much better than Kweelt, so KNSP will enjoy much better performance than KSP. Environment set up and result data..

5. Conclusions and Future Work 5 H I H U H Q F H V 1. Java and XML By Brett McLaughlin 1st Edition June 2000 2. Cocoon: http://xml.apache.org/cocoon/guide.html 3. Niagara: K WWS Z Z Z F V Z L V F H G X Q L DJ DUD 4. The Niagara Internet Query System. Jeffrey Naughton, David DeWitt, David Maier et. al.. 5. Kweelt: http://db.cis.upenn.edu/kweelt 6. Quilt: http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html 7. XML, XSL etc: K WWS Z Z Z Z F R UJ 8. XPath World Wide Web Consortium. XML Path Language (XPath) Version 1.0. W3C Recommendation. See http://www.w3.org/tr/xpath.html