Parallel/Distributed Databases XML

Similar documents
Introduction. " Documents have tags giving extra information about sections of the document

Introduction. " Documents have tags giving extra information about sections of the document

XML: extensible Markup Language

EMERGING TECHNOLOGIES

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

ADT 2005 Lecture 7 Chapter 10: XML

<account_number> A-101 </account_number># <branch_name> Downtown </branch_name># <balance> 500 </balance>#

XML. Structure of XML Data XML Document Schema Querying and Transformation Application Program Interfaces to XML Storage of XML Data XML Applications

XML Origin and Usages

Lecture 7 Introduction to XML Data Management

10.3 Answer: 10.5 Answer:

XML Basics for Web Services

XML. Solutions to Practice Exercises

ADT XML, XPath & XQuery

Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/

Chapter 7 - XML. Defined by the WWW Consortium (W3C) Originally intended as a document markup language, not a database language

Introduction to Semistructured Data and XML. Contents

Querying and Transforming XML Data Chapter5 Contents

XQUERY: an XML Query Language

Keep relation as the fundamental. Compare with ëobject-oriented DBMS," which. and tacks on relations as one of many types.

FUNCTIONAL MARKUP For instance, with functional markup, text representing section headings. INTRODUCTION This part of chapter includes

Chapter 13 - XML. Defined by the WWW Consortium (W3C) Originally intended as a document markup language, not a database language

Contains slides made by Naci Akkøk, Pål Halvorsen, Arthur M. Keller, Vera Goebel

Digital Asset Management 3. Multimedia Database System

Keep relation as the fundamental. Compare with ëobject-oriented DBMS," which. and tacks on relations as one of many types.

CSCI3030U Database Models

Relational Data Model is quite rigid. powerful, but rigid.

CS145 Introduction. About CS145 Relational Model, Schemas, SQL Semistructured Model, XML

Introduction to XML. XML: basic elements

IBM WebSphere software platform for e-business

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

XML. Document Type Definitions. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta

XML Metadata Standards and Topic Maps

Structured documents

XML Databases. 孟小峰 信息学院 2014/4/15 数据库系统概论讲义,XML 数据库,2014,2

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

Chapter 2 XML, XML Schema, XSLT, and XPath

XML and information exchange. XML extensible Markup Language XML

Database Management Systems (CPTR 312)

XML. Document Type Definitions XML Schema. Database Systems and Concepts, CSCI 3030U, UOIT, Course Instructor: Jarek Szlichta

Introduction to XML (Extensible Markup Language)

Several major software companies including IBM, Informix, Microsoft, Oracle, and Sybase have all released object-relational versions of their

Chapter 11 XML Data Modeling. Recent Development for Data Models 2016 Stefan Deßloch

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

Semistructured data, XML, DTDs

XML: Extensible Markup Language

XML. Objectives. Duration. Audience. Pre-Requisites

XML. Rodrigo García Carmona Universidad San Pablo-CEU Escuela Politécnica Superior

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

COMP9321 Web Application Engineering

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

XML. Jonathan Geisler. April 18, 2008

COMP9321 Web Application Engineering

Introduction to Databases CS348

EXtensible Markup Language XML

Chapter A: Network Model

Lecture 23 Database System Architectures

Basic Concepts. Chapter A: Network Model. Cont.) Data-Structure Diagrams (Cont( Data-Structure Diagrams. Basic Concepts

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

11. EXTENSIBLE MARKUP LANGUAGE (XML)

ODL Design to Relational Designs Object-Relational Database Systems (ORDBS) Extensible Markup Language (XML)

Introduction to Database Systems CSE 414

Chapter 2 The relational Model of data. Relational model introduction

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Query Languages for XML

COMP9321 Web Application Engineering

Chapter 4: SQL. Basic Structure

Semistructured Data and XML

Overview. DBMS Tasks Historic Development

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Data Presentation and Markup Languages

Database System Concepts

Chapter 20: Database System Architectures

Chapter 1: Introduction

Chapter 3: SQL. Chapter 3: SQL

CMSC 424 Database design Lecture 4: Relational Model ER to Relational model. Book: Chap. 2 and 6. Mihai Pop

Chapter 1: Semistructured Data Management XML

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

SQL DATA DEFINITION LANGUAGE

1.264 Lecture 13 XML

SQL DATA DEFINITION LANGUAGE

Chapter 2: Relational Model

Chapter 1: Semistructured Data Management XML

Chapter B: Hierarchical Model

Relational Algebra. Relational Algebra. 7/4/2017 Md. Golam Moazzam, Dept. of CSE, JU

Introduction to Semistructured Data and XML

SQL DATA DEFINITION LANGUAGE

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition

Chapter 13 XML: Extensible Markup Language

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

SQL STORED ROUTINES. CS121: Relational Databases Fall 2017 Lecture 9

The XML Metalanguage

Querying XML. COSC 304 Introduction to Database Systems. XML Querying. Example DTD. Example XML Document. Path Descriptions in XPath

Introduction to XSLT. Version 1.0 July nikos dimitrakas

Introduction to SQL SELECT-FROM-WHERE STATEMENTS SUBQUERIES DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

Transcription:

Parallel/Distributed Databases XML Mihai Pop CMSC424 most slides courtesy of Amol Deshpande

Project due today Admin Sign up for demo, if you haven't already myphpbib.sourceforge.net - example publication DB and API

SQL injection (security) http://www.securiteam.com/securityreviews/5dp0n1p76e.html

Topics Today Database system architectures (Chap. 20) Client-server Parallel and Distributed Systems (Chap. 20, 21, 22) Object Oriented, Object Relational (Chap. 9) XML (Chap. 10) Next class Data warehouses, Information Retrieval, Database Tuning?

Database System Architectures Centralized single-user Client-Server Architectures Connected over a network typically Back-end: manages the database Front-end(s): Forms, report-writes, sqlplus How they talk to each other? ODBC: Interface standard for talking to the server in C JDBC: In Java Transaction servers vs. data servers

Database System Architectures

Parallel Databases Why? More transactions per second, or less time per query Throughput vs. Response Time Speedup vs. Scaleup Database operations are embarrassingly parallel E.g. Consider a join between R and S on R.b = S.b But, perfect speedup doesn t happen Start-up costs (starting 1000s of jobs is expensive) Interference (e.g. shared disk) Skew (not all jobs are the same size)

Parallel Databases Shared-nothing vs. shared-memory vs. shared-disk

Parallel Databases Shared Memory Shared Disk Shared Nothing Communication between processors Extremely fast Disk interconnect is very fast Over a LAN, so slowest Scalability? Not beyond 32 or 64 or so (memory bus is the bottleneck) Not very scalable (disk interconnect is the bottleneck) Very very scalable Notes Cache-coherency an issue Transactions complicated; natural faulttolerance. Distributed transactions are complicated (deadlock detection etc); Main use Low degrees of parallelism Not used very often Everywhere

Distributed Systems Over a wide area network Typically not done for performance reasons For that, use a parallel system Done because of necessity Imagine a large corporation with offices all over the world Also, for redundancy and for disaster recovery reasons Lot of headaches Especially if trying to execute transactions that involve data from multiple sites Keeping the databases in sync 2-phase commit for transactions uniformly hated Autonomy issues Even within an organization, people tend to be protective of their unit/department Locks/Deadlock management Works better for query processing Since we are only reading the data

Next Object oriented, Object relational, XML

Motivation Relational model: Clean and simple Great for much enterprise data But lot of applications where not sufficiently rich Multimedia, CAD, for storing set data etc Object-oriented models in programming languages Complicated, but very useful Smalltalk, C++, now Java Allow Complex data types Inheritance Encapsulation People wanted to manage objects in databases.

History In the 1980 s and 90 s, DB researchers recognized benefits of objects. Two research thrusts: OODBMS: extend C++ with transactionally persistent objects Niche Market CAD etc ORDBMS: extend Relational DBs with object features Much more common Efficiency + Extensibility SQL:99 support Postgres First ORDBMS Berkeley research project Became Illustra, became Informix, bought by IBM

Example Create User Defined Types (UDT) CREATE TYPE BarType AS ( name CHAR(20), addr CHAR(20) ); CREATE TYPE BeerType AS ( name CHAR(20), manf CHAR(20) ); CREATE TYPE MenuType AS ( bar REF BarType, beer REF BeerType, price FLOAT ); Create Tables of UDTs CREATE TABLE Bars OF BarType; CREATE TABLE Beers OF BeerType; CREATE TABLE Sells OF MenuType;

Example Querying: SELECT * FROM Bars; Produces tuples such as: BarType( Joe s Bar, Maple St. ) Another query: SELECT bb.name(), bb.addr() FROM Bars bb; Inserting tuples: SET newbar = BarType(); newbar.name( Joe s Bar ); newbar.addr( Maple St. ); INSERT INTO Bars VALUES(newBar);

Example UDT s can be used as types of attributes in a table CREATE TYPE AddrType AS ( street CHAR(30), city CHAR(20), zip INT ); CREATE TABLE Drinkers ( name CHAR(30), addr AddrType, favbeer BeerType ); Find the beers served by Joe: SELECT ss.beer()->name FROM Sells ss WHERE ss.bar()->name = Joe s Bar ;

An Alternative: OODBMS Persistent OO programming Imagine declaring a Java object to be persistent Everything reachable from that object will also be persistent You then write plain old Java code, and all changes to the persistent objects are stored in a database When you run the program again, those persistent objects have the same values they used to have! Solves the impedance mismatch between programming languages and query languages E.g. converting between Java and SQL types, handling rowsets, etc. But this programming style doesn t support declarative queries For this reason (??), OODBMSs haven t proven popular OQL: A declarative language for OODBMSs Was only implemented by one vendor in France (Altair)

OODBMS Currently a Niche Market Engineering, spatial databases, physics etc Main issues: Navigational access Programs specify go to this object, follow this pointer Not declarative Though advantageous when you know exactly what you want, not a good idea in general Kinda similar argument as network databases vs relational databases

Summary, cont. ORDBMS offers many new features but not clear how to use them! schema design techniques not well understood No good logical design theory for non-1st-normal-form! query processing techniques still in research phase a moving target for OR DBA s! OODBMS Has its advantages Niche market Lot of similarities to XML as well

XML Extensible Markup Language Derived from SGML (Standard Generalized Markup Language) Similar to HTML, but HTML is not extensible Extensible == can add new tags etc Emerging as the wire format (data interchange format)

XML <bank-1> <customer> <customer-name> Hayes </customer-name> <customer-street> Main </customer-street> <customer-city> Harrison </customer-city> <account> <account-number> A-102 </account-number> <branch-name> Perryridge </branch-name> <balance> 400 </balance> </account> <account> </account> </customer>.. </bank-1>

Attributes Elements can have attributes <account acct-type = checking > <account-number> A-102 </account-number> <branch-name> Perryridge </branch-name> <balance> 400 </balance> </account> Attributes are specified by name=value pairs inside the starting tag of an element An element may have several attributes, but each attribute name can only occur once <account acct-type = checking monthly-fee= 5 >

Attributes Vs. Subelements Distinction between subelement and attribute In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents In the context of data representation, the difference is unclear and may be confusing Same information can be represented in two ways <account account-number = A-101 >. </account> <account> <account-number>a-101</account-number> </account> Suggestion: use attributes for identifiers of elements, and use subelements for contents

Namespaces XML data has to be exchanged between organizations Same tag name may have different meaning in different organizations, causing confusion on exchanged documents Specifying a unique string as an element name avoids confusion Better solution: use unique-name:element-name Avoid using long unique names all over document by using XML Namespaces <bank Xmlns:FB= http://www.firstbank.com > <FB:branch> <FB:branchname>Downtown</FB:branchname> <FB:branchcity> </FB:branch> </bank> Brooklyn </FB:branchcity>

Document Type Definition (DTD) The type of an XML document can be specified using a DTD DTD constraints structure of XML data What elements can occur What attributes can/must an element have What subelements can/must occur inside each element, and how many times. DTD does not constrain data types All values represented as strings in XML DTD syntax <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) > Also XML Schema (not covered -read in book & online)

Bank DTD <!DOCTYPE bank [ <!ELEMENT bank ( ( account customer depositor)+)> <!ELEMENT account (account-number branch-name balance)> <! ELEMENT customer(customer-name customer-street customer-city)> <! ELEMENT depositor (customer-name account-number)> <! ELEMENT account-number (#PCDATA)> <! ELEMENT branch-name (#PCDATA)> <! ELEMENT balance(#pcdata)> <! ELEMENT customer-name(#pcdata)> <! ELEMENT customer-street(#pcdata)> <! ELEMENT customer-city(#pcdata)> ]>

IDs and IDREFs An element can have at most one attribute of type ID The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier An attribute of type IDREF must contain the ID value of an element in the same document

Bank DTD with Attributes Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[ <!ELEMENT account (branch, balance)> <!ATTLIST account account-number ID # REQUIRED owners IDREFS # REQUIRED> <!ELEMENT customer(customer-name, customer-street, custome-city)> <!ATTLIST customer customer-id ID # REQUIRED accounts IDREFS # REQUIRED> declarations for branch, balance, customer-name, customer-street and customer-city ]>

XML data with ID and IDREF attributes <bank-2> <account account-number= A-401 owners= C100 C102 > <branch-name> Downtown </branch-name> <balance> 500 </balance> </account> <customer customer-id= C100 accounts= A-401 > <customer-name>joe </customer-name> <customer-street> Monroe </customer-street> <customer-city> Madison</customer-city> </customer> <customer customer-id= C102 accounts= A-401 A-402 > <customer-name> Mary </customer-name> <customer-street> Erin </customer-street> </customer> </bank-2> <customer-city> Newark </customer-city>

Querying and Transforming XML Data Standard XML querying/translation languages XPath Simple language consisting of path expressions Forms a basic component of the next two XSLT Simple language designed for translation from XML to XML and XML to HTML XQuery An XML query language with a rich set of features

Tree Model of XML Data Query and transformation languages are based on a tree model of XML data bank-2 branch-name account balance customer [customer-id= C100, accounts= A-401 customer [..] Downtown 500

XPath /bank-2/customer/customer-name <customer-name>joe</customer-name> <customer-name>mary</customer-name> /bank-2/customer/customer-name/text( ) Joe Mary /bank-2/account[balance > 400] returns account elements with a balance value greater than 400 /bank-2/account[balance > 400]/@account-number returns the account numbers of those accounts with balance > 400

Functions in XPath /bank-2/account[customer/count() > 2] Returns accounts with > 2 customers Boolean connectives and and or and function not() can be used in predicates IDREFs can be referenced using function id() E.g. /bank-2/account/id(@owner) returns all customers referred to from the owners attribute of account elements.

More XPath Features // can be used to skip multiple levels of nodes E.g. /bank-2//customer-name Wild-cards finds any customer-name element anywhere under the /bank-2 element, regardless of the element in which it is contained. /bank-2/*/customer-name Match any element name

XSLT A stylesheet stores formatting options for a document, usually separately from document E.g. HTML style sheet may specify font colors and sizes for headings, etc. The XML Stylesheet Language (XSL) was originally designed for generating HTML from XML XSLT is a general-purpose transformation language Can translate XML to XML, and XML to HTML XSLT transformations are expressed using rules called templates Templates combine selection using XPath with construction of results

XSLT Templates Example of XSLT template with match and select part <xsl:template match= /bank-2/customer > <xsl:value-of select= customer-name /> </xsl:template> <xsl:template match= * /> The match attribute of xsl:template specifies a pattern in XPath Elements in the XML document matching the pattern are processed by the actions within the xsl:template element xsl:value-of selects (outputs) specified values (here, customername) For elements that do not match any template Attributes and text contents are output as is Templates are recursively applied on subelements The <xsl:template match= * /> template matches all elements that do not match any other template Used to ensure that their contents do not get output.

Creating XML Output Any text or tag in the XSL stylesheet that is not in the xsl namespace is output as is E.g. to wrap results in new XML elements. <xsl:template match= /bank-2/customer > <customer> <xsl:value-of select= customer-name /> </customer> </xsl:template> <xsl:template match= * /> Example output: <customer> Joe </customer> <customer> Mary </customer>

XQuery XQuery is a general purpose query language for XML data Currently being standardized by the World Wide Web Consortium (W3C) The textbook description is based on a March 2001 draft of the standard. The final version may differ, but major features likely to stay unchanged. Alpha version of XQuery engine available free from Microsoft XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL XQuery uses a for let where.. result syntax for SQL from where SQL where result SQL select let allows temporary variables, and has no equivalent in SQL

FLWR Syntax in XQuery For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath Simple FLWR expression in XQuery find all accounts with balance > 400, with each result enclosed in an <account-number>.. </account-number> tag for $x in /bank-2/account let $acctno := $x/@account-number where $x/balance > 400 return <account-number> $acctno </account-number> Let clause not really needed in this query, and selection can be done In XPath. Query can be written as: for $x in /bank-2/account[balance>400] return <account-number> $x/@account-number </account-number>

Joins Joins are specified in a manner very similar to SQL for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor where $a/account-number = $d/account-number and $c/customer-name = $d/customer-name return <cust-acct> $c $a </cust-acct> The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account-number = $a/account-number and customer-name = $c/customer-name] return <cust-acct> $c $a</cust-acct>

XML: Summary Becoming the standard for data exchange Many details still need to be worked out!! Active area of research Especially optimization/implementation Worst...idea...ever!