Semistructured Data and Mediation
|
|
- Charleen Walters
- 5 years ago
- Views:
Transcription
1 Semistructured Data and Mediation Prof. Letizia Tanca Dipartimento di Elettronica e Informazione Politecnico di Milano
2 SEMISTRUCTURED DATA FOR THESE DATA THERE IS SOME FORM OF STRUCTURE, BUT IT IS NOT AS PRESCRIPTIVE REGULAR COMPLETE AS IN TRADITIONAL DBMSs EXAMPLES WEB DATA XML DATA BUT ALSO DATA DERIVED FROM THE INTEGRATION OF HETEROGENEOUS DATASOURCES 1
3 AN EXAMPLE OF SEMISTRUCTURED DATA a page produced form a database 2
4 AN EXAMPLE OF SEMISTRUCTURED DATA 3
5 AN EXAMPLE OF SEMISTRUCTURED DATA 4
6 AN EXAMPLE OF SEMISTRUCTURED DATA GRAPH-BASED REPRESENTATION: THE IRREGULAR DATA STRUCTURE APPEARS MORE CLEARLY PROFESSOR PROFESSOR NAME NAME TEACHES TEACHES AGE AGE 37 COURSE YEARS NAME? LAST NAME NAME KING ABSTRACT DATABASES OBJECTS 45 CONCRETE VALUES 5
7 A SIMPLE XML DOCUMENT WITH ITS GRAPH BASED REPRESENTATION <producer> <mn-name>mercury</mn-name> <year>1999</year> <model> <mo-name>sable LT</mo-name> <front-rating>3.84</front-rating> <side-rating>2.14</side-rating> <rank>9</rank> </model> </producer> PRODUCER mn-name year Mercury 1999 mo-name Sable LT front-rating MODEL.. side-rating rank
8 INFORMATION SEARCH IN SEMISTRUCTURED DATABASES WE WOULD LIKE TO: INTEGRATE QUERY COMPARE DATA WITH DIFFERENT STRUCTURES ALSO WITH SEMISTRUCTURED DATA, JUST AS IF THEY WERE ALL STRUCTURED 7
9 DYNAMIC INTEGRATION OF SEMISTRUCTURED DATABASES AN OVERALL DATA REPRESENTATION SHOULD BE PROGRESSIVELY BUILT, AS WE DISCOVER AND EXPLORE NEW INFORMATION SOURCES 8
10 SEMISTRUCTURED BASED ON TEXT TREES GRAPHS LABELED NODES LABELED ARCS BOTH DATA MODELS THEY ARE ALL DIFFERENT AND DO NOT LEND THEMSELVES TO EASY INTEGRATION 9
11 Recall: MEDIATORS The term mediation includes: the processing needed to make the interfaces work the knowledge structures that drive the transformations needed to transform data to information any intermediate storage that is needed (Wiederhold) 10
12 Mediator-based approach IN TSIMMIS: UNIQUE, GRAPH-BASED DATA MODEL DATA MODEL MANAGED BY THE MEDIATOR WRAPPERS FOR THE MODEL-TO- MODEL TRANSLATIONS 11
13 TSIMMIS APPLICATION 1 APPLICATION 2 (LOREL) (LOREL) (OEM) (OEM) MEDIATOR 1 MEDIATOR 2 WRAPPER 1 WRAPPER 2 WRAPPER 3 DATASOURCE 1 (RDBMS) DATASOURCE 2 (LOTUS NOTES) DATASOURCE 3 (WWW) 12
14 OEM (Object Exchange Model) (TSIMMIS) Graph-based Does not represent the schema Directly represents data : self-descriptive <temp-in-farenheit,int,80> 13
15 Object structure <Object-id,label,type,value> Nested structure 14
16 OEM (Object Exchange Model) (TSIMMIS) 15
17 Typical complications when integrating Each mediator is specialized into a certain domain (e.g. weather forecast), thus Each mediator must know domain metadata, which convey the data semantics On-line duplicate recognition and removal (no designer to solve conflicts at design time here!!) 16
18 Query formulation Find books authored by Aho e.g. Datalog-like: Booktitle(X) :- library ( book( author:aho, title:x )) OK, but if this query must be produced at run-time, how does the system know that: A node library exists, which contains nodes book, which in turn contain fields author and title TSIMMIS uses the Dataguide: a-posteriori schema, progressively built while exploring the data sources 17
19 TSIMMIS s language is LOREL Lightweight Object REpository Language Object-based Similar to OQL, with some modifications appropriate for semistructured data select library.book.title where library.book.author = Aho from library (if more than one root is available) 18
20 WRAPPERS (translators) Convert queries into queries/commands which are understandable for the specific data source they can extend the query possibilities of a data source Convert query results from the source s format to a format which is understandable for the application 19
21 WRAPPERS BookTitle Author Editor The HTML Sourcebook J. Graham... Computer Networks A. Tannenbaum... Wrapper Database Systems Data on the Web R. Elmasri, S. Navathe S. Abiteboul, P.Buneman, D.Suciu... HTML page database table(s) (or XML docs) 20
22 Example: estraction of information from Information extraction HTML docs Source Format: plain text with HTML tags (no semantics) Target Format: relational table (possibly nested, NF 2 ) or XML (we add structure, i.e. semantics ) Wrapper Software module which performs an extraction step Intuition: use extraction rules which exploit the marking tags 21
23 A complex extraction process 20-30KB IN HTML >10 attributes with nesting 22
24 Problems Web sites change very frequently A layout change may affect the extraction rules Human-based maintenance of an ad-hoc wrapper is very expensive Better: automatic wrapper generation 23
25 Automatic wrapper generation... We can only use them when pages are regular to some extent OK when: Many pages sharing the same structure e.g. pages are dynamically generated from a DB data intensive web sites 24
26 Online library Name John Smith Paul Jones Authors Id #a1 Database Primer #a2 Books Title Computer Systems Id Author XML at Work #b1 #a1 #b2 #a1 HTML and Scripts #b3 #a2 #b4 #a2 JavaScripts #b5 #a2 Description Id Name #e1 #b1 John Smith #e2 This book Paul Jones #e3 #e4 #e5 An undergraduate #e6 Title A comprehensive #e7 Database Primer Books Editions Book Editions Year Price Details Details Year Price $ First #b1 First Edition, 2000 Paperback 30$ 1998 Second 20$ #b $ First Second Edition, Hard Cover $ #b $ First #b $ null First Edition, Paperback $ #b $ Second Description #b5 First Edition, 2000 Paperback 50$ 1999 null 30$ Computer Systems null $ An useful HTML XML at Work Second Edition, Hard Cover $ HTML and Script A must in Java Script null $ Source Dataset HTML Pages 25
27 The ROAD RUNNER project Page Class It is the collection of all pages generated by the same script from a common dataset Schema Derivation Given a set of HTML sample pages, belonging to the same class, find the underlying dataset structure (database schema) Solution: Wrapper Generator Underlying dataset structure Extraction rules 26
28 B A C D F E G First Edition, 1998 Database This book John Primer Second 2000 Smith Edition, An Computer First Edition, undergraduate 1995 Input HTML Systems Sources A <HTML><BODY><TABLE> First Edition, XML at Work <HTML><BODY><TABLE> comprehensive 1999 <TR> <TR> <TD><FONT>books.com<FONT></TD> <TD><FONT>books.com<FONT></TD> null 1993 <TD><A>John Smith</A></TD> HTML and <TD><A>Paul An useful HTML Jones</A></TD> </TR> Paul Jones Second Scripts </TR> <TR> <TR> Edition, 1999 <TD><FONT>Database Primer<FONT></TD> <TD><FONT>XML at Work<FONT></TD> <TD><B>First Edition, Paperback</B></TD>, JavaScripts <TD><B>First A must in Edition, null Paperback</B></TD>, 2000 Wrapper Output H 20$ 30$ 40$ 30$ 30$ 45$ 50$ Wrapper = Grammar <HTML><BODY><TABLE> <TR> <TD><FONT>books.com<FONT></TD> <TD><A> #PCDATA</A></TD></TR> (<TR> ( <TD><FONT> #PCDATA <FONT></TD> ( <TD><B> #PCDATA </B> )? </TD> ( <TR><TD><FONT> #PCDATA <FONT> <B> #PCDATA </B> </TD></TR> )+ )+ Target Schema SET ( TUPLE (A : #PCDATA; B : SET ( TUPLE ( C : #PCDATA; D : #PCDATA; E : SET ( TUPLE ( F: #PCDATA; G: #PCDATA; 27 H: #PCDATA)))
29 Model Management approach (Atzeni, Bernstein, others ) Given two different data models (e.g. OO and relational, or XML and OO, etc.) (when datasources are at least semistructured) Define general mappings from one model into another, which allow to Map SQL schema to XML schema Map data source to data warehouse Map OO classes to data source tables, To this end, one possibility is to use a metamodel
30 METAMODEL A METAMODEL IS AN ABSTRACT MODEL FOR THE SPECIFICATION OF CONCRETE MODELS TWO TYPES OF METAMODELS: 1. GENERAL ENTITIES WHOSE SPECIALIZATIONS BECOME OBJECTS IN THE TARGET MODEL, E.G. GSMM 2. ENTITIES DESCRIBING THE OBJECTS OF THE TARGET MODEL, E.G.GEOGRAPHIC DATA FILES (GDF) 29
31 Using a metamodel for integrated data representation TRANSLATION OF DIFFERENT MODELS INTO A UNIQUE FORMALISM EASY A-PRIORI COMPARISON BETWEEN THE DIFFERENT MODEL S FEATURES AUTOMATIC TRANSLATION DICTATED BY THE REPRESENTATION RULES OF THE CONCRETE MODEL INTO THE METAMODEL 30
32 GSMM (General Semistructured Meta-Model) GRAPH-BASED DOES NOT REPRESENT THE SCHEMA: SELF- DESCRIPTIVE AS TSIMMIS GSSM REPLACES THE CONCEPT OF SCHEMA WITH THAT OF CONSTRAINT INTRODUCTION OF FLEXIBILITY IN DATA REPRESENTATION 31
33 An XML document PRODUCERS.. <producers> <producer> <mn-name>mercury</mn-name> <year>1999</year> <model> <mo-name>sable LT</mo-name> <front-rating>3.84</frontrating> <side-rating>2.14</side-rating> <rank>9</rank> </model> </producer> </producers> PRODUCER.. mn-name year MODEL Mercury 1999 rank mo-name Sable LT front-rating side-rating
34 Its representation in GSMM <PRODUCERS,root,0, >.. sub-element of <PRODUCER,element,1, > sub-element of <mn-name,element,1, Mercury> sub-element of sub-element of.. <MODEL,element,3, > <year,element,2, 1999> sub-element of sub-element of sub-element of <mo-name,element,1, SableLT> <rank,element,5, 9> <front-rating,element,2, 3.84> sub-element of <side-rating,element,4, 2.14> 33
35 GSL General Semistructured Language RULE BASED WHERE A RULE IS A COLORED GRAPH RED SOLID FOR POSITIVE PREMISES RED DASHED FOR NEGATIVE PREMISES GREEN SOLID FOR POSITIVE CONSEQUENCES RULES REPRESENT: QUERIES CONSTRAINTS (EMPTY CONSEQUENCE) 34
36 GSL QUERIES G Q 1 : Find all the B nodes which have at least a C child and link them to a node R 1 R 1 A R 2 Q 1 R 1 B C B C B D Q 2 : Find all the B nodes that do not have children C and link them to a node R 2 Q 2 R 2 B C 35
37 CONSTRAINTS IN GSL G A C 1 : Whenever a B node has a child, this is a C node B B C 1 B x This is false in the instance { x = C } C D C 2 : Each A node has at least a B child C 2 A B This is true in the instance 36
38 APPLICATION OF THE GSMM METAMODEL THE METAMODEL ALLOWS THE CONSTRUCTION OF A GENERIC GRAPH WHICH REPRESENTS AN INSTANCE OF A CONCRETE MODEL PLUS CONSTRAINTS (also represented by means of graphs): HIGH LEVEL, OR META- CONSTRAINTS these dictate the sintactic rules of the object data model LOW LEVEL CONSTRAINTS as usual, these dictate the application domain semantics IT IS A METAMODEL OF THE FIRST KIND 37
39 XML EACH NODE LABEL HAS CARDINALITY 4: n i = <Ntag i, Ntype i, Norder i, Ncontent i > Ntype i says whether the node is the root, an element, a text, Norder i represents the node s position as a child of its parent node Ncontent i may assume a value of type PCDATA or value EACH EDGE LABEL HAS CARDINALITY 1: ej = < (nh, nk), ELj > EL j = < Etype j > Etype j {attribute of, sub-element of} 38
40 CONSTRAINT REPRESENTATION IN XML <TAG1,TYPE1,ORDER1,CONTENT1> <TAG1,TYPE1,ORDER1,CONTENT1> <E-type> <TAG2,TYPE2,ORDER2,CONTENT2> <E-type> <TAG2,TYPE2,ORDER2,CONTENT2> { E-type=SubElement-of TYPE1=element (TYPE2=element TYPE2=text), E-type=Attribute-of TYPE1=element TYPE2=attribute } HIGH LEVEL, OR META-CONSTRAINT: THE TYPE OF AN ARC DEPENDS ON THE TYPE OF THE DESTINATION NODES { TYPE1=root CONTENT1= } HIGH LEVEL, OR META-CONSTRAINT: THE GRAPH ROOT HAS TYPE root AND UNDEFINED CONTENT HIGH LEVEL CONSTRAINTS CONTRIBUTE TO THE DEFINITION OF THE STRUCTURE OF ANY XML DOCUMENT 39
41 An XML document PRODUCERS.. <producers> <producer> <mn-name>mercury</mn-name> <year>1999</year> <model> <mo-name>sable LT</mo-name> <front-rating>3.84</frontrating> <side-rating>2.14</side-rating> <rank>9</rank> </model> </producer> </producers> PRODUCER.. mn-name year MODEL Mercury 1999 rank mo-name Sable LT front-rating side-rating
42 Its representation in GSMM High-level constraints satisfied sub-element of <mn-name,element,1, Mercury> <PRODUCERS,root,0, > sub-element of <PRODUCERS,element,1, > sub-element of.. sub-element of.. <MODEL,element,3, > <year,element,2, 1999> sub-element of sub-element of sub-element of <mo-name,element,1, SableLT> <rank,element,5, 9> <front-rating,element,2, 3.84> sub-element of <side-rating,element,4, 2.14> 41
43 LOW-LEVEL, OR APPLICATION-LEVEL, CONSTRAINTS <producer,element,order1,content1> <producer,element,order1,content1> <Subelement-of> <Subelement-of> <model,element, ORDER2,CONTENT2> <year,element,order2,content2> { CONTENT2>1990 } Low-level constraint: For each PRODUCER element there is at least one MODEL element Low-level constraint: YEAR must be greater than 1990 LOW-LEVEL CONSTRAINTS EXPRESS THE DOCUMENT S SEMANTICS 42
44 Geographic Data Files (GDF) GDF is a standard, used for describing roadmaps CITY STREET REGISTER STREET SIGN ARCHIVE TRAFFIC CONTROL SYSTEMS CAR NAVIGATOR SYSTEMS (GPS).. IT IS A METAMODEL OF THE SECOND KIND 43
45 GDF STRUCTURE DATASET OF 82 ASCII CHARACTERS RECORDS ENTITIES ATTRIBUTES RELATIONS 11 INTEREST THEMES ROADS AND FERRIES BRIDGES AND TUNNELS RAILROADS RIVERS PUBLIC TRANSPORTATION ADMINISTRATION AREAS.. 3 DESCRIPTION LEVELS FOR EACH THEME 44
46 GDF LEVEL 0 TOPOLOGY A PLANAR GRAPH: POINT ARC NODE POLYGON EACH ELEMENT IS UNIQUELY IDENTIFIED BY AN ID TOPOLOGICAL RULES GOVERN INTERNAL STRUCTURE AND RELATIONSHIPS AMONG ELEMENTS 45
47 GDF LEVELS LEVEL 1 ELEMENTARY ENTITIES THIS IS THE BASIS FOR THE CITY STREET REGISTER ROAD ELEMENT JUNCTION TRAFFIC AREA LEVEL 2 COMPLEX ENTITIES THIS IS THE BASIS FOR THE GEOGRAPHIC INFORMATION SYSTEMS STREET INTERSECTION 46
48 GDF LEVELS VIALE ABRUZZI CITY MAP LEVEL 2 CL234 CS102 CL203 CL235 CS103 L525 S867 S868 L321 L322 L524 S869 S870 LEVEL 1 LEVEL 0 C622 C621 L818 L811 L812 C623 L817 L819 C624 C625 C626 L813 C627 L820 C628 L814 C629 L821 L815 C630 L816 C631 47
49 META-ENTITY DEFINITION ATTRIBUTE_TYPE_ENTITY_TYPE ATTRIBUTE_TYPE 1 COMPOUND_ATTRIBUTE_TYPE ENTITY_TYPE ATTRIBUTE_TYPE_RELATION_TYPE RELATION_TYPE RELATION_TYPE_ENTITY_TYPE 48
50 GDF META-SCHEMA 49
51 AN ORTHOGONAL APPROACH: A STANDARD FOR METADATA MULTIMEDIA CONTENT DESCRIPTION INTERFACE (MPEG-7) PROVIDES A SET OF TOOLS FOR DESCRIBING MULTIMEDIA CONTENT XML-SCHEMA BASED NOT BASED ON SPECIFIC APPLICATION DOMAINS, IT ENABLES EASY EXCHANGE AND REUSE OF MULTIMEDIA CONTENTS SPEECH TEXT IMAGES TEXT IT IS APPLIED IN REAL-TIME AS WELL AS IN NON-REAL- TIME SITUATIONS: OFF-LINE CONTENT STORAGE ON-LINE CONTENT STORAGE STREAM (NO PERMANENT STORAGE) 50
52 MPEG-7 TOP LEVEL ELEMENTS Content Description (abstract) Complete Description (abstract) Content Management (abstract) Content Entity Multimedia Content (abstract) Image Video Audio Audio-visual Multimedia Content Multimedia Collection Signal Ink Content Analytic Edited Video Content Abstraction (abstract) Semantic Description Model Description Summary Description View Description Variation Description User Description Media Description Creation Description Usage Description Classification Scheme Description Da: Martinez, Koenen, Pereira 51
Semistructured Data and Mediation
Semistructured Data and Mediation Prof. Letizia Tanca Dipartimento di Elettronica e Informazione Politecnico di Milano SEMISTRUCTURED DATA FOR THESE DATA THERE IS SOME FORM OF STRUCTURE, BUT IT IS NOT
More informationis that the question?
To structure or not to structure, is that the question? Paolo Atzeni Based on work done with (or by!) G. Mecca, P.Merialdo, P. Papotti, and many others Lyon, 24 August 2009 Content A personal (and questionable
More informationJennifer Widom. Stanford University
Principled Research in Database Systems Stanford University What Academics Give Talks About Other people s papers Thesis and new results Significant research projects The research field BIG VISION Other
More informationExtracting Data From Web Sources. Giansalvatore Mecca
Extracting Data From Web Sources Giansalvatore Mecca Università della Basilicata mecca@unibas.it http://www.difa.unibas.it/users/gmecca/ EDBT Summer School 2002 August, 26 2002 Outline Subject of this
More informationDatabases and the World Wide Web
Databases and the World Wide Web Paolo Atzeni D.I.A. - Università di Roma Tre http://www.dia.uniroma3.it/~atzeni thanks to the Araneus group: G. Mecca, P. Merialdo, A. Masci, V. Crescenzi, G. Sindoni,
More informationData Integration Systems
Data Integration Systems Haas et al. 98 Garcia-Molina et al. 97 Levy et al. 96 Chandrasekaran et al. 2003 Zachary G. Ives University of Pennsylvania January 13, 2003 CIS 650 Data Sharing and the Web Administrivia
More informationTeiid Designer User Guide 7.5.0
Teiid Designer User Guide 1 7.5.0 1. Introduction... 1 1.1. What is Teiid Designer?... 1 1.2. Why Use Teiid Designer?... 2 1.3. Metadata Overview... 2 1.3.1. What is Metadata... 2 1.3.2. Editing Metadata
More informationUSING XML AS A MEDIUM FOR DESCRIBING, MODIFYING AND QUERYING AUDIOVISUAL CONTENT STORED IN RELATIONAL DATABASE SYSTEMS
USING XML AS A MEDIUM FOR DESCRIBING, MODIFYING AND QUERYING AUDIOVISUAL CONTENT STORED IN RELATIONAL DATABASE SYSTEMS Iraklis Varlamis 1, Michalis Vazirgiannis 2, Panagiotis Poulos 3 1,2) Dept of Informatics,
More informationDATA MODELS FOR SEMISTRUCTURED DATA
Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and
More informationIntroduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Introduction to XML Yanlei Diao UMass Amherst April 17, 2008 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly
More informationB.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1
Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished
More informationDatabase Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz
Database Management Systems MIT 22033 Lesson 01 - Introduction By S. Sabraz Nawaz Introduction A database management system (DBMS) is a software package designed to create and maintain databases (examples?)
More informationCSE 544 Data Models. Lecture #3. CSE544 - Spring,
CSE 544 Data Models Lecture #3 1 Announcements Project Form groups by Friday Start thinking about a topic (see new additions to the topic list) Next paper review: due on Monday Homework 1: due the following
More informationIntroduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 14-15: XML CSE 414 - Spring 2013 1 Announcements Homework 4 solution will be posted tomorrow Midterm: Monday in class Open books, no notes beyond one hand-written
More informationA Web-Based OO Platform for the Development of Didactic Multimedia Collaborative Applications
A Web-Based OO Platform for the Development of Didactic Multimedia Collaborative Applications David A. Fuller, Luis A. Guerrero, Jenny Zegarra {dfuller, luguerre, jzegarra}@ing.puc.cl Computer Science
More informationAn introduction to Big Data Integration. Letizia Tanca Politecnico di Milano
An introduction to Big Data Integration Letizia Tanca Politecnico di Milano Trends in Data Integration ü Pre-history: focused on challenges that occur within enterprises ü The Web era: scaling to a much
More informationLecturer 2: Spatial Concepts and Data Models
Lecturer 2: Spatial Concepts and Data Models 2.1 Introduction 2.2 Models of Spatial Information 2.3 Three-Step Database Design 2.4 Extending ER with Spatial Concepts 2.5 Summary Learning Objectives Learning
More informationCSE 880. Advanced Database Systems. Semistuctured Data and XML
CSE 880 Advanced Database Systems Semistuctured Data and XML S. Pramanik 1 Semistructured Data 1. Data is self describing with schema embedded to the data itself. 2. Theembeddedschemacanchangewithtimejustlike
More informationIntroduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington
Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT
More information10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344
What We Have Learned So Far Introduction to Data Management CSE 344 Lecture 12: XML and XPath A LOT about the relational model Hand s on experience using a relational DBMS From basic to pretty advanced
More informationWeb Data Extraction. Craig Knoblock University of Southern California. This presentation is based on slides prepared by Ion Muslea and Kristina Lerman
Web Data Extraction Craig Knoblock University of Southern California This presentation is based on slides prepared by Ion Muslea and Kristina Lerman Extracting Data from Semistructured Sources NAME Casablanca
More informationAdditional Readings on XPath/XQuery Main source on XML, but hard to read:
Introduction to Database Systems CSE 444 Lecture 10 XML XML (4.6, 4.7) Syntax Semistructured data DTDs XML Outline April 21, 2008 1 2 Further Readings on XML Additional Readings on XPath/XQuery Main source
More informationJSON - Overview JSon Terminology
Announcements Introduction to Database Systems CSE 414 Lecture 12: Json and SQL++ Office hours changes this week Check schedule HW 4 due next Tuesday Start early WQ 4 due tomorrow 1 2 JSON - Overview JSon
More informationData Formats and APIs
Data Formats and APIs Mike Carey mjcarey@ics.uci.edu 0 Announcements Keep watching the course wiki page (especially its attachments): https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018 Ditto for
More informationIntroduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 13: XML and XPath 1 Announcements Current assignments: Web quiz 4 due tonight, 11 pm Homework 4 due Wednesday night, 11 pm Midterm: next Monday, May 4,
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 11: XML and XPath 1 XML Outline What is XML? Syntax Semistructured data DTDs XPath 2 What is XML? Stands for extensible Markup Language 1. Advanced, self-describing
More informationCSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris
CSE 3241: Database Systems I Databases Introduction (Ch. 1-2) Jeremy Morris 1 Outline What is a database? The database approach Advantages Disadvantages Database users Database concepts and System architecture
More informationKNOWLEDGE DISCOVERY AND DATA MINING
KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS
More informationObject-Relational and Nested-Relational Databases Dr. Akhtar Ali
Extensions to Relational Databases Object-Relational and Nested-Relational Databases By Dr. Akhtar Ali Lecture Theme & References Theme The need for extensions in Relational Data Model (RDM) Classification
More informationIntroduction to Semistructured Data and XML
Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of Washington Database Management Systems, R. Ramakrishnan 1 How the Web is Today HTML documents often
More informationCSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story
CSE 544 Principles of Database Management Systems Lecture 4: Data Models a Never-Ending Story 1 Announcements Project Start to think about class projects If needed, sign up to meet with me on Monday (I
More informationDatabase Instance And Relational Schema Design A Fact Oriented Approach
Database Instance And Relational Schema Design A Fact Oriented Approach File-oriented approaches create problems for organizations because of d) how master files maintain facts used by certain application
More informationProject Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please)
Virginia Tech. Computer Science CS 4604 Introduction to DBMS Spring 2016, Prakash Project Assignment 2 (due April 6 th, 2016, 4:00pm, in class hard-copy please) Reminders: a. Out of 100 points. Contains
More informationSangam: A Framework for Modeling Heterogeneous Database Transformations
Sangam: A Framework for Modeling Heterogeneous Database Transformations Kajal T. Claypool University of Massachusetts-Lowell Lowell, MA Email: kajal@cs.uml.edu Elke A. Rundensteiner Worcester Polytechnic
More informationMIT Database Management Systems Lesson 01: Introduction
MIT 22033 Database Management Systems Lesson 01: Introduction By S. Sabraz Nawaz Senior Lecturer in MIT, FMC, SEUSL Learning Outcomes At the end of the module the student will be able to: Describe the
More informationIntroduction to Database Systems CSE 444
Introduction to Database Systems CSE 444 Lecture 25: XML 1 XML Outline XML Syntax Semistructured data DTDs XPath Coverage of XML is much better in new edition Readings Sections 11.1 11.3 and 12.1 [Subset
More informationMULTIMEDIA DATABASES OVERVIEW
MULTIMEDIA DATABASES OVERVIEW Recent developments in information systems technologies have resulted in computerizing many applications in various business areas. Data has become a critical resource in
More informationCSE 344 APRIL 16 TH SEMI-STRUCTURED DATA
CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA ADMINISTRATIVE MINUTIAE HW3 due Wednesday OQ4 due Wednesday HW4 out Wednesday (Datalog) Exam May 9th 9:30-10:20 WHERE WE ARE So far we have studied the relational
More informationOne of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while
1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling
More informationWARE: a tool for the Reverse Engineering of Web Applications
WARE: a tool for the Reverse Engineering of Web Applications Anna Rita Fasolino G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini Dipartimento di Informatica e Sistemistica University of Naples Federico
More informationDatabase System Concepts and Architecture
CHAPTER 2 Database System Concepts and Architecture Copyright 2017 Ramez Elmasri and Shamkant B. Navathe Slide 2-2 Outline Data Models and Their Categories History of Data Models Schemas, Instances, and
More informationEstimating the Quality of Databases
Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality
More informationStudy on XML-based Heterogeneous Agriculture Database Sharing Platform
Study on XML-based Heterogeneous Agriculture Database Sharing Platform Qiulan Wu, Yongxiang Sun, Xiaoxia Yang, Yong Liang,Xia Geng School of Information Science and Engineering, Shandong Agricultural University,
More informationRelational Approach. Problem Definition
Relational Approach (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Slides are mostly based on Information Retrieval Algorithms and Heuristics, Grossman & Frieder 1 Problem Definition Three conceptual
More informationIntroduction to Information Retrieval. Lecture Outline
Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations
More informationWeb Services Interfaces
Web Services Interfaces Michalis Petropoulos Alin Deutsch Yannis Papakonstantinou Vasilis Vassalos Scott Mitchell UNIVERSITY OF CALIFORNIA, SAN DIEGO Microsoft Research, July 2003 Exporting DBMSs on the
More informationRelational Data Model is quite rigid. powerful, but rigid.
Lectures Desktop - 2 (C) Page 1 XML Tuesday, April 27, 2004 8:43 AM Motivation: Relational Data Model is quite rigid. powerful, but rigid. With the explosive growth of the Internet, electronic information
More informationTeiid Designer User Guide 7.7.0
Teiid Designer User Guide 1 7.7.0 1. Introduction... 1 1.1. What is Teiid Designer?... 1 1.2. Why Use Teiid Designer?... 2 1.3. Metadata Overview... 2 1.3.1. What is Metadata... 2 1.3.2. Editing Metadata
More informationDATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems
DATABASE MANAGEMENT SYSTEMS UNIT I Introduction to Database Systems Terminology Data = known facts that can be recorded Database (DB) = logically coherent collection of related data with some inherent
More informationWEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE
WEB DATA EXTRACTION METHOD BASED ON FEATURED TERNARY TREE *Vidya.V.L, **Aarathy Gandhi *PG Scholar, Department of Computer Science, Mohandas College of Engineering and Technology, Anad **Assistant Professor,
More informationIntroduction to Database Systems. Fundamental Concepts
Introduction to Database Systems Fundamental Concepts Werner Nutt 1 Characteristics of the DB Approach Insulation of application programs and data from each other Use of a ue to store the schema Support
More informationApproaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values
XML Storage CPS 296.1 Topics in Database Systems Approaches Text files Use DOM/XSLT to parse and access XML data Specialized DBMS Lore, Strudel, exist, etc. Still a long way to go Object-oriented DBMS
More informationA MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS
A MODEL FOR ADVANCED QUERY CAPABILITY DESCRIPTION IN MEDIATOR SYSTEMS Alberto Pan, Paula Montoto and Anastasio Molano Denodo Technologies, Almirante Fco. Moreno 5 B, 28040 Madrid, Spain Email: apan@denodo.com,
More informationReview for Exam 1 CS474 (Norton)
Review for Exam 1 CS474 (Norton) What is a Database? Properties of a database Stores data to derive information Data in a database is, in general: Integrated Shared Persistent Uses of Databases The Integrated
More informationAnnouncements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 13: Json and SQL++ Announcements HW5 + WQ5 will be out tomorrow Both due in 1 week Midterm in class on Friday, 5/4 Covers everything (HW, WQ, lectures,
More information11. EXTENSIBLE MARKUP LANGUAGE (XML)
11. EXTENSIBLE MARKUP LANGUAGE (XML) Introduction Extensible Markup Language is a Meta language that describes the contents of the document. So these tags can be called as self-describing data tags. XML
More informationIntroduction: Databases and Database Users. Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 1 Introduction: Databases and Database Users Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Types of Databases and Database Applications
More informationChapter 13 XML: Extensible Markup Language
Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server
More informationData Models: The Center of the Business Information Systems Universe
Data s: The Center of the Business Information Systems Universe Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie, Maryland 20716 Tele: 301-249-1142 Email: Whitemarsh@wiscorp.com Web: www.wiscorp.com
More informationManagement Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management
Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1
Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.
More informationXSLT and Structural Recursion. Gestão e Tratamento de Informação DEI IST 2011/2012
XSLT and Structural Recursion Gestão e Tratamento de Informação DEI IST 2011/2012 Outline Structural Recursion The XSLT Language Structural Recursion : a different paradigm for processing data Data is
More informationPart XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321
Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends
More informationCSC Web Technologies, Spring Web Data Exchange Formats
CSC 342 - Web Technologies, Spring 2017 Web Data Exchange Formats Web Data Exchange Data exchange is the process of transforming structured data from one format to another to facilitate data sharing between
More informationDatabase Design. IIO30100 Tietokantojen suunnittelu. Michal Zabovsky. Presentation overview
Database Design IIO30100 Tietokantojen suunnittelu Michal Zabovsky Department of Informatics Faculty of Management Science and Informatics University of Zilina Slovak Republic Presentation overview Software
More informationManual Wrapper Generation. Automatic Wrapper Generation. Grammar Induction Approach. Overview. Limitations. Website Structure-based Approach
Automatic Wrapper Generation Kristina Lerman University of Southern California Manual Wrapper Generation Manual wrapper generation requires user to Specify the schema of the information source Single tuple
More informationMarkup Languages SGML, HTML, XML, XHTML. CS 431 February 13, 2006 Carl Lagoze Cornell University
Markup Languages SGML, HTML, XML, XHTML CS 431 February 13, 2006 Carl Lagoze Cornell University Problem Richness of text Elements: letters, numbers, symbols, case Structure: words, sentences, paragraphs,
More informationA Digital Library Framework for Reusing e-learning Video Documents
A Digital Library Framework for Reusing e-learning Video Documents Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti ISTI-CNR, via G. Moruzzi 1, 56124 Pisa, Italy paolo.bolettieri,fabrizio.falchi,claudio.gennaro,
More informationDatabase Systems ER Model. A.R. Hurson 323 CS Building
ER Model A.R. Hurson 323 CS Building Database Design Data model is a group of concepts that helps to specify the structure of a database and a set of associated operations allowing data retrieval and data
More informationDatabase Heterogeneity
Database Heterogeneity Lecture 13 1 Outline Database Integration Wrappers Mediators Integration Conflicts 2 1 1. Database Integration Goal: providing a uniform access to multiple heterogeneous information
More informationRelational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity
COS 597A: Principles of Database and Information Systems Relational model continued Understanding how to use the relational model 1 with as weak entity folded into folded into branches: (br_, librarian,
More information1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM
1 1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL In the context of federation of repositories of Semantic Interoperability s, a number of entities are relevant. The primary entities to be described by ADMS are the
More informationIntroduction. Web Pages. Example Graph
COSC 454 DB And the Web Introduction Overview Dynamic web pages XML and databases Reference: (Elmasri & Navathe, 5th ed) Ch. 26 - Web Database Programming Using PHP Ch. 27 - XML: Extensible Markup Language
More informationApplied Databases. Sebastian Maneth. Lecture 5 ER Model, Normal Forms. University of Edinburgh - January 30 th, 2017
Applied Databases Lecture 5 ER Model, Normal Forms Sebastian Maneth University of Edinburgh - January 30 th, 2017 Outline 2 1. Entity Relationship Model 2. Normal Forms From Last Lecture 3 the Lecturer
More informationI R UNDERGRADUATE REPORT. Information Extraction Tool. by Alex Lo Advisor: S.K. Gupta, Edward Yi-tzer Lin UG
UNDERGRADUATE REPORT Information Extraction Tool by Alex Lo Advisor: S.K. Gupta, Edward Yi-tzer Lin UG 2001-1 I R INSTITUTE FOR SYSTEMS RESEARCH ISR develops, applies and teaches advanced methodologies
More informationMISM: A platform for model-independent solutions to model management problems
MISM: A platform for model-independent solutions to model management problems Paolo Atzeni, Luigi Bellomarini, Francesca Bugiotti, and Giorgio Gianforme Dipartimento di informatica e automazione Università
More informationPart II: Semistructured Data
Inf1-DA 2011 2012 II: 22 / 119 Part II Semistructured Data XML: II.1 Semistructured data, XPath and XML II.2 Structuring XML II.3 Navigating XML using XPath Corpora: II.4 Introduction to corpora II.5 Querying
More informationTaxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company
Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation
More informationD2.5 Data mediation. Project: ROADIDEA
D2.5 Data mediation Project: ROADIDEA 215455 Document Number and Title: D2.5 Data mediation How to convert data with different formats Work-Package: WP2 Deliverable Type: Report Contractual Date of Delivery:
More informationData integration supports seamless access to autonomous, heterogeneous information
Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources
More informationToday s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls
Today s topics CompSci 516 Data Intensive Computing Systems Lecture 4 Relational Algebra and Relational Calculus Instructor: Sudeepa Roy Finish NULLs and Views in SQL from Lecture 3 Relational Algebra
More informationDatabase Ph.D. Qualifying Exam Spring 2006
Database Ph.D. Qualifying Exam Spring 2006 Please answer six of the following nine questions. Question 1. Consider the following relational schema: Employee (ID, Lname, Fname, Salary, Dnumber, City) where
More informationCSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story
CSE 544 Principles of Database Management Systems Fall 2016 Lecture 4 Data models A Never-Ending Story 1 Announcements Project Start to think about class projects More info on website (suggested topics
More informationSemi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories
Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Ornsiri Thonggoom, Il-Yeol Song, Yuan An The ischool at Drexel Philadelphia, PA USA Outline Long Term Research
More informationDatabase Systems Relational Model. A.R. Hurson 323 CS Building
Relational Model A.R. Hurson 323 CS Building Relational data model Database is represented by a set of tables (relations), in which a row (tuple) represents an entity (object, record) and a column corresponds
More informationXML: Extensible Markup Language
XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified
More informationMIDTERM EXAMINATION Spring 2010 CS403- Database Management Systems (Session - 4) Ref No: Time: 60 min Marks: 38
Student Info StudentID: Center: ExamDate: MIDTERM EXAMINATION Spring 2010 CS403- Database Management Systems (Session - 4) Ref No: 1356458 Time: 60 min Marks: 38 BC080402322 OPKST 5/28/2010 12:00:00 AM
More informationNexus A Global, Active,
Nexus A Global, Active, and 3D Augmented Reality Model Institute for Parallel and Distributed Systems Universität ität Stuttgarttt t Overview Nexus A Global, Active, and 3D Augmented Reality Model Introduction:
More informationKnowledge discovery from XML Database
Knowledge discovery from XML Database Pravin P. Chothe 1 Prof. S. V. Patil 2 Prof.S. H. Dinde 3 PG Scholar, ADCET, Professor, ADCET Ashta, Professor, SGI, Atigre, Maharashtra, India Maharashtra, India
More informationApplied Databases. Sebastian Maneth. Lecture 5 ER Model, normal forms. University of Edinburgh - January 25 th, 2016
Applied Databases Lecture 5 ER Model, normal forms Sebastian Maneth University of Edinburgh - January 25 th, 2016 Outline 2 1. Entity Relationship Model 2. Normal Forms Keys and Superkeys 3 Superkey =
More informationPractical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems
Practical Database Design Methodology and Use of UML Diagrams 406.426 Design & Analysis of Database Systems Jonghun Park jonghun@snu.ac.kr Dept. of Industrial Engineering Seoul National University chapter
More informationCA Test Data Manager 3.x: Foundations 200
CA EDUCATION COURSE DESCRIPTION CA Test Data Manager 3.x: Foundations 200 Course Overview PRODUCT RELEASE CA Test Data Manager 3.2 This course provides students with primary concepts on each function of
More informationCS145 Introduction. About CS145 Relational Model, Schemas, SQL Semistructured Model, XML
CS145 Introduction About CS145 Relational Model, Schemas, SQL Semistructured Model, XML 1 Content of CS145 Design of databases. E/R model, relational model, semistructured model, XML, UML, ODL. Database
More informationOverview. Introduction. Introduction XML XML. Lecture 16 Introduction to XML. Boriana Koleva Room: C54
Overview Lecture 16 Introduction to XML Boriana Koleva Room: C54 Email: bnk@cs.nott.ac.uk Introduction The Syntax of XML XML Document Structure Document Type Definitions Introduction Introduction SGML
More informationNew Features Summary PowerDesigner 15.2
New Features Summary PowerDesigner 15.2 Windows DOCUMENT ID: DC10077-01-1520-01 LAST REVISED: February 2010 Copyright 2010 by Sybase, Inc. All rights reserved. This publication pertains to Sybase software
More informationSemantics, Metadata and Identifying Master Data
Semantics, Metadata and Identifying Master Data A DataFlux White Paper Prepared by: David Loshin, President, Knowledge Integrity, Inc. Once you have determined that your organization can achieve the benefits
More informationInformation Management (IM)
1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;
More informationORDBMS - Introduction
ORDBMS - Introduction 1 Theme The need for extensions in Relational Data Model Classification of database systems Introduce extensions to the basic relational model Applications that would benefit from
More informationCS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #2: The Relational Model and Relational Algebra
CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #2: The Relational Model and Relational Algebra Course Outline Weeks 1 4: Query/ Manipulation Languages and Data Modeling
More informationHERA: Automatically Generating Hypermedia Front- Ends for Ad Hoc Data from Heterogeneous and Legacy Information Systems
HERA: Automatically Generating Hypermedia Front- Ends for Ad Hoc Data from Heterogeneous and Legacy Information Systems Geert-Jan Houben 1,2 1 Eindhoven University of Technology, Dept. of Mathematics and
More information