Model Management and Schema Mappings: Theory and Practice (Part II)

Size: px
Start display at page:

Download "Model Management and Schema Mappings: Theory and Practice (Part II)"

Transcription

1 IBM Research Model Management and Schema Mappings: Theory and Practice (Part II) Howard Ho IBM Almaden Research Center For VLDB07 Tutorial with Phil Bernstein, Microsoft Research 2006 IBM Corporation

2 The Integration Challenge Complex and heterogeneous environments Many different types of systems Many inter-related applications Escalating needs Variety, velocity, volume People are expensive 1

3 Outline Clio: Basic Features of a Schema Mapping System Schema Matching Schema Mapping Query Generation IBM Rational Data Architect Product MAUI: Advanced Features of a Schema Mapping System Nested Mapping Model Mapping-Based XML Transformation Engine Schema Integration Schema Evolution Clio2010: Mapping-Based Authoring of Data Flows ETL: Mapping ETL Conversion Web-Service Composition: Mapping for web-service data sources Mashups: Mapping Mashup Generation Reuse: Mapping Polymorphism Conclusions and Future Directions 2

4 Clio: A Schema Mapping System XML Schema DTD Relational User mapping Mapping Generation Wants data from S Understands T May not understand S Source schema S conforms to Logical mapping (internal) Xformation Query Generation Target schema T conforms to data Low-level mapping (SQL, SQL/XML, XQuery, XSLT and Java code) Logical mappings can be used for both target materialization or query rewriting 3

5 Major Features (and Challenges) Schemas can be arbitrarily different Element correspondences Human friendly Automatic discovery Support Nested Structures Nested Relational Model Nested Constraints Produce Correct Grouping Preserve data meaning Discover associations Use constraints & schema Create New Target Values and 4

6 Generate Transformation Queries (XQuery) <?xml version="1.0" encoding="utf-8"?> <statisticsdb> <citystatistics> <city/>, distinct ( FOR $x0 IN $doc/expensedb/grant, $x1 IN $doc/expensedb/company WHERE $x1/cid/text() = $x0/cid/text() RETURN <organization> <orgid> $x0/cid/text() </orgid>, <oname> $x1/cname/text() </oname>, distinct ( FOR $x0l1 IN $doc/expensedb/grant, $x1l1 IN $doc/expensedb/company WHERE $x1l1/cid/text() = $x0l1/cid/text() AND $x1/cname/text() = $x1l1/cname/text() AND $x0/cid/text() = $x0l1/cid/text() RETURN <funding> <fid> "Sk35(", $x0l1/amount/text(), ", ", $x1l1/cname/text(), ", ", $x0l1/cid/text(), ")" </fid>, <proj> "Sk36(", $x0l1/amount/text(), ", ", $x1l1/cname/text(), ", ", $x0l1/cid/text(), ")" </proj>, <aid> "Sk32(", $x0l1/amount/text(), ", ", $x1l1/cname/text(), ", ", $x0l1/cid/text(), ")" </aid> </funding> ) </organization> ), distinct ( FOR $x0 IN $doc/expensedb/grant, $x1 IN $doc/expensedb/company WHERE $x1/cid/text() = $x0/cid/text() RETURN <financial> <aid> "Sk32(", $x0/amount/text(), ", ", $x1/cname/text(), ", ", $x0/cid/text(), ")" </aid>, <amount> $x0/amount/text() </amount> </financial> ) </citystatistics> </statisticsdb> 5

7 Generate Transformation Scripts (XSLT) <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl=" <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/> <xsl:template match="/"> <result> <xsl:call-template name="q0"/> </result> </xsl:template> <xsl:template name="q0"> <xsl:element name="statisticsdb"> <xsl:attribute name="isroot">true</xsl:attribute> <xsl:element name="clioset"> <xsl:attribute name="id">sk_statisticsdb()</xsl:attribute> </xsl:element> </xsl:element> <xsl:for-each select="/expensedb/grant"> <xsl:variable name="x0" select="."/> <xsl:for-each select="/expensedb/company"> <xsl:variable name="x1" select="."/> <xsl:if test="$x1/cid=$x0/cid"> <xsl:element name="citystatistics"> <xsl:attribute name="inset">sk_statisticsdb()</xsl:attribute> <xsl:element name="city"/> <xsl:element name="clioset"> <xsl:attribute name="id">sk_statisticsdb_0_1(sk_statisticsdb())</xsl:attribute> </xsl:element> <xsl:element name="clioset"> <xsl:attribute name="id">sk_statisticsdb_0_2(sk_statisticsdb())</xsl:attribute> </xsl:element> </xsl:element> <xsl:element name="organization"> <xsl:attribute name="inset">sk_statisticsdb_0_1(sk_statisticsdb())</xsl:attribute> <xsl:element name="orgid"><xsl:value-of select="$x0/cid"/></xsl:element> <xsl:element name="oname"><xsl:value-of select="$x1/cname"/></xsl:element> <xsl:element name="clioset"> <xsl:attribute name="id">sk_statisticsdb_0_1_0_2(<xsl:value-of select="$x0/cid"/>, <xsl:value-of select="$x1/cname"/>, Sk_statisticsDB_0_1(Sk_statisticsDB())) </xsl:attribute> </xsl:element> </xsl:element> <xsl:element name="funding"> <xsl:attribute name="inset">sk_statisticsdb_0_1_0_2(<xsl:value-of select="$x0/cid"/>, <xsl:value-of select="$x1/cname"/>, Sk_statisticsDB_0_1(Sk_statisticsDB())) </xsl:attribute> <xsl:element name="fid"> Sk35(<xsl:value-of select="$x0/amount"/>, <xsl:value-of select="$x1/cname"/>, <xsl:value-of select="$x0/cid"/>) </xsl:element>

8 (Flat) Mapping Generation Source schema S Schema Correspondences Target schema T Mappings Source Concepts (relational views) Target Concepts (relational views) Step 1. Extraction of concepts (in each schema). Concept = one category of data that can exist in the schema Step 2. Mapping generation Enumerate all non-redundant maps between pairs of concepts [Popa, Velegrakis, Miller, Hernandez, Fagin. VLDB02] [Fagin, Kolatios, Miller, Popa. ICDT 03] [Haas, Hernandez, Ho, Popa, Roth. SIGMOD 05] 7

9 (Flat) Mapping Example m 1 maps proj to dept-projects m 1 : (p 0 in proj) (d in dept) (p in d.projects) p 0.dname = d.dname p 0.pname = p.pname m 2 : (p 0 in proj) (e 0 in p 0.emps) (d in dept) (p in d.projects) (e in d.emps) (w in e.workson) w.pid = p.pid p 0.dname = d.dname p 0.pname = p.pname e 0.ename = e.ename expression for e 0.salary = e.salary dept-empsworksonprojects The concept of project of a department m m 2 1 proj: Set [ dname pname emps: Set [ ename salary ] ] Two basic mappings (or source-to-target tgds or GLAV formulas) dept: Set [ dname budget emps: Set [ ename salary workson: Set [ pid ] ] projects: Set [ pid pname ] ] m 2 maps proj-emps to dept-emps-workson-projects The concept of project of an employee of a department 8

10 IBM Rational Data Architect Product 9

11 IBM Rational Data Architect Product Schema Matching, Schema Mapping and Query Generation Technologies from Clio Value Correspondences in the GUI Blue Lines: Confirmed by the users Gray Lines: Suggested by the schema matching algorithms Schema Matching Five different algorithms: two name-based (including thesaurus lookup) and three instance-based Users can choose Any weighted combination of the 5 schema matching algorithms Source or target One element (element/attribute or column) or a group of elements (subtree or table) The value k (for the top-k matches) The system returns the top-k matches for each element Current Release Source is relational (other IBM products support XML sources) Target can be relational (generates SQL) or XML (generates SQL/XML) Mapping is standardized within IBM, as an EMF in-memory object and as a serialized XML document 10

12 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Nested Mapping Model [Fuxman, Hernandez, Ho, Miller, Papotti, Popa. VLDB 06] Mapping-Based XML Transformation Engine Schema Integration Schema Evolution Clio2010: Mapping-Based Authoring of Data Flows Conclusions and Future Directions 11

13 New Nested-Mapping Engine for Clio Existing Clio engine is based on a flat mapping model Pros: easier to implement Cons: Fragmentation into many overlapping mappings Inefficiency in execution Redundancy in the output data No user-defined grouping semantics New Clio engine is based on a nested-mapping model Cons: more challenging to design and implement Pros: overcomes the above problems in the flat mapping model 12

14 (Flat) Mapping Example m 1 maps proj to dept-projects m 1 : (p 0 in proj) (d in dept) (p in d.projects) p 0.dname = d.dname p 0.pname = p.pname m 2 : (p 0 in proj) (e 0 in p 0.emps) (d in dept) (p in d.projects) (e in d.emps) (w in e.workson) w.pid = p.pid p 0.dname = d.dname p 0.pname = p.pname e 0.ename = e.ename expression for e 0.salary = e.salary dept-empsworksonprojects The concept of project of a department m m 2 1 proj: Set [ dname pname emps: Set [ ename salary ] ] Two basic mappings (or source-to-target tgds or GLAV formulas) dept: Set [ dname budget emps: Set [ ename salary workson: Set [ pid ] ] projects: Set [ pid pname ] ] m 2 maps proj-emps to dept-emps-workson-projects The concept of project of an employee of a department 13

15 Correlating Mapping Formulas m 1 : (p 0 in proj) (d in dept) (p in d.projects) p 0.dname = d.dname p 0.pname = p.pname proj tuples mapped only once m 2 : (p 0 in proj) (e 0 in p 0.emps) (d in dept) (p in d.projects) (e in d.emps) (w in e.workson) w.pid=p.pid p 0.dname = d.dname p 0.pname = p.pname e 0.ename = e.ename e 0.salary = e.salary Submapping, correlated to the parent mapping For every proj tuple, we map all employees, as a group. (Source grouping is preserved) Replace with n: (p 0 in proj) (d in dept) (p in d.projects) p 0.dname = d.dname p 0.pname = p.pname [ (e 0 in p 0.emps) (e in d.emps) (w in e.workson) w.pid=p.pid e 0.ename = e.ename e 0.salary = e.salary ] This is a nested mapping 14

16 Performance Flat query execution time / Nested query execution time KB 514 KB 312 KB Execution time for flat: 22mins Execution time for nested: 1.1s Flat query output size / Nested query output size Nesting Level 2111 KB 514 KB 312 KB Size of generated data (flat) including duplicates: 45MB Size of generated data (nested): 552KB Nesting Level The nested mapping generates much more efficient execution and less redundant data 15

17 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Nested Mapping Model Mapping-Based XML Transformation Engine [Jiang, Ho, Popa, Han. WWW 07] Schema Integration Schema Evolution Clio2010: Mapping-Based Authoring of Data Flows Conclusions and Future Directions 16

18 Performance in Executing Mappings Source schema S conforms to Logical mapping (internal) Query Generation Target schema T conforms to data Low-level mapping (SQL/XQuery/XSLT) Mappings are translated into general purpose query languages E.g., SQL, XQuery, XSLT Query optimization issues are left to the runtime engine of each of these languages to decide i.e., Q.O. decisions are not encoded in the queries. Idea: Execute mappings directly in our own runtime. 17

19 Mapping Execution Engine in Java Source schema S Logical mapping (internal) Target schema T conforms to data Mapping Execution Engine conforms to Motivation Grouping (on multiple levels) of transformed values. XQuery & XSLT have no specific constructs (hence algorithms) for such grouping tasks Scalability and efficiency Controllable memory resource usage Speed of transformation almost linear to input sizes Mapping-based Model high-level mapping semantics using IBM Mapping Specification Language (MSL) standard 18

20 Three-Phase Algorithm for Executing Mappings Phase 1 (Extract): tuple extraction Streaming holistic twig join Streaming and stored/indexed XML SQL queries through ODBC Relational source Phase 2 (Transform): generate an XML tree from each tuple Phase 3 (Merge): data merging A dynamic, scalable merge algorithm Hash-based vs. sort-based algorithms 19

21 XML-to-XML Comparative Results Running time in second Galax Quip MonetXQ Saxon Ours Data sizes (MB) 20

22 XML-to-XML Scalability Results Our Engine Running time (seconds DBLP data sizes (MB) 21

23 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Nested Mapping Model Mapping-Based XML Transformation Engine Schema Integration [Chiticariu, Hernandez, Popa, Kolaitis. VLDB 07 demo] Schema Evolution Clio2010: Mapping-Based Authoring of Data Flows Conclusions and Future Directions 22

24 Schema Integration Problem Given multiple overlapping schemas in the same domain, consolidate them into one. Output Input Source 1 Schema Source 2 Schema Schema Mappings Source n Schema Applications Provide a standard representation of the data (for unified querying, data warehousing, reference point, etc.) Metadata chaos reduction Schema integration is a hard problem Requires a lot of human interaction/feedback A series of recipes that arrive at a single integrated schema Related work [Pottinger, Bernstein. VLDB 03] [Chiticariu, Hernandez, Popa, Kolaitis. VLDB 07 demo] 23

25 Schema Integration [Chiticariu et al.] 1 User schema S 1 Schema Correspondences User schema S 2 2 Integrated schema I 1 3 Final schema I S 1 concepts S 2 concepts Integrated schema I n employee of a department B 1 B 4 dname budget dname budget ename B 3 B 2 sal address dname budget ename sal address pid pname department dname budget pid pname project of a department project of an employee of a department Step 1. Extraction of concepts from a hierarchy Step 2. Consider all possible ways of performing a hierarchical merge of the related concepts Generate initial set of candidate good integrated schemas Duplication-free enumeration algorithm Avoids duplicates by using constraints (Horn clauses) Polynomial-delay algorithm for enumerating satisfying assignments Step 3. Browse/Search/Refine set of candidate schemas Combination of partial enumeration with user constraints Based on the schemas seen so far, users express constraints on the future schemas to be generated 24

26 25

27 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Nested Mapping Model Mapping-Based XML Transformation Engine Schema Integration Schema Evolution Clio2010: Mapping-Based Authoring of Data Flows Conclusions and Future Directions 26

28 Schema Evolution The mapping Source Schema SS M ST Adapted Derived MTT T Target Schema Diff. Script Given or discovered T evolves Mapping Composition M ST = M ST M TT T When the target schema evolves, the system can generate an initial default mapping for T T automatically The user adds new mapping lines for new schema elements in T to complete the mapping M TT (for T T ) The new mapping M ST (S T ) is a composition of M ST (S T) and M TT (T T ) 27

29 Schema Evolution S evolves SS M ST T Diff. Script Given or discovered Derived MS S S S Adapted M S T = M S S M ST Mapping Composition When the source schema evolves, the system can generate an initial default mapping for S S automatically The user adds new mapping lines for new schema elements in S to complete the mapping M S S (for S S) The new mapping M S T (S T) is a composition of M S S (S S) and M ST (S T) 28

30 Related Theory and Algorithms Mapping Composition [Madhavan, Halevy. VLDB 03] [Fagin, Kolaitis, Popa, Tan. PODS 04] [Nash, Bernstein, Melnik. PODS 05] [Bernstein, Green, Melnik, Nash. VLDB 06] Mapping Inversion [Fagin. PODS 06] [Fagin, Kolaitis, Popa, Tan. PODS 07] Schema Evolution [Rahm, Bernstein. SIGMOD Rec. Dec 06] Mapping Adaptation under Evolving Schemas [Velegrakis, Miller, Popa. VLDB 03] [Yu, Popa. VLDB 05] Query rewrite [Yu, Popa. SIGMOD 04] 29

31 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Clio2010: Mapping-Based Authoring of Data Flows ETL: Mapping ETL Conversion Web-Service Composition: Mapping for web-service data sources Mashups: Mapping Mashup Generation Reuse: Mapping Polymorphism Conclusions and Future Directions 30

32 Mapping-Based Authoring of Data Flows Motivation Schema mappings are the building blocks for larger data transformation and integration applications The graph of mappings is a declarative specification of the flow of data Similar to ETL & mashups Need to take functional (web-service) data sources Need to extend Clio where multiple mappings can be defined, loaded (sharing a context), reused and managed 31

33 Clio2010 Clio2010 Automatically assemble a graph of initial, uncorrelated mappings into larger, richer mappings Support more complicated mapping composition and mapping merge Support functional data sources (e.g., web services) Compile the larger mappings into a global execution plan in Queries (XQuery) transformation scripts (XSLT) ETL flows (e.g., IBM WebSphere DataStage) mashups (e.g., IBM DAMIA) Support mapping reuse through mapping polymorphism framework 32

34 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Clio2010: Mapping-Based Authoring of Data Flows ETL: Mapping ETL Conversion [Hernandez et al. 07] Web-Service Composition: Mapping for web-service data sources Mashups: Mapping Mashup Generation Reuse: Mapping Polymorphism Conclusions and Future Directions 33

35 Orchid: Mapping ETL Conversion Round-tripping between ETL (Extract, Transform, Load) scripts and declarative mappings Generate ETL script from mappings Extract mapping information from ETL scripts Track and propagate mapping-related modifications Optimization of ETL scripts Remove redundant operations Push computation to other runtimes Rewrite ETL scripts EII Mapping in RDA ETL Flow Orchid 34

36 Mapping ETL Generation The developer inspects the connections between the sources, rules, and data warehouse model and generates a DataStage job to move data 35

37 ETL Mapping Abstraction The developer refines, tests and deploys the DataStage job to production. Need ETL Mapping: To reflect changes made in the ETL script back in the mapping. To extract mappings out of ETL scripts. 36

38 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Clio2010: Mapping-Based Authoring of Data Flows ETL: Mapping ETL Conversion Web-Service Composition: Mappings for web-service data sources [Alexe et al. 07] Mashups: Mapping Mashup Generation Reuse: Mapping Polymorphism Conclusions and Future Directions 37

39 Flows of Mappings getibmhotels : [ state, city ] Set [ hotel, geo, country, airport ] Source 1 Source 2 M 2 M 1 gethotelreviews : [ hotel, state, city ] Set [ image rooms rating reviews: Set [ review ] ] Target root: Set [ state, city, hotel, ratings: Set [ rating reviews: Set [ review ] ] M 3 38

40 Functional Data Sources Web Service Definition Argument Type Result Type Mapping between functional sources 39

41 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Clio2010: Mapping-Based Authoring of Data Flows ETL: Mapping ETL Conversion Web-Service Composition: Mapping for web-service data sources Mashups: Mapping Mashup Generation [Camacho et al. 07] Reuse: Mapping Polymorphism Conclusions and Future Directions 40

42 IBM DAMIA: A Mashup Tool IBM DAMIA Produces easily customizable flows Mashup fabric for data aggregation and transformation Web-based tool with userfriendly interface No scripting/programming skills required XQuery-like model 41

43 Generate DAMIA Flows from Clio Clio High level mappings Compiler SQL XSLT XQuery DAMIA flow 42

44 From XQuery to ETL The XQuery has a specific flow pattern, with several phases: Source extraction queries (projection, navigation, join) Result: sets of flat tuples Key generation Union Duplicate elimination Computation of target groups Generate target output (hierarchy) Back to hierarchical data We can visualize all this as a graph of operators we get the equivalent of an ETL job for XML 43

45 Doc 1 Student2, Doc 2 Student1 courseeval /Student1/* De-dup /Student2/* /courseeval/* // // // XML sid relational name sid π course π ûü eval_key=eval_key name grade sid sid name name eid = Key π π cname Key to Sk1( ) Eval_2 file Course_2 = F_course(sid, name) Key Key eid = Key Sk2( ) to Pid_0 = F_course ( ) Key Course_2 Eval_2 value_key π Pid_0 Key cname eid Key Pid_0 = F_course ( ) value_key Pid_0 π cname Key eid value_key Key value_key Course_1 Student_0 Eval_2 De-dup value_key Course_2 = Pid_0 value_key Nest-join Group-by Pid_0 concatenate Student, Course, Eval De-dup value_key relational XML... 44

46 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Clio2010: Mapping-Based Authoring of Data Flows ETL: Mapping ETL Conversion Web-Service Composition: Mapping for web-service data sources Mashups: Mapping Mashup Generation Reuse: Mapping Polymorphism [Wisnesky et al. 07] Conclusions and Future Directions 45

47 Mapping Reuse Motivaion: Existing mapping formalisms implicitly contain information that we need to be explicit Example: A Clio nested mapping expression implicitly defines a class of schemas for which it has an interpretation But in Clio mapping representation (XSML), every mapping instance requires concrete source and target schemas The flexibility to re-use mappings is lost Approach: We need a formal language for talking about mappings Require theory and tools to manipulate mappings as blocks Solution: Mapping Polymorphism 46

48 Clio Language Embedding into MF The bijection respects mapping semantics. Terms of Polymorphic Mapping Type Instantiation with concrete types Terms of Mapping Type Bijection Clio Nested Mapping Language Terms in MF MF also includes useful terms that manipulate mappings. 47

49 Outline Clio: Basic Features of a Schema Mapping System MAUI: Advanced Features of a Schema Mapping System Clio2010: Mapping-Based Authoring of Data Flows Conclusion and Related Work 48

50 Clio Innovations Over Time Product component Conference 99 SQL Assist Chocolate VLDB 2002 Cinnamon (CM 8.3) SIGMOD 2004 VLDB 2005 Criollo (RDA) SIGMOD VLDB 2001 & ICDE PODS SIGMOD 2000 demo demo VLDB 2006 Join-path generation Attribute matching XML support XSLT generation Deep union Mapping composition Schema integration SQL generation Clio research component XQuery generation Data viewer Hybrid algorithm SQL/XML generation Mapping language XQuery rewrite Schema evolution Java runtime engine Java code generation Nested mapping engine Flow mapping Web services 49

51 Conclusion Mappings define relationships between schemas of data sources Mappings and transformation queries are the new metadata Customers want I (for integration), not EAI, EII, ETL, etc. Consumability for information integration 50

52 Related Work: Very Small Subset Information Integration Survey [Haas ICDT 07] Beauty and the Beast: The Theory and Practice of Information Integration [Kolatis PODS 05] Schema mappings, data exchange, and metadata management [Halevy, Rajaraman, Ordille VLDB 06] Data Integration: The Teenage Years as part of their 10-year Best Paper Award on Information Manifold paper Schema Matching Survey by Erhard Rahm and Philip Bernstein in VLDB J. 01 Much work by Philip Bernstein, Anhai Doan, Alon Halevy, Jayant Madhavan Information Discovery project at IBM Almaden (Berthold Reinwald et al.) Data cleansing, conflict resolution, data quality, duplicate detection, entity resolution, data provenance Orchestra project at U Penn Schema Mapping Debugger [Alexe, Chiticariu, Tan. VLDB 06] A Benchmark for Schema Mapping System [Alexe, Tan, Velegrakis. 07] A new schema mapping GUI based on XQBE ideas [Ceri, Hernandez, Raffio et al. 07] Deep Web, peer-to-peer, ontology, etc 51

53 Acknowledgements: Clio Research Team The Project Founders: Laura Haas and Renee Miller (around 1999) Regulars: Clio Group: Mauricio Hernandez, Howard Ho, Lucian Popa, Ioana Stanoi Theory Group: Ron Fagin, Phokion Kolaitis, Alan Nash Past Visiting scientists: Stefan Dessloch, Paolo Papotti, Wang-Chiew Tan, Yannis Velegrakis, Cathy Wyss Past Interns and Post-docs: Bogdan Alexe, Alfredo Camacho, Laura Chiticariu, Ariel Fuxman, Michael Gubanov, Haifeng Jiang, Felix Naumann, Paolo Papotti, Ahmed Radwan, Alessandro Raffio, Michael Richmond, Shivkumar Shivaji, Yannis Velegrakis, Melanie Weis, Ryan Wisnesky, Lingling Yan, Cong Yu, Jindan Zhou 52

Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration

Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration Howard Ho IBM Almaden Research Center Take Home Messages Clio (Schema Mapping for EII) has evolved to Clio 2010

More information

Bio/Ecosystem Informatics

Bio/Ecosystem Informatics Bio/Ecosystem Informatics Renée J. Miller University of Toronto DB research problem: managing data semantics R. J. Miller University of Toronto 1 Managing Data Semantics Semantics modeled by Schemas (structure

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages DBAI-TR-2012-76 UMAP: A Universal Layer for Schema Mapping Languages Florin Chertes and Ingo Feinerer Technische Universität Wien, Vienna, Austria Institut für Informationssysteme FlorinChertes@acm.org

More information

Nested Mappings: Schema Mapping Reloaded

Nested Mappings: Schema Mapping Reloaded Nested Mappings: Schema Mapping Reloaded Ariel Fuxman University of Toronto afuxman@cs.toronto.edu Renee J. Miller University of Toronto miller@cs.toronto.edu Mauricio A. Hernandez IBM Almaden Research

More information

Clio Grows Up: From Research Prototype to Industrial Tool

Clio Grows Up: From Research Prototype to Industrial Tool Clio Grows Up: From Research Prototype to Industrial Tool Laura M. Haas IBM Silicon Valley Labs laura@almaden.ibm.com Mauricio A. Hernández IBM Almaden Research Center mauricio@almaden.ibm.com Lucian Popa

More information

Orchid: Integrating Schema Mapping and ETL

Orchid: Integrating Schema Mapping and ETL Orchid: Integrating Schema Mapping and ETL (Exted Version) Stefan Dessloch, Mauricio A. Hernández, Ryan Wisnesky, Ahmed Radwan, Jindan Zhou Department of Computer Science, University of Kaiserslautern

More information

A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange

A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange 1. Problem and Motivation A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange Laura Chiticariu and Wang Chiew Tan UC Santa Cruz {laura,wctan}@cs.ucsc.edu Data exchange is

More information

Schema Management. Abstract

Schema Management. Abstract Schema Management Periklis Andritsos Λ Ronald Fagin y Ariel Fuxman Λ Laura M. Haas y Mauricio A. Hernández y Ching-Tien Ho y Anastasios Kementsietsidis Λ Renée J. Miller Λ Felix Naumann y Lucian Popa y

More information

The interaction of theory and practice in database research

The interaction of theory and practice in database research The interaction of theory and practice in database research Ron Fagin IBM Research Almaden 1 Purpose of This Talk Encourage collaboration between theoreticians and system builders via two case studies

More information

Scalable Data Exchange with Functional Dependencies

Scalable Data Exchange with Functional Dependencies Scalable Data Exchange with Functional Dependencies Bruno Marnette 1, 2 Giansalvatore Mecca 3 Paolo Papotti 4 1: Oxford University Computing Laboratory Oxford, UK 2: INRIA Saclay, Webdam Orsay, France

More information

Composing Schema Mapping

Composing Schema Mapping Composing Schema Mapping An Overview Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Joint work with R. Fagin, L. Popa, and W.C. Tan 1 Data Interoperability Data may reside at several different

More information

Creating a Mediated Schema Based on Initial Correspondences

Creating a Mediated Schema Based on Initial Correspondences Creating a Mediated Schema Based on Initial Correspondences Rachel A. Pottinger University of Washington Seattle, WA, 98195 rap@cs.washington.edu Philip A. Bernstein Microsoft Research Redmond, WA 98052-6399

More information

A Classification of Schema Mappings and Analysis of Mapping Tools

A Classification of Schema Mappings and Analysis of Mapping Tools A Classification of Schema Mappings and Analysis of Mapping Tools Frank Legler IBM Deutschland Entwicklung GmbH flegler@de.ibm.com Felix Naumann Hasso-Plattner-Institut, Potsdam naumann@hpi.uni-potsdam.de

More information

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University. COCS 6421 Advanced Database Systems CSE, York University March 20, 2008 Agenda 1 Problem description Problems 2 3 Open questions and future work Conclusion Bibliography Problem description Problems Why

More information

Algebraic Model Management: A Survey

Algebraic Model Management: A Survey Algebraic Model Management: A Survey Patrick Schultz 1, David I. Spivak 1, and Ryan Wisnesky 2 1 Massachusetts Institute of Technology 2 Categorical Informatics, Inc. Abstract. We survey the field of model

More information

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 Foundations of Data Exchange and Metadata Management Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 The need for a formal definition We had a paper with Ron in PODS 2004 Back then I was a Ph.D.

More information

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob 1,2, Reinhard Pichler 1, and Emanuel Sallinger 2 1 TU Wien and 2 University of Oxford Tuple-generating

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. In this paper we study the

More information

Data Integration and Data Warehousing Database Integration Overview

Data Integration and Data Warehousing Database Integration Overview Data Integration and Data Warehousing Database Integration Overview Sergey Stupnikov Institute of Informatics Problems, RAS ssa@ipi.ac.ru Outline Information Integration Problem Heterogeneous Information

More information

BioFederator: A Data Federation System for Bioinformatics on the Web

BioFederator: A Data Federation System for Bioinformatics on the Web BioFederator: A Data Federation System for Bioinformatics on the Web Ahmed Radwan, Akmal Younis, Mauricio A. Hernandez, Howard Ho, Department of Electrical and Computer Engineering Lucian Popa, Shivkumar

More information

XML. Objectives. Duration. Audience. Pre-Requisites

XML. Objectives. Duration. Audience. Pre-Requisites XML XML - extensible Markup Language is a family of standardized data formats. XML is used for data transmission and storage. Common applications of XML include business to business transactions, web services

More information

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time!

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time! Lecture 13 Advanced Query Processing CS5208 Advanced QP 1 New Requirements Top-N/Bottom-N queries Interactive queries Decision making queries Tolerant of errors approximate answers acceptable Control over

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. We study the schema exchange

More information

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly

More information

Mapper An Efficient Data Transformation Operator

Mapper An Efficient Data Transformation Operator Mapper An Efficient Data Transformation Operator Paulo Carreira University of Lisbon 2 Context Source: IMPCRED CREDTID MERCH FT546083 RT546084 8 PCS LARGE CONTAINER MODULE X067 100 PCS COTTON CORDUROY

More information

Kanata: Adaptation and Evolution in Data Sharing Systems

Kanata: Adaptation and Evolution in Data Sharing Systems Kanata: Adaptation and Evolution in Data Sharing Systems Periklis Andritsos Ariel Fuxman Anastasios Kementsietsidis Renée J. Miller Yannis Velegrakis Department of Computer Science University of Toronto

More information

Author: Irena Holubová Lecturer: Martin Svoboda

Author: Irena Holubová Lecturer: Martin Svoboda NPRG036 XML Technologies Lecture 6 XSLT 9. 4. 2018 Author: Irena Holubová Lecturer: Martin Svoboda http://www.ksi.mff.cuni.cz/~svoboda/courses/172-nprg036/ Lecture Outline XSLT Principles Templates Instructions

More information

Generic Model Management

Generic Model Management Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation April 29, 2002 2002 Microsoft Corp. 1 The Problem ithere is 30 years of DB Research

More information

SCHEMA MAPPING DESIGN SYSTEMS 1. Schema Mapping Design Systems: Example-Driven and Semantic Approaches. Kathryn Dahlgren. CSU Stanislaus CS4960

SCHEMA MAPPING DESIGN SYSTEMS 1. Schema Mapping Design Systems: Example-Driven and Semantic Approaches. Kathryn Dahlgren. CSU Stanislaus CS4960 SCHEMA MAPPING DESIGN SYSTEMS 1 Schema Mapping Design Systems: Example-Driven and Semantic Approaches Kathryn Dahlgren CSU Stanislaus CS4960 Dr. Melanie Martin SCHEMA MAPPING DESIGN SYSTEMS 2 Introduction

More information

Database Design and Tuning

Database Design and Tuning Database Design and Tuning Chapter 20 Comp 521 Files and Databases Spring 2010 1 Overview After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for

More information

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining XML DTD Relational Databases for Querying XML Documents: Limitations and Opportunities Semi-structured SGML Emerging as a standard E.g. john 604xxxxxxxx 778xxxxxxxx

More information

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should call this dream or phantasm

More information

Database Management Systems (COP 5725) Homework 3

Database Management Systems (COP 5725) Homework 3 Database Management Systems (COP 5725) Homework 3 Instructor: Dr. Daisy Zhe Wang TAs: Yang Chen, Kun Li, Yang Peng yang, kli, ypeng@cise.uf l.edu November 26, 2013 Name: UFID: Email Address: Pledge(Must

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L11: Physical Database Design Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR, China

More information

Call: Datastage 8.5 Course Content:35-40hours Course Outline

Call: Datastage 8.5 Course Content:35-40hours Course Outline Datastage 8.5 Course Content:35-40hours Course Outline Unit -1 : Data Warehouse Fundamentals An introduction to Data Warehousing purpose of Data Warehouse Data Warehouse Architecture Operational Data Store

More information

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query

More information

Database Design Process

Database Design Process Database Design Process Real World Functional Requirements Requirements Analysis Database Requirements Functional Analysis Access Specifications Application Pgm Design E-R Modeling Choice of a DBMS Data

More information

XML Wrap-up. CS 431 March 1, 2006 Carl Lagoze Cornell University

XML Wrap-up. CS 431 March 1, 2006 Carl Lagoze Cornell University XML Wrap-up CS 431 March 1, 2006 Carl Lagoze Cornell University XSLT Processing Model Input XSL doc parse Input XML doc parse Parsed tree serialize Input XML doc Parsed tree Xformed tree Output doc (xml,

More information

Introduction to Federation Server

Introduction to Federation Server Introduction to Federation Server Alex Lee IBM Information Integration Solutions Manager of Technical Presales Asia Pacific 2006 IBM Corporation WebSphere Federation Server Federation overview Tooling

More information

A Mapping Model for Transforming Nested XML Documents

A Mapping Model for Transforming Nested XML Documents IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.2A, February 2006 83 A Mapping Model for Transforming Nested XML Documents Gang Qian, and Yisheng Dong, Department of Computer

More information

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information

COSC Dr. Ramon Lawrence. Emp Relation

COSC Dr. Ramon Lawrence. Emp Relation COSC 304 Introduction to Database Systems Normalization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Normalization Normalization is a technique for producing relations

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

Structural characterizations of schema mapping languages

Structural characterizations of schema mapping languages Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema

More information

Foundations and Applications of Schema Mappings

Foundations and Applications of Schema Mappings Foundations and Applications of Schema Mappings Phokion G. Kolaitis University of California Santa Cruz & IBM Almaden Research Center The Data Interoperability Challenge Data may reside at several different

More information

Physical Database Design and Tuning. Chapter 20

Physical Database Design and Tuning. Chapter 20 Physical Database Design and Tuning Chapter 20 Introduction We will be talking at length about database design Conceptual Schema: info to capture, tables, columns, views, etc. Physical Schema: indexes,

More information

COMP7640 Assignment 2

COMP7640 Assignment 2 COMP7640 Assignment 2 Due Date: 23:59, 14 November 2014 (Fri) Description Question 1 (20 marks) Consider the following relational schema. An employee can work in more than one department; the pct time

More information

Designing your BI Architecture

Designing your BI Architecture IBM Software Group Designing your BI Architecture Data Movement and Transformation David Cope EDW Architect Asia Pacific 2007 IBM Corporation DataStage and DWE SQW Complex Files SQL Scripts ERP ETL Engine

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

Overview of Data Management

Overview of Data Management Overview of Data Management School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Overview of Data Management 1 / 21 What is Data ANSI definition of data: 1 A representation

More information

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design Administrivia Physical Database Design R&G Chapter 16 Lecture 26 Homework 5 available Due Monday, December 8 Assignment has more details since first release Large data files now available No class Thursday,

More information

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until Databases Relational Model, Algebra and operations How do we model and manipulate complex data structures inside a computer system? Until 1970.. Many different views or ways of doing this Could use tree

More information

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E) QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E) 2 LECTURE OUTLINE Query Processing Methodology Basic Operations and Their Costs Generation of Execution Plans 3 QUERY PROCESSING IN A DDBMS

More information

Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange

Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange S. Ravichandra, and D.V.L.N. Somayajulu Abstract Schema mapping is a declarative specification of the relationship

More information

COSC 304 Introduction to Database Systems SQL. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 304 Introduction to Database Systems SQL. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 304 Introduction to Database Systems SQL Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca SQL Queries Querying with SQL is performed using a SELECT statement. The general

More information

10. Record-Oriented DB Interface

10. Record-Oriented DB Interface 10 Record-Oriented DB Interface Theo Härder wwwhaerderde Goals - Design principles for record-oriented and navigation on logical access paths - Development of a scan technique and a Main reference: Theo

More information

QUERY OPTIMIZATION [CH 15]

QUERY OPTIMIZATION [CH 15] Spring 2017 QUERY OPTIMIZATION [CH 15] 4/12/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1 Example SELECT distinct ename FROM Emp E, Dept D WHERE E.did = D.did and D.dname = Toy EMP

More information

XML. COSC Dr. Ramon Lawrence. An attribute is a name-value pair declared inside an element. Comments. Page 3. COSC Dr.

XML. COSC Dr. Ramon Lawrence. An attribute is a name-value pair declared inside an element. Comments. Page 3. COSC Dr. COSC 304 Introduction to Database Systems XML Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca XML Extensible Markup Language (XML) is a markup language that allows for

More information

More SQL: Complex Queries, Triggers, Views, and Schema Modification

More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

6232B: Implementing a Microsoft SQL Server 2008 R2 Database 6232B: Implementing a Microsoft SQL Server 2008 R2 Database Course Overview This instructor-led course is intended for Microsoft SQL Server database developers who are responsible for implementing a database

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

SQL Queries. COSC 304 Introduction to Database Systems SQL. Example Relations. SQL and Relational Algebra. Example Relation Instances

SQL Queries. COSC 304 Introduction to Database Systems SQL. Example Relations. SQL and Relational Algebra. Example Relation Instances COSC 304 Introduction to Database Systems SQL Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca SQL Queries Querying with SQL is performed using a SELECT statement. The general

More information

Phil Bernstein. Microsoft Research. Most slides come from SIGMOD 07 Keynote & Bridging Apps & DB, both with Sergey Melnik. Nov.

Phil Bernstein. Microsoft Research. Most slides come from SIGMOD 07 Keynote & Bridging Apps & DB, both with Sergey Melnik. Nov. Phil Bernstein Microsoft Research Nov. 26, 2007 Most slides come from SIGMOD 07 Keynote & Bridging Apps & DB, both with Sergey Melnik 1 Make it easier to write programs that access databases Traditionally,

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Information Systems. Database System Architecture. Relational Databases. Nikolaj Popov

Information Systems. Database System Architecture. Relational Databases. Nikolaj Popov Information Systems Database System Architecture. Relational Databases Nikolaj Popov Research Institute for Symbolic Computation Johannes Kepler University of Linz, Austria popov@risc.uni-linz.ac.at Outline

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Question 1 (a) 10 marks

Question 1 (a) 10 marks Question 1 (a) Consider the tree in the next slide. Find any/ all violations of a B+tree structure. Identify each bad node and give a brief explanation of each error. Assume the order of the tree is 4

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data

More information

(Big Data Integration) : :

(Big Data Integration) : : (Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki

More information

Designing Information-Preserving Mapping Schemes for XML

Designing Information-Preserving Mapping Schemes for XML Designing Information-Preserving Mapping Schemes for XML Denilson Barbosa Juliana Freire Alberto O. Mendelzon VLDB 2005 Motivation An XML-to-relational mapping scheme consists of a procedure for shredding

More information

Semi-structured Data 11 - XSLT

Semi-structured Data 11 - XSLT Semi-structured Data 11 - XSLT Andreas Pieris and Wolfgang Fischl, Summer Term 2016 Outline What is XSLT? XSLT at First Glance XSLT Templates Creating Output Further Features What is XSLT? XSL = extensible

More information

Excel to XML v3. Compatibility Switch 13 update 1 and higher. Windows or Mac OSX.

Excel to XML v3. Compatibility Switch 13 update 1 and higher. Windows or Mac OSX. App documentation Page 1/5 Excel to XML v3 Description Excel to XML will let you submit an Excel file in the format.xlsx to a Switch flow where it will be converted to XML and/or metadata sets. It will

More information

Semantic Errors in Database Queries

Semantic Errors in Database Queries Semantic Errors in Database Queries 1 Semantic Errors in Database Queries Stefan Brass TU Clausthal, Germany From April: University of Halle, Germany Semantic Errors in Database Queries 2 Classification

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

IMPORTANT: Circle the last two letters of your class account:

IMPORTANT: Circle the last two letters of your class account: Fall 2001 University of California, Berkeley College of Engineering Computer Science Division EECS Prof. Michael J. Franklin FINAL EXAM CS 186 Introduction to Database Systems NAME: STUDENT ID: IMPORTANT:

More information

Physical Database Design and Tuning

Physical Database Design and Tuning Physical Database Design and Tuning CS 186, Fall 2001, Lecture 22 R&G - Chapter 16 Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Excel to XML v4. Version adds two Private Data sets

Excel to XML v4. Version adds two Private Data sets Excel to XML v4 Page 1/6 Excel to XML v4 Description Excel to XML will let you submit an Excel file in the format.xlsx to a Switch flow were it will be converted to XML and/or metadata sets. It will accept

More information

Part V. Working with Information Systems. Marc H. Scholl (DBIS, Uni KN) Information Management Winter 2007/08 1

Part V. Working with Information Systems. Marc H. Scholl (DBIS, Uni KN) Information Management Winter 2007/08 1 Part V Working with Information Systems Marc H. Scholl (DBIS, Uni KN) Information Management Winter 2007/08 1 Outline of this part 1 Introduction to Database Languages Declarative Languages Option 1: Graphical

More information

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to implement a data warehouse with Microsoft SQL Server.

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Introduction to Data Management. Lecture #17 (Physical DB Design!) Introduction to Data Management Lecture #17 (Physical DB Design!) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info:

More information

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University Introduction to Wireless Sensor Network Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University 1 A Database Primer 2 A leap in history A brief overview/review of databases (DBMS s)

More information

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you

More information

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan SQL CS 564- Fall 2015 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan MOTIVATION The most widely used database language Used to query and manipulate data SQL stands for Structured Query Language many SQL standards:

More information

Outline. 1 CS520-5) Data Exchange

Outline. 1 CS520-5) Data Exchange Outline 0) Course Info 1) Introduction 2) Data Preparation and Cleaning 3) Schema matching and mapping 4) Virtual Data Integration 5) Data Exchange 6) Data Warehousing 7) Big Data Analytics 8) Data Provenance

More information

Grid Computing Systems: A Survey and Taxonomy

Grid Computing Systems: A Survey and Taxonomy Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical

More information

Keyword Search over Hybrid XML-Relational Databases

Keyword Search over Hybrid XML-Relational Databases SICE Annual Conference 2008 August 20-22, 2008, The University Electro-Communications, Japan Keyword Search over Hybrid XML-Relational Databases Liru Zhang 1 Tadashi Ohmori 1 and Mamoru Hoshi 1 1 Graduate

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

XML Systems & Benchmarks

XML Systems & Benchmarks XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) ER to Relational & Relational Algebra Lecture 4, January 20, 2015 Mohammad Hammoud Today Last Session: The relational model Today s Session: ER to relational Relational algebra

More information

STBenchmark: Towards a Benchmark for Mapping Systems

STBenchmark: Towards a Benchmark for Mapping Systems STBenchmark: Towards a Benchmark for Mapping Systems Bogdan Alexe UC Santa Cruz abogdan@cs.ucsc.edu Wang-Chiew Tan UC Santa Cruz wctan@cs.ucsc.edu Yannis Velegrakis University of Trento velgias@disi.unitn.eu

More information

SQL STRUCTURED QUERY LANGUAGE

SQL STRUCTURED QUERY LANGUAGE STRUCTURED QUERY LANGUAGE SQL Structured Query Language 4.1 Introduction Originally, SQL was called SEQUEL (for Structured English QUery Language) and implemented at IBM Research as the interface for an

More information