A Collective, Probabilistic Approach to Schema Mapping

Size: px
Start display at page:

Download "A Collective, Probabilistic Approach to Schema Mapping"

Transcription

1 A Collective, Probabilistic Approach to Schema Mapping Angelika Kimmig, Alex Memory, Renée Miller, Lise Getoor ILP 2017 (published at ICDE 2017) 1

2 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org target oid 111 SAP 222 MS 333 Z 444 HC

3 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration transfer data 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org target oid 111 SAP 222 MS 333 Z 444 HC

4 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration transfer data rewrite queries 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org target oid 111 SAP 222 MS 333 Z 444 HC

5 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration transfer data need schema mapping rewrite queries 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org oid target 111 SAP 222 MS 333 Z 444 HC

6 source emp proj Context: Data Exchange & id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X Data Integration transfer data emp(i,n,c) leader(n) need schema proj(t,m,l) S. O. (T,S,O) BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader org target 5 Tom Y7 mapping proj(t,m,l) & emp(l,n,c) O. (T,N,O) & org(o,c) oid Alice 111 SAP topic mgr lead Bob 222 MS BigData 1 2 Jim 333 Z ML 1 1 rewrite queries egov 4 5 Ann 444 HC DM 5 5 Igor 2

7 Schema Mapping st tgd = source-target tuple generating dependency [Fagin et al, 05; ten Cate & Kolaitis, 10] first order rule 8x (x)!9y (x, y) conjunctive query on source schema conjunctive query on target schema a schema mapping is a set of st tgds 3

8 Goal: learn schema mapping proj topic mgr lead BigData 1 2 ML 1 1 emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS from data BigData Alice 111 ML Alice 111 leader org Alice Bob oid SAP MS proj(t,m,l) & emp(l,n,c) leader(n) emp(i,n,c) O. org(o,c) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) 4

9 Goal: learn schema mapping proj topic mgr lead BigData 1 2 ML 1 1 emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS from data BigData Alice 111 ML Alice 111 leader org Alice Bob oid SAP MS Challenges: ambiguous metadata imperfect data existentials / nulls proj(t,m,l) & emp(l,n,c) leader(n) emp(i,n,c) O. org(o,c) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) 4

10 Here: Given data example (I,J), candidate set C Find optimal M C for (I,J) proj I J C topic mgr lead BigData 1 2 BigData Alice 111 ML 1 1 ML Alice 111 emp leader org id company oid 1 Alice SAP Alice 111 SAP 2 Bob IBM Bob 222 MS 3 Pat MS emp(i,n,c) leader(n) proj(t,m,l) S. O. (T,S,O) proj(t,m,l) & emp(l,n,c) leader(n) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) proj(t,m,l) & emp(l,n,c) O. (T,N,O) & org(o,c) emp(i,n,c) O. org(o,c) emp(i,n,c) O. org(o,n)... M proj(t,m,l) & emp(l,n,c) leader(n) emp(i,n,c) O. org(o,c) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) 5

11 Full st tgds (no ) errors of M explained by M J Universal(M,I) not explained by M goal: find small M that maximizes intersection 6

12 Arbitrary st tgds: replace containment with homomorphism checks errors of M explained by M Universal(M,I) J 7

13 Arbitrary st tgds: replace containment with homomorphism checks errors of M explained by M Universal(M,I) J tuples where we can replace nulls by constants such that we get tuples in J = no errors 7

14 Arbitrary st tgds: replace containment with homomorphism checks errors of M explained by M Universal(M,I) J tuples where we can replace nulls by constants such that we get tuples in J = no errors tuples which can be obtained from tuples in Universal(M,I) replacing nulls by constants = (partially) explained 7

15 Example BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid leader org oid null1 IBM Alice 111 SAP null2 SAP Bob 222 MS 8

16 Example BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid leader org oid null1 IBM Alice 111 SAP null2 SAP Bob 222 MS no errors: replace null2 by

17 Example BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid null1 null2 IBM SAP leader Alice Bob org oid 111 SAP 222 MS no errors: replace null2 by to explain these

18 Example errors: can t replace null1 to get tuples in J BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid null1 null2 IBM SAP leader Alice Bob org oid 111 SAP 222 MS no errors: replace null2 by to explain these

19 Example errors: can t replace null1 to get tuples in J not explained by any tuple in Universal(M,I) BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid null1 null2 IBM SAP leader Alice Bob org oid 111 SAP 222 MS no errors: replace null2 by to explain these

20 Task Universal(M,I) J Given source schema S, target schema T errors of M data example (I,J) explained by M set C of candidate st tgds Find an optimal mapping M, i.e., arg min M C 2 4size(M)+ X t2j(1 explains(m,t)) + X t2universal(c,i) J 3 error(m,t) 5 9

21 Task Universal(M,I) J Given source schema S, target schema T errors of M data example (I,J) set C of candidate st tgds explained by M NP-hard even Find an optimal mapping M, i.e., arg min M C 2 4size(M)+ X t2j(1 explains(m,t)) + X t2universal(c,i) for full st tgds J 3 error(m,t) 5 9

22 Probabilistic Soft Logic (PSL) declarative language to specify probabilistic models over logical atoms / relational domains PSL program = set of weighted first order rules w : b 1 ( ~ X) ^...^ b n ( ~ X)! h 1 ( ~ X) _..._ h m ( ~ X) MPE inference = finding most likely model efficient approximate solver with guarantee on solution quality 10

23 Our PSL program J(T ) given J(T ) covers(f, T) creates(f, T) creates(f, T) covers(f, T) 11

24 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M 11

25 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M minimize #errors: 1:in(F ) ^ creates(f, T)! J(T ) 11

26 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M minimize #errors: minimize #unexplained: 1:in(F ) ^ creates(f, T)! J(T ) 1:J(T )!9F.covers(F, T) ^ in(f ) 11

27 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M minimize #errors: minimize #unexplained: minimize size of M: 1:in(F ) ^ creates(f, T)! J(T ) 1:J(T )!9F.covers(F, T) ^ in(f ) size(f ):in(f )!? 11

28 New PSL construct: prioritized disjunction rules 1:J(T )!9F.covers(F, T) ^ in(f ) observed priority between 0 (low) and 1 (high) to be inferred automatically transformed into set of standard PSL rules expressing a preference for inferred atoms with higher priority 12

29 Experimental evaluation scenarios generated using ibench [Arocena et al, 15] candidate st tgds generated using Clio [Fagin et al, 09] E1: increasingly noisy candidates, perfect data metadata-only baseline suffers, we get perfect mappings E2: ambiguous set of candidates, increasingly noisy data high quality mappings found for up to 25% unexpected and 10% missing target tuples in J 13

30 Given - metadata - data example - candidate st tgds Our Contributions Find - small set of st tgds - minimally invalid - maximally explaining 14

31 Given - metadata - data example - candidate st tgds Our Contributions Collective Mapping Discovery (CMD) declarative probabilistic model + MPE inference using Probabilistic Soft Logic (PSL) Find - small set of st tgds - minimally invalid - maximally explaining 14

32 Given - metadata - data example - candidate st tgds Find Collective Mapping - small set of st tgds - minimally invalid Discovery (CMD) declarative probabilistic model + MPE inference using Probabilistic Soft Logic (PSL) - maximally explaining Our Contributions supports arbitrary st tgds jointly reasons about metadata and data handles noisy input efficient solver with quality guarantee declarative, extensible definition of optimization 14

33 Given - metadata - data example - candidate st tgds Find Collective Mapping - small set of st tgds - minimally invalid Discovery (CMD) declarative probabilistic model + MPE inference using Probabilistic Soft Logic (PSL) - maximally explaining Our Contributions supports arbitrary st tgds jointly reasons about metadata and data handles noisy input efficient solver with quality guarantee declarative, extensible definition of optimization Thanks! 14

A Collective, Probabilistic Approach to Schema Mapping

A Collective, Probabilistic Approach to Schema Mapping A Collective, Probabilistic Approach to Schema Mapping Angelika Kimmig KU Leuven angelika.kimmig@cs.kuleuven.be Alex Memory University of Maryland memory@cs.umd.edu Renée J. Miller University of Toronto

More information

Structural characterizations of schema mapping languages

Structural characterizations of schema mapping languages Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema

More information

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 Foundations of Data Exchange and Metadata Management Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 The need for a formal definition We had a paper with Ron in PODS 2004 Back then I was a Ph.D.

More information

Composing Schema Mapping

Composing Schema Mapping Composing Schema Mapping An Overview Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Joint work with R. Fagin, L. Popa, and W.C. Tan 1 Data Interoperability Data may reside at several different

More information

Outline. 1 CS520-5) Data Exchange

Outline. 1 CS520-5) Data Exchange Outline 0) Course Info 1) Introduction 2) Data Preparation and Cleaning 3) Schema matching and mapping 4) Virtual Data Integration 5) Data Exchange 6) Data Warehousing 7) Big Data Analytics 8) Data Provenance

More information

Scalable Data Exchange with Functional Dependencies

Scalable Data Exchange with Functional Dependencies Scalable Data Exchange with Functional Dependencies Bruno Marnette 1, 2 Giansalvatore Mecca 3 Paolo Papotti 4 1: Oxford University Computing Laboratory Oxford, UK 2: INRIA Saclay, Webdam Orsay, France

More information

The interaction of theory and practice in database research

The interaction of theory and practice in database research The interaction of theory and practice in database research Ron Fagin IBM Research Almaden 1 Purpose of This Talk Encourage collaboration between theoreticians and system builders via two case studies

More information

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob 1,2, Reinhard Pichler 1, and Emanuel Sallinger 2 1 TU Wien and 2 University of Oxford Tuple-generating

More information

Data Exchange: Semantics and Query Answering

Data Exchange: Semantics and Query Answering Data Exchange: Semantics and Query Answering Ronald Fagin Phokion G. Kolaitis Renée J. Miller Lucian Popa IBM Almaden Research Center fagin,lucian @almaden.ibm.com University of California at Santa Cruz

More information

Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange

Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange S. Ravichandra, and D.V.L.N. Somayajulu Abstract Schema mapping is a declarative specification of the relationship

More information

Efficient and scalable Data Exchange with target functional dependencies

Efficient and scalable Data Exchange with target functional dependencies Efficient and scalable Data Exchange with target functional dependencies Ioana Ileana Joint work 1 with Angela Bonifati (Univ. Lyon 1) and Michele Linardi (Univ. Paris Descartes) 1 A. Bonifati, I. Ileana,

More information

Schema Mappings, Data Exchange, and Metadata Management

Schema Mappings, Data Exchange, and Metadata Management Schema Mappings, Data Exchange, and Metadata Management Phokion G. Kolaitis IBM Almaden Research Center kolaitis@almaden.ibm.com ABSTRACT Schema mappings are high-level specifications that describe the

More information

SCHEMA MAPPING DESIGN SYSTEMS 1. Schema Mapping Design Systems: Example-Driven and Semantic Approaches. Kathryn Dahlgren. CSU Stanislaus CS4960

SCHEMA MAPPING DESIGN SYSTEMS 1. Schema Mapping Design Systems: Example-Driven and Semantic Approaches. Kathryn Dahlgren. CSU Stanislaus CS4960 SCHEMA MAPPING DESIGN SYSTEMS 1 Schema Mapping Design Systems: Example-Driven and Semantic Approaches Kathryn Dahlgren CSU Stanislaus CS4960 Dr. Melanie Martin SCHEMA MAPPING DESIGN SYSTEMS 2 Introduction

More information

Validity-Sensitive Querying of XML Databases

Validity-Sensitive Querying of XML Databases Validity-Sensitive Querying of XML Databases Slawomir Staworko Jan homicki Department of omputer Science University at Buffalo DataX, March 26, 2006 Motivation Querying Invalid XML Integration of XML documents

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns) tuples (or rows) 2.2 Attribute Types The

More information

Bio/Ecosystem Informatics

Bio/Ecosystem Informatics Bio/Ecosystem Informatics Renée J. Miller University of Toronto DB research problem: managing data semantics R. J. Miller University of Toronto 1 Managing Data Semantics Semantics modeled by Schemas (structure

More information

Structural Characterizations of Schema-Mapping Languages

Structural Characterizations of Schema-Mapping Languages Structural Characterizations of Schema-Mapping Languages Balder ten Cate University of Amsterdam and UC Santa Cruz balder.tencate@uva.nl Phokion G. Kolaitis UC Santa Cruz and IBM Almaden kolaitis@cs.ucsc.edu

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki

More information

Markov Logic: Representation

Markov Logic: Representation Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into

More information

Foundations of Schema Mapping Management

Foundations of Schema Mapping Management Foundations of Schema Mapping Management Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile University of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk

More information

A Retrospective on Datalog 1.0

A Retrospective on Datalog 1.0 A Retrospective on Datalog 1.0 Phokion G. Kolaitis UC Santa Cruz and IBM Research - Almaden Datalog 2.0 Vienna, September 2012 2 / 79 A Brief History of Datalog In the beginning of time, there was E.F.

More information

Outline. 1 CS520-5) Data Exchange. 3 CS520-5) Data Exchange. Person Name Address Office-phone Office-address Home-phone

Outline. 1 CS520-5) Data Exchange. 3 CS520-5) Data Exchange. Person Name Address Office-phone Office-address Home-phone Outline IIT DBGroup CS520 Data Integration, Warehousing, and Provenance Boris Glavic http://www.cs.iit.edu/~glavic/ http://www.cs.iit.edu/~cs520/ http://www.cs.iit.edu/~dbgroup/ 0) Course Info 1) Introduction

More information

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data

More information

Provable data privacy

Provable data privacy Provable data privacy Kilian Stoffel 1 and Thomas Studer 2 1 Université de Neuchâtel, Pierre-à-Mazel 7, CH-2000 Neuchâtel, Switzerland kilian.stoffel@unine.ch 2 Institut für Informatik und angewandte Mathematik,

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Composition and Inversion of Schema Mappings

Composition and Inversion of Schema Mappings Composition and Inversion of Schema Mappings Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile U. of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk

More information

Logical Foundations of Relational Data Exchange

Logical Foundations of Relational Data Exchange Logical Foundations of Relational Data Exchange Pablo Barceló Department of Computer Science, University of Chile pbarcelo@dcc.uchile.cl 1 Introduction Data exchange has been defined as the problem of

More information

Checking Containment of Schema Mappings (Preliminary Report)

Checking Containment of Schema Mappings (Preliminary Report) Checking Containment of Schema Mappings (Preliminary Report) Andrea Calì 3,1 and Riccardo Torlone 2 Oxford-Man Institute of Quantitative Finance, University of Oxford, UK Dip. di Informatica e Automazione,

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)

More information

The Inverse of a Schema Mapping

The Inverse of a Schema Mapping The Inverse of a Schema Mapping Jorge Pérez Department of Computer Science, Universidad de Chile Blanco Encalada 2120, Santiago, Chile jperez@dcc.uchile.cl Abstract The inversion of schema mappings has

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

Logic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Logic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research - Almaden Lecture 4 - Part 2 2 / 17 Alternative Semantics of Queries Bag Semantics We focused on the containment problem for conjunctive

More information

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi

More information

KNOWLEDGE GRAPH CONSTRUCTION

KNOWLEDGE GRAPH CONSTRUCTION KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara Karlsruhe Institute of Technology 7/7/2015 Can Computers Create Knowledge? Internet Massive source of publicly available information Knowledge Computers + Knowledge

More information

Part I: Structured Data

Part I: Structured Data Inf1-DA 2011 2012 I: 92 / 117 Part I Structured Data Data Representation: I.1 The entity-relationship (ER) data model I.2 The relational model Data Manipulation: I.3 Relational algebra I.4 Tuple-relational

More information

Designing and Refining Schema Mappings via Data Examples

Designing and Refining Schema Mappings via Data Examples Designing and Refining Schema Mappings via Data Examples Bogdan Alexe UCSC abogdan@cs.ucsc.edu Balder ten Cate UCSC balder.tencate@gmail.com Phokion G. Kolaitis UCSC & IBM Research - Almaden kolaitis@cs.ucsc.edu

More information

KNOWLEDGE GRAPH IDENTIFICATION

KNOWLEDGE GRAPH IDENTIFICATION KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1, Hui Miao 1, Lise Getoor 1, William Cohen 2 1 University of Maryland, College Park, US 2 Carnegie Mellon University International Semantic Web Conference 10/25/2013

More information

Foundations and Applications of Schema Mappings

Foundations and Applications of Schema Mappings Foundations and Applications of Schema Mappings Phokion G. Kolaitis University of California Santa Cruz & IBM Almaden Research Center The Data Interoperability Challenge Data may reside at several different

More information

Query Processing SL03

Query Processing SL03 Distributed Database Systems Fall 2016 Query Processing Overview Query Processing SL03 Distributed Query Processing Steps Query Decomposition Data Localization Query Processing Overview/1 Query processing:

More information

Logic (or Declarative) Programming Foundations: Prolog. Overview [1]

Logic (or Declarative) Programming Foundations: Prolog. Overview [1] Logic (or Declarative) Programming Foundations: Prolog In Text: Chapter 12 Formal logic Logic programming Prolog Overview [1] N. Meng, S. Arthur 2 1 Logic Programming To express programs in a form of symbolic

More information

Nested Mappings: Schema Mapping Reloaded

Nested Mappings: Schema Mapping Reloaded Nested Mappings: Schema Mapping Reloaded Ariel Fuxman University of Toronto afuxman@cs.toronto.edu Renee J. Miller University of Toronto miller@cs.toronto.edu Mauricio A. Hernandez IBM Almaden Research

More information

Data Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza

Data Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza Data Integration 1 Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza View-based query processing Diego Calvanese, Giuseppe De Giacomo, Georg

More information

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob Reinhard Pichler Emanuel Sallinger University of Oxford Vienna University of Technology Vienna University

More information

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

Inverting Schema Mappings: Bridging the Gap between Theory and Practice Inverting Schema Mappings: Bridging the Gap between Theory and Practice Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile PUC Chile R&M Tech marenas@ing.puc.cl jperez@ing.puc.cl

More information

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous

More information

Bachelor in Information Technology (BIT) O Term-End Examination

Bachelor in Information Technology (BIT) O Term-End Examination No. of Printed Pages : 6 I CSI-14 I Bachelor in Information Technology (BIT) O Term-End Examination cn Cn1 June, 2010 CD cp CSI-14 : DATA ANALYSIS AND DATABASE DESIGN Time : 3 hours Maximum Marks : 75

More information

Building Dynamic Knowledge Graphs

Building Dynamic Knowledge Graphs Building Dynamic Knowledge Graphs Jay Pujara Department of Computer Science University of Maryland College Park, MD 20742 jay@cs.umd.edu Lise Getoor Department of Computer Science University of California

More information

Propagating Dependencies under Schema Mappings A Graph-based Approach

Propagating Dependencies under Schema Mappings A Graph-based Approach Propagating Dependencies under Schema Mappings A Graph-based Approach ABSTRACT Qing Wang Research School of Computer Science Australian National University Canberra ACT 0200, Australia qing.wang@anu.edu.au

More information

PPDL: Probabilistic Programming with Datalog

PPDL: Probabilistic Programming with Datalog PPDL: Probabilistic Programming with Datalog Balder ten Cate 1, Benny Kimelfeld 2,1, and Dan Olteanu 3,1 1 LogicBlox, Inc., USA 2 Technion, Israel 3 University of Oxford, UK 1 Introduction There has been

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. In this paper we study the

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. We study the schema exchange

More information

Learning Programs from Noisy Data

Learning Programs from Noisy Data Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich Why learn programs from examples? Input/output examples often easier to provide examples than specification

More information

Multisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries?

Multisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries? Multisets and Duplicates SQL: Duplicate Semantics and NULL Values Fall 2015 SQL uses a MULTISET/BAG semantics rather than a SET semantics: SQL tables are multisets of tuples originally for efficiency reasons

More information

Database Constraints and Homomorphism Dualities

Database Constraints and Homomorphism Dualities Database Constraints and Homomorphism Dualities Balder ten Cate 1, Phokion G. Kolaitis 1,2, and Wang-Chiew Tan 2,1 1 University of California Santa Cruz 2 IBM Research-Almaden Abstract. Global-as-view

More information

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure Databases databases Terminology of relational model Properties of database relations. Relational Keys. Meaning of entity integrity and referential integrity. Purpose and advantages of views. The relational

More information

LOGIC AND DISCRETE MATHEMATICS

LOGIC AND DISCRETE MATHEMATICS LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University

More information

ON SCHEMA DISCOVERY ICDM Renée J. Miller

ON SCHEMA DISCOVERY ICDM Renée J. Miller ON SCHEMA DISCOVERY ICDM 2011 Renée J. Miller What are Schemas? 2 Schema From the Greek "σχήμα meaning shape, or more generally, plan Structure and constraints the data (should) satisfy Attribute structure

More information

Estimating the Quality of Databases

Estimating the Quality of Databases Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality

More information

Open Data Integration. Renée J. Miller

Open Data Integration. Renée J. Miller Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that

More information

Consistent Query Answering

Consistent Query Answering Consistent Query Answering Opportunities and Limitations Jan Chomicki Dept. CSE University at Buffalo State University of New York http://www.cse.buffalo.edu/ chomicki 1 Integrity constraints Integrity

More information

Managing Inconsistencies in Collaborative Data Management

Managing Inconsistencies in Collaborative Data Management Managing Inconsistencies in Collaborative Data Management Eric Kao Logic Group Computer Science Department Stanford University Talk given at HP Labs on November 9, 2010 Structured Data Public Sources Company

More information

IBM InfoSphere Information Server Version 8 Release 7. Reporting Guide SC

IBM InfoSphere Information Server Version 8 Release 7. Reporting Guide SC IBM InfoSphere Server Version 8 Release 7 Reporting Guide SC19-3472-00 IBM InfoSphere Server Version 8 Release 7 Reporting Guide SC19-3472-00 Note Before using this information and the product that it

More information

Example: Bioinformatics. Soft Constraint Processing. From Optimal CSP to Soft CSP. Overview. From Optimal CSP to Soft CSP.

Example: Bioinformatics. Soft Constraint Processing. From Optimal CSP to Soft CSP. Overview. From Optimal CSP to Soft CSP. Example: Bioinformatics Soft Constraint Processing 16.412J/6.834J Cognitive Robotics Martin Sachenbacher (Using material from Thomas Schiex) RNA is single-strand molecule, composed of A,U,G,C Function

More information

Rewriting for Sound and Complete Union, Intersection and Negation Types

Rewriting for Sound and Complete Union, Intersection and Negation Types Rewriting for Sound and Complete Union, Intersection and Negation Types David J. Pearce School of Engineering and Computer Science Victoria University of Wellington @WhileyDave http://whiley.org http://github.com/whiley

More information

Database Security Overview. Murat Kantarcioglu

Database Security Overview. Murat Kantarcioglu UT DALLAS Erik Jonsson School of Engineering & Computer Science Database Security Overview Murat Kantarcioglu Topics The access control model of System R Extensions to the System R model Views and content-based

More information

CS590U Access Control: Theory and Practice. Lecture 18 (March 10) SDSI Semantics & The RT Family of Role-based Trust-management Languages

CS590U Access Control: Theory and Practice. Lecture 18 (March 10) SDSI Semantics & The RT Family of Role-based Trust-management Languages CS590U Access Control: Theory and Practice Lecture 18 (March 10) SDSI Semantics & The RT Family of Role-based Trust-management Languages Understanding SPKI/SDSI Using First-Order Logic Ninghui Li and John

More information

Software as a Service Multi-tenant Data Architecture. Frederick Chong Architect DPE Architecture Strategy Microsoft Corporation

Software as a Service Multi-tenant Data Architecture. Frederick Chong Architect DPE Architecture Strategy Microsoft Corporation Software as a Service Multi-tenant Data Architecture Frederick Chong Architect DPE Architecture Strategy Microsoft Corporation Agenda SIMT principles and considerations Database options for storing multi-tenant

More information

Database Security Lecture 10

Database Security Lecture 10 Database Security Lecture 10 Database security Grant-Revoke Model Elisa Bertino bertino@cs.purdue.edu Access Control in Commercial DBMSs Most commercial systems adopt DAC Current discretionary authorization

More information

Three easy pieces on schema mappings for tree-structured data

Three easy pieces on schema mappings for tree-structured data Three easy pieces on schema mappings for tree-structured data Claire David 1 and Filip Murlak 2 1 Université Paris-Est Marne-la-Vallée 2 University of Warsaw Abstract. Schema mappings specify how data

More information

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages DBAI-TR-2012-76 UMAP: A Universal Layer for Schema Mapping Languages Florin Chertes and Ingo Feinerer Technische Universität Wien, Vienna, Austria Institut für Informationssysteme FlorinChertes@acm.org

More information

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania Combining the Logical and the Probabilistic in Program Analysis Xin Zhang Xujie Si Mayur Naik University of Pennsylvania What is Program Analysis? int f(int i) {... } Program Analysis x may be null!...

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Chapter 4. The Relational Model

Chapter 4. The Relational Model Chapter 4 The Relational Model Chapter 4 - Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations and relations in the relational model.

More information

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity COSC 416 NoSQL Databases Relational Model (Review) Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Model History The relational model was proposed by E. F. Codd

More information

Inconsistency-tolerant logics

Inconsistency-tolerant logics Inconsistency-tolerant logics CS 157 Computational Logic Autumn 2010 Inconsistent logical theories T 1 = { p(a), p(a) } T 2 = { x(p(x) q(x)), p(a), q(a) } Definition: A theory T is inconsistent if T has

More information

Overview of Data Management

Overview of Data Management Overview of Data Management School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Overview of Data Management 1 / 21 What is Data ANSI definition of data: 1 A representation

More information

Midterm Exam #2 (Version A) CS 122A Winter 2017

Midterm Exam #2 (Version A) CS 122A Winter 2017 NAME: SEAT NO.: STUDENT ID: Midterm Exam #2 (Version A) CS 122A Winter 2017 Max. Points: 100 (Please read the instructions carefully) Instructions: - The total time for the exam is 50 minutes; be sure

More information

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100

CSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100 CSE 344 Midterm Monday, Nov 4, 2013, 9:30-10:20 Name: Question Points Score 1 30 2 10 3 50 4 10 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;

More information

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries. Conjunctive queries Relational calculus queries without negation and disjunction. Conjunctive queries have a normal form: ( y 1 ) ( y n )(p 1 (x 1,..., x m, y 1,..., y n ) p k (x 1,..., x m, y 1,..., y

More information

A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange

A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange 1. Problem and Motivation A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange Laura Chiticariu and Wang Chiew Tan UC Santa Cruz {laura,wctan}@cs.ucsc.edu Data exchange is

More information

Data Integration with Uncertainty

Data Integration with Uncertainty Noname manuscript No. (will be inserted by the editor) Xin Dong Alon Halevy Cong Yu Data Integration with Uncertainty the date of receipt and acceptance should be inserted later Abstract This paper reports

More information

Lecture Notes for 3 rd August Lecture topic : Introduction to Relational Model. Rishi Barua Shubham Tripathi

Lecture Notes for 3 rd August Lecture topic : Introduction to Relational Model. Rishi Barua Shubham Tripathi Lecture Notes for 3 rd August 2011 Lecture topic : Introduction to Relational Model Rishi Barua 09010141 Shubham Tripathi 09010148 Example of a relation. Attribute (Column) ID Name Department Salary (Rs)

More information

Topic A: Introduction to Prolog

Topic A: Introduction to Prolog Topic A: Introduction to Prolog Recommended Exercises and Readings From Programming in Prolog (5 th Ed.) Exercises: 1.2, 1.3, 1.4, Readings: Chapters 1 and 2 1 2 Prolog Prolog: Programming in Logic A logic

More information

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.) Chapter 10 Normalization Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update Anomalies 1.3 Null

More information

Learning Queries for Relational, Semi-structured, and Graph Databases

Learning Queries for Relational, Semi-structured, and Graph Databases Learning Queries for Relational, Semi-structured, and Graph Databases Radu Ciucanu University of Lille & INRIA, France Supervised by Angela Bonifati & S lawek Staworko SIGMOD 13 PhD Symposium June 23,

More information

Data Exchange in the Relational and RDF Worlds

Data Exchange in the Relational and RDF Worlds Data Exchange in the Relational and RDF Worlds Marcelo Arenas Department of Computer Science Pontificia Universidad Católica de Chile This is joint work with Jorge Pérez, Juan Reutter, Cristian Riveros

More information

LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor

LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES Theodoros Rekatsinas, Amol Deshpande, Lise Getoor Motivation Probabilistic databases store, manage and query uncertain data Numerous applications

More information

Data Integration: Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The Teenage Years

Data Integration: Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The Teenage Years Data Integration: Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The Teenage Years CPSC 534P Rachel Pottinger September 19, 2011 Administrative Notes Homework

More information

Chapter 3. The Relational Model. Database Systems p. 61/569

Chapter 3. The Relational Model. Database Systems p. 61/569 Chapter 3 The Relational Model Database Systems p. 61/569 Introduction The relational model was developed by E.F. Codd in the 1970s (he received the Turing award for it) One of the most widely-used data

More information

Steps in normalisation. Steps in normalisation 7/15/2014

Steps in normalisation. Steps in normalisation 7/15/2014 Introduction to normalisation Normalisation Normalisation = a formal process for deciding which attributes should be grouped together in a relation Normalisation is the process of decomposing relations

More information

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant

More information

Laconic Schema Mappings: Computing the Core with SQL Queries

Laconic Schema Mappings: Computing the Core with SQL Queries Laconic Schema Mappings: Computing the Core with SQL Queries Balder ten Cate INRIA and ENS Cachan balder.tencate@inria.fr Laura Chiticariu IBM Almaden chiti@almaden.ibm.com Phokion Kolaitis UC Santa Cruz

More information

ITCS 6150 Intelligent Systems. Lecture 13 First-Order Logic Chapter 8

ITCS 6150 Intelligent Systems. Lecture 13 First-Order Logic Chapter 8 ITCS 6150 Intelligent Systems Lecture 13 First-Order Logic Chapter 8 First-order logic We saw how propositional logic can create intelligent behavior But propositional logic is a poor representation for

More information

Program Verification using Templates over Predicate Abstraction. Saurabh Srivastava and Sumit Gulwani

Program Verification using Templates over Predicate Abstraction. Saurabh Srivastava and Sumit Gulwani Program Verification using Templates over Predicate Abstraction Saurabh Srivastava and Sumit Gulwani ArrayInit(Array A, int n) i := 0; while (i < n) A[i] := 0; i := i + 1; Assert( j: 0 j

More information

Limits of Schema Mappings

Limits of Schema Mappings Limits of Schema Mappings Phokion G. Kolaitis 1, Reinhard Pichler 2, Emanuel Sallinger 3, and Vadim Savenkov 4 1 University of California Santa Cruz, Santa Cruz, USA; and IBM Research-Almaden, San Jose,

More information

Introduction to Database Management Systems

Introduction to Database Management Systems Relational Data Model Relational Data Model 1 o Relations o Attributes o Tuples o Relations o Primary Keys o Objectives o Comparison to other models o Components o Relation Properties o Kinds of Relations

More information

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2 Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2 Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant

More information

Privacy Preserving Group Linkage

Privacy Preserving Group Linkage Privacy Preserving Group Linkage Fengjun Li 1, Yuxin Chen 1, Bo Luo 1, Dongwon Lee 2, and Peng Liu 2 1 EECS Department, University of Kansas, 2 College of IST, Penn State University 1 Record Linkage Record

More information

The data structures of the relational model Attributes and domains Relation schemas and database schemas

The data structures of the relational model Attributes and domains Relation schemas and database schemas The data structures of the relational model Attributes and domains Relation schemas and database schemas databases First normal form (1NF) Running Example Pubs-Drinkers-DB: Pubs (name, location) Drinkers

More information