A Collective, Probabilistic Approach to Schema Mapping
|
|
- Mervyn Palmer
- 5 years ago
- Views:
Transcription
1 A Collective, Probabilistic Approach to Schema Mapping Angelika Kimmig, Alex Memory, Renée Miller, Lise Getoor ILP 2017 (published at ICDE 2017) 1
2 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org target oid 111 SAP 222 MS 333 Z 444 HC
3 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration transfer data 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org target oid 111 SAP 222 MS 333 Z 444 HC
4 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration transfer data rewrite queries 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org target oid 111 SAP 222 MS 333 Z 444 HC
5 Context: Data Exchange & source emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X 5 Tom Y7 proj topic mgr lead BigData 1 2 ML 1 1 egov 4 5 DM 5 5 Data Integration transfer data need schema mapping rewrite queries 2 BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader Alice Bob Jim Ann Igor org oid target 111 SAP 222 MS 333 Z 444 HC
6 source emp proj Context: Data Exchange & id company 1 Alice SAP 2 Bob IBM 3 Pat MS 4 Carl X Data Integration transfer data emp(i,n,c) leader(n) need schema proj(t,m,l) S. O. (T,S,O) BigData Alice 111 ML Alice 111 MT Justin 444 Deep Ann 333 leader org target 5 Tom Y7 mapping proj(t,m,l) & emp(l,n,c) O. (T,N,O) & org(o,c) oid Alice 111 SAP topic mgr lead Bob 222 MS BigData 1 2 Jim 333 Z ML 1 1 rewrite queries egov 4 5 Ann 444 HC DM 5 5 Igor 2
7 Schema Mapping st tgd = source-target tuple generating dependency [Fagin et al, 05; ten Cate & Kolaitis, 10] first order rule 8x (x)!9y (x, y) conjunctive query on source schema conjunctive query on target schema a schema mapping is a set of st tgds 3
8 Goal: learn schema mapping proj topic mgr lead BigData 1 2 ML 1 1 emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS from data BigData Alice 111 ML Alice 111 leader org Alice Bob oid SAP MS proj(t,m,l) & emp(l,n,c) leader(n) emp(i,n,c) O. org(o,c) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) 4
9 Goal: learn schema mapping proj topic mgr lead BigData 1 2 ML 1 1 emp id company 1 Alice SAP 2 Bob IBM 3 Pat MS from data BigData Alice 111 ML Alice 111 leader org Alice Bob oid SAP MS Challenges: ambiguous metadata imperfect data existentials / nulls proj(t,m,l) & emp(l,n,c) leader(n) emp(i,n,c) O. org(o,c) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) 4
10 Here: Given data example (I,J), candidate set C Find optimal M C for (I,J) proj I J C topic mgr lead BigData 1 2 BigData Alice 111 ML 1 1 ML Alice 111 emp leader org id company oid 1 Alice SAP Alice 111 SAP 2 Bob IBM Bob 222 MS 3 Pat MS emp(i,n,c) leader(n) proj(t,m,l) S. O. (T,S,O) proj(t,m,l) & emp(l,n,c) leader(n) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) proj(t,m,l) & emp(l,n,c) O. (T,N,O) & org(o,c) emp(i,n,c) O. org(o,c) emp(i,n,c) O. org(o,n)... M proj(t,m,l) & emp(l,n,c) leader(n) emp(i,n,c) O. org(o,c) proj(t,m,l) & emp(m,n,c) O. (T,N,O) & org(o,c) 5
11 Full st tgds (no ) errors of M explained by M J Universal(M,I) not explained by M goal: find small M that maximizes intersection 6
12 Arbitrary st tgds: replace containment with homomorphism checks errors of M explained by M Universal(M,I) J 7
13 Arbitrary st tgds: replace containment with homomorphism checks errors of M explained by M Universal(M,I) J tuples where we can replace nulls by constants such that we get tuples in J = no errors 7
14 Arbitrary st tgds: replace containment with homomorphism checks errors of M explained by M Universal(M,I) J tuples where we can replace nulls by constants such that we get tuples in J = no errors tuples which can be obtained from tuples in Universal(M,I) replacing nulls by constants = (partially) explained 7
15 Example BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid leader org oid null1 IBM Alice 111 SAP null2 SAP Bob 222 MS 8
16 Example BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid leader org oid null1 IBM Alice 111 SAP null2 SAP Bob 222 MS no errors: replace null2 by
17 Example BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid null1 null2 IBM SAP leader Alice Bob org oid 111 SAP 222 MS no errors: replace null2 by to explain these
18 Example errors: can t replace null1 to get tuples in J BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid null1 null2 IBM SAP leader Alice Bob org oid 111 SAP 222 MS no errors: replace null2 by to explain these
19 Example errors: can t replace null1 to get tuples in J not explained by any tuple in Universal(M,I) BigData Bob null1 ML Alice null2 BigData Alice 111 ML Alice 111 leader org oid null1 null2 IBM SAP leader Alice Bob org oid 111 SAP 222 MS no errors: replace null2 by to explain these
20 Task Universal(M,I) J Given source schema S, target schema T errors of M data example (I,J) explained by M set C of candidate st tgds Find an optimal mapping M, i.e., arg min M C 2 4size(M)+ X t2j(1 explains(m,t)) + X t2universal(c,i) J 3 error(m,t) 5 9
21 Task Universal(M,I) J Given source schema S, target schema T errors of M data example (I,J) set C of candidate st tgds explained by M NP-hard even Find an optimal mapping M, i.e., arg min M C 2 4size(M)+ X t2j(1 explains(m,t)) + X t2universal(c,i) for full st tgds J 3 error(m,t) 5 9
22 Probabilistic Soft Logic (PSL) declarative language to specify probabilistic models over logical atoms / relational domains PSL program = set of weighted first order rules w : b 1 ( ~ X) ^...^ b n ( ~ X)! h 1 ( ~ X) _..._ h m ( ~ X) MPE inference = finding most likely model efficient approximate solver with guarantee on solution quality 10
23 Our PSL program J(T ) given J(T ) covers(f, T) creates(f, T) creates(f, T) covers(f, T) 11
24 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M 11
25 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M minimize #errors: 1:in(F ) ^ creates(f, T)! J(T ) 11
26 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M minimize #errors: minimize #unexplained: 1:in(F ) ^ creates(f, T)! J(T ) 1:J(T )!9F.covers(F, T) ^ in(f ) 11
27 Our PSL program J(T ) given J(T ) covers(f, T) covers(f, T) creates(f, T) creates(f, T) find optimal M: in(f ), F 2 M minimize #errors: minimize #unexplained: minimize size of M: 1:in(F ) ^ creates(f, T)! J(T ) 1:J(T )!9F.covers(F, T) ^ in(f ) size(f ):in(f )!? 11
28 New PSL construct: prioritized disjunction rules 1:J(T )!9F.covers(F, T) ^ in(f ) observed priority between 0 (low) and 1 (high) to be inferred automatically transformed into set of standard PSL rules expressing a preference for inferred atoms with higher priority 12
29 Experimental evaluation scenarios generated using ibench [Arocena et al, 15] candidate st tgds generated using Clio [Fagin et al, 09] E1: increasingly noisy candidates, perfect data metadata-only baseline suffers, we get perfect mappings E2: ambiguous set of candidates, increasingly noisy data high quality mappings found for up to 25% unexpected and 10% missing target tuples in J 13
30 Given - metadata - data example - candidate st tgds Our Contributions Find - small set of st tgds - minimally invalid - maximally explaining 14
31 Given - metadata - data example - candidate st tgds Our Contributions Collective Mapping Discovery (CMD) declarative probabilistic model + MPE inference using Probabilistic Soft Logic (PSL) Find - small set of st tgds - minimally invalid - maximally explaining 14
32 Given - metadata - data example - candidate st tgds Find Collective Mapping - small set of st tgds - minimally invalid Discovery (CMD) declarative probabilistic model + MPE inference using Probabilistic Soft Logic (PSL) - maximally explaining Our Contributions supports arbitrary st tgds jointly reasons about metadata and data handles noisy input efficient solver with quality guarantee declarative, extensible definition of optimization 14
33 Given - metadata - data example - candidate st tgds Find Collective Mapping - small set of st tgds - minimally invalid Discovery (CMD) declarative probabilistic model + MPE inference using Probabilistic Soft Logic (PSL) - maximally explaining Our Contributions supports arbitrary st tgds jointly reasons about metadata and data handles noisy input efficient solver with quality guarantee declarative, extensible definition of optimization Thanks! 14
A Collective, Probabilistic Approach to Schema Mapping
A Collective, Probabilistic Approach to Schema Mapping Angelika Kimmig KU Leuven angelika.kimmig@cs.kuleuven.be Alex Memory University of Maryland memory@cs.umd.edu Renée J. Miller University of Toronto
More informationStructural characterizations of schema mapping languages
Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema
More informationFoundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016
Foundations of Data Exchange and Metadata Management Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 The need for a formal definition We had a paper with Ron in PODS 2004 Back then I was a Ph.D.
More informationComposing Schema Mapping
Composing Schema Mapping An Overview Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Joint work with R. Fagin, L. Popa, and W.C. Tan 1 Data Interoperability Data may reside at several different
More informationOutline. 1 CS520-5) Data Exchange
Outline 0) Course Info 1) Introduction 2) Data Preparation and Cleaning 3) Schema matching and mapping 4) Virtual Data Integration 5) Data Exchange 6) Data Warehousing 7) Big Data Analytics 8) Data Provenance
More informationScalable Data Exchange with Functional Dependencies
Scalable Data Exchange with Functional Dependencies Bruno Marnette 1, 2 Giansalvatore Mecca 3 Paolo Papotti 4 1: Oxford University Computing Laboratory Oxford, UK 2: INRIA Saclay, Webdam Orsay, France
More informationThe interaction of theory and practice in database research
The interaction of theory and practice in database research Ron Fagin IBM Research Almaden 1 Purpose of This Talk Encourage collaboration between theoreticians and system builders via two case studies
More informationFunction Symbols in Tuple-Generating Dependencies: Expressive Power and Computability
Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob 1,2, Reinhard Pichler 1, and Emanuel Sallinger 2 1 TU Wien and 2 University of Oxford Tuple-generating
More informationData Exchange: Semantics and Query Answering
Data Exchange: Semantics and Query Answering Ronald Fagin Phokion G. Kolaitis Renée J. Miller Lucian Popa IBM Almaden Research Center fagin,lucian @almaden.ibm.com University of California at Santa Cruz
More informationCore Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange
Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange S. Ravichandra, and D.V.L.N. Somayajulu Abstract Schema mapping is a declarative specification of the relationship
More informationEfficient and scalable Data Exchange with target functional dependencies
Efficient and scalable Data Exchange with target functional dependencies Ioana Ileana Joint work 1 with Angela Bonifati (Univ. Lyon 1) and Michele Linardi (Univ. Paris Descartes) 1 A. Bonifati, I. Ileana,
More informationSchema Mappings, Data Exchange, and Metadata Management
Schema Mappings, Data Exchange, and Metadata Management Phokion G. Kolaitis IBM Almaden Research Center kolaitis@almaden.ibm.com ABSTRACT Schema mappings are high-level specifications that describe the
More informationSCHEMA MAPPING DESIGN SYSTEMS 1. Schema Mapping Design Systems: Example-Driven and Semantic Approaches. Kathryn Dahlgren. CSU Stanislaus CS4960
SCHEMA MAPPING DESIGN SYSTEMS 1 Schema Mapping Design Systems: Example-Driven and Semantic Approaches Kathryn Dahlgren CSU Stanislaus CS4960 Dr. Melanie Martin SCHEMA MAPPING DESIGN SYSTEMS 2 Introduction
More informationValidity-Sensitive Querying of XML Databases
Validity-Sensitive Querying of XML Databases Slawomir Staworko Jan homicki Department of omputer Science University at Buffalo DataX, March 26, 2006 Motivation Querying Invalid XML Integration of XML documents
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data
More informationChapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns) tuples (or rows) 2.2 Attribute Types The
More informationBio/Ecosystem Informatics
Bio/Ecosystem Informatics Renée J. Miller University of Toronto DB research problem: managing data semantics R. J. Miller University of Toronto 1 Managing Data Semantics Semantics modeled by Schemas (structure
More informationStructural Characterizations of Schema-Mapping Languages
Structural Characterizations of Schema-Mapping Languages Balder ten Cate University of Amsterdam and UC Santa Cruz balder.tencate@uva.nl Phokion G. Kolaitis UC Santa Cruz and IBM Almaden kolaitis@cs.ucsc.edu
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki
More informationMarkov Logic: Representation
Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into
More informationFoundations of Schema Mapping Management
Foundations of Schema Mapping Management Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile University of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk
More informationA Retrospective on Datalog 1.0
A Retrospective on Datalog 1.0 Phokion G. Kolaitis UC Santa Cruz and IBM Research - Almaden Datalog 2.0 Vienna, September 2012 2 / 79 A Brief History of Datalog In the beginning of time, there was E.F.
More informationOutline. 1 CS520-5) Data Exchange. 3 CS520-5) Data Exchange. Person Name Address Office-phone Office-address Home-phone
Outline IIT DBGroup CS520 Data Integration, Warehousing, and Provenance Boris Glavic http://www.cs.iit.edu/~glavic/ http://www.cs.iit.edu/~cs520/ http://www.cs.iit.edu/~dbgroup/ 0) Course Info 1) Introduction
More informationConsensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA
Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data
More informationProvable data privacy
Provable data privacy Kilian Stoffel 1 and Thomas Studer 2 1 Université de Neuchâtel, Pierre-à-Mazel 7, CH-2000 Neuchâtel, Switzerland kilian.stoffel@unine.ch 2 Institut für Informatik und angewandte Mathematik,
More informationRelational Model, Relational Algebra, and SQL
Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity
More informationComposition and Inversion of Schema Mappings
Composition and Inversion of Schema Mappings Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile U. of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk
More informationLogical Foundations of Relational Data Exchange
Logical Foundations of Relational Data Exchange Pablo Barceló Department of Computer Science, University of Chile pbarcelo@dcc.uchile.cl 1 Introduction Data exchange has been defined as the problem of
More informationChecking Containment of Schema Mappings (Preliminary Report)
Checking Containment of Schema Mappings (Preliminary Report) Andrea Calì 3,1 and Riccardo Torlone 2 Oxford-Man Institute of Quantitative Finance, University of Oxford, UK Dip. di Informatica e Automazione,
More informationChapter 2: Intro to Relational Model
Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)
More informationThe Inverse of a Schema Mapping
The Inverse of a Schema Mapping Jorge Pérez Department of Computer Science, Universidad de Chile Blanco Encalada 2120, Santiago, Chile jperez@dcc.uchile.cl Abstract The inversion of schema mappings has
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationLogic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden
Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research - Almaden Lecture 4 - Part 2 2 / 17 Alternative Semantics of Queries Bag Semantics We focused on the containment problem for conjunctive
More informationProcessing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?
Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi
More informationKNOWLEDGE GRAPH CONSTRUCTION
KNOWLEDGE GRAPH CONSTRUCTION Jay Pujara Karlsruhe Institute of Technology 7/7/2015 Can Computers Create Knowledge? Internet Massive source of publicly available information Knowledge Computers + Knowledge
More informationPart I: Structured Data
Inf1-DA 2011 2012 I: 92 / 117 Part I Structured Data Data Representation: I.1 The entity-relationship (ER) data model I.2 The relational model Data Manipulation: I.3 Relational algebra I.4 Tuple-relational
More informationDesigning and Refining Schema Mappings via Data Examples
Designing and Refining Schema Mappings via Data Examples Bogdan Alexe UCSC abogdan@cs.ucsc.edu Balder ten Cate UCSC balder.tencate@gmail.com Phokion G. Kolaitis UCSC & IBM Research - Almaden kolaitis@cs.ucsc.edu
More informationKNOWLEDGE GRAPH IDENTIFICATION
KNOWLEDGE GRAPH IDENTIFICATION Jay Pujara 1, Hui Miao 1, Lise Getoor 1, William Cohen 2 1 University of Maryland, College Park, US 2 Carnegie Mellon University International Semantic Web Conference 10/25/2013
More informationFoundations and Applications of Schema Mappings
Foundations and Applications of Schema Mappings Phokion G. Kolaitis University of California Santa Cruz & IBM Almaden Research Center The Data Interoperability Challenge Data may reside at several different
More informationQuery Processing SL03
Distributed Database Systems Fall 2016 Query Processing Overview Query Processing SL03 Distributed Query Processing Steps Query Decomposition Data Localization Query Processing Overview/1 Query processing:
More informationLogic (or Declarative) Programming Foundations: Prolog. Overview [1]
Logic (or Declarative) Programming Foundations: Prolog In Text: Chapter 12 Formal logic Logic programming Prolog Overview [1] N. Meng, S. Arthur 2 1 Logic Programming To express programs in a form of symbolic
More informationNested Mappings: Schema Mapping Reloaded
Nested Mappings: Schema Mapping Reloaded Ariel Fuxman University of Toronto afuxman@cs.toronto.edu Renee J. Miller University of Toronto miller@cs.toronto.edu Mauricio A. Hernandez IBM Almaden Research
More informationData Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza
Data Integration 1 Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza View-based query processing Diego Calvanese, Giuseppe De Giacomo, Georg
More informationFunction Symbols in Tuple-Generating Dependencies: Expressive Power and Computability
Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob Reinhard Pichler Emanuel Sallinger University of Oxford Vienna University of Technology Vienna University
More informationInverting Schema Mappings: Bridging the Gap between Theory and Practice
Inverting Schema Mappings: Bridging the Gap between Theory and Practice Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile PUC Chile R&M Tech marenas@ing.puc.cl jperez@ing.puc.cl
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationBachelor in Information Technology (BIT) O Term-End Examination
No. of Printed Pages : 6 I CSI-14 I Bachelor in Information Technology (BIT) O Term-End Examination cn Cn1 June, 2010 CD cp CSI-14 : DATA ANALYSIS AND DATABASE DESIGN Time : 3 hours Maximum Marks : 75
More informationBuilding Dynamic Knowledge Graphs
Building Dynamic Knowledge Graphs Jay Pujara Department of Computer Science University of Maryland College Park, MD 20742 jay@cs.umd.edu Lise Getoor Department of Computer Science University of California
More informationPropagating Dependencies under Schema Mappings A Graph-based Approach
Propagating Dependencies under Schema Mappings A Graph-based Approach ABSTRACT Qing Wang Research School of Computer Science Australian National University Canberra ACT 0200, Australia qing.wang@anu.edu.au
More informationPPDL: Probabilistic Programming with Datalog
PPDL: Probabilistic Programming with Datalog Balder ten Cate 1, Benny Kimelfeld 2,1, and Dan Olteanu 3,1 1 LogicBlox, Inc., USA 2 Technion, Israel 3 University of Oxford, UK 1 Introduction There has been
More informationSchema Exchange: a Template-based Approach to Data and Metadata Translation
Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. In this paper we study the
More informationSchema Exchange: a Template-based Approach to Data and Metadata Translation
Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. We study the schema exchange
More informationLearning Programs from Noisy Data
Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich Why learn programs from examples? Input/output examples often easier to provide examples than specification
More informationMultisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries?
Multisets and Duplicates SQL: Duplicate Semantics and NULL Values Fall 2015 SQL uses a MULTISET/BAG semantics rather than a SET semantics: SQL tables are multisets of tuples originally for efficiency reasons
More informationDatabase Constraints and Homomorphism Dualities
Database Constraints and Homomorphism Dualities Balder ten Cate 1, Phokion G. Kolaitis 1,2, and Wang-Chiew Tan 2,1 1 University of California Santa Cruz 2 IBM Research-Almaden Abstract. Global-as-view
More informationIntroduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure
Databases databases Terminology of relational model Properties of database relations. Relational Keys. Meaning of entity integrity and referential integrity. Purpose and advantages of views. The relational
More informationLOGIC AND DISCRETE MATHEMATICS
LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University
More informationON SCHEMA DISCOVERY ICDM Renée J. Miller
ON SCHEMA DISCOVERY ICDM 2011 Renée J. Miller What are Schemas? 2 Schema From the Greek "σχήμα meaning shape, or more generally, plan Structure and constraints the data (should) satisfy Attribute structure
More informationEstimating the Quality of Databases
Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality
More informationOpen Data Integration. Renée J. Miller
Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that
More informationConsistent Query Answering
Consistent Query Answering Opportunities and Limitations Jan Chomicki Dept. CSE University at Buffalo State University of New York http://www.cse.buffalo.edu/ chomicki 1 Integrity constraints Integrity
More informationManaging Inconsistencies in Collaborative Data Management
Managing Inconsistencies in Collaborative Data Management Eric Kao Logic Group Computer Science Department Stanford University Talk given at HP Labs on November 9, 2010 Structured Data Public Sources Company
More informationIBM InfoSphere Information Server Version 8 Release 7. Reporting Guide SC
IBM InfoSphere Server Version 8 Release 7 Reporting Guide SC19-3472-00 IBM InfoSphere Server Version 8 Release 7 Reporting Guide SC19-3472-00 Note Before using this information and the product that it
More informationExample: Bioinformatics. Soft Constraint Processing. From Optimal CSP to Soft CSP. Overview. From Optimal CSP to Soft CSP.
Example: Bioinformatics Soft Constraint Processing 16.412J/6.834J Cognitive Robotics Martin Sachenbacher (Using material from Thomas Schiex) RNA is single-strand molecule, composed of A,U,G,C Function
More informationRewriting for Sound and Complete Union, Intersection and Negation Types
Rewriting for Sound and Complete Union, Intersection and Negation Types David J. Pearce School of Engineering and Computer Science Victoria University of Wellington @WhileyDave http://whiley.org http://github.com/whiley
More informationDatabase Security Overview. Murat Kantarcioglu
UT DALLAS Erik Jonsson School of Engineering & Computer Science Database Security Overview Murat Kantarcioglu Topics The access control model of System R Extensions to the System R model Views and content-based
More informationCS590U Access Control: Theory and Practice. Lecture 18 (March 10) SDSI Semantics & The RT Family of Role-based Trust-management Languages
CS590U Access Control: Theory and Practice Lecture 18 (March 10) SDSI Semantics & The RT Family of Role-based Trust-management Languages Understanding SPKI/SDSI Using First-Order Logic Ninghui Li and John
More informationSoftware as a Service Multi-tenant Data Architecture. Frederick Chong Architect DPE Architecture Strategy Microsoft Corporation
Software as a Service Multi-tenant Data Architecture Frederick Chong Architect DPE Architecture Strategy Microsoft Corporation Agenda SIMT principles and considerations Database options for storing multi-tenant
More informationDatabase Security Lecture 10
Database Security Lecture 10 Database security Grant-Revoke Model Elisa Bertino bertino@cs.purdue.edu Access Control in Commercial DBMSs Most commercial systems adopt DAC Current discretionary authorization
More informationThree easy pieces on schema mappings for tree-structured data
Three easy pieces on schema mappings for tree-structured data Claire David 1 and Filip Murlak 2 1 Université Paris-Est Marne-la-Vallée 2 University of Warsaw Abstract. Schema mappings specify how data
More informationDBAI-TR UMAP: A Universal Layer for Schema Mapping Languages
DBAI-TR-2012-76 UMAP: A Universal Layer for Schema Mapping Languages Florin Chertes and Ingo Feinerer Technische Universität Wien, Vienna, Austria Institut für Informationssysteme FlorinChertes@acm.org
More informationCombining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania
Combining the Logical and the Probabilistic in Program Analysis Xin Zhang Xujie Si Mayur Naik University of Pennsylvania What is Program Analysis? int f(int i) {... } Program Analysis x may be null!...
More informationLearning mappings and queries
Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language
More informationChapter 4. The Relational Model
Chapter 4 The Relational Model Chapter 4 - Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations and relations in the relational model.
More informationRelational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity
COSC 416 NoSQL Databases Relational Model (Review) Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Model History The relational model was proposed by E. F. Codd
More informationInconsistency-tolerant logics
Inconsistency-tolerant logics CS 157 Computational Logic Autumn 2010 Inconsistent logical theories T 1 = { p(a), p(a) } T 2 = { x(p(x) q(x)), p(a), q(a) } Definition: A theory T is inconsistent if T has
More informationOverview of Data Management
Overview of Data Management School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Overview of Data Management 1 / 21 What is Data ANSI definition of data: 1 A representation
More informationMidterm Exam #2 (Version A) CS 122A Winter 2017
NAME: SEAT NO.: STUDENT ID: Midterm Exam #2 (Version A) CS 122A Winter 2017 Max. Points: 100 (Please read the instructions carefully) Instructions: - The total time for the exam is 50 minutes; be sure
More informationCSE 344 Midterm. Monday, Nov 4, 2013, 9:30-10:20. Question Points Score Total: 100
CSE 344 Midterm Monday, Nov 4, 2013, 9:30-10:20 Name: Question Points Score 1 30 2 10 3 50 4 10 Total: 100 This exam is open book and open notes but NO laptops or other portable devices. You have 50 minutes;
More informationConjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.
Conjunctive queries Relational calculus queries without negation and disjunction. Conjunctive queries have a normal form: ( y 1 ) ( y n )(p 1 (x 1,..., x m, y 1,..., y n ) p k (x 1,..., x m, y 1,..., y
More informationA Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange
1. Problem and Motivation A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange Laura Chiticariu and Wang Chiew Tan UC Santa Cruz {laura,wctan}@cs.ucsc.edu Data exchange is
More informationData Integration with Uncertainty
Noname manuscript No. (will be inserted by the editor) Xin Dong Alon Halevy Cong Yu Data Integration with Uncertainty the date of receipt and acceptance should be inserted later Abstract This paper reports
More informationLecture Notes for 3 rd August Lecture topic : Introduction to Relational Model. Rishi Barua Shubham Tripathi
Lecture Notes for 3 rd August 2011 Lecture topic : Introduction to Relational Model Rishi Barua 09010141 Shubham Tripathi 09010148 Example of a relation. Attribute (Column) ID Name Department Salary (Rs)
More informationTopic A: Introduction to Prolog
Topic A: Introduction to Prolog Recommended Exercises and Readings From Programming in Prolog (5 th Ed.) Exercises: 1.2, 1.3, 1.4, Readings: Chapters 1 and 2 1 2 Prolog Prolog: Programming in Logic A logic
More informationChapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)
Chapter 10 Normalization Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update Anomalies 1.3 Null
More informationLearning Queries for Relational, Semi-structured, and Graph Databases
Learning Queries for Relational, Semi-structured, and Graph Databases Radu Ciucanu University of Lille & INRIA, France Supervised by Angela Bonifati & S lawek Staworko SIGMOD 13 PhD Symposium June 23,
More informationData Exchange in the Relational and RDF Worlds
Data Exchange in the Relational and RDF Worlds Marcelo Arenas Department of Computer Science Pontificia Universidad Católica de Chile This is joint work with Jorge Pérez, Juan Reutter, Cristian Riveros
More informationLOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor
LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES Theodoros Rekatsinas, Amol Deshpande, Lise Getoor Motivation Probabilistic databases store, manage and query uncertain data Numerous applications
More informationData Integration: Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The Teenage Years
Data Integration: Querying Heterogeneous Information Sources Using Source Descriptions & Data Integration: The Teenage Years CPSC 534P Rachel Pottinger September 19, 2011 Administrative Notes Homework
More informationChapter 3. The Relational Model. Database Systems p. 61/569
Chapter 3 The Relational Model Database Systems p. 61/569 Introduction The relational model was developed by E.F. Codd in the 1970s (he received the Turing award for it) One of the most widely-used data
More informationSteps in normalisation. Steps in normalisation 7/15/2014
Introduction to normalisation Normalisation Normalisation = a formal process for deciding which attributes should be grouped together in a relation Normalisation is the process of decomposing relations
More informationChapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases
Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant
More informationLaconic Schema Mappings: Computing the Core with SQL Queries
Laconic Schema Mappings: Computing the Core with SQL Queries Balder ten Cate INRIA and ENS Cachan balder.tencate@inria.fr Laura Chiticariu IBM Almaden chiti@almaden.ibm.com Phokion Kolaitis UC Santa Cruz
More informationITCS 6150 Intelligent Systems. Lecture 13 First-Order Logic Chapter 8
ITCS 6150 Intelligent Systems Lecture 13 First-Order Logic Chapter 8 First-order logic We saw how propositional logic can create intelligent behavior But propositional logic is a poor representation for
More informationProgram Verification using Templates over Predicate Abstraction. Saurabh Srivastava and Sumit Gulwani
Program Verification using Templates over Predicate Abstraction Saurabh Srivastava and Sumit Gulwani ArrayInit(Array A, int n) i := 0; while (i < n) A[i] := 0; i := i + 1; Assert( j: 0 j
More informationLimits of Schema Mappings
Limits of Schema Mappings Phokion G. Kolaitis 1, Reinhard Pichler 2, Emanuel Sallinger 3, and Vadim Savenkov 4 1 University of California Santa Cruz, Santa Cruz, USA; and IBM Research-Almaden, San Jose,
More informationIntroduction to Database Management Systems
Relational Data Model Relational Data Model 1 o Relations o Attributes o Tuples o Relations o Primary Keys o Objectives o Comparison to other models o Components o Relation Properties o Kinds of Relations
More informationElmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2 Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant
More informationPrivacy Preserving Group Linkage
Privacy Preserving Group Linkage Fengjun Li 1, Yuxin Chen 1, Bo Luo 1, Dongwon Lee 2, and Peng Liu 2 1 EECS Department, University of Kansas, 2 College of IST, Penn State University 1 Record Linkage Record
More informationThe data structures of the relational model Attributes and domains Relation schemas and database schemas
The data structures of the relational model Attributes and domains Relation schemas and database schemas databases First normal form (1NF) Running Example Pubs-Drinkers-DB: Pubs (name, location) Drinkers
More information