Managing Inconsistencies in Collaborative Data Management

Similar documents
Inconsistency-tolerant logics

Computing Query Answers with Consistent Support

Research in the Logic Group

INCONSISTENT DATABASES

Data integration lecture 3

arxiv:cs/ v1 [cs.db] 5 Apr 2002

Consistent Query Answering

Static, Incremental and Parameterized Complexity of Consistent Query Answering in Databases Under Cardinality-Based Semantics

Relational Databases

arxiv: v1 [cs.db] 23 May 2016

Semantic Optimization of Preference Queries

Consistent Query Answering

Towards a Logical Reconstruction of Relational Database Theory

Preference-Driven Querying of Inconsistent Relational Databases

Scalar Aggregation in Inconsistent Databases

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases

The Relational Model

Computing Query Answers with Consistent Support

Consistent Query Answering: Opportunities and Limitations

Data integration lecture 2

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 14. Example. Datalog syntax: rules. Datalog query. Meaning of Datalog rules

Overview. CS389L: Automated Logical Reasoning. Lecture 6: First Order Logic Syntax and Semantics. Constants in First-Order Logic.

CS590U Access Control: Theory and Practice. Lecture 18 (March 10) SDSI Semantics & The RT Family of Role-based Trust-management Languages

CS40-S13: Functional Completeness

Database Theory: Datalog, Views

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies

Consistent Answers from Integrated Data Sources

Part I Logic programming paradigm

A Retrospective on Datalog 1.0

From Database Repair Programs to Consistent Query Answering in Classical Logic (extended abstract)

On the Hardness of Counting the Solutions of SPARQL Queries

Query Minimization. CSE 544: Lecture 11 Theory. Query Minimization In Practice. Query Minimization. Query Minimization for Views

Edinburgh Research Explorer

Describe The Differences In Meaning Between The Terms Relation And Relation Schema

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Composing Schema Mapping

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Predicate Logic and Quan0fies

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Introduction to Temporal Database Research. Outline

CMPS 277 Principles of Database Systems. Lecture #4

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

Leonardo de Moura and Nikolaj Bjorner Microsoft Research

Integrity Constraints (Chapter 7.3) Overview. Bottom-Up. Top-Down. Integrity Constraint. Disjunctive & Negative Knowledge. Proof by Refutation

Logic Programs for Consistently Querying Data Integration Systems

1 Scope, Bound and Free Occurrences, Closed Terms

Diagnosis through constrain propagation and dependency recording. 2 ATMS for dependency recording

Consistent Query Answering in Databases

Minimal-Change Integrity Maintenance Using Tuple Deletions

Chapter 1.3 Quantifiers, Predicates, and Validity. Reading: 1.3 Next Class: 1.4. Motivation

Schema Mappings and Data Exchange

THE RELATIONAL MODEL. University of Waterloo

Structural characterizations of schema mapping languages

SYSTEMS OF NONLINEAR EQUATIONS

Applications of Annotated Predicate Calculus to Querying Inconsistent Databases

Query formalisms for relational model relational calculus

arxiv:cs/ v1 [cs.db] 29 Nov 2002

Lecture 1: Conjunctive Queries

Semantic reasoning for dynamic knowledge bases. Lionel Médini M2IA Knowledge Dynamics 2018

Semantic Errors in Database Queries

Computational Logic Lecture 11. Resolution Tricks. Michael Genesereth Autumn Plan

Question Bank. 4) It is the source of information later delivered to data marts.

Consistent Query Answering for Atemporal Constraints over Temporal Databases

The Semantics of Consistency and Trust in Peer Data Exchange Systems

Answer Sets and the Language of Answer Set Programming. Vladimir Lifschitz

2.1 Sets 2.2 Set Operations

Further Improvement on Integrity Constraint Checking for Stratifiable Deductive Databases

Chapter 9: Constraint Logic Programming

CMPS 277 Principles of Database Systems. Lecture #3

9b. Combining Logic Programs with Object-Oriented Representation

Bibliographic citation

Chapter 1: Introduction

Computer Science Technical Report

Notes. Notes. Introduction. Notes. Propositional Functions. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry.

Dexter respects users privacy by storing users local data and evaluating queries client sided inside their browsers.

Lecture 9: Datalog with Negation

XML Data Exchange. Marcelo Arenas P. Universidad Católica de Chile. Joint work with Leonid Libkin (U. of Toronto)

arxiv: v2 [cs.db] 13 Aug 2017

Database Consistency: Logic-Based Approaches

Optimized encodings for Consistent Query Answering via ASP from different perspectives

OASIS: Architecture, Model and Management of Policy

8. Negation 8-1. Deductive Databases and Logic Programming. (Sommer 2017) Chapter 8: Negation

Semantic data integration in P2P systems

Artificial Intelligence. Chapters Reviews. Readings: Chapters 3-8 of Russell & Norvig.

On the implementation of a multiple output algorithm for defeasible argumentation

Data Integration: Logic Query Languages

MaanavaN.Com DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

Chapter 2 & 3: Representations & Reasoning Systems (2.2)

Introduction to Logic Programming

What is a Logic Program. Introduction to Logic Programming. Introductory Examples. Introductory Examples. Foundations, First-Order Language

Estimating the Quality of Databases

OWL 2 Profiles. An Introduction to Lightweight Ontology Languages. Markus Krötzsch University of Oxford. Reasoning Web 2012

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Bliksem 1.10 User Manual

Ontology and Database Systems: Foundations of Database Systems

Constraint Propagation for Efficient Inference in Markov Logic

Integration of Multiple Uncertain Data Sources

Database Management System 15

CS221 / Autumn 2017 / Liang & Ermon. Lecture 17: Logic II

CSC Discrete Math I, Spring Sets

Transcription:

Managing Inconsistencies in Collaborative Data Management Eric Kao Logic Group Computer Science Department Stanford University Talk given at HP Labs on November 9, 2010

Structured Data Public Sources Company Directories Product Catalogs Airline Schedules Enterprise Sources Personnel Records Equipment Databases Room Schedules Weather Reports Patient Records Drug Studies Orders Inventories 11/09/10 2

Relationships Among Sources Replicated data Cached data Materialized views (as in data warehouses) Heterogeneity (different schemas or vocabularies) values in euros vs values in dollars French instead of English different numbers of tables or attributes Real World Constraints Physical laws Governmental laws Business rules 11/09/10 3

Update In the face of interrelationships among sources, updates to one source may necessitate updates to other sources in order to preserve their relationships. Update Integration is the process of updating interrelated data sources in an integrated fashion. Analogous to Data Integration for data consumers. 11/09/10 4

Collaboration In large enterprises, consortia, and open information systems (e.g. the WWW), it is common for different data sources to be controlled by different people. In such cases, the individuals performing updates must collaborate (explicitly or implicitly). Collaborative Data Management (CDM) is update integration in a multiple user setting. 11/09/10 5

Inconsistencies In CDM, we capture the relationships between data sources as constraints written in a logical language Not all relationships captured, but many important ones When multiple users update data sources independently, the constraints may be violated. We called these violations inconsistencies. 11/09/10 6

Inconsistency Management Identifying and indicating inconsistencies. Automatically updating some data sources to avoid inconsistencies. Answering queries in the presence of inconsistencies. 11/09/10 7

Decentralized Office assignment A building is divided into spaces. Each space as a space czar who is tasked with assigning people to offices within that space. The building manager, department manager, and department chair also exercise authority over the assignment of offices. Individual professors request specific assignments for his or her affiliates. 11/09/10 8

Desk Assignment Example Alice Desk1 desk(alice,desk1) Bob Desk1 desk(bob,desk1)

Desk Assignment Example Conflict: Data is inconsistent with the following constraint desk(alice,desk1) desk(bob,desk1) x1, x2, y (person.desk(x1,y) x1 = x2) person.desk(x2,y)

Alert the users to the conflict The users can work to resolve the conflict Current snapshot...

Queries Users would like to query the eventual, consistent data But all we have is the current, inconsistent, snapshot. Current snapshot...

Queries desk(alice,desk1) Conflict: Who sits at Desk1? desk(bob,desk1) Best guess: Alice or Bob, but not both!

Relational Calculus Syntax Eric Kao

Database, Constraints Database instance A finite set of ground atoms e.g., D = { p(a,a), p(b,c), q(a), r(a,d) } Constraint set A finite set of relational calculus sentences (no free variables) e.g., C = { x ( p(x,x) q(x)), x q(x) }

Database Query Query A relational calculus formula e.g., Q 1 = p(a,b) e.g., Q 2 (x) = y p(x,y) e.g., Q 3 (x) = z y ( p(x,y) r(y,z))

Min-change Repairs Given: D : database e.g.,{ p(a), p(b) } Ω : set of constraints e.g.,{ p(a) p(b) } A repair is a database D* that is consistent with Ω. e.g., { p(a) }, { p(b) }, { }, { p(a), p(c) },... A minimal change repair is a repair that is minimally different from D. e.g., { p(a) }, { p(b) }

Consistent query answers CQA: answers that arise no matter how the conflicts are resolved (in a minimal-change fashion) [ABC] D CQA Q(t) iff Ω for all minimal-change repairs D* of D w.r.t. to Ω, D* Q(t)

CQA Conflict p(a) p(b) Constraint: p(a) p(b) p(a)? NO {p(b)} p(b)? NO {p(a)} p(a) p(b)? NO p(a) p(b)? YES {p(b)} {p(a)}, {p(b)}

More lenient semantics In some situations, we would like to see plausible though non-guaranteed answers e.g., a student checking out whom his office mates might be Possible answers from repairs: answers that arise from some minimal-change repair

Relax, not repair Constraint: every person must be assigned a desk Data: Charlie is missing a desk assignment Minimal-change repairs: {desk(charles,desk1)}, {desk(charles,desk2)}, {desk(charles,desk3)}, Possible answers: desk1, desk2, desk3,

Incomplete Database An incomplete database is a pair <P,N> where P,N are sets of ground atoms P N = {} e.g., K = <{p(a),q(a)}, {p(c),q(b)}> Intuitively, P gives the known true atoms N gives the known false atoms The rest are unknown <P,N> d Q iff D d Q for all D Instances (<P,N>) d

Relaxations of a database A relaxation of D w.r.t. Ω is an incomplete database K = <P,N>, where P,N base(d) P D N D = <P,N> d Ω Let the set of relaxations of D w.r.t. to Ω be denoted Re d (D,Ω)

Example D = {p(a),p(b)} Ω = { p(a) p(b)} d = {a,b,c} Re d (D,Ω) = { <{p(a)},{p(c)}>, <{p(a)},{}>, <{p(b)},{p(c)}> <{p(b)},{}> <{},{p(c)}>, <{},{}> }

Answers with Consistent Support ACS: answers that are supported by a consistent relaxation of the database. [KG] D CS Ω Q(t) iff there exists K Re d (D,Ω) st K d Q(t) Where d = adom(d,q)

ACS Conflict p(a) p(b) Constraint: p(a) p(b) p(a)? YES <{p(a)},{}> p(b)? YES <{p(b)},{}> p(a) p(b)? NO p(a) p(b)? YES <{p(a)},{}>

More complex queries p a 1 b 1 c 2 d 3 q a b r c d Constraint: xyz ( p(x,z) p(y,z) x=y) x [q(x) p(x,1)]? ACS: NO

Data Complexity Both ACS and CQA are NP-Hard problems [KG] [CM] Reduction from monotone 3-SAT ACS is tractable for * * queries. [KG] CQA is tractable for * queries [KG] Some subclasses of * queries [ABC, CM, FM] with suitable restrictions on the class of constraints

Query rewriting How can we use existing database infrastructure to support these query semantics? Solution: Query rewriting Algorithm ACS-Rewrite[Q,Ω] Rewrite Q ( * *) into Q' so that evaluating Q' on a standard DBMS gives ACS answers to Q. [KG] Similar rewriting algorithms have been devised for CQA [ABC, CM, FM, KG]

Results Tractable subclasses identified for ACS Finitely-closed means the set of constraints has a finite closure under resolution.

ACS: Ground conjunctive queries Q = p(a,1) p(b,1) C = { x1 x2 y ( p(x1,y) p(x2,y) x1=x2) } Q contradicts C. Answer: NO Rewrite: Q' = False

ACS: qf conjunctive queries Q(x1,y1,x2,y2) = p(x1,y1) p(x2,y2) C = { x1 x2 y ( p(x1,y) p(x2,y) x1=x2) } Q(x1,y1,x2,y2) is consistent w.r.t. C if and only if (x1=x2 y1 y2) Rewrite: Q' = Q(x1,x2,y1,y2) (x1=x2 y1 y2)

ACS: * queries Q = x,y ( q(x,y) r(x,y)) C ={ x q(x,x) } q(x,y) r(x,y) q(x,x) r(x,x) Rewrite: Q' = Q x r(x,x)

Ongoing work Identify other tractable classes Recursive queries and aggregates Other answer semantics preferentially-defeasible data? coarser granularity? Semantics that consider history?

Issues Constraint Language Expressiveness (Skolems, aggregates) Evaluation and Rewriting Techniques Managing Inconsistency Inconsistencies and interdependencies Automatic update in the presence of inconsistencies Queries in the presence of inconsistencies Dynamic Constraints Update Preferences Workflow Analysis, Use, Design Privacy and Security Promoting Convergence 11/09/10 35

References [ABC] M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. Symposium on principles of database systems, Pages 68 79, 1999. [CM] J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput.,197(1-2):90 121, 2005. [FM] A. Fuxman and R. J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4):610 635, 2007. [KG] E. Kao and M. Genesereth. Answers with consistent support complexity and rewriting. Working paper, 2010.