Consistent Query Answering

Similar documents
Consistent Query Answering: Opportunities and Limitations

Database Consistency: Logic-Based Approaches

Consistent Query Answering

Edinburgh Research Explorer

Static, Incremental and Parameterized Complexity of Consistent Query Answering in Databases Under Cardinality-Based Semantics

INCONSISTENT DATABASES

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies

Preference-Driven Querying of Inconsistent Relational Databases

Data integration lecture 3

Consistent Query Answering in Databases

Inconsistency-tolerant logics

Logic Programs for Consistently Querying Data Integration Systems

Optimized encodings for Consistent Query Answering via ASP from different perspectives

Computing Query Answers with Consistent Support

Managing Inconsistencies in Collaborative Data Management

Semantic Optimization of Preference Queries

Consistent Answers from Integrated Data Sources

Query Answering in Peer-to-Peer Data Exchange Systems

Minimal-Change Integrity Maintenance Using Tuple Deletions

The Semantics of Consistency and Trust in Peer Data Exchange Systems

arxiv:cs/ v1 [cs.db] 5 Apr 2002

Relational Databases

A Comprehensive Semantic Framework for Data Integration Systems

Declarative Semantics of Production Rules for Integrity Maintenance

DATABASE MANAGEMENT SYSTEMS

arxiv: v1 [cs.db] 23 May 2016

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016

Approximation Algorithms for Computing Certain Answers over Incomplete Databases

Scalar Aggregation in Inconsistent Databases

Query Decomposition and Data Localization

Incomplete Data: What Went Wrong, and How to Fix It. PODS 2014 Incomplete Information 1/1

Kanata: Adaptation and Evolution in Data Sharing Systems

Database Repairing and Consistent Query Answering

Structural characterizations of schema mapping languages

Query Processing SL03

Mobile and Heterogeneous databases

Logical Foundations of Relational Data Exchange

On the Hardness of Counting the Solutions of SPARQL Queries

On the Role of Integrity Constraints in Data Integration

Answer Sets and the Language of Answer Set Programming. Vladimir Lifschitz

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Chapter 1: Introduction

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Consistent Query Answering for Atemporal Constraints over Temporal Databases

From Database Repair Programs to Consistent Query Answering in Classical Logic (extended abstract)

Estimating the Quality of Databases

Data integration lecture 2

Uncertain Data Models

Database Management Systems Paper Solution

Semantic Errors in Database Queries

Databases -Normalization I. (GF Royle, N Spadaccini ) Databases - Normalization I 1 / 24

Database Management System

Updates through Views

Computing Query Answers with Consistent Support

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

Database Design and Tuning

Unifying Causality, Diagnosis, Repairs and View-Updates in Databases

Bibliographic citation


Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

On the Data Complexity of Consistent Query Answering over Graph Databases

Introduction. 1. Introduction. Overview Query Processing Overview Query Optimization Overview Query Execution 3 / 591

This lecture. Databases -Normalization I. Repeating Data. Redundancy. This lecture introduces normal forms, decomposition and normalization.

Detecting Logical Errors in SQL Queries

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Chapter 2: Intro to Relational Model

Provable data privacy

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

ECE 650 Systems Programming & Engineering. Spring 2018

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Validity-Sensitive Querying of XML Databases

Query Processing & Optimization

Semantic data integration in P2P systems

Learning Queries for Relational, Semi-structured, and Graph Databases

Data Integration: Logic Query Languages

Composing Schema Mapping

Validity-Sensitive Querying of XML Databases

Messing Up with Bart: Error Generation for Evaluating Data-Cleaning Algorithms

Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Query Evaluation and Optimization

Distributed KIDS Labs 1

COSC Dr. Ramon Lawrence. Emp Relation

Essential Gringo (Draft)

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Data about data is database Select correct option: True False Partially True None of the Above

Data Exchange: Semantics and Query Answering

Query Optimization Overview

COMP7640 Assignment 2

Horizontal Aggregations for Mining Relational Databases

Physical Database Design and Tuning. Chapter 20

Rewrite and Conquer: Dealing with Integrity Constraints in Data Integration

Lecture 5 Design Theory and Normalization

SQL and Incomp?ete Data

On Reconciling Data Exchange, Data Integration, and Peer Data Management

Uncertainty in Databases. Lecture 2: Essential Database Foundations

a standard database system

NORMAL FORMS. CS121: Relational Databases Fall 2017 Lecture 18

Relational Model, Relational Algebra, and SQL

Transcription:

Consistent Query Answering Opportunities and Limitations Jan Chomicki Dept. CSE University at Buffalo State University of New York http://www.cse.buffalo.edu/ chomicki 1

Integrity constraints Integrity constraints describe valid database instances. Examples: functional dependencies (FDs): every employee has a single salary. denial constraints: no employee can make more than her manager. inclusion dependencies (INDs): managers have to be employees. The constraints are formulated in first-order logic: n, s, m, s, m. [Emp(n, s, m) Emp(m, s, m ) s > s ]. An inconsistent database violates the constraints. 2

Integrity constraints Integrity constraints describe valid database instances. Examples: functional dependencies (FDs): every employee has a single salary. denial constraints: no employee can make more than her manager. inclusion dependencies (INDs): managers have to be employees. The constraints are formulated in first-order logic: n, s, m, s, m. [Emp(n, s, m) Emp(m, s, m ) s > s ]. An inconsistent database violates the constraints. 3

Integrity constraints Integrity constraints describe valid database instances. Examples: functional dependencies (FDs): every employee has a single salary. denial constraints: no employee can make more than her manager. inclusion dependencies (INDs): managers have to be employees. The constraints are formulated in first-order logic: n, s, m, s, m. [Emp(n, s, m) Emp(m, s, m ) s > s ]. An inconsistent database violates the constraints. 4

Traditional view Integrity constraints are always enforced. Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary This instance cannot arise but... consider data integration. 5

Traditional view Integrity constraints are always enforced. Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary This instance cannot arise but... consider data integration. 6

Traditional view Integrity constraints are always enforced. Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary This instance cannot arise but... consider data integration. 7

Ignoring inconsistency SELECT * FROM Emp WHERE Salary < 25M B. Gates Redmond, WA 20M A. Grove Santa Clara, CA 10M The result is not fully reliable. 8

Ignoring inconsistency SELECT * FROM Emp WHERE Salary < 25M B. Gates Redmond, WA 20M A. Grove Santa Clara, CA 10M The result is not fully reliable. 9

Quarantining inconsistency The facts involved in an inconsistency are not used in the derivation of query answers [Bry, IICIS 97]. SELECT * FROM Emp WHERE Salary < 25M A. Grove Santa Clara, CA 10M But what about SELECT EmpName FROM Emp WHERE Salary > 10K A. Grove Partial information cannot be obtained. 10

Quarantining inconsistency The facts involved in an inconsistency are not used in the derivation of query answers [Bry, IICIS 97]. SELECT * FROM Emp WHERE Salary < 25M A. Grove Santa Clara, CA 10M But what about SELECT EmpName FROM Emp WHERE Salary > 10K A. Grove Partial information cannot be obtained. 11

Quarantining inconsistency The facts involved in an inconsistency are not used in the derivation of query answers [Bry, IICIS 97]. SELECT * FROM Emp WHERE Salary < 25M A. Grove Santa Clara, CA 10M But what about SELECT EmpName FROM Emp WHERE Salary > 10K A. Grove Partial information cannot be obtained. 12

Quarantining inconsistency The facts involved in an inconsistency are not used in the derivation of query answers [Bry, IICIS 97]. SELECT * FROM Emp WHERE Salary < 25M A. Grove Santa Clara, CA 10M But what about SELECT EmpName FROM Emp WHERE Salary > 10K A. Grove Partial information cannot be obtained. 13

Constraints with exceptions Weaken the constraints [Borgida, TODS 85], without affecting query evaluation. Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Constraint with exception: ( x, y, z, y, z ) [Emp(x, y, z) Emp(x, y, z ) y y x B.Gates] 14

Constraints with exceptions Weaken the constraints [Borgida, TODS 85], without affecting query evaluation. Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Constraint with exception: ( x, y, z, y, z ) [Emp(x, y, z) Emp(x, y, z ) y y x B.Gates] 15

Data cleaning Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary Repair: EmpName Address Salary B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Some information is lost. 16

Data cleaning Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary Repair: EmpName Address Salary B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Some information is lost. 17

Data cleaning Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary Repair: EmpName Address Salary B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Some information is lost. 18

CQA: a query-driven approach Consider all repairs: possible databases that result from fixing the original database in a minimal way. Return all the answers that belong to the result of query evaluation in every repair (consistent answers). 19

Emp EmpName Address Salary B. Gates Redmond, WA 20M B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M Functional dependency: EmpN ame Address Salary Repairs: B. Gates Redmond, WA 30M A. Grove Santa Clara, CA 10M B. Gates Redmond, WA 20M A. Grove Santa Clara, CA 10M 20

SELECT * FROM Emp WHERE Salary < 25M A. Grove Santa Clara, CA 10M But SELECT EmpName FROM Emp WHERE Salary > 10K B. Gates A. Grove 21

SELECT * FROM Emp WHERE Salary < 25M A. Grove Santa Clara, CA 10M But SELECT EmpName FROM Emp WHERE Salary > 10K B. Gates A. Grove 22

Inconsistent databases There are many situations when users want/need to live with inconsistent databases: integration of heterogeneous databases with overlapping information not enough information to resolve inconsistencies preservation of all data (even erroneous) the consistency of the database will be restored by executing further transactions inconsistency wrt soft integrity constraints (those that we hope to see satisfied but do not/cannot check) process 23

Research goals Formal definition of reliable ( consistent ) information in an inconsistent database. Computational mechanisms for obtaining consistent information. Computational complexity analysis: tractable vs. intractable classes of queries and integrity constraints trade-off: complexity vs. expressiveness. Implementation: preferably using DBMS technology. 24

Plan of the talk 1. repairs and consistent query answers 2. computing consistent query answers to relational algebra/calculus/sql queries 3. extensions and new directions: other notions of repair probabilistic databases 4. further active research directions 5. related work 6. lessons of CQA research 7. the CQA community 25

Constraint classes Universal:. A 1 A n B 1 B m. Denial:. A 1 A n. Functional dependencies (FDs): X Y. Inclusion dependencies (INDs): P[X] R[Y ]. 26

Arenas, Bertossi, Ch. [PODS 99]. Consistent query answers Repair: a database that satisfies the integrity constraints difference from the given database is minimal (the set of inserted/deleted facts is minimal under set inclusion). A tuple (a 1,...,a n ) is a consistent query answer to a query Q(x 1,...,x n ) in a database r if it is an element of the result of Q in every repair of r. 27

A logical aside Belief revision: semantically: repairing revising the database with integrity constraints consistent query answers counterfactual inference. Logical inconsistency: inconsistent database: database facts together with integrity constraints form an inconsistent set of formulas trivialization of reasoning does not occur because constraints are not used in relational query evaluation. 28

Computational issues There are too many repairs to evaluate the query in each of them. A B a 1 b 1 a 1 b 1 a 2 b 2 a 2 b 2 a n a n b n b n Under the functional dependency A B, this instance has 2 n repairs. 29

Computing consistent query answers Query rewriting: given a query Q and a set of integrity constraints, construct a query Q such that for every database instance r the set of answers to Q in r = the set of consistent answers to Q in r. Representing all repairs: given a set of integrity constraints and a database instance r: 1. construct a space-efficient representation of all repairs of r 2. use this representation to answer (many) queries. Specifying repairs as logic programs. 30

Query rewriting First-order queries transformed using semantic query optimization techniques: [Arenas, Bertossi, Ch., PODS 99]. Functional dependencies: ( x)( y)( z)( y )( z )( Emp(x, y, z) Emp(x, y, z ) y = y ) ( x)( y)( z)( y )( z )( Emp(x, y, z) Emp(x, y, z ) z = z ) Query: Emp(x, y, z). Transformed query: Emp(x, y, z) ( y )( z )( Emp(x, y, z ) y = y ) ( y )( z )( Emp(x, y, z ) z = z ). 31

Query rewriting First-order queries transformed using semantic query optimization techniques: [Arenas, Bertossi, Ch., PODS 99]. Functional dependencies: ( x)( y)( z)( y )( z )( Emp(x, y, z) Emp(x, y, z ) y = y ) ( x)( y)( z)( y )( z )( Emp(x, y, z) Emp(x, y, z ) z = z ) Query: Emp(x, y, z). Transformed query: Emp(x, y, z) ( y )( z )( Emp(x, y, z ) y = y ) ( y )( z )( Emp(x, y, z ) z = z ). 32

Query rewriting First-order queries transformed using semantic query optimization techniques: [Arenas, Bertossi, Ch., PODS 99]. Functional dependencies: ( x)( y)( z)( y )( z )( Emp(x, y, z) Emp(x, y, z ) y = y ) ( x)( y)( z)( y )( z )( Emp(x, y, z) Emp(x, y, z ) z = z ) Query: Emp(x, y, z). Transformed query: Emp(x, y, z) ( y )( z )( Emp(x, y, z ) y = y ) ( y )( z )( Emp(x, y, z ) z = z ). 33

Scope of query rewriting Query rewriting: queries involving conjunctions of literals (relational algebra: σ,, ) and binary universal integrity constraints [Arenas, Bertossi, Ch., PODS 99]. existentially-quantified conjunctions (π, σ, ) and single-key dependencies [Fuxman, Miller, ICDT 05]: CTree queries ( no non-key joins) extended to exclusion dependencies [Grieco et al., CIKM 05]. 34

SELECT Name FROM Emp WHERE Salary > 10K SELECT Name FROM Emp e1 WHERE Salary > 10K AND NOT EXISTS (SELECT * FROM EMPLOYEE e2 WHERE e2.name = e1.name AND e2.salary <= 10K) 35

Experimental results (ConQuer) [Fuxman, Fazli, Miller, SIGMOD 05]. The system ConQuer: back-end: DB2 UDB. query rewriting into SQL, producing unnested queries queries from the TPC-H workload databases can be annotated with consistency indicators tested for synthetic databases with 400K 8M tuples, 0 50% conflicts relatively little overhead compared to evaluating the original query using the backend 36

Conflict hypergraph Denial constraints only. Vertices: facts in the original instance. Edges: (minimal) sets of facts that violate some constraint. Repairs: maximal independent sets. B. Gates Redmond, WA 20M A. Grove Santa Clara, CA 10M B. Gates Redmond, WA 30M 37

Computing CQAs using the conflict hypergraph [Ch., Marcinkowski, I&C, 2005]. Algorithm HProver: 1. input: query Φ a disjunction of ground atoms 2. Φ = P 1 (t 1 ) P m (t m ) P m+1 (t m+1 ) P n (t n ) 3. find a repair including P 1 (t 1 ),...,P m (t m ) and excluding P m+1 (t m+1 ),...,P n (t n ) by enumerating the appropriate edges. Excluding a fact A: A is not in the original instance, or A belongs to an edge {A,B 1,...,B k } in the conflict hypergraph and B 1,...,B k belong to the repair. P (data complexity) for quantifier-free queries and denial constraints. 38

Hippo Q : σ,,, Upper Envelope Translation Q : σ,, Q c :,, DB Evaluation Candidates Grounding Conflict Detection Conflict Graph HProver Consistent Answers 39

Experimental results (Hippo) [Ch., Marcinkowski, Staworko, CIKM 04]. The system Hippo: back-end: PostgreSQL conflict hypergraph (edges) in main memory optimization can eliminate many (sometimes all) database accesses in HProver tested for synthetic databases with up to 200K tuples, 2% conflicts computing consistent query answers using the conflict hypergraph faster than evaluating transformed queries relatively little overhead compared to evaluating the original query using the backend 40

Specifying repairs as logic programs [Arenas, Bertossi, Ch., TPLP 03], [Greco, Greco and Zumpano, TKDE 03], [Barcelo, Bertossi, PADL 03], [Eiter et al., ICLP 03]: using logic programs with negation and disjunction repairs answer sets several different encodings implemented using main-memory LP systems (dlv, smodels) Π p 2-complete problems Scope: arbitrary universal constraints, some inclusion dependencies arbitrary first-order queries 41

The approach of [Arenas, Bertossi, Ch., TPLP 03]. Facts: Emp( B.Gates, Redmond WA, 20K). Emp( B.Gates, Redmond WA, 30K). Emp( A.Grove, Santa Clara CA, 10K). Rules: Emp (x, y, z) Emp (x, y, z ) Emp(x, y, z), Emp(x, y, z ), y y. Emp (x, y, z) Emp (x, y, z ) Emp(x, y, z), Emp(x, y, z ), z z. Emp (x, y, z) Emp(x, y, z), not Emp (x, y, z). Emp (x, y, z) not Emp(x, y, z), not Emp (x, y, z). 42

[Eiter at al., ICLP 03]. Experimental data (INFOMIX) The system INFOMIX: combines CQA with data integration (GAV) relational backend: PostgreSQL uses the disjunctive LP system dlv for repair computations optimization techniques: localization, factorization tested on legacy databases of up to 50K tuples. 43

Data complexity of consistent query answers Keys / primary keys) Denial Universal σ,, P P P(binary) σ,,, P P? σ,π co-npc / P co-npc co-np-hard σ,, π co-npc / P(Ctree) co-npc co-np-hard Tractable approaches: query rewriting conflict graphs Sources: [Ch., Marcinkowski, I&C 05], [Fuxman, Miller, ICDT 05], [Cali, Lembo, Rosati, PODS 03]. 44

Tractable/intractable queries Tractable (P): under any denial constraints: SELECT * FROM P UNION (SELECT * FROM Q EXCEPT SELECT * FROM R) 45

Schema: CREATE TABLE P(A PRIMARY KEY, B); CREATE TABLE Q(C PRIMARY KEY, D) Tractable (P): SELECT Q.D FROM P, Q WHERE P.B = Q.C Intractable (co-np-complete): SELECT Q.D FROM P, Q WHERE P.B = Q.D 46

Schema: CREATE TABLE P(A PRIMARY KEY, B); CREATE TABLE Q(C PRIMARY KEY, D) Tractable (P): SELECT Q.D FROM P, Q WHERE P.B = Q.C Intractable (co-np-complete): SELECT Q.D FROM P, Q WHERE P.B = Q.D 47

Schema: CREATE TABLE P(A PRIMARY KEY, B); CREATE TABLE Q(C PRIMARY KEY, D) Tractable (P): SELECT Q.D FROM P, Q WHERE P.B = Q.C Intractable (co-np-complete): SELECT Q.D FROM P, Q WHERE P.B = Q.D 48

Alternative frameworks Different assumptions about database completeness and correctness (in the presence of inclusion dependencies): possibly incorrect but complete: repairs by deletion only [Ch., Marcinkowski, I&C 05] possibly incorrect and incomplete: fix FDs by deletion, INDs by insertion [Cali, Lembo, Rosati, PODS 03]. 49

Different notions of minimal repairs: minimal cardinality changes [Lopatenko, Bertossi, 06]: tractability of incremental CQA: given a consistent database D and a fixed sequence of updates U, what is the complexity of computing CQA over U(DB)? repairing attribute values. 50

Attribute-based repairs Several different approaches: (A) ground and non-ground repairs [Wijsen, TODS 05] (B) project-join repairs [Wijsen, FQAS 06] (C) repairs minimizing Euclidean distance [Bertossi et al., DBPL 05] (D) repairs of minimum cost [Bohannon et al., SIGMOD 05]. Computational complexity: (A) and (B): similar to tuple based repairs (C) and (D): checking existence of a repair of cost < K NP-complete. 51

Attribute-based repairs Several different approaches: (A) ground and non-ground repairs [Wijsen, TODS 05] (B) project-join repairs [Wijsen, FQAS 06] (C) repairs minimizing Euclidean distance [Bertossi et al., DBPL 05] (D) repairs of minimum cost [Bohannon et al., SIGMOD 05]. Computational complexity: (A) and (B): similar to tuple based repairs (C) and (D): checking existence of a repair of cost < K NP-complete. 52

Project-join repairs PJ-repairs: repairs of a lossless join decomposition. EmpDept Name Dept Location John Sales Buffalo Functional dependencies: Name Dept Dept Location Mary Sales Toronto Repairs: John Sales Buffalo Mary Sales Toronto 53

Project-join repairs PJ-repairs: repairs of a lossless join decomposition. EmpDept Name Dept Location John Sales Buffalo Functional dependencies: Name Dept Dept Location Mary Sales Toronto Repairs: John Sales Buffalo Mary Sales Toronto 54

Project-join repairs PJ-repairs: repairs of a lossless join decomposition. EmpDept Name Dept Location John Sales Buffalo Functional dependencies: Name Dept Dept Location Mary Sales Toronto Repairs: John Sales Buffalo Mary Sales Toronto 55

Decomposition π Name,Dept (EmpDept) π Dept,Location (EmpDept) Name Dept Location John Sales Buffalo John Sales Toronto Mary Sales Buffalo Mary Sales Toronto PJ-repairs: John Sales Buffalo Mary Sales Buffalo John Sales Toronto Mary Sales Toronto 56

Decomposition π Name,Dept (EmpDept) π Dept,Location (EmpDept) Name Dept Location John Sales Buffalo John Sales Toronto Mary Sales Buffalo Mary Sales Toronto PJ-repairs: John Sales Buffalo Mary Sales Buffalo John Sales Toronto Mary Sales Toronto 57

Decomposition π Name,Dept (EmpDept) π Dept,Location (EmpDept) Name Dept Location John Sales Buffalo John Sales Toronto Mary Sales Buffalo Mary Sales Toronto PJ-repairs: John Sales Buffalo Mary Sales Buffalo John Sales Toronto Mary Sales Toronto 58

Probabilistic framework for dirty databases [Andritsos, Fuxman, Miller, ICDE 06]. Framework: potential duplicates identified and grouped into clusters worlds repairs: one tuple from each cluster world probability: product of tuple probabilities clean answers: in the query result in some (supporting) world clean answer probability: sum of the probabilities of supporting worlds 59

EmpSal EmpName Salary prob Functional dependency: EmpName Salary B. Gates 20M 0.7 B. Gates 30M 0.3 A. Grove 10M 0.5 A. Grove 20M 0.5 60

Query rewriting: SELECT Name FROM Emp WHERE Salary > 15M SELECT Name, SUM(e.prob) FROM Emp e WHERE Salary > 15M GROUP BY Name Evaluation of the rewritten query: B. Gates 20M 0.7 B. Gates 30M 0.3 A. Grove 10M 0.5 A. Grove 20M 0.5 B. Gates 1 A. Grove 0.5 61

Query rewriting: SELECT Name FROM Emp WHERE Salary > 15M SELECT Name, SUM(e.prob) FROM Emp e WHERE Salary > 15M GROUP BY Name Evaluation of the rewritten query: B. Gates 20M 0.7 B. Gates 30M 0.3 A. Grove 10M 0.5 A. Grove 20M 0.5 B. Gates 1 A. Grove 0.5 62

Further active topics SQL queries: aggregation: glb/lub answers grouping Data integration: GAV, LAV, GLAV,... tension between repairing and satisfying source-to-target dependencies P2P: how to isolate an inconsistent peer 63

Nulls: repairs with nulls? clean semantics vs. SQL conformance Priorities: preferred repairs conflict resolution XML: what is an integrity constraint? what is a repair? minimum edit distance 64

Related work Belief revision: revising database with integrity constraints revised theory changes with each database update emphasis on semantics (AGM postulates), not computation complexity results [Eiter, Gottlob, AI 92] do not quite transfer beyond revision: merging, arbitration Disjunctive information: repair possible world (sometimes) consistent answer certain answer (sometimes) using disjunctions to represent resolved conflicts complexity results [Imielinski et al., JCSS 95] do not quite transfer 65

Lessons of CQA research Need to tame the semantic explosion: different repair semantics overwhelm the potential user More focus on applications: 99% technology, 1% applications often only repairing is needed heuristics Integrate with other tools: schema matching/mapping data cleaning 66

The CQA community Some data: over 30 active researchers 80 100 publications (since 1999) papers in major conferences (PODS, SIGMOD, ICDT, ICDE, CIKM, DBPL) and journals (TODS, TKDE, TCS, I&C, AMAI, JAL, TPLP) outreach to the AI community (qualified success) yearly workshop: IIDB (+LAAIC?) 67

Selected papers 1. M. Arenas, L. Bertossi, J. Chomicki, Consistent Query Answers in Inconsistent Databases. PODS 99. 2. L. Bertossi, J. Chomicki, Query Answering in Inconsistent Databases. In Logics for Emerging Applications of Databases, J. Chomicki, R. van der Meyden, G. Saake [eds.], Springer-Verlag, 2003. 3. J. Chomicki and J. Marcinkowski, On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases. In Inconsistency Tolerance, L. Bertossi, A. Hunter, T. Schaub, editors, Springer-Verlag, 2004. 4. L. Bertossi, Consistent Query Answering in Databases. SIGMOD Record, June 2006. 68