Database Repairing and Consistent Query Answering

Size: px
Start display at page:

Download "Database Repairing and Consistent Query Answering"

Transcription

1 () Keys Université de Mons (UMONS) October 20, 2009

2 Outline () Keys 1 2 () Keys

3 Disclaimer () Keys Not a comprehensive overview. Selected topics, biased by personal interests. See [BC03, Ber06, Cho06, Cho07] for overviews.

4 Outline () Keys 1 2 () Keys

5 Context: Inconsistent, Incomplete, Imprecise Data () Keys Traditional database approach Data integrity assume data is consistent, certain, precise. Completeness All relevant data is in the database (CWA). How about, for example, today s citation databases? ISI Web of Science is incomplete for computer science [MCSvL09]. CiteSeer seems no longer be actively maintained. Google Scholar is imprecise (e.g., misspelled names, citations from web pages that are no publications,...)...

6 Context: Inconsistent, Incomplete, Imprecise Data () Keys Traditional database approach Data integrity assume data is consistent, certain, precise. Completeness All relevant data is in the database (CWA). How about, for example, today s citation databases? ISI Web of Science is incomplete for computer science [MCSvL09]. CiteSeer seems no longer be actively maintained. Google Scholar is imprecise (e.g., misspelled names, citations from web pages that are no publications,...)...

7 Inconsistent Data () Keys Inconsistent data Data violating integrity constraints: Classical dependencies: key, fd, egd, tgd,... New dependencies [Fan08, FGJ08]: cfd, md,... Conditional functional dependency [CFG + 07] ZIP, CountryCode = 44 Street Matching dependency [S09] Phone : 1.0 Street : 0.9 If two tuples match on Phone, then the similarity between their Street-values must exceed 0.9 (e.g., Spen Ln Spem Ln).

8 Data Cleaning Data cleaning Fix errors by some minimal change. () Keys Data cleaning CC Phone Name Street City ZIP Mike New St York A Joe Spen Ln York A1 Kind of changes that can be considered: Delete the first or the second row. Modify New St in Spen Ln (or vice versa). Replace one occurrence of A1 with a new fresh value..

9 Repair () Keys Relative to a database db a set IC of integrity constraints. Definition Repair A repair rep is a database (over the same schema) such that: : rep = IC; Preferential: there is no database rep such that rep = IC rep is preferred to rep, relative to some (partial) preference order.

10 Example () Keys repairing WorksFor EName DName Ed Toys Dept DName Shoes Inclusion dependency WorksFor[2] Dept[1] Two consistent databases: WorksFor EName DName Ed Toys WorksFor EName DName Ed Toys Dept Dept DName Shoes Toys DName Toys We may prefer the first way of repairing.

11 Outline () Keys 1 2 () Keys

12 Preference Order on the Repair Space () Keys Several preference orders have been proposed: Based on symmetric difference Based on cardinality Loosely sound semantics Based on homomorphism Based on metric distance... Jan Chomicki coined the term semantic explosion in this context.

13 Symmetric Difference Repairs [ABC99] () Keys Definition rep 1 rep 2 if (db rep 1 ) (db rep 2 ) Minimize (w.r.t. ) the set of deleted inserted facts. Equivalent definition rep 1 (rep 2 db) rep 1 rep 2 if (rep 1 \ db) rep 2 rep 1 rep 2 Symmetric difference repair Let db = {P( a)} rep 1 = {P( a), Q( b)} rep 2 = {} Then, rep 1 rep 2 are not comparable by.

14 Example of Symmetric Difference Repair () Keys Infinitely many repairs WorksFor EName DName Ed Toys Dept DName Budget Shoes 3000K Inclusion dependency WorksFor[2] Dept[1] Symmetric difference repairs: WorksFor EName DName Dept DName Budget Shoes 3000K WorksFor EName DName Ed Toys Dept DName Budget Shoes 3000K Toys 9999K

15 Example of Symmetric Difference Repair () Keys Set inclusion vs. counting EMP Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 20 An clerk 40 Functional dependency Name Rank, Sal (EMP(x 1, clerk, s 1 ) EMP(x 2, boss, s 2 ) s 1 s 2 ) The symmetric difference repairs are: Name Rank Sal Ed clerk 28 Tim clerk 30 An clerk 40 Name Rank Sal An boss 20

16 Example of Symmetric Difference Repair () Keys Set inclusion vs. counting EMP Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 20 An clerk 40 Functional dependency Name Rank, Sal (EMP(x 1, clerk, s 1 ) EMP(x 2, boss, s 2 ) s 1 s 2 ) The symmetric difference repairs are: Name Rank Sal Ed clerk 28 Tim clerk 30 An clerk 40 Name Rank Sal An boss 20

17 Cardinality Repairs () Keys Definition card rep 1 card rep 2 if db rep 1 < db rep 2 Cardinality repair Name Rank Sal Ed clerk 28 Tim clerk 30 An clerk 40 EMP Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 20 An clerk 40 card Name Rank Sal An boss 20

18 Component-cardinality Repairs [AK09] () Keys Component-cardinality repairs WorksFor EName DName Ed Shoes An Shoes Dept DName Inclusion dependency WorksFor[2] Dept[1] Two consistent databases: rep 1 Ed Shoes WorksFor EName DName An Shoes Dept DName Shoes rep 2 WorksFor EName DName Dept DName rep 1 card rep 2, but rep 2 is preferred if we only consider Dept.

19 Loosely Sound Semantics [CLR03] () Keys Definition ls rep 1 ls rep 2 if rep 1 (rep 2 db) Maximize the set of preserved database facts. Open World Assumption: Add as much as you like. Loosely sound semantics Let db = {P( a)} rep 1 = {P( a), Q( b)} rep 2 = {} Then, rep 1 ls rep 2. Cardinality variant ls card rep 1 ls card rep 2 if rep 1 db > rep 2 db

20 Entire Tuples vs. Components of Tuples () Keys EMP Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 20 An clerk 40 Name Rank, Sal (EMP(x 1, clerk,s 1 ) EMP(x 2, boss,s 2 ) s 1 s 2 ) Fixing components of tuples: Name Rank Sal Ed clerk 28 Tim clerk 30 An clerk 20 Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 40

21 Entire Tuples vs. Components of Tuples () Keys EMP Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 20 An clerk 40 Name Rank, Sal (EMP(x 1, clerk,s 1 ) EMP(x 2, boss,s 2 ) s 1 s 2 ) Fixing components of tuples: Name Rank Sal Ed clerk 28 Tim clerk 30 An clerk 20 Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 40

22 Homomorphism Based Repairs [Wij05] Idea rep 1 ls rep 2 (rep 1 db) (rep 2 db) () Keys Replace A B with A homomorphic to B ; A B with glb(a, B). Definition Greatest Lower Bound glb(a, B) Lower bound: glb(a, B) is homomorphic to A to B; Greatest: every database that is homomorphic to A to B, is also homomorphic to glb(a, B). Definition hom rep 1 hom rep 2 if glb(rep 2,db) homomorphic to glb(rep 1,db), but not vice versa.

23 Recall Homomorphism () Keys Definition Homomorphism A database db 1, possibly with variables, is homomorphic to a database db 2 if there exists a substitution θ for the variables in db 1 such that θ(db 1 ) db 2. Homomorphism {R( a, a, a )} {R( b, b, a )} {R(x, x, a )} {R(u, w, a ), R(w, u, a )}

24 Example Homomorphism Repairs () Keys Homomorphism repair db Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 20 An clerk 40 rep 2 Name Rank Sal Ed clerk 28 Tim clerk 30 An boss 40 glb(rep 2,db) Name Rank Sal Ed clerk 28 Tim clerk 30 An boss y An x 40 rep 2 db Name Rank Sal Ed clerk 28 Tim clerk 30

25 Numerical Attributes () Keys Assumptions Primary keys are satisfied immutable. Inconsistencies in numerical data. Inconsistent numerical data x y z(emp(x, y, z) (y < 5) (z 6000)) Two Approaches Update Based [FFP05] EMP Emp Status Sal Ed Tim Least Square Fixes [BBFL08]

26 Update Based [FFP05] () Keys Principle Update Based rep 1 is preferred to rep 2 if it requires updating a smaller set of values (in terms of set inclusion or cardinality). The actual new values after update do not matter. Update based x y z(emp(x, y, z) (y < 5) (z 6000)) EMP Emp Status Sal t 1 Ed t 2 Tim Emp Status Sal Ed Tim The atomic updates are (t 1, Status, 8 ) (t 2, Sal, 1000 ). The set of updated values is {(t 1, Status), (t 2, Sal)}.

27 Least Square Fixes [BBFL08] () Keys Principle least square fixes rep 1 is preferred to rep 2 if the distance between rep 1 db is smaller than the distance between rep 2 db. Least square fixes x y z(emp(x,y,z) (y < 5) (z 6000)) EMP Emp Status Sal Ed Tim Emp Status Sal Ed Tim Distance for Ed-tuple: w Status (2 2) 2 + w Sal ( ) 2 Distance for Tim-tuple: w Status (4 5) 2 + w Sal ( ) 2 Global distance: Σ

28 Comparison Numercial Attributes () Keys Repairs are different in both approaches. Comparison x y z(ects(x,y,z) (y + z 120)) ECTS SID Year1 Year2 Ed Tim SID Year1 Year2 Ed Tim This would not be a prefered repair in the update based approach.

29 Repair Checking () Keys Relative to a set IC of integrity constraints a type of repair. Definition Repair checking Repair checking is the complexity of (checking membership of) the set: RC(IC) = {(db, rep) rep is a repair of db} For results, see [CM05, AK09].

30 Topics to Work on () Keys Tailoring the repair process: Repair-to-source source-to-repair dependencies After double-checking the list of bosses: EMP db (x, boss, z) z EMP rep (x, boss, z ) Take into account provenance. Probability distribution over repairs. semi-structured data....

31 Outline () Keys 1 2 () Keys

32 Semantics () Keys Relative to a database db a set IC of integrity constraints. Definition answer The consistent (or certain) answer to a query q( x) is defined by: { a q( a) is true in every repair of db} For a Boolean query q, we say that q is consistently true if q is true in every repair.

33 Example () Keys rep 1 Name City Sal Blake Paris 10 EMP Name City Sal Blake Paris 10 Blake Rome 10 rep 2 Name City Sal Blake Rome 10 q 1 (y) = {y z(emp( Blake, y, z))} {} q 2 (z) = {z y(emp( Blake, y, z))} { 10 } y(emp( Blake, y, 10 )) is consistently true.

34 Complexity for Boolean () Keys Relative to: a set IC of integrity constraints; a type of repair (e.g. loosely sound semantics); a Boolean query q. Complexity query answering is the complexity (of deciding membership) of the set: (q,ic) = {db q is true in every repair of db}

35 Objectives () Keys Tractability Characterize queries q integrity constraints IC for which (q,ic) is in P. FO Definability Characterize queries q integrity constraints IC for which (q,ic) is first-order definable ( hence in P). First-order definable (q,ic) is first-order definable if there exists a first-order sentence ψ such that for every database db, db (q,ic) if only if db = ψ.

36 History () Keys Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki: Answers in Inconsistent s. PODS 1999 [ABC99] Numerous publications since Implemented in prototypes systems. Hippo [CMS04b] ConQuer [FFM05]...

37 Outline () Keys 1 2 () Keys

38 Denial Constraints () Keys Definition Denial constraint A denial constraint has the form: x 1... x k (R 1 ( x 1 ) R k ( x k ) ϕ( x 1,..., x k )) where ϕ is a conjunction of atomic formulas using built-in predicates (=, <). Denial constraints For the schema EMP[Name, Rank, Sal]: u, x, y, z(emp(u, boss, y) EMP(x, clerk, z) y < z) x, y 1, z 1, y 2, z 2 (EMP(x, y 1, z 1 ) EMP(x, y 2, z 2 ) y 1 y 2 ) x, y 1, z 1, y 2, z 2 (EMP(x, y 1, z 1 ) EMP(x, y 2, z 2 ) z 1 z 2 )

39 Repairs () Keys We assume here that repairs are maximally consistent subsets of db. This corresponds to symmetric difference repairs or loosely sound semantics (adding new atoms is useless in the case of denials).

40 Exponential Number of Repairs () Keys Exponential number of repairs x(r(x, a ) R(x, b )) A B 1 a 1 b 2 a 2 b. n a n b There are 2 n different repairs.

41 Conflict Hypergraph () Keys Relative to a database db a set of denial constraints. Definition Conflict hypergraph A conflict hypergraph is a hypergraph whose hyperedges are subsets of db. For every denial constraint x 1... x k (R 1 ( x 1 ) R k ( x k ) ϕ( x 1,..., x k )), if θ is a valuation such that θ( x i ) = a i for 1 i k db = R 1 ( a 1 ) R k ( a k ) ϕ( a 1,..., a k ), then {R 1 ( a 1 ),...,R k ( a k )} is an hyperedge.

42 Example Conflict Hypergraph [CM05] () Keys Conflict hypergraph EMP Name Rank Sal t 1 Ed clerk 28 t 2 Tim clerk 30 t 3 An boss 20 t 4 An clerk 40 Properties t t 3 2 t 4 t 1 Every repair is a maximal (w.r.t. ) subset of db that contains no hyperedge of the conflict hypergraph. The number of hyperedges is polynomial in the size of db.

43 Boolean () Keys Definition Boolean query A quantifier-free Boolean query is a Boolean combination of ground atoms. It can be assumed to be in CNF: φ 1 φ 2 φ l, where each φ i is of the form A 1 A m B 1 B n, with A 1,...,A m, B 1,...,B n distinct ground atoms. Boolean query EMP( An, clerk, 40 ) EMP( Ed, clerk, 28 )

44 Denial Constraints () Keys Question The problem is to verify for 1 i l whether φ i = A 1 A m B 1 B n is true in every repair. We ask instead whether φ i is false in some repair, i.e. whether some repair rep satisfies φ i = A 1 A m B 1 B n.

45 Denial Constraints () Keys Crux HProver algorithm [CM05, CMS04a] Any repair rep satisfying A 1 A m B 1 B n must verify the following conditions: 1 A 1,...,A m rep. 2 For each edge E in the conflict hypergraph, E rep. 3 Maximality: for 1 j n, if B j db, then there is an edge E j in the conflict hypergraph such that B j E j E j \ {B j } rep. Why is HProver polynomial in the size of db? The Maximality condition chooses n hyperedges among a polynomial number of hyperedges.

46 Outline () Keys 1 2 () Keys

47 Imprecise Data () Keys Primary key violations are a natural way to model imprecise data. Imprecise data Speakers Name Jan Jan Jef Jef Jean-Luc Affiliation UA UHasselt UMH UMons FUNDP Tuples with the same Name but different Affiliation are mutually exclusive. Tuples with different Name are independent.

48 Probabilistic Data [AFM06, HAKO09, DRS09] () Keys Probabilistic data Speakers Name Affiliation P Jan UA 0.6 Jan UHasselt 0.4 Jef UMH 0.8 Jef UMons 0.2 Jean-Luc FUNDP 1.0

49 () Keys Definition Boolean conjunctive query A Boolean conjunctive query has the form (R 1 ( x 1, x 1 ) R k ( x 1, x k )) This query contains a self-join if R i = R j for some i j. Since primary keys are underlined, IC can be read from q. We write (q) instead of (q,ic) if IC is clear from q.

50 Reduction From Graph 3-Colorability Theorem (q) is conp-complete for () Keys 1 2 q = x y z(c(x, z) C(y, z) E(x, y)) Graph is 3-colorable q is false in some repair 3 4 C Vertex Color 1 red 1 blue 1 yellow. 4 red 4 blue 4 yellow E From To

51 Reduction From Graph 3-Colorability Theorem (q) is conp-complete for () Keys 1 2 q = x y z(c(x, z) C(y, z) E(x, y)) Graph is 3-colorable q is false in some repair 3 4 C Vertex Color 1 blue 2 red 3 yellow 4 blue E From To

52 First-order Definability Example () Keys (q 0 ) is first-order definable rep 1 Name City Sal Blake Paris 10 EMP Name City Sal Blake Paris 10 Blake Rome 10 rep 2 Name City Sal Blake Rome 10 q 0 = y(emp( Blake, y, 10 )) is consistently true if only if the (possibly inconsistent) database satisfies y (EMP( Blake, y, 10 ) y z (EMP( Blake, y, z) z = 10 ))

53 First-order Definability Example () Keys (q 1 ) is first-order definable E EName DName Ed Shoes Ed Toys D DName Budget Boss Shoes 3000K An Shoes 3100K An Toys 3200K An q 1 = y z(e( Ed, y) D(y, z, An )) is consistently true if only if the (possibly inconsistent) database satisfies y (E( Ed, y) y (E( Ed, y ) z (D(y, z, An ) z w (D(y, z, w) w = An ))))

54 Rewrite Rules that Emerge [Wij09b] () Keys Rewrite rule for conjunctive queries without self-join 1 First, express all constraints on variables constants in a conjunction ϕ of equalities. For example, x y(r(x, y) S(y, a )) is expressed as R(x, y 1 ) S(y 2, w a ) y 1 = y 2 w a = a }{{} ϕ 2 Next, apply the following rewrite rules: Rew[R( x, y) q ϕ] = x y(r( x, y) y(r( x, y) Rew[q ϕ])) Induction basis: Rew[ϕ] = ϕ 3 Manual simplification may be applied for readability.

55 db = Rew[q]? q is true in every repair () Keys For q a conjunctive query without self-join, db satisfies Rew[q] q true in every in every repair. is not generally true Let q = x y z(r(x, z) S(y, z)). Rew[q] = x z (R(x, z) z (R(x, z ) y w (S(y, w) w (S(y, w ) z = w )))) R A C 1 a 2 b S B C 3 a 3 b By symmetry, same problem for x y z(s(y, z) R(x, z)).

56 Kind of Results [Wij09b] () Keys Theorem Let q be conjunctive without self-join. (q) is first-order definable if q has a rooted join tree such that whenever R( x, y) is the parent of S( u, w), then at least one of the following conditions is satisfied: every variable that occurs in x, occurs in u; or if variable v occurs in both R( x, y) S( u, w), then v occurs in u. satisfying theorem s conditions R 4 (z, w) R 0 (x, y) {x, y} {y} R 2 (y, z) {z} R 1 (x, y) {y} R 3 (y, a )

57 Join Tree () Keys Definition Join tree A join tree for a conjunctive query q is an undirected tree whose vertices are the atoms of q such that: Connectedness Condition: whenever the same variable v occurs in two atoms A B, then v occurs in each atom on the unique path linking A B. It is common to label each edge with the set of variables that occur in both end points.

58 Kind of Results [Wij09a] () Keys Theorem Let q = (R( x, y) S( u, w)). Let L be the set of atoms that occur in both atoms. Let X be the set of variables that occur in x. Let U be the set of variables that occur in u. (q) is first-order definable if only if X U or U X or L X or L U.

59 Open Questions () Keys Are the following statements equivalent for every Boolean conjunctive query q without self-join? 1 For some ordering of the atoms in q, for every database db, db = Rew[q] iff db (q). 2 (q) is first-order definable. 3 (q) is not conp-complete. Clearly, 1 = 2, 2 = 3. Is there a Boolean conjunctive query q without self-join such that (q) is first-order definable but our rewrite rule does not apply? Is there a Boolean conjunctive query q without self-join such that (q) is in P but not first-order definable?

60 Open Questions () Keys Are the following statements equivalent for every Boolean conjunctive query q without self-join? 1 For some ordering of the atoms in q, for every database db, db = Rew[q] iff db (q). 2 (q) is first-order definable. 3 (q) is not conp-complete. Clearly, 1 = 2, 2 = 3. Is there a Boolean conjunctive query q without self-join such that (q) is first-order definable but our rewrite rule does not apply? Is there a Boolean conjunctive query q without self-join such that (q) is in P but not first-order definable?

61 Topics to Work on () Keys Almost-certain answers. Semi-structured Web data. Argumentation....

62 References I () Keys Marcelo Arenas, Leopoldo E. Bertossi, Jan Chomicki. query answers in inconsistent databases. In PODS, pages ACM Press, Periklis Andritsos, Ariel Fuxman, Renée J. Miller. Clean answers over dirty databases: A probabilistic approach. In Ling Liu, Andreas Reuter, Kyu-Young Whang, Jianjun Zhang, editors, ICDE, page 30. IEEE Computer Society, Foto N. Afrati Phokion G. Kolaitis. Repair checking in inconsistent databases: algorithms complexity. In Fagin [Fag09], pages Leopoldo E. Bertossi, Loreto Bravo, Enrico Franconi, Andrei Lopatenko. The complexity approximation of fixing numerical attributes in databases under integrity constraints. Inf. Syst., 33(4-5): , Leopoldo E. Bertossi Jan Chomicki. answering in inconsistent databases. In Jan Chomicki, Ron van der Meyden, Gunter Saake, editors, Logics for Emerging Applications of s, pages Springer, Leopoldo E. Bertossi. query answering in databases. SIGMOD Record, 35(2):68 76, 2006.

63 References II () Keys Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, Shuai Ma. Improving data quality: Consistency accuracy. In Christoph Koch, Johannes Gehrke, Minos N. Garofalakis, Divesh Srivastava, Karl Aberer, An Deshpe, Daniela Florescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, Erich J. Neuhold, editors, VLDB, pages ACM, Jan Chomicki. query answering: Opportunities limitations. In DEXA Workshops, pages IEEE Computer Society, Jan Chomicki. query answering: Five easy pieces. In Thomas Schwentick Dan Suciu, editors, ICDT, volume 4353 of Lecture Notes in Computer Science, pages Springer, Andrea Calì, Domenico Lembo, Riccardo Rosati. On the decidability complexity of query answering over inconsistent incomplete databases. In PODS, pages ACM, Jan Chomicki Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90 121, Jan Chomicki, Jerzy Marcinkowski, Slawomir Staworko. Computing consistent query answers using conflict hypergraphs. In David A. Grossman, Luis Gravano, ChengXiang Zhai, Otthein Herzog, David A. Evans, editors, CIKM, pages ACM, 2004.

64 References III () Keys Jan Chomicki, Jerzy Marcinkowski, Slawomir Staworko. Hippo: A system for computing consistent answers to a class of sql queries. In Elisa Bertino, Stavros Christodoulakis, Dimitris Plexousakis, Vassilis Christophides, Manolis Koubarakis, Klemens Böhm, Elena Ferrari, editors, EDBT, volume 2992 of Lecture Notes in Computer Science, pages Springer, Nilesh N. Dalvi, Christopher Ré, Dan Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7):86 94, Ronald Fagin, editor. Theory - ICDT 2009, 12th International Conference, St. Petersburg, Russia, March 23-25, 2009, Proceedings, volume 361 of ACM International Conference Proceeding Series. ACM, Wenfei Fan. Dependencies revisited for improving data quality. In Maurizio Lenzerini Domenico Lembo, editors, PODS, pages ACM, Ariel Fuxman, Elham Fazli, Renée J. Miller. Conquer: Efficient management of inconsistent databases. In Fatma Özcan, editor, SIGMOD Conference, pages ACM, Sergio Flesca, Filippo Furfaro, Francesco Parisi. query answers on numerical databases under aggregate constraints. In Gavin M. Bierman Christoph Koch, editors, DBPL, volume 3774 of Lecture Notes in Computer Science, pages Springer, 2005.

65 References IV () Keys Wenfei Fan, Floris Geerts, Xibei Jia. A revival of integrity constraints for data cleaning. PVLDB, 1(2): , Jiewen Huang, Lyublena Antova, Christoph Koch, Dan Olteanu. Maybms: a probabilistic database management system. In Ugur Çetintemel, Stanley B. Zdonik, Donald Kossmann, Nesime Tatbul, editors, SIGMOD Conference, pages ACM, Bertr Meyer, Christine Choppy, Jørgen Staunstrup, Jan van Leeuwen. Viewpoint - research evaluation for computer science. Commun. ACM, 52(4):31 34, Shaoxu Song Lei Chen Discovering matching dependencies. CoRR, abs/ , repairing using updates. ACM Trans. Syst., 30(3): , query answering under primary keys: a characterization of tractable queries. In Fagin [Fag09], pages

66 References V () Keys. On the consistent rewriting of conjunctive queries under primary key constraints. Inf. Syst., 34(7): , 2009.

Consistent Query Answering

Consistent Query Answering Consistent Query Answering Sławek Staworko 1 University of Lille INRIA Mostrare Project DEIS 2010 November 9, 2010 1 Some slides are due to [Cho07] Sławek Staworko (Mostrare) CQA DEIS 2010 1 / 33 Overview

More information

Consistent Query Answering

Consistent Query Answering Consistent Query Answering Opportunities and Limitations Jan Chomicki Dept. CSE University at Buffalo State University of New York http://www.cse.buffalo.edu/ chomicki 1 Integrity constraints Integrity

More information

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies Jef Wijsen Université de Mons-Hainaut, Mons, Belgium, jef.wijsen@umh.ac.be, WWW home page: http://staff.umh.ac.be/wijsen.jef/

More information

Static, Incremental and Parameterized Complexity of Consistent Query Answering in Databases Under Cardinality-Based Semantics

Static, Incremental and Parameterized Complexity of Consistent Query Answering in Databases Under Cardinality-Based Semantics Static, Incremental and Parameterized Complexity of Consistent Query Answering in Databases Under Cardinality-Based Semantics Leopoldo Bertossi Carleton University Ottawa, Canada Based in part on join

More information

Consistent Query Answering: Opportunities and Limitations

Consistent Query Answering: Opportunities and Limitations Consistent Query Answering: Opportunities and Limitations Jan Chomicki Dept. Computer Science and Engineering University at Buffalo, SUNY Buffalo, NY 14260-2000, USA chomicki@buffalo.edu Abstract This

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Hippo: A System for Computing Consistent Answers to a Class of SQL Queries Citation for published version: Chomicki, J, Marcinkowski, J & Staworko, S 2004, Hippo: A System for

More information

Approximation Algorithms for Computing Certain Answers over Incomplete Databases

Approximation Algorithms for Computing Certain Answers over Incomplete Databases Approximation Algorithms for Computing Certain Answers over Incomplete Databases Sergio Greco, Cristian Molinaro, and Irina Trubitsyna {greco,cmolinaro,trubitsyna}@dimes.unical.it DIMES, Università della

More information

INCONSISTENT DATABASES

INCONSISTENT DATABASES INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity

More information

Data integration lecture 3

Data integration lecture 3 PhD course on View-based query processing Data integration lecture 3 Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza {rosati}@dis.uniroma1.it Corso di Dottorato

More information

Database Consistency: Logic-Based Approaches

Database Consistency: Logic-Based Approaches Database Consistency: Logic-Based Approaches Jan Chomicki 1 Wenfei Fan 2 1 University at Buffalo and Warsaw University 2 University of Edinburgh April 27-28, 2007 Plan of the course 1 Integrity constraints

More information

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases Jan Chomicki 1 and Jerzy Marcinkowski 2 1 Dept. of Computer Science and Engineering University at Buffalo

More information

arxiv: v1 [cs.db] 23 May 2016

arxiv: v1 [cs.db] 23 May 2016 Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics (extended version) arxiv:1605.07159v1 [cs.db] 23 May 2016 Andrei Lopatenko Free University

More information

Preference-Driven Querying of Inconsistent Relational Databases

Preference-Driven Querying of Inconsistent Relational Databases Preference-Driven Querying of Inconsistent Relational Databases Slawomir Staworko 1, Jan Chomicki 1, and Jerzy Marcinkowski 2 1 University at Buffalo, {staworko,chomicki}@cse.buffalo.edu 2 Wroclaw University

More information

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 Foundations of Data Exchange and Metadata Management Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 The need for a formal definition We had a paper with Ron in PODS 2004 Back then I was a Ph.D.

More information

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems DATABASE THEORY Lecture 18: Dependencies Markus Krötzsch Knowledge-Based Systems TU Dresden, 3rd July 2018 Review: Databases and their schemas Lines: Line Type 85 bus 3 tram F1 ferry...... Stops: SID Stop

More information

arxiv:cs/ v1 [cs.db] 5 Apr 2002

arxiv:cs/ v1 [cs.db] 5 Apr 2002 On the Computational Complexity of Consistent Query Answers arxiv:cs/0204010v1 [cs.db] 5 Apr 2002 1 Introduction Jan Chomicki Jerzy Marcinkowski Dept. CSE Instytut Informatyki University at Buffalo Wroclaw

More information

Consistent Query Answering in Databases

Consistent Query Answering in Databases Consistent Query Answering in Databases Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada. bertossi@scs.carleton.ca 1 Introduction For several reasons databases may become

More information

Inconsistency-tolerant logics

Inconsistency-tolerant logics Inconsistency-tolerant logics CS 157 Computational Logic Autumn 2010 Inconsistent logical theories T 1 = { p(a), p(a) } T 2 = { x(p(x) q(x)), p(a), q(a) } Definition: A theory T is inconsistent if T has

More information

Consistent Query Answering for Atemporal Constraints over Temporal Databases

Consistent Query Answering for Atemporal Constraints over Temporal Databases Consistent Query Answering for Atemporal Constraints over Temporal Databases Jan Chomicki Dept. of Computer Science and Engineering SUNY at Buffalo Buffalo, NY, USA chomicki@buffalo.edu Jef Wijsen Département

More information

Structural characterizations of schema mapping languages

Structural characterizations of schema mapping languages Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema

More information

Minimal-Change Integrity Maintenance Using Tuple Deletions

Minimal-Change Integrity Maintenance Using Tuple Deletions Minimal-Change Integrity Maintenance Using Tuple Deletions Jan Chomicki University at Buffalo Dept. CSE chomicki@cse.buffalo.edu Jerzy Marcinkowski Wroclaw University Instytut Informatyki jma@ii.uni.wroc.pl

More information

Logical Foundations of Relational Data Exchange

Logical Foundations of Relational Data Exchange Logical Foundations of Relational Data Exchange Pablo Barceló Department of Computer Science, University of Chile pbarcelo@dcc.uchile.cl 1 Introduction Data exchange has been defined as the problem of

More information

On the Hardness of Counting the Solutions of SPARQL Queries

On the Hardness of Counting the Solutions of SPARQL Queries On the Hardness of Counting the Solutions of SPARQL Queries Reinhard Pichler and Sebastian Skritek Vienna University of Technology, Faculty of Informatics {pichler,skritek}@dbai.tuwien.ac.at 1 Introduction

More information

Managing Inconsistencies in Collaborative Data Management

Managing Inconsistencies in Collaborative Data Management Managing Inconsistencies in Collaborative Data Management Eric Kao Logic Group Computer Science Department Stanford University Talk given at HP Labs on November 9, 2010 Structured Data Public Sources Company

More information

Uncertainty in Databases. Lecture 2: Essential Database Foundations

Uncertainty in Databases. Lecture 2: Essential Database Foundations Uncertainty in Databases Lecture 2: Essential Database Foundations Table of Contents 1 2 3 4 5 6 Table of Contents Codd s Vision Codd Catches On Top Academic Recognition Selected Publication Venues 1 2

More information

Dependencies Revisited for Improving Data Quality

Dependencies Revisited for Improving Data Quality Dependencies Revisited for Improving Data Quality Wenfei Fan University of Edinburgh & Bell Laboratories Wenfei Fan Dependencies Revisited for Improving Data Quality 1 / 70 Real-world data is often dirty

More information

The Semantics of Consistency and Trust in Peer Data Exchange Systems

The Semantics of Consistency and Trust in Peer Data Exchange Systems The Semantics of Consistency and Trust in Peer Data Exchange Systems Leopoldo Bertossi 1 and Loreto Bravo 2 1 Carleton University, School of Computer Science, Ottawa, Canada. bertossi@scs.carleton.ca 2

More information

Three easy pieces on schema mappings for tree-structured data

Three easy pieces on schema mappings for tree-structured data Three easy pieces on schema mappings for tree-structured data Claire David 1 and Filip Murlak 2 1 Université Paris-Est Marne-la-Vallée 2 University of Warsaw Abstract. Schema mappings specify how data

More information

Provable data privacy

Provable data privacy Provable data privacy Kilian Stoffel 1 and Thomas Studer 2 1 Université de Neuchâtel, Pierre-à-Mazel 7, CH-2000 Neuchâtel, Switzerland kilian.stoffel@unine.ch 2 Institut für Informatik und angewandte Mathematik,

More information

Query Evaluation on Probabilistic Databases

Query Evaluation on Probabilistic Databases Query Evaluation on Probabilistic Databases Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington 1 The Probabilistic Data In this paper we consider the query evaluation problem: how can

More information

Query Evaluation on Probabilistic Databases

Query Evaluation on Probabilistic Databases Query Evaluation on Probabilistic Databases Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington 1 The Probabilistic Data In this paper we consider the query evaluation problem: how can

More information

Computing Query Answers with Consistent Support

Computing Query Answers with Consistent Support Computing Query Answers with Consistent Support Jui-Yi Kao Advised by: Stanford University Michael Genesereth Inconsistency in Databases If the data in a database violates the applicable ICs, we say the

More information

Logic Programs for Consistently Querying Data Integration Systems

Logic Programs for Consistently Querying Data Integration Systems Logic Programs for Consistently Querying Data Integration Systems Loreto Bravo Pontificia Universidad Católica de Chile Departamento de Ciencia de Computación Santiago, Chile. lbravo@ing.puc.cl Leopoldo

More information

Updating data and knowledge bases

Updating data and knowledge bases Updating data and knowledge bases Inconsistency management in data and knowledge bases (2013) Antonella Poggi Sapienza Università di Roma Inconsistency management in data and knowledge bases (2013) Rome,

More information

The Most Probable Database Problem

The Most Probable Database Problem The Most Probable Database Problem Eric Gribkoff University of Washington eagribko@cs.uw.edu Guy Van den Broeck KU Leuven, UCLA guyvdb@cs.ucla.edu Dan Suciu University of Washington suciu@cs.uw.edu ABSTRACT

More information

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution Leopoldo Bertossi Carleton University School of Computer Science Institute for Data Science Ottawa, Canada bertossi@scs.carleton.ca

More information

Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints

Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints Leopoldo Bertossi 1,LoretoBravo 1, Enrico Franconi 2, and Andrei Lopatenko 2, 1 Carleton University,

More information

The Inverse of a Schema Mapping

The Inverse of a Schema Mapping The Inverse of a Schema Mapping Jorge Pérez Department of Computer Science, Universidad de Chile Blanco Encalada 2120, Santiago, Chile jperez@dcc.uchile.cl Abstract The inversion of schema mappings has

More information

Certain Answers as Objects and Knowledge

Certain Answers as Objects and Knowledge Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning Certain Answers as Objects and Knowledge Leonid Libkin School of Informatics, University of

More information

Data integration lecture 2

Data integration lecture 2 PhD course on View-based query processing Data integration lecture 2 Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza {rosati}@dis.uniroma1.it Corso di Dottorato

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

Detecting Logical Errors in SQL Queries

Detecting Logical Errors in SQL Queries Detecting Logical Errors in SQL Queries Stefan Brass Christian Goldberg Martin-Luther-Universität Halle-Wittenberg, Institut für Informatik, Von-Seckendorff-Platz 1, D-06099 Halle (Saale), Germany (brass

More information

Checking Containment of Schema Mappings (Preliminary Report)

Checking Containment of Schema Mappings (Preliminary Report) Checking Containment of Schema Mappings (Preliminary Report) Andrea Calì 3,1 and Riccardo Torlone 2 Oxford-Man Institute of Quantitative Finance, University of Oxford, UK Dip. di Informatica e Automazione,

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. We study the schema exchange

More information

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University. COCS 6421 Advanced Database Systems CSE, York University March 20, 2008 Agenda 1 Problem description Problems 2 3 Open questions and future work Conclusion Bibliography Problem description Problems Why

More information

Logical Aspects of Massively Parallel and Distributed Systems

Logical Aspects of Massively Parallel and Distributed Systems Logical Aspects of Massively Parallel and Distributed Systems Frank Neven Hasselt University PODS Tutorial June 29, 2016 PODS June 29, 2016 1 / 62 Logical aspects of massively parallel and distributed

More information

a standard database system

a standard database system user queries (RA, SQL, etc.) relational database Flight origin destination airline Airport code city VIE LHR BA VIE Vienna LHR EDI BA LHR London LGW GLA U2 LGW London LCA VIE OS LCA Larnaca a standard

More information

Query Answering in Peer-to-Peer Data Exchange Systems

Query Answering in Peer-to-Peer Data Exchange Systems Query Answering in Peer-to-Peer Data Exchange Systems Leopoldo Bertossi and Loreto Bravo Carleton University, School of Computer Science, Ottawa, Canada. {bertossi,lbravo}@scs.carleton.ca Abstract. The

More information

A Comprehensive Semantic Framework for Data Integration Systems

A Comprehensive Semantic Framework for Data Integration Systems A Comprehensive Semantic Framework for Data Integration Systems Andrea Calì 1, Domenico Lembo 2, and Riccardo Rosati 2 1 Faculty of Computer Science Free University of Bolzano/Bozen, Italy cali@inf.unibz.it

More information

Scalar Aggregation in Inconsistent Databases

Scalar Aggregation in Inconsistent Databases Scalar Aggregation in Inconsistent Databases Marcelo Arenas Dept. of Computer Science University of Toronto marenas@cs.toronto.edu Jan Chomicki Dept. CSE University at Buffalo chomicki@cse.buffalo.edu

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. In this paper we study the

More information

Validity-Sensitive Querying of XML Databases

Validity-Sensitive Querying of XML Databases Validity-Sensitive Querying of XML Databases Slawomir Staworko and Jan Chomicki University at Buffalo, {staworko,chomicki}@cse.buffalo.edu Abstract. We consider the problem of querying XML documents which

More information

A Retrospective on Datalog 1.0

A Retrospective on Datalog 1.0 A Retrospective on Datalog 1.0 Phokion G. Kolaitis UC Santa Cruz and IBM Research - Almaden Datalog 2.0 Vienna, September 2012 2 / 79 A Brief History of Datalog In the beginning of time, there was E.F.

More information

Logic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden

Logic and Databases. Lecture 4 - Part 2. Phokion G. Kolaitis. UC Santa Cruz & IBM Research - Almaden Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research - Almaden Lecture 4 - Part 2 2 / 17 Alternative Semantics of Queries Bag Semantics We focused on the containment problem for conjunctive

More information

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Database Theory VU , SS Codd s Theorem. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2011 3. Codd s Theorem Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 29 March, 2011 Pichler 29 March,

More information

The Relational Model

The Relational Model The Relational Model David Toman School of Computer Science University of Waterloo Introduction to Databases CS348 David Toman (University of Waterloo) The Relational Model 1 / 28 The Relational Model

More information

Bibliographic citation

Bibliographic citation Bibliographic citation Andrea Calì, Georg Gottlob, Andreas Pieris: Tractable Query Answering over Conceptual Schemata. In Alberto H. F. Laender, Silvana Castano, Umeshwar Dayal, Fabio Casati, Jos Palazzo

More information

On Reconciling Data Exchange, Data Integration, and Peer Data Management

On Reconciling Data Exchange, Data Integration, and Peer Data Management On Reconciling Data Exchange, Data Integration, and Peer Data Management Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati Dipartimento di Informatica e Sistemistica Sapienza

More information

Uncertain Data Models

Uncertain Data Models Uncertain Data Models Christoph Koch EPFL Dan Olteanu University of Oxford SYNOMYMS data models for incomplete information, probabilistic data models, representation systems DEFINITION An uncertain data

More information

On the Role of Integrity Constraints in Data Integration

On the Role of Integrity Constraints in Data Integration On the Role of Integrity Constraints in Data Integration Andrea Calì, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

More information

Data Exchange: Semantics and Query Answering

Data Exchange: Semantics and Query Answering Data Exchange: Semantics and Query Answering Ronald Fagin Phokion G. Kolaitis Renée J. Miller Lucian Popa IBM Almaden Research Center fagin,lucian @almaden.ibm.com University of California at Santa Cruz

More information

Composing Schema Mapping

Composing Schema Mapping Composing Schema Mapping An Overview Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Joint work with R. Fagin, L. Popa, and W.C. Tan 1 Data Interoperability Data may reside at several different

More information

Scalable Data Exchange with Functional Dependencies

Scalable Data Exchange with Functional Dependencies Scalable Data Exchange with Functional Dependencies Bruno Marnette 1, 2 Giansalvatore Mecca 3 Paolo Papotti 4 1: Oxford University Computing Laboratory Oxford, UK 2: INRIA Saclay, Webdam Orsay, France

More information

XXXII Conference on Very Large Data Bases VLDB 2006 Seoul, Korea, 15 th September 2006

XXXII Conference on Very Large Data Bases VLDB 2006 Seoul, Korea, 15 th September 2006 Andrea Calì Faculty of Computer Science Free University of Bolzano State University of New York at Stony Brook XXXII Conference on Very Large Data Bases VLDB 2006 Seoul, Korea, 15 th September 2006 F-Logic

More information

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2011 1. Introduction: Relational Query Languages Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 8 March,

More information

Structural Characterizations of Schema-Mapping Languages

Structural Characterizations of Schema-Mapping Languages Structural Characterizations of Schema-Mapping Languages Balder ten Cate University of Amsterdam and UC Santa Cruz balder.tencate@uva.nl Phokion G. Kolaitis UC Santa Cruz and IBM Almaden kolaitis@cs.ucsc.edu

More information

a standard database system

a standard database system user queries (RA, SQL, etc.) relational database Flight origin destination airline Airport code city VIE LHR BA VIE Vienna LHR EDI BA LHR London LGW GLA U2 LGW London LCA VIE OS LCA Larnaca a standard

More information

Data Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza

Data Integration 1. Giuseppe De Giacomo. Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza Data Integration 1 Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza View-based query processing Diego Calvanese, Giuseppe De Giacomo, Georg

More information

On the Data Complexity of Consistent Query Answering over Graph Databases

On the Data Complexity of Consistent Query Answering over Graph Databases On the Data Complexity of Consistent Query Answering over Graph Databases Pablo Barceló and Gaëlle Fontaine Department of Computer Science University of Chile pbarcelo@dcc.uchile.cl, gaelle@dcc.uchile.cl

More information

Data Quality Problems beyond Consistency and Deduplication

Data Quality Problems beyond Consistency and Deduplication Data Quality Problems beyond Consistency and Deduplication Wenfei Fan Floris Geerts Shuai Ma Nan Tang Wenyuan Yu University of Edinburgh {wenfei@inf., fgeerts@inf., sma1@inf., ntang@inf., wenyuan.yu@}ed.ac.uk

More information

38050 Povo Trento (Italy), Via Sommarive 14 THE CODB ROBUST PEER-TO-PEER DATABASE SYSTEM

38050 Povo Trento (Italy), Via Sommarive 14  THE CODB ROBUST PEER-TO-PEER DATABASE SYSTEM UNIVERSITY OF TRENTO DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY 38050 Povo Trento (Italy), Via Sommarive 14 http://www.dit.unitn.it THE CODB ROBUST PEER-TO-PEER DATABASE SYSTEM Enrico Franconi,

More information

Kanata: Adaptation and Evolution in Data Sharing Systems

Kanata: Adaptation and Evolution in Data Sharing Systems Kanata: Adaptation and Evolution in Data Sharing Systems Periklis Andritsos Ariel Fuxman Anastasios Kementsietsidis Renée J. Miller Yannis Velegrakis Department of Computer Science University of Toronto

More information

Foundations of Schema Mapping Management

Foundations of Schema Mapping Management Foundations of Schema Mapping Management Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile University of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk

More information

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons Foto Afrati 1, Rada Chirkova 2, Manolis Gergatsoulis 3, and Vassia Pavlaki 1 1 Department of Electrical and Computing Engineering,

More information

Relative Information Completeness

Relative Information Completeness Relative Information Completeness Abstract Wenfei Fan University of Edinburgh & Bell Labs wenfei@inf.ed.ac.uk The paper investigates the question of whether a partially closed database has complete information

More information

Data Quality Problems beyond Consistency and Deduplication

Data Quality Problems beyond Consistency and Deduplication Data Quality Problems beyond Consistency and Deduplication Wenfei Fan Floris Geerts Shuai Ma Nan Tang Wenyuan Yu University of Edinburgh {wenfei@inf., fgeerts@inf., sma1@inf., ntang@inf., wenyuan.yu@}ed.ac.uk

More information

Optimized encodings for Consistent Query Answering via ASP from different perspectives

Optimized encodings for Consistent Query Answering via ASP from different perspectives Optimized encodings for Consistent Query Answering via ASP from different perspectives Marco Manna, Francesco Ricca, and Giorgio Terracina Department of Mathematics, University of Calabria, Italy {manna,ricca,terracina}@mat.unical.it

More information

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2018 1. Introduction: Relational Query Languages Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 6 March,

More information

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages DBAI-TR-2012-76 UMAP: A Universal Layer for Schema Mapping Languages Florin Chertes and Ingo Feinerer Technische Universität Wien, Vienna, Austria Institut für Informationssysteme FlorinChertes@acm.org

More information

ANDREAS PIERIS JOURNAL PAPERS

ANDREAS PIERIS JOURNAL PAPERS ANDREAS PIERIS School of Informatics, University of Edinburgh Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK apieris@inf.ed.ac.uk PUBLICATIONS (authors in alphabetical order) JOURNAL

More information

Conditional Dependencies: A Principled Approach to Improving Data Quality

Conditional Dependencies: A Principled Approach to Improving Data Quality Conditional Dependencies: A Principled Approach to Improving Data Quality Wenfei Fan 1, Floris Geerts 2,andXibeiJia 2, 1 University of Edinburgh and Bell Laboratories 2 University of Edinburgh Abstract.

More information

Ontologies and Databases

Ontologies and Databases Ontologies and Databases Diego Calvanese KRDB Research Centre Free University of Bozen-Bolzano Reasoning Web Summer School 2009 September 3 4, 2009 Bressanone, Italy Overview of the Tutorial 1 Introduction

More information

From Database Repair Programs to Consistent Query Answering in Classical Logic (extended abstract)

From Database Repair Programs to Consistent Query Answering in Classical Logic (extended abstract) From Database Repair Programs to Consistent Query Answering in Classical Logic (extended abstract) Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada. bertossi@scs.carleton.ca

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

Conceptual Design. The Entity-Relationship (ER) Model

Conceptual Design. The Entity-Relationship (ER) Model Conceptual Design. The Entity-Relationship (ER) Model CS430/630 Lecture 12 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Database Design Overview Conceptual design The Entity-Relationship

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi Carleton University Ottawa, Canada bertossi@scs.carleton.ca Solmaz Kolahi University of British Columbia

More information

Handling Inconsistency through Effective Measurement of Referential

Handling Inconsistency through Effective Measurement of Referential Handling Inconsistency through Effective Measurement of Referential Dependencies in Databases 1 Abdollah Yousefzadeh, 2 Hrudaya Ku Tripathy 1 School of Computing and Technology Asia Pacific University

More information

Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity

Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity Loreto Bravo 1, Wenfei Fan 1,2, Floris Geerts 1, Shuai Ma 1 1 School of Informatics, University of Edinburgh,

More information

Semantic Errors in Database Queries

Semantic Errors in Database Queries Semantic Errors in Database Queries 1 Semantic Errors in Database Queries Stefan Brass TU Clausthal, Germany From April: University of Halle, Germany Semantic Errors in Database Queries 2 Classification

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data

More information

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

PCP and Hardness of Approximation

PCP and Hardness of Approximation PCP and Hardness of Approximation January 30, 2009 Our goal herein is to define and prove basic concepts regarding hardness of approximation. We will state but obviously not prove a PCP theorem as a starting

More information

Multisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries?

Multisets and Duplicates. SQL: Duplicate Semantics and NULL Values. How does this impact Queries? Multisets and Duplicates SQL: Duplicate Semantics and NULL Values Fall 2015 SQL uses a MULTISET/BAG semantics rather than a SET semantics: SQL tables are multisets of tuples originally for efficiency reasons

More information

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi

More information

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob 1,2, Reinhard Pichler 1, and Emanuel Sallinger 2 1 TU Wien and 2 University of Oxford Tuple-generating

More information

following syntax: R ::= > n j P j $i=n : C j :R j R 1 u R 2 C ::= > 1 j A j :C j C 1 u C 2 j 9[$i]R j (» k [$i]r) where i and j denote components of r

following syntax: R ::= > n j P j $i=n : C j :R j R 1 u R 2 C ::= > 1 j A j :C j C 1 u C 2 j 9[$i]R j (» k [$i]r) where i and j denote components of r Answering Queries Using Views in Description Logics Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini Dipartimento di Informatica e Sistemistica, Universit a di Roma La Sapienza" Via Salaria 113,

More information

Advanced Query Processing

Advanced Query Processing Advanced Query Processing CSEP544 Nov. 2015 CSEP544 Optimal Sequential Algorithms Nov. 2015 1 / 28 Lecture 9: Advanced Query Processing Optimal Sequential Algorithms. Semijoin Reduction Optimal Parallel

More information