Propagating Dependencies under Schema Mappings A Graph-based Approach

Size: px
Start display at page:

Download "Propagating Dependencies under Schema Mappings A Graph-based Approach"

Transcription

1 Propagating Dependencies under Schema Mappings A Graph-based Approach ABSTRACT Qing Wang Research School of Computer Science Australian National University Canberra ACT 0200, Australia qing.wang@anu.edu.au Schema mapping plays an important role in many databaserelated transformation tasks, such as data exchange, data integration and data migration. In this paper, we study the dependency propagation problem in the context of schema mappings. This allows us to understand and discover logical consequences among source constraints, target constraints and mapping constraints of a schema mapping. In order to precisely characterize the relationships between source and target schemas, we consider mapping constraints as being bipartite TGDs, i.e., a class of tuple-generating dependencies (TGDs) that include both source-to-target dependencies and target-to-source dependencies. We then develop propagation graphs to represent the relationships among the attributes of different relations and, based on such propagation graphs, we propose algorithms to propagate inclusion and functional dependencies between source and target schemas. We have also designed a schema mapping reasoning tool to implement and evaluate our proposed approach. Categories and Subject Descriptors H.2.1 [Information Systems]: Database Management Logical Design Keywords Schema Mappings, Dependency Propagation, Data Integration, Data Dependencies 1. INTRODUCTION A schema mapping is concerned with specifying relationships between the elements of a source schema and a target schema. It plays an important role in many database-related Some of the work reported in this paper was undertaken when the second author was visiting the Research School of Computer Science, Australian National University. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. IDEAS 15, July 13-15, 2015, Yokohama, Japan Copyright c 2015 ACM /15/07...$ Xi Wen Department of Computer Science Nanchang University Nanchang City, China wenxi@ncu.edu.cn transformation tasks, such as data exchange [12, 25], data integration [26] and data migration [32, 33]. As the relationships between two schemas are often quite complicated, not just one-to-one correspondence at the schema level, specifying such relationships is by no means an easy task. Generally, two lines of research exist in the area of designing schema mappings: one is to generate schema mappings from a visual specification provided by users, and the other is to derive schema mappings based on data examples. The former has been traditionally studied for many years [6, 12, 22, 25], while the latter has attracted considerable interest in recent years [1, 2, 21, 30]. Nevertheless, these works also have some limitations. For instance, a visual specification is often ambiguous, causing difficulties to generate a schema mapping as desired. Data examples may not always be available, or even if available, could be biased, leading to deriving schema mappings inaccurately. To remedy these deficiencies, existing approaches either require a manual process of tuning schema mappings, which is often laborious and error-prone, or demand more data examples for improving accuracy. Despite these attempts, the resulting design quality of schema mappings is still far from ideal. In this paper, we aim to develop an approach that helps understand how well a schema mapping is designed, including to answer the following questions of interest: Can we ensure certain properties of a source database to be preserved in the desired target database through the design of a schema mapping? Can we determine whether or not a target constraint can be enforced on a target database before the target database is transformed from a given source database? If some target constraints cannot be enforced, can we efficiently identify which data in the source database need to be cleansed, or determine whether the schema mapping and target constraints need to be re-designed? In many real-life applications, implementing a schema mapping to materialize a target database is an expensive undertaking, especially when the source database has a large amount of data. Therefore, it is crucial to check, if possible, whether a schema mapping is designed meaningfully and effectively in advance, before an implementation takes place. Related work. Research on schema mappings has received a great deal of attention over the past decades [3, 6, 11,

2 12, 25, 31]. In relating to the universal target solution and query answering problems, high-level specification languages used for schema mappings have been investigated [3, 6]. In viewing schema mappings as metadata, the composition and inverse operations of schema mappings have been developed in [4, 11, 14, 27, 29]. Early works on generating schema mapping from a visual specification have led to the development of a good number of schema mapping design systems such as Clio [22], HePToX [7], and Altova MapForce 1. Recent works focused more on deriving schema mapping from data examples such as Eirene [2] and Muse [1]. Several works have also studied schema mappings in terms of optimization [13, 20], simplification [8], debugging [9] and learning [30]. Our work in this paper is to complement not replace these existing techniques by providing an approach for efficiently analyzing the design of a schema mapping through dependency propagation from target to source, or from source to target under the schema mapping. Previously, several works have proposed some graphical approaches to represent TGDs and EGDs [10, 12, 28]. In [10] the authors studied the implication problem of functional and inclusion dependencies using a directed graph in which each node represents a relation and each edge represents an inclusion dependency between two relations. A set of inclusion dependencies is acyclic if such a graph has no cycle. The authors of [12] proposed a graphical approach to identify a class of TGDs that can guarantee the termination of a chase procedure, so-called weakly acyclic sets of TGDs. Our approach in this paper generalizes the graphical representations of these works by associating a relation schema with each vertex, and labelling each edge with a function between attributes of two relation schemas. Dependency propagation has previously been studied in the context of views by several works [18, 23, 24]. Such views may be defined over a database using different fragments of relational algebra, and dependencies that have been considered in these works were primarily (conditional) functional dependencies [18, 23] or join dependencies [24]. In particular, the authors of [18] generalised the results on propagating functional dependencies to conditional functional dependencies [17]. Different from dependency propagation on views that are unidirectional mappings, we study dependency propagation between two schemas that may have bidirectional mappings, and investigate the dependency propagation problem for improving the design quality of schema matching. Until recently, to our best knowledge, this area has not been explored yet in the literature. Contributions. This paper has the following contributions: We study schema mappings by allowing mapping constraints to be bipartite TGDs over source and target schemas, including both source-to-target and targetto-source TGDs. This enables us to accurately represent the relationships between two database schemas. We propose the notion of propagation graph, and use it as an effective model to visualize the relationship between source and target schemas. We can also navigate through a propagation graph to analyze logical connections among source, target and mapping constraints when designing a schema mapping. 1 We investigate the dependency propagation problem in the context of schema mapping. Based on propagation graphs, we develop algorithms to propagate dependencies between source and target schemas under a schema mapping. It is well-known that the implication problem for TGDs is undecidable [5]. Our graphbased approach provides an approximate but efficient solution to this problem. We have developed a bipartite schema mapping tool to visualize propagation graphs, and on top of propagation graphs, to incorporate our propagation algorithms for inclusion and functional dependencies. We have applied our schema mapping tool over two schema mapping data sets to evaluate its usability in real-world applications. Outline. The remainder of the paper is structured as follows. Section 2 provides an example to illustrate why dependency propagation is needed in schema mappings. Section 3 presents the basic definitions related to schema mappings and dependency propagation. In Section 4 we describe a graphical model that captures the inter-relationships among TGDs, and discuss the algorithms for propagating inclusion and functional dependencies. The experimental study is presented in Section 5. We conclude the paper in Section 6 with an outlook to future work. 2. A MOTIVATING EXAMPLE Generally, a schema mapping over a source schema S and a target schema T is associated with three kinds of constraints: source constraints over S, target constraints over T, and mapping constraints over S and T. Source and target constraints govern the integrity of data stored in source and target databases, respectively, while mapping constraints control the transformations between them. Given such a schema mapping, a natural question is: How can we discover logical consequences among its source, target and mapping constraints? An answer to this question would provide us a conceptual view on how well a schema mapping is designed, i.e., how much semantics specified by the source schema is preserved by the target schema, or conversely, how much semantics specified by the target schema is captured by the source schema. The following example illustrates that in real-world applications, in order to find out how well a schema mapping is designed, it is often desired to compare the source and target constraints in terms of a given set of mapping constraints. Example 2.1. Suppose that we have the following schema mapping over a source schema S and a target schema T in a App application. (1) S contains three relation schemas: Rent(id, name, address), Rent (no, address, rent), All(name, dob, gender, cid), and source constraints: Σ s =.

3 Rent id name address c1 Tim Jenkin 5 Jicket St, Dunedin c2 Linda Lee 36 Novar St, Dunedin c3 Mike Carl 2 Manor St, Dunedin Rent no address rent 1 5 Jicket St, Dunedin Jicket St, Dunedin Manor St, Dunedin 450 All name dob gender cid Linda Lee 30/Mar/1990 f c2 Mike Carl 15/Jun/1884 m c3 Peter Wong 01/Jan/1880 m c4 Figure 1: A source instance I over the source schema S id c1 c2 c3 name Tim Jenkin Linda Lee Mike Carl no address 1 5 Jicket St, Dunedin 2 2 Manor St, Dunedin id no rent c c Figure 2: A target instance over the target schema T (C1) x, y, z.(rent(x, y, z) (x, y)); (C2) x, y, z, x, z.(rent(x, y, z) Rent(x, z, z ) (x, x, z )); (C3) x, y, z.(rent(x, y, z) x.(x, y) (x, x, z)). (C4) x, y, z.((x, y, z) z.rent(y, z, z)); (C5) x, y, z.((x, y, z) y, z.rent(x, y, z )); (C6) (x, y, z) x, y, z.all(x, y, z, x). Figure 3: Mapping constraints C1 C6 (2) T also contains three relation schemas: (no, address), (id, name), (id, no, rent), and target constraints: Σ t = { : no rent, [id] [id], [no] [no]}, where : no rent is a functional dependency defined on, and the others are inclusion dependencies. (3) Mapping constraints between S and T are presented in Figure 3, which contain: (i) C1 C3, which specify how source instances are transformed into target instances, and (ii) C4-C6, which specify how target instances relate to source instances. For example, C2 states that whenever a tuple (i, n, a) in Rent and a tuple (o, a, r) in Rent coincide on the attribute address, then there must be a tuple (i, o, r) in ; (C3) states that each tuple (o, a, r) in Rent leads to a tuple (o, a) in and a tuple (i, o, r) in where i is a value unknown. In the presence of C1 C6, it would be desired to find a set of constraints Σ s (resp. Σ t) that can be automatically propagated from the target schema to the source schema (resp. the source schema to the target schema), as shown below: Σ s = {Rent : no rent}, Σ t = {[id] [id], [no] [no]}. Then, by comparing Σ s (i.e., propagated source constraints under the given schema mapping) and Σ s (i.e., original source constraints), we would know that some source instance might violate the constraint Rent : no rent, e.g. the source instance in Figure 1. It means that either data in Rent need to be cleansed, or this constraint needs to be reconsidered. Suppose that we clean up Rent by removing the first tuple. Then we would obtain the target instance depicted in Figure 2. Similarly, by comparing Σ t (i.e., propagated target constraints under the given schema mapping) and Σ t (i.e., original target constraints), we would know that the two inclusion dependencies in Σ t can hold on every target instance under this schema mapping. We will discuss how Σ s and Σ t can be obtained through dependency propagation under the given schema mapping in Section PRELIMINARIES A (relational) database schema S consists of a finite, nonempty set of relation schemas. Each relation schema R has a fixed arity, and a finite set attr(r) of attribute names. A relational atom is an expression of the form R(t 1,..., t n) for n-ary R S, and an equality atom is an expression of the form t 1 = t 2, where t 1,..., t n are variables or constants. A constraint (so-called embedded dependency [16]) σ over S is an expression of the form x, ȳ.(ϕ( x, ȳ) z.ψ( x, z)), where ϕ and ψ are conjunctions of atoms over S, and x, ȳ and z are mutually disjoint variables. If only relational atoms occur in ψ, σ is called a tuple-generating dependency

4 R Figure 4: A source instance I P (a) J 1 Q P (b) J 2 Q P (c) J 3 Q 2 3 Figure 5: Three possible target instances J 1, J 2 and J 3 (TGD). If only equality atoms occur in ψ, σ is called an equality-generating dependency (EGD). For brevity, the universal qualification is often omitted in σ. We call ϕ the premise of σ, i.e. P re(σ) = ϕ( x, ȳ), and ψ the conclusion of σ, i.e. Con(σ) = ψ( x, z). A functional dependency (FD) defined on a relation schema R is an EGD expressed as R : X Y for X, Y attr(r) [16] (i.e., whenever two tuples of R agree on attributes X, then they must also agree on attributes Y ). An inclusion dependency (IND) defined on two relation schemas R 1 and R 2 is a TGD expressed as R 1[X] R 2[Y ] for X attr(r 1), Y attr(r 2) and X = Y (i.e., whenever a tuple t 1 occurs in R 1, there must exist a tuple t 2 in R 2 such that the values in attributes X of t 1 are the same as the values in attributes Y of t 2). A (relational) database instance I over S assigns to each relation schema R S a finite relation I(R). As a convention, we use inst(s) to refer to the set of all database instances over S, and I = σ to denote that a database instance I inst(s) satisfies σ. We have I = Σ iff I = σ for every σ Σ. A schema mapping is a triple M = (S, T, Σ m) consisting of a source schema S, a target schema T and a set Σ m of mapping constraints specified by some logical formalism over S and T. Instances of S are called source instances and instances of T are target instances. Similarly, constraints over S are called source constraints and constraints over T are called target constraints. In previous studies of data integration and data exchange, mapping constraints are typically formulated as source-to-target TGDs [3, 12, 31] or certain subclasses of source-to-target TGDs such as LAV, GAV and GLAV [26]. A source-to-target TGD is a TGD in which ϕ is a conjunction of atoms over S and ψ is a conjunction of atoms over T. Conversely, a target-to-source TGD is a TGD in which ϕ is a conjunction of atoms over T and ψ is a conjunction of atoms over S. Target-to-source TGDs are often used in inverting schema mappings, such as Fagin s S-inverse [11] and quasi-inverse [15]. In this paper, we consider source and target constraints as the union of a set of INDs and a set of FDs, and mapping constraints as including source-to-target and also target-tosource TGDs. To express this formally, we say that each mapping constraint is a bipartite TGD over S and T, which is either a source-to-target TGD or a target-to-source TGD over S and T. Let (I, J) = Σ m denote that a source instance I inst(s) and a target instance J inst(t ) satisfy all mapping constraints in Σ m. A model transformation under M = (S, T, Σ m) translates a given source instance I inst(s) into a target instance J inst(t ) such that (I, J) satisfies every mapping constraint in Σ m, and inst(m) = {(I, J) (I, J) = Σ m}. The co-existence of target-to-source and source-to-target TGDs in Σ m gives us the flexility to precisely capture the known relationship between source and target instances. Targetto-source TGDs can constrain target instances by working simultaneously with source-to-target TGDs specified in the same Σ m, as illustrated by the following example. Example 3.1. Consider a source schema S = {R} and a target schema T = {P, Q}. Figure 4 depicts a source instance I over S and Figure 5 depicts three possible target instances J 1, J 2 and J 3 over T. (1) If we have a schema mapping M 1 = (S, T, Σ 1) where Σ 1 contains R(x, y, z) P (x, y) Q(y, z), then (I, J i) inst(m 1) for i = 1, 2, 3. (2) If we have a schema mapping M 2 = (S, T, Σ 2) where Σ 2 contains R(x, y, z) P (x, y) Q(y, z); R(x, y, z) P (x, y) Q(y, z), then (I, J i) inst(m 2) for i = 2, 3, but (I, J 1) inst(m 2). (3) If we have a schema mapping M 3 = (S, T, Σ 3) where Σ 3 contains R(x, y, z) P (x, y) Q(y, z); P (x, y) z.r(x, y, z); Q(y, z) x.r(x, y, z), then (I, J 3) inst(m 3), but (I, J i) inst(m 3) for i = 1, GRAPH-BASED DEPENDENCY PROPA- GATION In this section we develop a graphical model for describing the inter-relationships among TGDs. Then, based on this graphical model, we propose algorithms for propagating inclusion and functional dependencies across two schemas under a schema mapping.

5 Rent All Rent f 1 f 4 f 2,1 f 6 f 5 f 3,2 f 3,1 g 1 g 2 Edges Labels of Edges Rent f1 f 1 : 1 1, 2 2. Rent f2,1 f 2,1 : 1 1. Rent f2,2 f 2,2 : 1 2, 3 3. Rent f3,1 f 3,1 : 1 1, 2 2. Rent f3,2 f 3,2 : 1 2, 3 3. f4 Rent f 4 : 1 1. f5 Rent f 5 : 2 1, 3 3. f6 All f 6 : 1 4. g1 g 1 : 1 1. g2 g 2 : 2 1. Rent f 2,1 4.1 Propagation Graphs Formally, a propagation graph G = (V, E) consists of a set V of vertices and a set E All f 6 of edges, where each vertex R V is a relation schema, and each edge R f R E is directed and labelled by a function f : attr(r) attr(r ). Given a set Σ of TGDs, the propagation graph of Σ can be constructed by applying the follow rules for each σ ϕ( x, ȳ) z.ψ( x, z) in Σ: f 1 Rent (1) Add an edge R f R for each relational atom R( x, ȳ ) f 4 in P re(σ) and each relational atom R ( x, z ) in Con(σ), where x, x x, ȳ ȳ, and z z, and the edge is labelled by f : attr(r) attr(r ) such that t and f(t) refer to the same variable in x x. (2) If P re(σ) contains more than one relational atom, all the edges yielded by σ are of type approximate ; otherwise, edges are of type exact. An approximate edge is removed from the graph when there exists another exact edge with the same start vertex, end vertex and label. f 5 Intuitively, an edge R f R represents that the existence of values in some attributes of R may require the existence Rent f 3,1 of values in some attributes of R, where the label f specifies which attributes of R and R are related. We distinguish two types of edges exact and approximate. For an approximate edge, the existence of values in some attributes of R does not always require the existence of the values in the corresponding attributes of R. Nevertheless, for an exact edge the existence of values in some attributes of R implies that the values must exist in the corresponding attributes of R. Definition 4.1. Given a propagation graph G, a propagation path in G is a sequence of edges R 1 f1 R 2,..., R n 1 fn 1 R n such that the composition of f 1,..., f n, denoted as f = f n 1 f 1, is a function that maps a non-empty subset of the attributes of R 1 into the attributes of R n. We call such a path is labelled by f. Note that, although every propagation path is a path (i.e., a sequence of edges) in a propagation graph, not every path in a propagation graph is a propagation path. For simplicity, we consider the attributes of a n-ary relation schema as being ordered, each having a distinct position between 1 and n. Consequently, every attribute of a relation schema R can simply be represented using its distinct position. Figure 6: A propagation graph Example 4.1. Consider the schema mapping discussed in Example 2.1. We may have the propagation graph depicted in Figure 6, where: C1 yields Rent f1. C2 yields Rent f2,1 and Rent f2,2. C3 yields Rent f3,1 and Rent f3,2. C4 yields f4 Rent. C5 yields f5 Rent. C6 yields f6 All. The two INDs in Σ t yield g1 and g2. In Figure 6, only Rent f2,1 is an approximate edge that is represented by a dashed line. The other edges are exact. Rent f2,2 is removed from the propagation graph because it is approximate and identical to the exact edge Rent f3,2. f4 Rent, Rent f1 is a propagation path with the label f = f 1 f 4 such that f(1) = 1. However, Rent f3,2, f6 All is not a propagation path because f 6 f 3,2 does not map any attributes of Rent to the attributes of All. In accordance with the rules of constructing propagation graphs, we thus have the following theorem. Theorem 4.1. Let Σ be a set of TGDs. The propagation graph of Σ can be constructed in linear time in the size of Σ, i.e., the number of constraints in Σ. 4.2 Propagating Algorithms In this section, we present our algorithms for propagating inclusion and functional dependencies across a schema mapping between source and target schemas. The algorithm for propagating inclusion dependencies is discussed in Section and the algorithm for propagating functional dependencies is discussed in Section Let M = (S, T, Σ m) be a schema mapping, which associates with a set Σ s of source constraints and a set Σ t of target constraints.

6 Input: a schema mapping M = (S, T, Σ m) and a set Σ s of source constraints Output: a set Σ t of INDs over T. Steps: 1. Initialize Σ t := ; 2. Construct the propagation graph G of Σ m Σ s; 3. Repeat the following for each propagation path between two R 1, R 2 T in G with the label f: If all edges in the propagation path are exact, then Σ t := Σ t {(R 1[X] R 2[Y ], exact)}, Otherwise Σ t := Σ t {(R 1[X] R 2[Y ], approximate)}, where f maps the attributes X of R 1 to the attributes Y of R 2 4. If {(R 1[X] R 2[Y ], exact), (R 1[X] R 2[Y ], approximate)} Λ, then Σ t := Σ t {(R 1[X] R 2[Y ], approximate)}, 5. Return Σ t Figure 7: Algorithm for propagating inclusion dependencies from S to T Definition 4.2. A constraint σ over T is said to be propagated from S to T under M if, for every model transformation (I, J) of M, whenever I satisfies Σ s, J must satisfy σ. Analogously, we can define constraints that are propagated from T to S under M. Since the same principles apply for propagating dependencies in either direction, we will only discuss the algorithms for propagating dependencies from S to T to avoid repetition Inclusion Dependencies Our algorithm for propagating inclusion dependencies is built upon the notion of propagation graph. Figure 7 depicts the algorithm that takes a schema mapping M and a set Σ s of source constraints as input to construct the propagation graph of Σ m Σ s and then generates a set of propagated inclusion dependencies over T. The key idea is that each propagation path from R 1 to R 2 in the target schema corresponds to an inclusion dependency R 1[X] R 2[Y ] between these two relation schemas. Note that, such an inclusion dependency may be associated with more than one propagation path from R 1 to R 2. Depending on the types of the edges occurring in these propagation paths, an inclusion dependency is either exact or approximate. More specifically, R 1[X] R 2[Y ] is exact if there exists at least one propagation path from R 1 to R 2 in which all the edges are exact, and R 1[X] R 2[Y ] is approximate if all the propagation paths from R 1 to R 2 contain at least one approximate edge. To resolve the ambiguities of approximate inclusion dependencies, we may involve human feedback using the following approach. For each approximate IND σ and each of its propagation paths, generate a pair σ, {ϕ 1 ψ 1,..., ϕ n ψ n} where ϕ i ψ i (i = 1,..., n) correspond to approximate edges occurring in the propagation path. Prompt the pair to users so that they can fine-tune the semantics of a schema mapping by confirming whether or not each ϕ i ψ i (1 i n) holds. If ϕ i ψ i does not hold, then mark the edge of ϕ i ψ i as forbidden (i.e., remove the propagation path). If σ has no propagation path left, then remove σ from Σ t. If ϕ i ψ i holds, then add ϕ i ψ i into Σ m. If σ has one propagation path with all exact edges, change σ to be exact in Σ t. In doing so, Σ t is refined to only contain exact inclusion dependencies, which are sound. Consequently, the specification of the schema mapping is fine-tuned to specify the desired transformations between source and target schemas. Example 4.2. Consider the schema mapping depicted in Example 4.1 and the propagation graph depicted in Figure 6. Using the algorithm presented in Figure 7, we can obtain Σ s = {(σ 1, approximate)} and Σ t = {(σ 2, exact), (σ 3, exact)}, where σ 1, σ 2 and σ 3, and the propagation paths for σ 1, σ 2 and σ 3 are depicted in Figure 8. Because σ 1 is approximate, σ 1, {Rent(x, y, z) x, z.(x, x, z )} is prompted to the users, where Rent(x, y, z) x, z.(x, x, z ) corresponds to the approximate edge Rent f2,1. If it holds, then we have Σ s = {(σ 1, exact)}. If it does not hold, we have Σ s = Functional Dependencies Now we discuss how to propagate functional dependencies under a schema mapping M = (S, T, Σ m). In the same spirit of computing a propagation cover of functional dependencies in the context of view dependencies [18, 19], the set of all functional dependencies implied by Σ s needs to be calculated first. The following example illustrates this in more detail. Example 4.3. Suppose that we have a relation schema R(A 1, A 2, A 3) in S, a relation schema R (B 1, B 2, B 3) in T, Σ s = {R : A 1 A 2, R : A 2 A 3} and Σ m = {R(x, y, z) y.r (x, y, z), R (x, y, z) y.r(x, y, z)}. By Σ m, we know that the values of the attributes B 1 and B 3 in R are identical to the values of the attributes A 1 and A 3 in R. If propagating FDs in Σ s onto R, then R : B 1 B 3 would

7 f 5 g 2 Rent f 3,2 f 3,1 All f 6 All f 6 Rent f 2,1 Rent f 1 f 4 Rent f 1 f 4 All f 6 (a) The propagation path for σ 1 (b) The propagation path for σ 2 Rent f 1 f 4 f 5 f 5 Rent f 3,1 Rent f 3,1 (c) The propagation path for σ 3 σ 1: Rent(x, y, z) x, y, z.all(x, y, z, x); f 5 σ 2: (x, y, z) y.(x, y ); Rent σ 3: (x, y, f 3,1 z) z.(y, z ), Figure 8: Three propagation paths in the propagation graph depicted in Figure 6 be lost. However, if propagating FDs in Σ s onto R, then we would have R : B 1 B 3 propagated from S to T by R : A 1 A 3. Let Σ s contain the set of all functional dependencies implied by the functional dependencies in Σ s. Then we use the following algorithm to identify a set of FDs that are (possibly conditionally) propagated from S to T under M. Push backward: For each R : X Y Σ s, if there exists a propagation path R,..., R labelled by f in the propagation graph of M, where R T, and each of XY has a preimage in f such that X = {f 1 (x) x X} and Y = {f 1 (y) y Y }, then R : X Y is propagated from S to T under M. Push forward: For each R : X Y Σ s, if there exists a propagation path R,..., R labelled by f in the propagation graph of M, where R T, and each of XY has an image in f such that X = {f(x) x X} and Y = {f(y) y Y }, and there does not exist a propagation path R,..., R labelled by f 1, then R : X Y is propagated from S to T under M with the condition that this FD can only be applied to a subset of tuples in a relation I(R ), i.e., {t t I(R ), t I(R), and t.z = t.f(z)}, where t.z z X Y and t.f(z) denote the values of the attributes z of t and f(z) of t, respectively. Example 4.4. Consider { : no rent} Σ t, and the propagation graph as in Figure 6. A propagation path v 1, v 2 corresponds to Rent f3,2, where v 1 = Rent, v 2 =, f 3,2 is the label of the path, and the preimages of no and rent under f 3,2 are still no and rent respectively. By the rule push backward, Rent : no rent is propagated from T to S. Although there is a propagation path v 2, v 1 corresponding to f5 Rent, by the rule push forward and the fact that f 1 3,2 = f5, no further FDs can be propagated from T to S based on this path. 5. EXPERIMENTS In order to evaluate our work presented in this paper, we have developed a bipartite schema mapping (BSM) tool. In practice, this tool can help schema mapping designers in several aspects: (1) visualizing propagation graphs for any given schema mappings, (2) assessing the design quality of schema mappings by propagating dependencies between source and target schemas, and (3) facilitating the data cleaning tasks of source instances in accordance with a given schema mapping and desired target constraints. Our BSM tool was written in Python. We have conducted our experiments over two schema mapping data sets using this BSM tool. The first data set is App. The source and target schemas of App, source and target constraints as well as a schema mapping were described in Example 2.1. In Example 4.1, we have also discussed the propagation graph of the schema mapping described in Example 2.1. Based on such a propagation graph, several propagation paths and the corresponding propagated dependencies are presented in Examples 4.2 and 4.4. The second data set is Amalgam which was taken from the web page of the Clio Project at the University of Toronto 2. Amalgam contains four individual database schemas S 1, S 2, S 3 and S 4 in the area of bibliographic databases. In the following, we illustrate the main features of the BSM tool based on Amalgam, and treat S 1 and S 2 as the source and target schemas, respectively. 2 miller/amalgam/

8 Figure 9: A schema mapping M S1S2 over Amalgam (S 1 is the source schema and S 2 is the target schema) Figure 10: The propagation graph for visualizing the schema mapping M S1S2

9 Source schema Target schema (S 1) (S 2) No of relations No of INDs No of FDs No of MCs 10 Table 1: Some statistics about the source schema S 1 and the target schemas S 2 in Amalgam, where MCs refer to mapping constraints. Figure 9 presents the main user interface built in the BSM tool for specifying the source schema, target schema and a schema mapping between the source and target schemas of Amalgam. As can be seen from Figure 9, source and target constraints in the form of FDs and INDs can also be specified through the user interface. In our experiments, as described in Table 1, the source constraints contain 14 INDs and 23 FDs, and the target constrains are expected to have 26 INDs and 21 FDs. We manually set up 10 mapping constraints which transform the records in the relation schemas Article, ArticlePublished, TechReport, TechPublished and Author of the source schema (i.e. S 1) into the corresponding ones in the relation schemas of the target schema (i.e. S 2), including Authors, Allbibs, Titles, CitJournal, Institutions, Publisher, Journal, Volumes, Years, Months, Pages, Numbers, etc., and vice versa. Some of these mapping constraints are presented in Figure 12. Figure 10 visualizes the propagation graph of the schema mapping M S1S2 depicted in Figure 9. Each white node represents a relation schema in the source schema, each black node represents a relation schema in the target schema, exact edges are black and approximate edges are red. From Figure 10, we can see that a schema mapping is often quite complicated in real-world applications, even when source and target schemas are relatively small. There are much more edges existing between relation schemas in the source and target schemas than existing between relation schemas in the same schemas. Edges that are across two different schemas indicate the correspondences between the source and target schemas which are specified through M S1S2. To evaluate how effectively the BSM tool can propagate dependencies across two different schemas through a schema mapping, we have conducted an experiment on Amalgam to derive all possible INDs over the target schema S 2 from the source constraints over S 1 and the mapping constraints in M S1S2. The experimental result shows that there are 20 propagation paths existing between relations in the target schema (i.e., both the starting and ending vertices are in S 2), and correspondingly 16 non-trivial INDs are derived over the target schema S 2. Among these 16 non-trivial INDs, 2 INDs are covered by the expected target constraints while the others are not. Figure 11 presents the propagation graph for deriving such target constraints over S 2 (we omit the isolated vertices for simplicity). 6. CONCLUSION We have presented an approach for propagating dependencies under schema mappings in this paper. Mapping constraints of a schema mapping are permitted to be bipartite TGDs. This enables us to precisely specify the relationship Figure 11: The propagation graph for deriving target constraints over S 2 based on the schema mapping M S1S2 and the source constraints over S 1 between source and target databases. We have also developed a graphical model to represent the inter-relationships among the attributes of relation schemas, and on this basis, studied the dependency propagation problem in the context of schema mappings. Our solution to this problem supports us to develop a conceptual analysis tool that exploits the semantics of a schema mapping through propagation paths in the corresponding propagation graph. In doing so, the design quality of schema mappings can be assessed before actually implementing them. As future work we will extend our work in two directions: We will study the dependency propagation problem in a peer-to-peer data management environment, which would require us to generalize our schema mapping tool to handling the propagation of dependencies across multiple databases. We will also conduct experiments to investigate how our mapping tool of propagating dependencies can be incorporated into other existing mapping tools of designing schema mappings for improved quality. In particular, we are interested in exploring how our approach can be used to reason about and repair the mapping constraints in a schema mapping. 7. REFERENCES [1] Alexe, B., Chiticariu, L., Miller, R. J., and Tan, W.-C. Muse: Mapping understanding and design by example. In ICDE (2008), pp [2] Alexe, B., Ten Cate, B., Kolaitis, P. G., and Tan, W.-C. EIRENE: Interactive design and refinement of schema mappings via data examples. PVLDB 4, 12 (2011), 1414âĂŞ [3] Arenas, M., Barcelo, P., Libkin, L., and Murlak, F. Relational and XML data exchange.

10 (Article(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8, x 9, x 10, x 11, x 12), ArticlePublished(x 1, y), Author(y, z)) (Allbibs(x 1), Authors(x 1, z), Titles(x 1, x 2), CitJournal(x 1, z 2), Journal(x 3, z 2), Years(x 1, x 4), Months(x 1, x 5), Pages(x 1, x 6), Volumes(x 1, x 7), Numbers(x 1, x 8)); Article(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8, x 9, x 10, x 11, x 12) (Allbibs(x 1), Titles(x 1, x 2), CitJournal(x 1, z 2), Journal(x 3, z 2), Years(x 1, x 4), Months(x 1, x 5), Pages(x 1, x 6), Volumes(x 1, x 7), Numbers(x 1, x 8)); ArticlePublished(x, y) Article(x, x 2, x 3, x 4, x 5, x 6, x 7, x 8, x 9, x 10, x 11, x 12), Authors(x, z 1); (TechReport(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8, x 9, x 10, x 11, x 12), TechPublished(x 1, y), Author(y, z)) (Allbibs(x 1), Authors(x 1, z), Titles(x 1, x 2), Institutions(x 1, x 3), Years(x 1, x 4), Months(x 1, x 5), Pages(x 1, x 6), Volumes(x 1, x 7), Numbers(x 1, x 8)); Authors(x, y) Author(z, y); Journal(x, y) Article(x 1, x 2, x, x 4, x 5, x 6, x 7, x 8, x 9, x 10, x 11, x 12). Figure 12: Some mapping constraints used in the experiments over Amalgam Synthesis Lectures on Data Management 2, 1 (2010), [4] Arenas, M., Pérez, J., and Riveros, C. The recovery of a schema mapping: bringing exchanged data back. ACM TODS 34, 4 (2009), 22. [5] Beeri, C., and Vardi, M. Y. A proof procedure for data dependencies. JACM 31, 4 (1984), [6] Bellahsene, Z., Bonifati, A., and Rahm, E. Schema Matching and Mapping. Springer, [7] Bonifati, A., Chang, E. Q., Lakshmanan, A. V., Ho, T., and Pottinger, R. HePToX: marrying xml and heterogeneity in your p2p databases. In PVLDB (2005), pp [8] Calvanese, D., De Giacomo, G., Lenzerini, M., and Vardi, M. Y. Simplifying schema mappings. In ICDT (2011), pp [9] Chiticariu, L., and Tan, W.-C. Debugging schema mappings with routes. In VLDB (2006), pp [10] Cosmadakis, S. S., and Kanellakis, P. C. Functional and inclusion dependencies: a graph theoretic approach. In PODS (1984), pp [11] Fagin, R. Inverting schema mappings. ACM TODS 32, 4 (2007). [12] Fagin, R., Kolaitis, P. G., Miller, R. J., and Popa, L. Data exchange: semantics and query answering. TCS 336, 1 (2005), [13] Fagin, R., Kolaitis, P. G., Nash, A., and Popa, L. Towards a theory of schema-mapping optimization. In PODS (2008), pp [14] Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W.-C. Composing schema mappings: Second-order dependencies to the rescue. ACM TODS 30, 4 (2005). [15] Fagin, R., Kolaitis, P. G., Popa, L., and Tan, W.-C. Quasi-inverses of schema mappings. ACM TODS 33, 2 (2008). [16] Fagin, R., and Vardi, M. The theory of data dependencies a survey. Proceedings of Symposia in Applied Mathematics 34 (1986), [17] Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. Conditional functional dependencies for capturing data inconsistencies. ACM TODS 33, 2 (2008). [18] Fan, W., Ma, S., Hu, Y., Liu, J., and Wu, Y. Propagating functional dependencies with conditions. PVLDB 1, 1 (2008), [19] Gottlob, G. Computing covers for embedded functional dependencies. In PODS (1987), pp [20] Gottlob, G., Pichler, R., and Savenkov, V. Normalization and optimization of schema mappings. VLDB 2, 1 (2009), [21] Gottlob, G., and Senellart, P. Schema mapping discovery from data instances. JACM 57, 2 (2010), 6. [22] Hernández, M. A., Miller, R. J., and Haas, L. M. Clio: A semi-automatic tool for schema mapping. ACM SIGMOD Record 30, 2 (2001), 607. [23] Klug, A. Calculating constraints on relational expression. ACM TODS 5, 3 (1980), [24] Klug, A., and Price, R. Determining view dependencies using tableaux. ACM TODS 7, 3 (1982), [25] Kolaitis, P. G. Schema mappings, data exchange, and metadata management. In PODS (2005), pp [26] Lenzerini, M. Data integration: A theoretical perspective. In PODS (2002). [27] Madhavan, J., and Halevy, A. Y. Composing mappings among data sources. In PVLDB (2003), pp [28] Missaoui, R., and Godin, R. The implication problem for inclusion dependencies: A graph approach. ACM SIGMOD Record 19, 1 (1990), [29] Nash, A., Bernstein, P. A., and Melnik, S. Composition of mappings given by embedded dependencies. ACM TODS 32, 1 (2007), 4. [30] ten Cate, B., Dalmau, V., and Kolaitis, P. G. Learning schema mappings. In ICDT (2012), pp [31] ten Cate, B., and Kolaitis, P. G. Structural characterizations of schema-mapping languages. JACM 53, 1 (2010), [32] Thalheim, B., and Wang, Q. Towards a theory of refinement for data migration. In ER. 2011, pp [33] Thalheim, B., and Wang, Q. Data migration: A theoretical perspective. DKE 87 (2013),

Composing Schema Mapping

Composing Schema Mapping Composing Schema Mapping An Overview Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Joint work with R. Fagin, L. Popa, and W.C. Tan 1 Data Interoperability Data may reside at several different

More information

The Inverse of a Schema Mapping

The Inverse of a Schema Mapping The Inverse of a Schema Mapping Jorge Pérez Department of Computer Science, Universidad de Chile Blanco Encalada 2120, Santiago, Chile jperez@dcc.uchile.cl Abstract The inversion of schema mappings has

More information

Checking Containment of Schema Mappings (Preliminary Report)

Checking Containment of Schema Mappings (Preliminary Report) Checking Containment of Schema Mappings (Preliminary Report) Andrea Calì 3,1 and Riccardo Torlone 2 Oxford-Man Institute of Quantitative Finance, University of Oxford, UK Dip. di Informatica e Automazione,

More information

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

Inverting Schema Mappings: Bridging the Gap between Theory and Practice Inverting Schema Mappings: Bridging the Gap between Theory and Practice Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile PUC Chile R&M Tech marenas@ing.puc.cl jperez@ing.puc.cl

More information

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Query Rewriting Using Views in the Presence of Inclusion Dependencies Query Rewriting Using Views in the Presence of Inclusion Dependencies Qingyuan Bai Jun Hong Michael F. McTear School of Computing and Mathematics, University of Ulster at Jordanstown, Newtownabbey, Co.

More information

Provable data privacy

Provable data privacy Provable data privacy Kilian Stoffel 1 and Thomas Studer 2 1 Université de Neuchâtel, Pierre-à-Mazel 7, CH-2000 Neuchâtel, Switzerland kilian.stoffel@unine.ch 2 Institut für Informatik und angewandte Mathematik,

More information

Scalable Data Exchange with Functional Dependencies

Scalable Data Exchange with Functional Dependencies Scalable Data Exchange with Functional Dependencies Bruno Marnette 1, 2 Giansalvatore Mecca 3 Paolo Papotti 4 1: Oxford University Computing Laboratory Oxford, UK 2: INRIA Saclay, Webdam Orsay, France

More information

Logical Foundations of Relational Data Exchange

Logical Foundations of Relational Data Exchange Logical Foundations of Relational Data Exchange Pablo Barceló Department of Computer Science, University of Chile pbarcelo@dcc.uchile.cl 1 Introduction Data exchange has been defined as the problem of

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. In this paper we study the

More information

Foundations of Schema Mapping Management

Foundations of Schema Mapping Management Foundations of Schema Mapping Management Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile University of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk

More information

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University. COCS 6421 Advanced Database Systems CSE, York University March 20, 2008 Agenda 1 Problem description Problems 2 3 Open questions and future work Conclusion Bibliography Problem description Problems Why

More information

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability

Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Function Symbols in Tuple-Generating Dependencies: Expressive Power and Computability Georg Gottlob 1,2, Reinhard Pichler 1, and Emanuel Sallinger 2 1 TU Wien and 2 University of Oxford Tuple-generating

More information

Schema Exchange: a Template-based Approach to Data and Metadata Translation

Schema Exchange: a Template-based Approach to Data and Metadata Translation Schema Exchange: a Template-based Approach to Data and Metadata Translation Paolo Papotti and Riccardo Torlone Università Roma Tre {papotti,torlone}@dia.uniroma3.it Abstract. We study the schema exchange

More information

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems

DATABASE THEORY. Lecture 18: Dependencies. TU Dresden, 3rd July Markus Krötzsch Knowledge-Based Systems DATABASE THEORY Lecture 18: Dependencies Markus Krötzsch Knowledge-Based Systems TU Dresden, 3rd July 2018 Review: Databases and their schemas Lines: Line Type 85 bus 3 tram F1 ferry...... Stops: SID Stop

More information

Structural characterizations of schema mapping languages

Structural characterizations of schema mapping languages Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema

More information

A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange

A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange 1. Problem and Motivation A Non intrusive Data driven Approach to Debugging Schema Mappings for Data Exchange Laura Chiticariu and Wang Chiew Tan UC Santa Cruz {laura,wctan}@cs.ucsc.edu Data exchange is

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Database Constraints and Homomorphism Dualities

Database Constraints and Homomorphism Dualities Database Constraints and Homomorphism Dualities Balder ten Cate 1, Phokion G. Kolaitis 1,2, and Wang-Chiew Tan 2,1 1 University of California Santa Cruz 2 IBM Research-Almaden Abstract. Global-as-view

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data

More information

Composition and Inversion of Schema Mappings

Composition and Inversion of Schema Mappings Composition and Inversion of Schema Mappings Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile U. of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki

More information

Foundations and Applications of Schema Mappings

Foundations and Applications of Schema Mappings Foundations and Applications of Schema Mappings Phokion G. Kolaitis University of California Santa Cruz & IBM Almaden Research Center The Data Interoperability Challenge Data may reside at several different

More information

Approximation Algorithms for Computing Certain Answers over Incomplete Databases

Approximation Algorithms for Computing Certain Answers over Incomplete Databases Approximation Algorithms for Computing Certain Answers over Incomplete Databases Sergio Greco, Cristian Molinaro, and Irina Trubitsyna {greco,cmolinaro,trubitsyna}@dimes.unical.it DIMES, Università della

More information

Structural Characterizations of Schema-Mapping Languages

Structural Characterizations of Schema-Mapping Languages Structural Characterizations of Schema-Mapping Languages Balder ten Cate University of Amsterdam and UC Santa Cruz balder.tencate@uva.nl Phokion G. Kolaitis UC Santa Cruz and IBM Almaden kolaitis@cs.ucsc.edu

More information

On Reconciling Data Exchange, Data Integration, and Peer Data Management

On Reconciling Data Exchange, Data Integration, and Peer Data Management On Reconciling Data Exchange, Data Integration, and Peer Data Management Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati Dipartimento di Informatica e Sistemistica Sapienza

More information

Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange

Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange Core Schema Mappings: Computing Core Solution with Target Dependencies in Data Exchange S. Ravichandra, and D.V.L.N. Somayajulu Abstract Schema mapping is a declarative specification of the relationship

More information

On the Hardness of Counting the Solutions of SPARQL Queries

On the Hardness of Counting the Solutions of SPARQL Queries On the Hardness of Counting the Solutions of SPARQL Queries Reinhard Pichler and Sebastian Skritek Vienna University of Technology, Faculty of Informatics {pichler,skritek}@dbai.tuwien.ac.at 1 Introduction

More information

Certain Answers as Objects and Knowledge

Certain Answers as Objects and Knowledge Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning Certain Answers as Objects and Knowledge Leonid Libkin School of Informatics, University of

More information

A Theory of Redo Recovery

A Theory of Redo Recovery A Theory of Redo Recovery David Lomet Microsoft Research Redmond, WA 98052 lomet@microsoft.com Mark Tuttle HP Labs Cambridge Cambridge, MA 02142 mark.tuttle@hp.com ABSTRACT Our goal is to understand redo

More information

Creating a Mediated Schema Based on Initial Correspondences

Creating a Mediated Schema Based on Initial Correspondences Creating a Mediated Schema Based on Initial Correspondences Rachel A. Pottinger University of Washington Seattle, WA, 98195 rap@cs.washington.edu Philip A. Bernstein Microsoft Research Redmond, WA 98052-6399

More information

Data Exchange: Semantics and Query Answering

Data Exchange: Semantics and Query Answering Data Exchange: Semantics and Query Answering Ronald Fagin Phokion G. Kolaitis Renée J. Miller Lucian Popa IBM Almaden Research Center fagin,lucian @almaden.ibm.com University of California at Santa Cruz

More information

Data integration lecture 2

Data integration lecture 2 PhD course on View-based query processing Data integration lecture 2 Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza {rosati}@dis.uniroma1.it Corso di Dottorato

More information

INCONSISTENT DATABASES

INCONSISTENT DATABASES INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity

More information

Data integration lecture 3

Data integration lecture 3 PhD course on View-based query processing Data integration lecture 3 Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza {rosati}@dis.uniroma1.it Corso di Dottorato

More information

Efficient and scalable Data Exchange with target functional dependencies

Efficient and scalable Data Exchange with target functional dependencies Efficient and scalable Data Exchange with target functional dependencies Ioana Ileana Joint work 1 with Angela Bonifati (Univ. Lyon 1) and Michele Linardi (Univ. Paris Descartes) 1 A. Bonifati, I. Ileana,

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

Data Integration: A Theoretical Perspective

Data Integration: A Theoretical Perspective Data Integration: A Theoretical Perspective Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113, I 00198 Roma, Italy lenzerini@dis.uniroma1.it ABSTRACT

More information

Kanata: Adaptation and Evolution in Data Sharing Systems

Kanata: Adaptation and Evolution in Data Sharing Systems Kanata: Adaptation and Evolution in Data Sharing Systems Periklis Andritsos Ariel Fuxman Anastasios Kementsietsidis Renée J. Miller Yannis Velegrakis Department of Computer Science University of Toronto

More information

Foundations of SPARQL Query Optimization

Foundations of SPARQL Query Optimization Foundations of SPARQL Query Optimization Michael Schmidt, Michael Meier, Georg Lausen Albert-Ludwigs-Universität Freiburg Database and Information Systems Group 13 th International Conference on Database

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

Incomplete Databases: Missing Records and Missing Values

Incomplete Databases: Missing Records and Missing Values Incomplete Databases: Missing Records and Missing Values Werner Nutt, Simon Razniewski, and Gil Vegliach Free University of Bozen-Bolzano, Dominikanerplatz 3, 39100 Bozen, Italy {nutt, razniewski}@inf.unibz.it,

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Using Statistics for Computing Joins with MapReduce

Using Statistics for Computing Joins with MapReduce Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat

More information

Uncertainty in Databases. Lecture 2: Essential Database Foundations

Uncertainty in Databases. Lecture 2: Essential Database Foundations Uncertainty in Databases Lecture 2: Essential Database Foundations Table of Contents 1 2 3 4 5 6 Table of Contents Codd s Vision Codd Catches On Top Academic Recognition Selected Publication Venues 1 2

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

FOL Modeling of Integrity Constraints (Dependencies)

FOL Modeling of Integrity Constraints (Dependencies) FOL Modeling of Integrity Constraints (Dependencies) Alin Deutsch Computer Science and Engineering, University of California San Diego deutsch@cs.ucsd.edu SYNONYMS relational integrity constraints; dependencies

More information

Algebraic Model Management: A Survey

Algebraic Model Management: A Survey Algebraic Model Management: A Survey Patrick Schultz 1, David I. Spivak 1, and Ryan Wisnesky 2 1 Massachusetts Institute of Technology 2 Categorical Informatics, Inc. Abstract. We survey the field of model

More information

Who won the Universal Relation wars? Alberto Mendelzon University of Toronto

Who won the Universal Relation wars? Alberto Mendelzon University of Toronto Who won the Universal Relation wars? Alberto Mendelzon University of Toronto Who won the Universal Relation wars? 1 1-a Who won the Universal Relation wars? The Good Guys. The Good Guys 2 3 Outline The

More information

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages

DBAI-TR UMAP: A Universal Layer for Schema Mapping Languages DBAI-TR-2012-76 UMAP: A Universal Layer for Schema Mapping Languages Florin Chertes and Ingo Feinerer Technische Universität Wien, Vienna, Austria Institut für Informationssysteme FlorinChertes@acm.org

More information

Schema Mappings, Data Exchange, and Metadata Management

Schema Mappings, Data Exchange, and Metadata Management Schema Mappings, Data Exchange, and Metadata Management Phokion G. Kolaitis IBM Almaden Research Center kolaitis@almaden.ibm.com ABSTRACT Schema mappings are high-level specifications that describe the

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Data Exchange in the Relational and RDF Worlds

Data Exchange in the Relational and RDF Worlds Data Exchange in the Relational and RDF Worlds Marcelo Arenas Department of Computer Science Pontificia Universidad Católica de Chile This is joint work with Jorge Pérez, Juan Reutter, Cristian Riveros

More information

Schema Design for Uncertain Databases

Schema Design for Uncertain Databases Schema Design for Uncertain Databases Anish Das Sarma, Jeffrey Ullman, Jennifer Widom {anish,ullman,widom}@cs.stanford.edu Stanford University Abstract. We address schema design in uncertain databases.

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi Carleton University Ottawa, Canada bertossi@scs.carleton.ca Solmaz Kolahi University of British Columbia

More information

Relational to RDF Data Exchange in Presence of a Shape Expression Schema

Relational to RDF Data Exchange in Presence of a Shape Expression Schema Relational to RDF Data Exchange in Presence of a Shape Expression Schema Iovka Boneva, Jose Lozano, Sławek Staworko CRIStAL UMR 9189, University of Lille and Inria, F-59000 Lille, France Abstract. We study

More information

Functions. How is this definition written in symbolic logic notation?

Functions. How is this definition written in symbolic logic notation? functions 1 Functions Def. Let A and B be sets. A function f from A to B is an assignment of exactly one element of B to each element of A. We write f(a) = b if b is the unique element of B assigned by

More information

Unifying and extending hybrid tractable classes of CSPs

Unifying and extending hybrid tractable classes of CSPs Journal of Experimental & Theoretical Artificial Intelligence Vol. 00, No. 00, Month-Month 200x, 1 16 Unifying and extending hybrid tractable classes of CSPs Wady Naanaa Faculty of sciences, University

More information

Graph Databases. Advanced Topics in Foundations of Databases, University of Edinburgh, 2017/18

Graph Databases. Advanced Topics in Foundations of Databases, University of Edinburgh, 2017/18 Graph Databases Advanced Topics in Foundations of Databases, University of Edinburgh, 2017/18 Graph Databases and Applications Graph databases are crucial when topology is as important as the data Several

More information

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies Jef Wijsen Université de Mons-Hainaut, Mons, Belgium, jef.wijsen@umh.ac.be, WWW home page: http://staff.umh.ac.be/wijsen.jef/

More information

On the Data Complexity of Consistent Query Answering over Graph Databases

On the Data Complexity of Consistent Query Answering over Graph Databases On the Data Complexity of Consistent Query Answering over Graph Databases Pablo Barceló and Gaëlle Fontaine Department of Computer Science University of Chile pbarcelo@dcc.uchile.cl, gaelle@dcc.uchile.cl

More information

Correctness Criteria Beyond Serializability

Correctness Criteria Beyond Serializability Correctness Criteria Beyond Serializability Mourad Ouzzani Cyber Center, Purdue University http://www.cs.purdue.edu/homes/mourad/ Brahim Medjahed Department of Computer & Information Science, The University

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

The interaction of theory and practice in database research

The interaction of theory and practice in database research The interaction of theory and practice in database research Ron Fagin IBM Research Almaden 1 Purpose of This Talk Encourage collaboration between theoreticians and system builders via two case studies

More information

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2011 1. Introduction: Relational Query Languages Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 8 March,

More information

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016

Foundations of Data Exchange and Metadata Management. Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 Foundations of Data Exchange and Metadata Management Marcelo Arenas Ron Fagin Special Event - SIGMOD/PODS 2016 The need for a formal definition We had a paper with Ron in PODS 2004 Back then I was a Ph.D.

More information

MA651 Topology. Lecture 4. Topological spaces 2

MA651 Topology. Lecture 4. Topological spaces 2 MA651 Topology. Lecture 4. Topological spaces 2 This text is based on the following books: Linear Algebra and Analysis by Marc Zamansky Topology by James Dugundgji Fundamental concepts of topology by Peter

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Foundations of Databases

Foundations of Databases Foundations of Databases Free University of Bozen Bolzano, 2004 2005 Thomas Eiter Institut für Informationssysteme Arbeitsbereich Wissensbasierte Systeme (184/3) Technische Universität Wien http://www.kr.tuwien.ac.at/staff/eiter

More information

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler

Database Theory VU , SS Introduction: Relational Query Languages. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2018 1. Introduction: Relational Query Languages Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 6 March,

More information

Three easy pieces on schema mappings for tree-structured data

Three easy pieces on schema mappings for tree-structured data Three easy pieces on schema mappings for tree-structured data Claire David 1 and Filip Murlak 2 1 Université Paris-Est Marne-la-Vallée 2 University of Warsaw Abstract. Schema mappings specify how data

More information

MASTRO-I: Efficient integration of relational data through DL ontologies

MASTRO-I: Efficient integration of relational data through DL ontologies MASTRO-I: Efficient integration of relational data through DL ontologies Diego Calvanese 1, Giuseppe De Giacomo 2, Domenico Lembo 2, Maurizio Lenzerini 2, Antonella Poggi 2, Riccardo Rosati 2 1 Faculty

More information

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi

More information

Semantic data integration in P2P systems

Semantic data integration in P2P systems Semantic data integration in P2P systems D. Calvanese, E. Damaggio, G. De Giacomo, M. Lenzerini, R. Rosati Dipartimento di Informatica e Sistemistica Antonio Ruberti Università di Roma La Sapienza International

More information

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong.

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong. Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong SAR, China fyclaw,jleeg@cse.cuhk.edu.hk

More information

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases

On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases Jan Chomicki 1 and Jerzy Marcinkowski 2 1 Dept. of Computer Science and Engineering University at Buffalo

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Updating data and knowledge bases

Updating data and knowledge bases Updating data and knowledge bases Inconsistency management in data and knowledge bases (2013) Antonella Poggi Sapienza Università di Roma Inconsistency management in data and knowledge bases (2013) Rome,

More information

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California, Irvine, CA 92697 chenli@ics.uci.edu Abstract In data-integration

More information

Inference in Hierarchical Multidimensional Space

Inference in Hierarchical Multidimensional Space Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional

More information

The Semantics of Consistency and Trust in Peer Data Exchange Systems

The Semantics of Consistency and Trust in Peer Data Exchange Systems The Semantics of Consistency and Trust in Peer Data Exchange Systems Leopoldo Bertossi 1 and Loreto Bravo 2 1 Carleton University, School of Computer Science, Ottawa, Canada. bertossi@scs.carleton.ca 2

More information

Limits of Schema Mappings

Limits of Schema Mappings Limits of Schema Mappings Phokion G. Kolaitis 1, Reinhard Pichler 2, Emanuel Sallinger 3, and Vadim Savenkov 4 1 University of California Santa Cruz, Santa Cruz, USA; and IBM Research-Almaden, San Jose,

More information

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms Schema Refinement: Dependencies and Normal Forms Grant Weddell Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2016 CS 348 (Intro to DB Mgmt)

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Rewrite and Conquer: Dealing with Integrity Constraints in Data Integration

Rewrite and Conquer: Dealing with Integrity Constraints in Data Integration Rewrite and Conquer: Dealing with Integrity Constraints in Data Integration Andrea Calì, Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini Abstract The work Data Integration under Integrity

More information

Choice Logic Programs and Nash Equilibria in Strategic Games

Choice Logic Programs and Nash Equilibria in Strategic Games Choice Logic Programs and Nash Equilibria in Strategic Games Marina De Vos and Dirk Vermeir Dept. of Computer Science Free University of Brussels, VUB Pleinlaan 2, Brussels 1050, Belgium Tel: +32 2 6293308

More information

Bibliographic citation

Bibliographic citation Bibliographic citation Andrea Calì, Georg Gottlob, Andreas Pieris: Tractable Query Answering over Conceptual Schemata. In Alberto H. F. Laender, Silvana Castano, Umeshwar Dayal, Fabio Casati, Jos Palazzo

More information

Dependencies Revisited for Improving Data Quality

Dependencies Revisited for Improving Data Quality Dependencies Revisited for Improving Data Quality Wenfei Fan University of Edinburgh & Bell Laboratories Wenfei Fan Dependencies Revisited for Improving Data Quality 1 / 70 Real-world data is often dirty

More information

Helping the Tester Get it Right: Towards Supporting Agile Combinatorial Test Design

Helping the Tester Get it Right: Towards Supporting Agile Combinatorial Test Design Helping the Tester Get it Right: Towards Supporting Agile Combinatorial Test Design Anna Zamansky 1 and Eitan Farchi 2 1 University of Haifa, Israel 2 IBM Haifa Research Lab, Israel Abstract. Combinatorial

More information

A Mechanism for Sequential Consistency in a Distributed Objects System

A Mechanism for Sequential Consistency in a Distributed Objects System A Mechanism for Sequential Consistency in a Distributed Objects System Cristian Ţăpuş, Aleksey Nogin, Jason Hickey, and Jerome White California Institute of Technology Computer Science Department MC 256-80,

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

Chapter 6. Curves and Surfaces. 6.1 Graphs as Surfaces

Chapter 6. Curves and Surfaces. 6.1 Graphs as Surfaces Chapter 6 Curves and Surfaces In Chapter 2 a plane is defined as the zero set of a linear function in R 3. It is expected a surface is the zero set of a differentiable function in R n. To motivate, graphs

More information

Implementing mapping composition

Implementing mapping composition The VLDB Journal (2008) 17:333 353 DOI 10.1007/s00778-007-0059-9 SPECIAL ISSUE PAPER Implementing mapping composition Philip A. Bernstein Todd J. Green Sergey Melnik Alan Nash Received: 17 February 2007

More information

Schema Independent Relational Learning

Schema Independent Relational Learning Schema Independent Relational Learning Jose Picado Arash Termehchy Alan Fern School of EECS, Oregon State University Corvallis, OR 97331 {picadolj,termehca,afern}@eecs.oregonstate.edu Abstract Relational

More information

Approximate Functional Dependencies for XML Data

Approximate Functional Dependencies for XML Data Approximate Functional Dependencies for XML Data Fabio Fassetti and Bettina Fazzinga DEIS - Università della Calabria Via P. Bucci, 41C 87036 Rende (CS), Italy {ffassetti,bfazzinga}@deis.unical.it Abstract.

More information

Mining XML Functional Dependencies through Formal Concept Analysis

Mining XML Functional Dependencies through Formal Concept Analysis Mining XML Functional Dependencies through Formal Concept Analysis Viorica Varga May 6, 2010 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML

More information

Representing Product Designs Using a Description Graph Extension to OWL 2

Representing Product Designs Using a Description Graph Extension to OWL 2 Representing Product Designs Using a Description Graph Extension to OWL 2 Henson Graves Lockheed Martin Aeronautics Company Fort Worth Texas, USA henson.graves@lmco.com Abstract. Product development requires

More information

Ontology-Based Schema Integration

Ontology-Based Schema Integration Ontology-Based Schema Integration Zdeňka Linková Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic linkova@cs.cas.cz Department

More information