Mining XML Functional Dependencies through Formal Concept Analysis

Size: px
Start display at page:

Download "Mining XML Functional Dependencies through Formal Concept Analysis"

Transcription

1 Mining XML Functional Dependencies through Formal Concept Analysis Viorica Varga May 6, 2010

2 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML keys Detecting XML data redundancy Conclusions and Future Work

3 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML keys Detecting XML data redundancy Conclusions and Future Work

4 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML keys Detecting XML data redundancy Conclusions and Future Work

5 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML keys Detecting XML data redundancy Conclusions and Future Work

6 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML keys Detecting XML data redundancy Conclusions and Future Work

7 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML keys Detecting XML data redundancy Conclusions and Future Work

8 XML Design XML data design: choose an appropriate XML schema, which usually come in the form of DTD (Document Type Definition) or XML Scheme. Functional dependencies (FDs) are a key factor in XML design. The objective of normalization is to eliminate redundancies from an XML document, eliminate or reduce potential update anomalies. Arenas, M., Libkin, L.: A normal form for XML documents. TODS 29(1), (2004) Yu, C., Jagadish, H. V.: XML schema refinement through redundancy detection and normalization. VLDB J. 17(2): (2008)

9 Schema definition Definition (Schema) A schema is defined as a set S = (E, T, r), where: E is a finite set of element labels; T is a finite set of element types, and each e E is associated with a τ T, written as (e : τ), τ has the next form: τ ::= str int float SetOf τ Rcd [e 1 : τ 1,..., e n : τ n ]; r E is the label of the root element, whose associated element type can not be SetOf τ. Types str, int and float are the system defined simple types and Rcd indicate complex scheme elements. Keyword SetOf is used to indicate set schema elements Attributes and elements are treated in the same way, with a symbol before attributes.

10 Figure: CustOrder XML tree

11 Customer s Orders Example Scheme 0 CustOrder : Rcd 1 Customers : SetOf Rcd 2 CustomerID : s t r 3 CompanyName : s t r 4 A d d r e s s : s t r 5 C i t y : s t r 6 P o s talcode : s t r 7 Country : s t r 8 Phone : s t r 9 Orders : SetOf Rcd 10 OrderID : i n t 11 CustomerID : s t r 12 OrderDate : s t r 13 OrderDetails : SetOf Rcd 14 OrderID : i n t 15 ProductID : i n t 16 U n i t P r i c e : f l o a t 17 Q u a n t i t y : f l o a t 18 ProductName : s t r 19 CategoryID : i n t

12 A schema element e k can be identified through a path expression, path(e k ) = /e 1 /e 2 /.../e k, where e 1 = r, and e i is associated with type τ i ::= Rcd [..., e i+1 : τ i+1,...] for all i [1, k 1]. A path is repeatable, if e k is a set element. We adopt XPath steps. (self) and.. (parent) Definition (Data tree) An XML database is defined to be a rooted labeled tree T = N, P, V, n r, where: N is a set of labeled data nodes, each n N has a label e and a node key that uniquely identifies it in T ; n r N is the root node; P is a set of parent-child edges, there is exactly one p = (n, n) in P for each n N (except n r ), where n N, n n, n is called the parent node, n is called the child node; V is a set of value assignments, there is exactly one v = (n, s) in V for each leaf node n N, where s is a value of simple type.

13 Descendant, repeatable element definition We assign a node key, referred to to each data node in the data tree in a pre-order traversal. A data element n k is a descendant of another data element n 1 if there exists a series of data elements n i, such that (n i, n i+1 ) P for all i [1, k 1]. Data element n k can be addressed using a path expression, path(n k ) = /e 1 /... /e k, where e i is the label of n i for each i [1, k], n 1 = n r, and (n i, n i+1 ) P for all i [1, k 1]. A data element n k is called repeatable if e k corresponds to a set element in the schema. Element n k is called a direct descendant of element n a, if n k is a descendant of n a, path(n k ) =... /e a /e 1 /... /e k 1 /e k, and e i is not a set element for any i [1, k 1].

14 Warehouse Example

15 Warehouse Example Scheme 0 warehouse : Rcd 1 s t a t e : SetOf Rcd 2 name : s t r 3 s t o r e : SetOf Rcd 4 contact : Rcd 5 name : s t r 6 a d d r e s s : s t r 7 book : SetOf Rcd 8 ISBN : s t r 9 author : SetOf s t r 10 t i t l e : s t r 11 p r i c e : s t r

16 Element-value equality Definition(Element-value equality) Two data elements n 1 of T 1 = N 1, P 1, V 1, n r1 and n 2 of T 2 = N 2, P 2, V 2, n r2 are element-value equal (written as n 1 = ev n 2 ) if and only if: n 1 and n 2 both exist and have the same label; There exists a set M, such that for every pair (n 1, n 2 ) M, n 1 = ev n 2, where n 1, n 2 are children elements of n 1, n 2, respectively. Every child element of n 1 or n 2 appears in exactly one pair in M. (n 1, s) V 1 if and only if (n 2, s) V 2,where s is a simple value. Example Data elements node 30 and 50 are element value equal if and only if the subtrees rooted at those two elements are identical when the order among sibling elements is ignored.

17 Path-value equality Definition(Path-value equality) Two data element paths p 1 on T 1 = N 1, P 1, V 1, n r1 and p 2 on T 2 = N 2, P 2, V 2, n r2 are path-value equal (written as T 1.p 1 = pv T 2.p 2 ) if and only if there is a set M of matching pairs where For each pair m = (n 1, n 2 ) in M, n 1 N 1, n 2 N 2, path(n 1 ) = p 1, path(n 2 ) = p 2, and n 1 = ev n 2 ; All data elements with path p 1 in T 1 and path p 2 in T 2 participate in M, and each such data element participates in only one such pair. Value equality between two paths is complicated by the fact that a single path can match multiple data elements in the data tree. This definition consider two paths value equal if each node which is pointed to by one path must have a corresponding node that is pointed to by the other path, where the two nodes are element value equal.

18 Generalized tree tuple Definition A generalized tree tuple of data tree T = N, P, V, n r, with regard to a particular data element n p (called pivot node), is a tree t T n p = N t, P t, V t, n r, where: N t N is the set of nodes, n p N t ; P t P is the set of parent-child edges; V t V is the set of value assignments; n r is the same root node in both t T n p and T ; n N t if and only if: n is a descendant or ancestor of np in T, or n is a non-repeatable direct descendant of an ancestor of np in T ; (n 1, n 2 ) P t if and only if n 1 N t, n 2 N t, (n 1, n 2 ) P; (n, s) V t if and only if n N t, (n, s) V.

19 Tuple class A generalized tree tuple is a data tree projected from the original data tree. It has an extra parameter called a pivot node. In contrast with tree tuple defined in Arenas and Libkin s article, which separate sibling nodes with the same path at all hierarchy levels, the generalized tree tuple separate sibling nodes with the same path above the pivot node Based on the pivot node, generalized tree tuples can be categorized into tuple classes: Definition(Tuple class) A tuple class Cp T of the data tree T is the set of all generalized tree tuples tn T, where path(n) = p. Path p is called the pivot path.

20 Figure: Example tree tuple

21 Figure: Example tree tuple

22 XML Functional Dependency Definition(XML FD) An XML FD is a triple C p, LHS, RHS, written as LHS RHS w.r.t. C p, where C p denotes a tuple class, LHS is a set of paths (P li, i = [1, n]) relative to p, and RHS is a single path (P r ) relative to p. An XML FD holds on a data tree T (or T satisfies an XML FD) if and only if for any two generalized tree tuples t 1, t 2 C p - i [1, n], t 1.P li = or t 2.P li =, or - If i [1, n], t 1.P li = pv t 2.P li, then t 1.P r, t 2.P r, t 1.P r = pv t 2.P r. A null value,, results from a path that matches no node in the tuple, and = pv is the path-value equality defined previous.

23 XML Functional Dependency Example Example (XML FD) In our running example whenever two products agree on ProductID values, they have the same ProductName. This can be formulated as follows:./productid./productname w.r.t. C OrderDetails Another example is:./productid./categoryid w.r.t C OrderDetails Example (XML FD) In warehause tree:./isbn./title w.r.t C book../contact/name,./isbn./price w.r.t C book./isbn./author w.r.t C book./author,./title./isbn w.r.t C book

24 Trivial XML FD Definition: (Trivial XML FD) An XML FD C p, LHS, RHS is trivial if: 1. RHS LHS, or 2. For any generalized tree tuple in C p, there is at least one path in LHS that matches no data element. The 2. point can arrise, because of the existence of Choice elements. Example If Contact is a Choice element instead of Rcd, i.e. it can have either name or address as its childs, but not both, then the XML FD: C store,./contact/name,./contact/address,./@key is trivial, because no C store tuple will have both LHS node.

25 XML key Definition (XML key) An XML Key of a data tree T is a pair C p, LHS, where T satisfies the XML FD C p, LHS,./@key. Example We have the XML FD: C Orders,./OrderID,./@key, which implies that C Orders,./OrderID is an XML key. Example C State,./name C Store,./contact/name,./contact/address are XML keys.

26 Structurally redundant XML FDs Theorem Let FD = C p, LHS, RHS, if none of the paths in LHS and RHS specifies a data element that is descendent of the pivot node in the tuple, then FD holds on a data tree T if and only if FD = C p, LHS, RHS holds on T, where C p is the lowest-repeatable-ancestor tuple class of C p paths in LHS and RHS are equivalent to paths in LHS and RHS (i.e. they correspond to the same absolute paths). Example../ISBN../title w.r.t C author is structurally redundant with./isbn./title w.r.t C book

27 Interesting XML FD Tuple classes with repeatable pivot paths are called essential tuple classes. Definition(Interesting XML FD) An XML FD C p, LHS, RHS is interesting if it satisfies the following conditions: RHS / LHS; C p is an essential tuple class; RHS matches to descendent(s) of the pivot node. An interesting XML FD is a non-trivial XML FD with an essential tuple class and is not structurally redundant to any other XML FD.

28 XML data redundancy Definition(XML data redundancy) A data tree T contains a redundancy if and only if T satisfies an interesting XML FD C p, LHS, RHS, but does not satisfy the XML Key C p, LHS. Intuitively: if C p, LHS is not a key for T, then there exists two distinct tuples in C p that share the same LHS. T satisfies C p, LHS, RHS, so RHS of these two tuples must be value equal so: data is redundantly stored

29 GTT-XNF Definition(GTT-XNF) An XML schema S is in GTT-XNF given the set of all satisfied interesting XML FDs if and only if for each such XML FD ( C p, LHS, RHS ), C p, LHS is an XML key. Intuitively: GTT-XNF disallows any satisfied interesting XML FD that indicates data redundancies. Rule 1 (Reflexivity) LHS P 1 w.r.t. C p is satisfied if P 1 LHS. Rule 2 (Augmentation) LHS P 1 w.r.t. C p {LHS, P 2 } P 1 w.r.t. C p. Rule 3 (Transitivity) LHS P 1 w.r.t. C p... LHS P n w.r.t. C p {P 1,..., P n } P w.r.t. C p LHS P w.r.t. C p.

30 XML Data Flat Representation Figure: One relation for the whole XML data

31 XML Data Hierarhical Representation

32 Intra-relation FDs/Keys, Inter-relation FDs/Keys Example (XML FD) In warehause tree:./isbn./title w.r.t C book intra-relation FD../contact/name,./ISBN./price w.r.t C book inter-relation FD./ISBN./author w.r.t C book inter-relation FD./author,./title./ISBN w.r.t C book inter-relation FD Yu, C., Jagadish, H. V.: XML schema refinement through redundancy detection and normalization. VLDB J. 17(2): (2008) presents algorithm based on partitioning to detect intra-relation FDs, another very complicated for inter-relation FDs.

33 Eliminating redundancy-indicating FDs if C p, LHS is not a key for T T satisfies C p, LHS, RHS, so RHS is redundantly stored to eliminate such FD, the schema element corresponding to RHS is moved into a new schema location, such that those data elements are no longer redundantly stored. Let Σ be the set of redundancy-indicating FDs. Example./ISBN./title w.r.t C book {../../name,../contact/name,./isbn }./price w.r.t C book Assumption: {../name,./contact/name} is a key for C store.

34 Local/global XML FD Definition (Local/global XML FD) An XML FD C p, LHS, RHS is local if there exists LHS LHS such that C p, LHS is an XML key, where C p is an ancestor tuple class of C p (i.e. p is a prefix of p). Otherwise, the FD is global. Example./ISBN./title w.r.t C book is global, because no subset of its LHS is a key for any tuple class above C book means: 2 books, regardless whether they are under the same store or state, if they have the same ISBN, then they will have the same title Example {../../name,../contact/name,./isbn }./price w.r.t C book is local, because {../../name,../contact/name} is a key for C store. means: state name and store name uniquely identifies each store, any 2 books, if they have the same ISBN, they will have the same price, as long as they are under the same store.

35 Eliminate global FD Procedure 1 Let F = {P 1,..., P n } P r w.r.t. C p be a redundancy indicated global FD on Schema S root ; {e i i [1, n]} and {τ i i [1, n]} be the sets of schema element labels and types, respectively, associated with each P i ; e r and τ r be the schema element label and type,respectively, associated with P r ; τ parent be the schema element type of the parent element of P r ; τ root =Rcd[e 1 : τ 1,..., e m : τ m] be the element type of the root element. Eliminating redundancy: Create a new schema element with label e new and type τ new =SetOfRcd[e 1 : τ 1,..., e n : τ n, e r : τ r ] Set τ root =Rcd[e 1 : τ 1,..., e m : τ m, e new : τ new ] Remove (e r : τ r ) from τ parent

36 Eliminate global FD Example./ISBN./title w.r.t C book Scheme after eliminating global FD: 0 warehouse : Rcd 1 s t a t e : SetOf Rcd 2 name : s t r 3 s t o r e : SetOf Rcd 4 contact : Rcd 5 name : s t r 6 a d d r e s s : s t r 7 book : SetOf Rcd 8 ISBN : s t r 9 author : SetOf s t r 10 p r i c e : s t r 11 new book : SetofRcd 12 ISBN : s t r 13 t i t l e : s t r

37 Adjusting FD remove F from Σ. the semantics of F is captured by: {P 1,..., P n } P r w.r.t. C new, it is not redundancy, does not need to be added to Σ remove all FDs from Σ that are affected by the move of P r. Example {./author,./title}./isbn w.r.t C book is removed, because it is not valid. It is safe to do, because ISBN is no longer redundant.

38 Eliminate local FD Procedure 2 Let F = {P 1,..., P k 1, P k,..., P n } P r w.r.t. C p be a redundancy indicated local FD on Schema S root ; {P 1,..., P k 1 } is the key for C p; C p is is an ancestor tuple class of C p and there is no other subset L of {P i i [1, n]} such that L is a key for C p C p is an ancestor of C p and a descendant of C p (i.e. C p is the lowest tuple class that can be identified) {e i i [k, n]} and {τ i i [k, n]} be the sets of schema element labels and types, respectively, associated with each P i ; e r and τ r be the schema element label and type,respectively, associated with P r ; τ parent be the schema element type of the parent element of P r ; τ p =Rcd[e 1 : τ 1,..., e m : τ m] be the element type of the schema element corresponding to the pivot path of C p.

39 Eliminate redundancy Create a new schema element with label e new and type τ new =SetOfRcd[e k : τ k,..., e n : τ n, e r : τ r ] Set τ p =Rcd[e 1 : τ 1,..., e m : τ m, e new : τ new ] Remove (e r : τ r ) from τ parent Explanation to eliminate a local FD like {../../name,../contact/name,./isbn }./price w.r.t C book create a new schema element containing the subset of its LHS (ISBN) that are not part of the key for ancestor tuple class (C store ) and RHS element (price) put this new element under the schema element corresponding to the pivot path of the ancestor tuple class (/warehouse/state/store). RHS element is removed from its original position

40 Eliminate redundancy cont. by creating the new schema element under the non-root ancestor, fewer elements needs to be copied under the new scheme after the the modification of the scheme remove any FD that is affected by the move of P r Scheme after eliminating local FD: 0 warehouse : Rcd 1 s t a t e : SetOf Rcd 2 name : s t r 3 s t o r e : SetOf Rcd 4 contact : Rcd 5 name : s t r 6 a d d r e s s : s t r 7 book : SetOf Rcd 8 ISBN : s t r 9 author : SetOf s t r 10 t i t l e : s t r 11 new book : SetOfRcd 12 ISBN : s t r 13 p r i c e : s t r

41 Special case for Procedure 2 if the entire LHS of the FD is a key for some ancestor tuple class Example In DBLP scheme year of an article is determined by the identity (@key) of the issue containing the article instead of creating a new scheme element containing a single element year we move year after issue

42 Normalization algorithm

43 Normalization algorithm explanation FDs are grouped according to their LHS Example./ISBN./title w.r.t C book./isbn./author w.r.t C book if they are dealt separately two new scheme element are created FDs are processed according to the number of paths in their LHS to reduce the storage cost. Example {./title,.author}./isbn w.r.t C book If this FD is processed first, then the elements title and author will remain under book, not ISBN

44 Normalization algorithm explanation cont. FDs are processed according to the hierarchy depth of their tuple class in a bottom-up fashion (lowest first) this is because during the process of FDs for a lower hierarchy tuple class, redundancy-indicating FDs for a higher hierarchy tuple class may be created algorithm terminates because each application of Procedure 1 and 2 either removes one redundancy-indicating FD or converts one redundancy-indicating FD into another one with a tuple class at a higher hierarchy

45 GTT-XNF Scheme of Warehouse xml data Eliminating first:./isbn./title w.r.t C book./isbn./author w.r.t C book this FD is no longer redundancy indicating {../../name,../contact/name,./isbn }./price w.r.t C book 0 warehouse : Rcd 1 s t a t e : SetOf Rcd 2 name : s t r 3 s t o r e : SetOf Rcd 4 contact : Rcd 5 name : s t r 6 a d d r e s s : s t r 7 book : SetOf Rcd 8 ISBN : s t r 9 p r i c e : s t r 10 new book : SetOfRcd 11 ISBN : s t r 12 t i t l e : s t r 13 author : SetOf s t r

46 Introduction to FCA From a philosophical point of view a concept is a unit of thoughts consisting of two parts: the extension, which are objects; the intension consisting of all attributes valid for the objects of the context; Formal Concept Analysis (FCA) introduced by Wille gives a mathematical formalization of the concept notion. A detailed mathematic foundation of FCA can be found in: Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin-Heidelberg-New York. (1999) Formal Concept Analysis is applied in many different realms like psychology, sociology, computer science, biology, medicine and linguistics. FCA is a useful tool to explore the conceptual knowledge contained in a database by analyzing the formal conceptual structure of the data.

47 Introduction to FCA FCA studies how objects can be hierarchically grouped together according to their common attributes. In FCA the data is represented by a cross table, called formal context. A formal context is a triple (G, M, I ). G is a finite set of objects M is finite set of attributes The relation I G M is a binary relation between objects and attributes. Each couple (g, m) I denotes the fact that the object g G is related to the item m M.

48 Introduction to FCA For a set A G of objects we define A := {m M gim for all g A} the set of all attributes common to the objects in A. Dually, for a set B M of attributes we define B := {g G gim for all m B} the set of all objects which have all attributes in B. A formal concept of the context K := (G, M, I ) is a pair (A, B) where A G, B M, A = B, and B = A. We call A the extent and B the intent of the concept (A, B). The set of all concepts of the context (G, M, I ) is denoted by B(G, M, I ).

49 Example formal context The following cross table describes for some hotels the attributes they have. In this case the objects are: Oasis, Royal, Amelia, California, Grand, Samira; and the attributes are: Internet, Sauna, Jacuzzi, ATM, Babysitting. ({California, Grand}) := {Sauna, Jacuzzi}. Internet Sauna Jacuzzi ATM Babysitting Oasis X X X X Royal X X X Amelia X X California X X Grand X X Samira X X Table: Formal context of the Hotel facilities example

50 FCA tool to detect XML FDs we elaborate an FCA based tool that identify functional dependencies in XML documents. to achieve this, as a first step, we have to construct the Formal Context of functional dependencies for XML data. we have to identify the objects and attributes of this context in case of XML data. tuple-based XML FD notion proposed in the above section suggests a natural technique for XFD discovery XML data can be converted into a fully unnested relation, a single relational table, and apply existing FD discovery algorithms directly. given an XML document, which contains at the beginning the schema of the data, we create generalized tree tuples from it.

51 Construct Formal Context of XML FDs each tree tuple in a tuple class has the same structure, so it has the same number of elements. we use the flat representation which converts the generalized tree tuples into a flat table each row in the table corresponds to a tree tuple in the XML tree in the flat table we insert non-leaf and leaf level elements (or attributes) from the tree for non-leaf level nodes the associated keys are used as values we include non-leaf level nodes with associated key values, to detect XML keys

52 Flat table for tuple class C Orders Example Let us construct the flat table for tuple class C Orders. There are two non-leaf nodes: Orders, appears as Orders@key OrderDetails, appears as OrderDetails@key.

53 Formal Context for class C Orders Context s Attributes: PathEnd/ElementName for non-leaf level nodes: the name of the attribute is constructed as: and its value will be the associated key value for non-leaf level nodes: the element names of the leaves. Context s Objects: the objects are considered to be the tree tuple pairs, actually the tuple pairs of the flat table. The key values associated to non-leaf elements and leaf element s values are used in these tuple pairs. Context s Properties: the mapping between objects and attributes is defined by a binary relation, this incidence relation of the context shows which attributes of this tuple pairs have the same value.

54 Beginning of the Formal Context of functional dependencies for tuple class C Orders the analyzed XML document may have a large number of tree tuples. we filter the tuple pairs and we leave out those pairs in which there are no common attributes, by an operation called clarifying the context, which does not alter the conceptual hierarchy.

55 Concept Lattice of functional dependencies Formal Context for tuple class C Orders we run the Concept Explorer (ConExp) engine to generate the concepts and create the concept lattice.

56 Processing the Output of FCA a concept lattice consists of the set of concepts of a formal context and the subconcept-superconcept relation between the concepts; every circle represents a formal concept; each concept is a tuple of a set of objects and a set of common attributes, but only the attributes are listed; an edge connects two concepts if one implies the other directly; each link connecting two concepts represents the transitive subconcept-superconcept relation between them; the top concept has all formal objects in its extension; the bottom concept has all formal attributes in its intension.

57 The relationship between FDs in databases and implications in FCA a FD X Y holds in a relation r over R iff the implication X Y holds in the context (G, R, I ) where G = {(t 1, t 2 ) t 1, t 2 r, t 1 t 2 } and A R, (t 1, t 2 )IA t 1 [A] = t 2 [A]. objects of the context are couples of tuples and each object intent is the agree set of this couple the implications in this lattice corresponds to functional dependencies in XML. Example C Orders,./OrderID,./CustomerID C Orders,./Orders@key,./CustomerID C Orders,./OrderDetail/OrderID,./CustomerID

58 Reading the Concept Lattice in the lattice we list only the attributes, these are relevant for our analysis; let there be a concept, labeled by A, B and a second concept labeled by C, where A, B and C are FCA attributes; let concept labeled by A, B be the subconcept of concept labeled by C; tuple pairs of concept labeled by A, B have the same values for attributes A, B, but for attribute C too. tuple pairs of concept labeled by C do not have the same values for attribute A, nor for B, but have the same value for attribute C. tuple pairs of every subconcept of concept labeled by A, B have the same values for attributes A, B. the labeling of the lattice is simplified by putting each attribute only once, at the highest level.

59 Reading the Concept Lattice we analyze attributes A and B: if we have only A B, then A would be a subconcept of B; if only B A holds then B should be a subconcept of A; we have A B and B A, that s why they come side by side in the lattice. So attributes from a concept imply each other. Example We have the next XML FDs: C Orders,./OrderID,./OrderDetails/OrderID C Orders,./OrderID,./Orders@key C Orders,./Orders@key,./OrderID C Orders,./Orders@key,./OrderDetails/OrderID C Orders,./OrderDetails/OrderID,./Orders@key C Orders,./OrderDetails/OrderID,./OrderID

60 The functional dependencies found by software FCAMineXFD Figure: Functional dependencies in tuple class C Orders

61 The concept lattice for the whole XML document

62 Data Analysis we can see the hierarchy of the analyzed data: the node labeled by Customers/Country is on a higher level, than node labeled by Customers/City; the Customer s node with every attribute is a subconcept of node labeled Customers/City; in our XML data, every customer has different name, address, phone number, so these attributes appear in one concept node and imply each other; the Orders node in XML is child of Customers, in the lattice, the node labeled with the key of Orders node, is subconcept of Customers node, so the hierarchy is visible; these are 1:n relationships, from Country to City, from City to Customers, from Customers to Orders. information about products is on the other side of the lattice; Products are in n:m relationship with Customers, linked by OrderDetail node in this case.

63 FDs for the whole XML document

64 Finding XML keys FDs with RHS values can be used to detect the keys in XML. In tuple class C Orders we have XML FD: C Orders,./OrderID,./@key, which implies that COrders,./OrderID is an XML key. C Orders,./OrderDetails/OrderID,./@key, so COrders,./OrderDetails/OrderID is an XML key too. In tuple class C Customers software found XML FD: C Customers,./CustomerID,./@key, which implies that C Customers,./CustomerID is an XML key. other detected XML keys are: CCustomers,./Orders/CustomerID ; CCustomers,./CompanyName ; C Customers,./Address ; C Customers,./Phone.

65 Detecting XML data redundancy having the set of functional dependencies for XML data in a tuple class, we can detect interesting functional dependencies. in essential tuple class C Orders an interesting FD: C Orders,./OrderDetails/ProductID,./OrderDetails/ProductName but C Orders,./OrderDetails/ProductID is not an XML key So it is a data redundancy. the same reason applies for XML FD C Orders,./OrderDetails/ProductName,./OrderDetails/ProductID. the other XML FD s have as LHS a key for tuple class C Orders.

66 Conclusions This paper introduces an approach for mining functional dependencies in XML documents based on FCA. Based on the flat representation of XML, we constructed the concept lattice. We analyzed the resulted concepts, which allowed us to discover a number of interesting dependencies. Our framework offers an graphical visualization for dependency exploration.

67 Future Work given the set of dependencies discovered by our tool: propose a normalization algorithm for converting any XML schema into a correct one

Detecting XML Functional Dependencies through Formal Concept Analysis

Detecting XML Functional Dependencies through Formal Concept Analysis Detecting XML Functional Dependencies through Formal Concept Analysis Katalin Tunde Janosi-Rancz 1, Viorica Varga 2, and Timea Nagy 2 1 Hungarian University of Transylvania, Tirgu Mures, Romania tsuto@ms.sapientia.ro

More information

Efficient Discovery of XML Data Redundancies

Efficient Discovery of XML Data Redundancies Efficient Discovery of XML Data Redundancies Cong Yu Department of EECS University of Michigan congy@eecs.umich.edu H. V. Jagadish Department of EECS University of Michigan jag@eecs.umich.edu ABSTRACT

More information

A METHOD FOR MINING FUNCTIONAL DEPENDENCIES IN RELATIONAL DATABASE DESIGN USING FCA

A METHOD FOR MINING FUNCTIONAL DEPENDENCIES IN RELATIONAL DATABASE DESIGN USING FCA STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIII, Number 1, 2008 A METHOD FOR MINING FUNCTIONAL DEPENDENCIES IN RELATIONAL DATABASE DESIGN USING FCA KATALIN TUNDE JANOSI RANCZ AND VIORICA VARGA Abstract.

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Trees Sidra Malik sidra.malik@ciitlahore.edu.pk Tree? In computer science, a tree is an abstract model of a hierarchical structure A tree is a finite set of one or more nodes

More information

Foundations of Computer Science Spring Mathematical Preliminaries

Foundations of Computer Science Spring Mathematical Preliminaries Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms Schema Refinement: Dependencies and Normal Forms Grant Weddell Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2016 CS 348 (Intro to DB Mgmt)

More information

Approximate Functional Dependencies for XML Data

Approximate Functional Dependencies for XML Data Approximate Functional Dependencies for XML Data Fabio Fassetti and Bettina Fazzinga DEIS - Università della Calabria Via P. Bucci, 41C 87036 Rende (CS), Italy {ffassetti,bfazzinga}@deis.unical.it Abstract.

More information

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms Schema Refinement: Dependencies and Normal Forms M. Tamer Özsu David R. Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Fall 2012 CS 348 Schema Refinement

More information

Schema Refinement: Dependencies and Normal Forms

Schema Refinement: Dependencies and Normal Forms Schema Refinement: Dependencies and Normal Forms Grant Weddell David R. Cheriton School of Computer Science University of Waterloo CS 348 Introduction to Database Management Spring 2012 CS 348 (Intro to

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

TREES. Trees - Introduction

TREES. Trees - Introduction TREES Chapter 6 Trees - Introduction All previous data organizations we've studied are linear each element can have only one predecessor and successor Accessing all elements in a linear sequence is O(n)

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices. Trees Trees form the most widely used subclasses of graphs. In CS, we make extensive use of trees. Trees are useful in organizing and relating data in databases, file systems and other applications. Formal

More information

Ryan Marcotte CS 475 (Advanced Topics in Databases) March 14, 2011

Ryan Marcotte  CS 475 (Advanced Topics in Databases) March 14, 2011 Ryan Marcotte www.cs.uregina.ca/~marcottr CS 475 (Advanced Topics in Databases) March 14, 2011 Outline Introduction to XNF and motivation for its creation Analysis of XNF s link to BCNF Algorithm for converting

More information

CSI33 Data Structures

CSI33 Data Structures Outline Department of Mathematics and Computer Science Bronx Community College November 13, 2017 Outline Outline 1 C++ Supplement.1: Trees Outline C++ Supplement.1: Trees 1 C++ Supplement.1: Trees Uses

More information

7.1 Introduction. A (free) tree T is A simple graph such that for every pair of vertices v and w there is a unique path from v to w

7.1 Introduction. A (free) tree T is A simple graph such that for every pair of vertices v and w there is a unique path from v to w Chapter 7 Trees 7.1 Introduction A (free) tree T is A simple graph such that for every pair of vertices v and w there is a unique path from v to w Tree Terminology Parent Ancestor Child Descendant Siblings

More information

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova

An Effective and Efficient Approach for Keyword-Based XML Retrieval. Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova An Effective and Efficient Approach for Keyword-Based XML Retrieval Xiaoguang Li, Jian Gong, Daling Wang, and Ge Yu retold by Daryna Bronnykova Search on XML documents 2 Why not use google? Why are traditional

More information

Generic Schema Matching with Cupid

Generic Schema Matching with Cupid Generic Schema Matching with Cupid J. Madhavan, P. Bernstein, E. Rahm Overview Schema matching problem Cupid approach Experimental results Conclusion presentation by Viorica Botea The Schema Matching Problem

More information

FCA-based Search for Duplicate objects in Ontologies

FCA-based Search for Duplicate objects in Ontologies FCA-based Search for Duplicate objects in Ontologies Dmitry A. Ilvovsky and Mikhail A. Klimushkin National Research University Higher School of Economics, School of Applied Mathematics and Informational

More information

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010 Uses for About Binary January 31, 2010 Uses for About Binary Uses for Uses for About Basic Idea Implementing Binary Example: Expression Binary Search Uses for Uses for About Binary Uses for Storage Binary

More information

March 20/2003 Jayakanth Srinivasan,

March 20/2003 Jayakanth Srinivasan, Definition : A simple graph G = (V, E) consists of V, a nonempty set of vertices, and E, a set of unordered pairs of distinct elements of V called edges. Definition : In a multigraph G = (V, E) two or

More information

Definition of Graphs and Trees. Representation of Trees.

Definition of Graphs and Trees. Representation of Trees. Definition of Graphs and Trees. Representation of Trees. Chapter 6 Definition of graphs (I) A directed graph or digraph is a pair G = (V,E) s.t.: V is a finite set called the set of vertices of G. E V

More information

Introduction to Computers and Programming. Concept Question

Introduction to Computers and Programming. Concept Question Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 7 April 2 2004 Concept Question G1(V1,E1) A graph G(V, where E) is V1 a finite = {}, nonempty E1 = {} set of G2(V2,E2) vertices and

More information

Trees. Tree Structure Binary Tree Tree Traversals

Trees. Tree Structure Binary Tree Tree Traversals Trees Tree Structure Binary Tree Tree Traversals The Tree Structure Consists of nodes and edges that organize data in a hierarchical fashion. nodes store the data elements. edges connect the nodes. The

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

Algorithmic Aspects of Communication Networks

Algorithmic Aspects of Communication Networks Algorithmic Aspects of Communication Networks Chapter 5 Network Resilience Algorithmic Aspects of ComNets (WS 16/17): 05 Network Resilience 1 Introduction and Motivation Network resilience denotes the

More information

CSE 214 Computer Science II Introduction to Tree

CSE 214 Computer Science II Introduction to Tree CSE 214 Computer Science II Introduction to Tree Fall 2017 Stony Brook University Instructor: Shebuti Rayana shebuti.rayana@stonybrook.edu http://www3.cs.stonybrook.edu/~cse214/sec02/ Tree Tree is a non-linear

More information

UNIT 3 DATABASE DESIGN

UNIT 3 DATABASE DESIGN UNIT 3 DATABASE DESIGN Objective To study design guidelines for relational databases. To know about Functional dependencies. To have an understanding on First, Second, Third Normal forms To study about

More information

XML Data Exchange. Marcelo Arenas P. Universidad Católica de Chile. Joint work with Leonid Libkin (U. of Toronto)

XML Data Exchange. Marcelo Arenas P. Universidad Católica de Chile. Joint work with Leonid Libkin (U. of Toronto) XML Data Exchange Marcelo Arenas P. Universidad Católica de Chile Joint work with Leonid Libkin (U. of Toronto) Data Exchange in Relational Databases Data exchange has been extensively studied in the relational

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits.

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits. Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: o There is a unique simple path between any 2 of its vertices. o No loops. o No multiple edges. Example

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Removing XML Data Redundancies Using Functional and Equality-Generating Dependencies

Removing XML Data Redundancies Using Functional and Equality-Generating Dependencies Removing XML Data Redundancies Using Functional and Equality-Generating Dependencies Junhu Wang 1 Rodney Topor 2 1 INT, Griffith University, Gold Coast, Australia J.Wang@griffith.edu.au 2 CIT, Griffith

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

Trees. Eric McCreath

Trees. Eric McCreath Trees Eric McCreath 2 Overview In this lecture we will explore: general trees, binary trees, binary search trees, and AVL and B-Trees. 3 Trees Trees are recursive data structures. They are useful for:

More information

Trees 11/15/16. Chapter 11. Terminology. Terminology. Terminology. Terminology. Terminology

Trees 11/15/16. Chapter 11. Terminology. Terminology. Terminology. Terminology. Terminology Chapter 11 Trees Definition of a general tree A general tree T is a set of one or more nodes such that T is partitioned into disjoint subsets: A single node r, the root Sets that are general trees, called

More information

CS 338 Functional Dependencies

CS 338 Functional Dependencies CS 338 Functional Dependencies Bojana Bislimovska Winter 2016 Outline Design Guidelines for Relation Schemas Functional Dependency Set and Attribute Closure Schema Decomposition Boyce-Codd Normal Form

More information

Trees. Truong Tuan Anh CSE-HCMUT

Trees. Truong Tuan Anh CSE-HCMUT Trees Truong Tuan Anh CSE-HCMUT Outline Basic concepts Trees Trees A tree consists of a finite set of elements, called nodes, and a finite set of directed lines, called branches, that connect the nodes

More information

Data Structure - Binary Tree 1 -

Data Structure - Binary Tree 1 - Data Structure - Binary Tree 1 - Hanyang University Jong-Il Park Basic Tree Concepts Logical structures Chap. 2~4 Chap. 5 Chap. 6 Linear list Tree Graph Linear structures Non-linear structures Linear Lists

More information

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam CS 441 Discrete Mathematics for CS Lecture 26 Graphs Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Final exam Saturday, April 26, 2014 at 10:00-11:50am The same classroom as lectures The exam

More information

Tree. A path is a connected sequence of edges. A tree topology is acyclic there is no loop.

Tree. A path is a connected sequence of edges. A tree topology is acyclic there is no loop. Tree A tree consists of a set of nodes and a set of edges connecting pairs of nodes. A tree has the property that there is exactly one path (no more, no less) between any pair of nodes. A path is a connected

More information

Informal Design Guidelines for Relational Databases

Informal Design Guidelines for Relational Databases Outline Informal Design Guidelines for Relational Databases Semantics of the Relation Attributes Redundant Information in Tuples and Update Anomalies Null Values in Tuples Spurious Tuples Functional Dependencies

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

CS350: Data Structures Red-Black Trees

CS350: Data Structures Red-Black Trees Red-Black Trees James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Red-Black Tree An alternative to AVL trees Insertion can be done in a bottom-up or

More information

Improving web search with FCA

Improving web search with FCA Improving web search with FCA Radim BELOHLAVEK Jan OUTRATA Dept. Systems Science and Industrial Engineering Watson School of Engineering and Applied Science Binghamton University SUNY, NY, USA Dept. Computer

More information

Programming II (CS300)

Programming II (CS300) 1 Programming II (CS300) Chapter 11: Binary Search Trees MOUNA KACEM mouna@cs.wisc.edu Fall 2018 General Overview of Data Structures 2 Introduction to trees 3 Tree: Important non-linear data structure

More information

Lecture 32. No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions?

Lecture 32. No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions? Lecture 32 No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions? Friday, April 1 CS 215 Fundamentals of Programming II - Lecture 32 1 Outline Introduction

More information

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents

Section 5.5. Left subtree The left subtree of a vertex V on a binary tree is the graph formed by the left child L of V, the descendents Section 5.5 Binary Tree A binary tree is a rooted tree in which each vertex has at most two children and each child is designated as being a left child or a right child. Thus, in a binary tree, each vertex

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. In linked list the elements are necessarily to be contiguous In linked list the elements may locate at far positions

More information

CSE 230 Intermediate Programming in C and C++ Binary Tree

CSE 230 Intermediate Programming in C and C++ Binary Tree CSE 230 Intermediate Programming in C and C++ Binary Tree Fall 2017 Stony Brook University Instructor: Shebuti Rayana shebuti.rayana@stonybrook.edu Introduction to Tree Tree is a non-linear data structure

More information

Conceptual modeling of entities and relationships using Alloy

Conceptual modeling of entities and relationships using Alloy Conceptual modeling of entities and relationships using Alloy K. V. Raghavan Indian Institute of Science, Bangalore Conceptual modeling What is it? Capture requirements, other essential aspects of software

More information

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document

BayesTH-MCRDR Algorithm for Automatic Classification of Web Document BayesTH-MCRDR Algorithm for Automatic Classification of Web Document Woo-Chul Cho and Debbie Richards Department of Computing, Macquarie University, Sydney, NSW 2109, Australia {wccho, richards}@ics.mq.edu.au

More information

EE 368. Weeks 5 (Notes)

EE 368. Weeks 5 (Notes) EE 368 Weeks 5 (Notes) 1 Chapter 5: Trees Skip pages 273-281, Section 5.6 - If A is the root of a tree and B is the root of a subtree of that tree, then A is B s parent (or father or mother) and B is A

More information

Gestão e Tratamento da Informação

Gestão e Tratamento da Informação Gestão e Tratamento da Informação Web Data Extraction: Automatic Wrapper Generation Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2010/2011 Outline Automatic Wrapper Generation

More information

birds fly S NP N VP V Graphs and trees

birds fly S NP N VP V Graphs and trees birds fly S NP VP N birds V fly S NP NP N VP V VP S NP VP birds a fly b ab = string S A B a ab b S A B A a B b S NP VP birds a fly b ab = string Grammar 1: Grammar 2: A a A a A a B A B a B b A B A b Grammar

More information

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.)

Chapter 10. Normalization. Chapter Outline. Chapter Outline(contd.) Chapter 10 Normalization Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update Anomalies 1.3 Null

More information

Tree Structures. A hierarchical data structure whose point of entry is the root node

Tree Structures. A hierarchical data structure whose point of entry is the root node Binary Trees 1 Tree Structures A tree is A hierarchical data structure whose point of entry is the root node This structure can be partitioned into disjoint subsets These subsets are themselves trees and

More information

(2,4) Trees. 2/22/2006 (2,4) Trees 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary

More information

Database Design Principles

Database Design Principles Database Design Principles CPS352: Database Systems Simon Miner Gordon College Last Revised: 2/11/15 Agenda Check-in Design Project ERD Presentations Database Design Principles Decomposition Functional

More information

Garbage Collection: recycling unused memory

Garbage Collection: recycling unused memory Outline backtracking garbage collection trees binary search trees tree traversal binary search tree algorithms: add, remove, traverse binary node class 1 Backtracking finding a path through a maze is an

More information

Distributed Database System. Project. Query Evaluation and Web Recognition in Document Databases

Distributed Database System. Project. Query Evaluation and Web Recognition in Document Databases 74.783 Distributed Database System Project Query Evaluation and Web Recognition in Document Databases Instructor: Dr. Yangjun Chen Student: Kang Shi (6776229) August 1, 2003 1 Abstract A web and document

More information

Efficient Access to Non-Sequential Elements of a Search Tree

Efficient Access to Non-Sequential Elements of a Search Tree Efficient Access to Non-Sequential Elements of a Search Tree Lubomir Stanchev Computer Science Department Indiana University - Purdue University Fort Wayne Fort Wayne, IN, USA stanchel@ipfw.edu Abstract

More information

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language)

Announcements (January 20) Relational Database Design. Database (schema) design. Entity-relationship (E/R) model. ODL (Object Definition Language) Announcements (January 20) 2 Relational Database Design Review for Codd paper due tonight via email Follow instructions on course Web site Reading assignment for next week (Ailamaki et al., VLDB 2001)

More information

Binary Trees, Binary Search Trees

Binary Trees, Binary Search Trees Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)

More information

Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques. Fundamentals, Design, and Implementation, 9/e

Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques. Fundamentals, Design, and Implementation, 9/e Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques Fundamentals, Design, and Implementation, 9/e Three Schema Model ANSI/SPARC introduced the three schema model in 1975 It provides a framework

More information

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems

Functional Dependencies and Normalization for Relational Databases Design & Analysis of Database Systems Functional Dependencies and Normalization for Relational Databases 406.426 Design & Analysis of Database Systems Jonghun Park jonghun@snu.ac.kr Dept. of Industrial Engineering Seoul National University

More information

Data Abstractions. National Chiao Tung University Chun-Jen Tsai 05/23/2012

Data Abstractions. National Chiao Tung University Chun-Jen Tsai 05/23/2012 Data Abstractions National Chiao Tung University Chun-Jen Tsai 05/23/2012 Concept of Data Structures How do we store some conceptual structure in a linear memory? For example, an organization chart: 2/32

More information

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree BUSINESS INTELLIGENCE SSAS - SQL Server Analysis Services Business Informatics Degree 2 BI Architecture SSAS: SQL Server Analysis Services 3 It is both an OLAP Server and a Data Mining Server Distinct

More information

Trees, Part 1: Unbalanced Trees

Trees, Part 1: Unbalanced Trees Trees, Part 1: Unbalanced Trees The first part of this chapter takes a look at trees in general and unbalanced binary trees. The second part looks at various schemes to balance trees and/or make them more

More information

V. Database Design CS448/ How to obtain a good relational database schema

V. Database Design CS448/ How to obtain a good relational database schema V. How to obtain a good relational database schema Deriving new relational schema from ER-diagrams Normal forms: use of constraints in evaluating existing relational schema CS448/648 1 Translating an E-R

More information

A Structural Numbering Scheme for XML Data

A Structural Numbering Scheme for XML Data A Structural Numbering Scheme for XML Data Alfred M. Martin WS2002/2003 February/March 2003 Based on workout made during the EDBT 2002 Workshops Dao Dinh Khal, Masatoshi Yoshikawa, and Shunsuke Uemura

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2 Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-2 Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Hamilton paths & circuits. Gray codes. Hamilton Circuits. Planar Graphs. Hamilton circuits. 10 Nov 2015

Hamilton paths & circuits. Gray codes. Hamilton Circuits. Planar Graphs. Hamilton circuits. 10 Nov 2015 Hamilton paths & circuits Def. A path in a multigraph is a Hamilton path if it visits each vertex exactly once. Def. A circuit that is a Hamilton path is called a Hamilton circuit. Hamilton circuits Constructing

More information

CSI33 Data Structures

CSI33 Data Structures Outline Department of Mathematics and Computer Science Bronx Community College October 19, 2016 Outline Outline 1 Chapter 7: Trees Outline Chapter 7: Trees 1 Chapter 7: Trees Uses Of Trees Chapter 7: Trees

More information

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang) Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang) 1 Tree 2 A Tree Structure A tree structure means that the data are organized so that items of information are related by branches 3 Definition

More information

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017 12. Graphs and Trees 2 Aaron Tan 6 10 November 2017 1 10.5 Trees 2 Definition Definition Definition: Tree A graph is said to be circuit-free if, and only if, it has no circuits. A graph is called a tree

More information

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases

Chapter 10. Chapter Outline. Chapter Outline. Functional Dependencies and Normalization for Relational Databases Chapter 10 Functional Dependencies and Normalization for Relational Databases Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant

More information

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW XML and Databases Lecture 10 XPath Evaluation using RDBMS Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. Recall pre / post encoding 2. XPath with //, ancestor, @, and text() 3.

More information

Tree. Virendra Singh Indian Institute of Science Bangalore Lecture 11. Courtesy: Prof. Sartaj Sahni. Sep 3,2010

Tree. Virendra Singh Indian Institute of Science Bangalore Lecture 11. Courtesy: Prof. Sartaj Sahni. Sep 3,2010 SE-286: Data Structures t and Programming Tree Virendra Singh Indian Institute of Science Bangalore Lecture 11 Courtesy: Prof. Sartaj Sahni 1 Trees Nature Lover sviewofatree leaves branches root 3 Computer

More information

Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

Range CUBE: Efficient Cube Computation by Exploiting Data Correlation Range CUBE: Efficient Cube Computation by Exploiting Data Correlation Ying Feng Divyakant Agrawal Amr El Abbadi Ahmed Metwally Department of Computer Science University of California, Santa Barbara Email:

More information

Data mining, 4 cu Lecture 6:

Data mining, 4 cu Lecture 6: 582364 Data mining, 4 cu Lecture 6: Quantitative association rules Multi-level association rules Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Data mining, Spring 2010 (Slides adapted

More information

Course on Database Design Carlo Batini University of Milano Bicocca Part 5 Logical Design

Course on Database Design Carlo Batini University of Milano Bicocca Part 5 Logical Design Course on Database Design Carlo atini University of Milano icocca Part 5 Logical Design 1 Carlo atini, 2015 This work is licensed under the Creative Commons Attribution NonCommercial NoDerivatives 4.0

More information

Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University

Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University U Kang (2016) 1 In This Lecture The concept of binary tree, its terms, and its operations Full binary tree theorem Idea

More information

Bastian Wormuth. Version About this Manual

Bastian Wormuth. Version About this Manual Elba User Manual Table of Contents Bastian Wormuth Version 0.1 1 About this Manual...1 2 Overview...2 3 Starting Elba...3 4 Establishing the database connection... 3 5 Elba's Main Window... 5 6 Creating

More information

Database Management System 15

Database Management System 15 Database Management System 15 Trivial and Non-Trivial Canonical /Minimal School of Computer Engineering, KIIT University 15.1 First characterize fully the data requirements of the prospective database

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

Trees. (Trees) Data Structures and Programming Spring / 28

Trees. (Trees) Data Structures and Programming Spring / 28 Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r

More information

Data Structures and Algorithms (DSA) Course 9 Lists / Graphs / Trees. Iulian Năstac

Data Structures and Algorithms (DSA) Course 9 Lists / Graphs / Trees. Iulian Năstac Data Structures and Algorithms (DSA) Course 9 Lists / Graphs / Trees Iulian Năstac Recapitulation It is considered the following type: typedef struct nod { ; struct nod *next; } NOD; 2 Circular

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information