Learning n-ary tree-pattern queries for web information extraction

Size: px
Start display at page:

Download "Learning n-ary tree-pattern queries for web information extraction"

Transcription

1 Learning n-ary tree-pattern queries for web information extraction Benjamin Habegger Grappa / INRIA Mostrare Project Université Charles de Gaulle Lille 3 BP 60149, Villeneuve d Ascq Cedex, FRANCE habegger@grappa.univ-lille3.fr July 28, 2006 Abstract The problem of extracting information from the Web consists in building patterns allowing to extract specific information from documents of a given Web source. Up to now, most existing techniques use string-based representations of documents as well as string-based patterns. Using tree representations naturally allows to overcome limitations of string-based approaches. While some tree-based approaches exist, they are either limited to learning unary queries or build n-ary queries by composing unary queries. In this paper we study using tree-patterns as an n-ary extraction language and propose an algorithm capable of learning such queries. The learning algorithm we propose calculates the most informationconservative tree-pattern which is a generalization of two input trees. Tree-patterns have the double advantage of both allowing to explicitly work with the tree structure of the HTML/XML documents and allow to express n-ary queries. As our experiments will show, tree patterns can express many extraction tasks. They also have the advantage of being closely related to the now standard XPath language and therefore easily understandable by human experts. 1 Introduction The main motivation to web information extraction is to provide a structured access to the data made available on the web. Once extracted and restructured the extracted data can be used by any other data-based application such as mediators, online agents, etc. The need for information extraction techniques allowing to automate pattern generation comes from the fact that the web is in constant change and that hand-crafting patterns is known to be a tedious task. There has been much research on how to efficiently build programs (called wrappers) capable of automatically extracting data from web site (eg. [8, 12, 3, 14, 18, 9]). Most existing techniques [11, 13, 8] use a string representation for both documents and the learned patterns. In the context of n-ary extraction, these methods have only proven 1

2 to be efficient in extracting data which has both a tabular format and where the values to be extracted are relatively close to each other (ie. few tokens are found between them). Much data on the web does not however follow such a tabular format : the values of some attributes may be shared among the extracted tuples, unwanted data may appear between values to be extracted, etc. Using string-based representations makes it difficult to learn patterns in such cases. Figure 1, gives an example of where some data (the city) is shared among the relation to be extracted (see figure 2). Even though this is a artificial simple case, most string-based methods cannot cope with such data. The natural tree structure of HTML documents can be used in cases where stringbased techniques fail. Figure 4 gives the tree representation of the document of figure 1. The tree pattern given in figure 4 allows to correctly extract the desired relation. To our knowledge, there only exists few works using machine learning for information extraction which explicitly use a tree based document representation [3, 12]. In both cases, a tree transducers is learned from a given set of example documents which only extract single nodes. [5] use an interesting attribute/value encoding of tree nodes. The encoding however only uses the local context of the nodes to be extracted. Tree patterns allow to directly and naturally express n-ary queries. Furthermore they are closely related to widely used semi-structured query languages such as XPath [23]. Also tree patterns are more easily readable by human users than many other formalisms such as tree transducers. For these different reasons, it is interesting to develop techniques capable of learning tree patterns. There are multiple ways to define tree patterns, depending both on the relations they are allowed to express and how they are to be matched by a tree. These different settings have important consequences on the efficiency of both matching and learning, and on the expressivity of the learned patterns. In this paper we propose a weightbased algorithm capable of coping with different settings and generating a tree pattern of maximal weight in different settings. We particularly study ordered and unordered injective tree patterns in which child and descendant relationships can be expressed. Our algorithm is capable of generating n-ary extraction patterns in such cases. The contributions of this paper are the following : We introduce a flexible notion of weighted patterns based on a the relational representation of trees and patterns We propose a generalization algorithm allowing to build n-ary information extraction patterns which can easily be adapted to different settings (different relations and embedding definitions) We study learning tree patterns with our algorithm in different settings and evaluated it on different datasets in these settings This paper is organized as follows. Section 2 presents the general problem of information extraction followed by section 3 which presents related work in information extraction and in tree pattern mining. Section 4 presents the necessary background allowing to define tree pattern generalization. The principles of our generalization are given in section 5 followed by the details of our algorithm in section 6. The results of 2

3 Figure 1: The People/Location document its evaluation are presented in section 7. Finally we conclude and present future work in section 8. 2 Web Information Extraction The objective of web information extraction is to extract from a set of similarly formatted documents a target relation defined by the user. A program capable of proceding to such an extraction is called a wrapper. The People/location document of figure 1 gives an example document containing a relation between people and the cities they live in. A wrapper for such a source (in this case consisting of only one document) would be given the HTML code of this document (figure 3) and would return the set of tuples of figure 2. This task is not evident since the HTML code only describes how the data is to be presented in a navigator and not how it is related together. Definition 1 Given a set of documents D in a similar format and a target relation R, a wrapper is a program W such that W(D) = R. Building a wrapper manually requires knowing either a programmaing or querying language and, most importantly, also requires an expertise to determine regularities in a given document set can be exploited to build an efficient wrapper. In order to enable non-expert users to build wrappers, machine learning is an interesting solution. 3

4 City Name Nantes 7 Fabrice 9 Nantes 7 Nordine 11 Nantes 7 Alex 13 Grenoble 16 Gwen 18 Grenoble 16 Bertrand 20 Lille 23 Claire 25 Figure 2: The (city, name) relation to be extracted In such a case, the user simply selects (usually be the means of a graphical user interface) examples of the items he wishes to extract. In the case of n-ary extraction, he also specifies to which tuple and which attribute the value belongs. The set of selections associated to a given document is called a labeling. This labeling can be applied to the document it referes to in order to extract the selected tuples. When the document is represented as a tree, we will consider that the labeled elements are nodes of the tree. In the following, for a given document d and a given labeling L, we denote by L(d) the set of node tuples the labeling produces. It should be noted that a labeling does not necessarily specify all the elements to be extracted. Definition 2 (Wrapper learning problem) Given a set of labeled documents build the most specific pattern such that L(D) Q p (D). Where L(D) denotes the set of labeled tuples in D and Q p (D) denotes the result of applying pattern p to document D. Definition 2 is voluntarily general. First, it only takes into account positive examples (ie. the data to be extracted). This is a recurrent problem in information extraction since there is no consensual definition of what a negative example is. Second, specificity can be defined in different ways depending on the pattern language. A more precise definition of the type of patterns we are learning will be given further. One of the major problems in learning wrappers is that we are faced to multiple contradictory requirements. (1) The number of interactions between the user and the system during the learning process must be reduced to a minimum (ideally fewer than 10 labelings). (2) The time taken to build an extraction pattern must be low. (3) The constructed pattern must be reseasonably concise. (4) The pattern language must be able to cope with multiple types of regularites. These different aspects should therefore be taken into account when evaluating a wrapper learning method. 3 Related Work There has been much research on wrapper learning in the past years and many systems [14, 11, 8, 12, 7, 15, 5] have been proposed. Few systems allow for n-ary extraction in a direct manner. To our knowledge, only IERel [8] and WIEN [13] are capable of directly learning n-ary extraction patterns. However, both are string-based and rely on the strong hypothesis that the data to be extracted is tabular (ie. the tuples to be 4

5 <html> <head> <link rel="stylesheet" type="text/css" href="style.css" /> </head> <body> <h1>people list</h1> <table> <tr><th>nantes</th></tr> <tr><td>fabrice</td></tr> <tr><td>nordine</td></tr> <tr><td>alex</td></tr> </table> <table> <tr><th>grenoble</th></tr> <tr><td>gwen</td></tr> <tr><td>bertrand</td></tr> </table> <table> <tr><th>lille</th></tr> <tr><td>claire</td></tr> </table> </body> </html> Figure 3: Source of the People/Location document extracted do not overlap). WIEN is the first information extraction system developed and is known to have very limited expressivity and requires many examples. IERel on the other hand requires only very few example instances to learn efficient patterns for tabular data. Some systems allow to learn n-ary queries by composing unary queries. For example, this technique is used many systems such as Stalker [17]. This supposes that intermediate nodes or surrounding text have either to be tagged explicitly by the user or be discovered by the system. Stalker also requires knowing how the data is organized in the page. While, Lixto [2] is a visually based wrapper building systems which doesn t use machine learning, it allows to do n-ary extraction by composing monadic queries. In a similar manner as Stalker, intermediate nodes need to be selected by the user. The system PAF [5] transforms the n-ary wrapper learning problem into a classification problem. They use an attribute/value representation of the nodes of the tree representation of documents. A classifier is built for each component of the tuples. The classifier extracting nodes for the i th component has access to information related to the (i 1) st components. However, in the case of the PAF system, only the local context of the nodes to be extracted is taken into account. XPath [23] is a widely used query language for semi-structured data. In some simple cases, a wrapper can be simply expressed by a set of XPath expressions relative 5

6 to each other. To extract the (city, name) relation from the document of figure 1 whose source is given in figure 3, each group of instances can be determined by the XPath expression /html/body/table. Given such a table node it is easy to see that the expression tr/th allows to extract the city attribute of the group of instances. Finally each match with the XPath expression tr/td extracts the name attribute. The XPath query for which there exists a match (one match being one possible embedding) for each tuple extraction is the following query using branching : /html/body/table[tr/th][tr/td] It should be noted that evaluation algorithms which are linear in query and tree size exist for the core parts of XPath [6, 21]. Our work is also closely related to tree mining. Tree mining consists in finding frequently occurring patterns in a set of trees. Tree mining is mostly used for finding frequent queries or XML document classification. There are multitudes of research papers on the topic of tree-pattern mining. For example, the TreeFinder system [22] finds frequent unordered trees based on the notion of tree subsumption, a definition of inclusion which preserves ancestorship. It does not allow for child edges or abstracting labels. Another interesting tree mining approach is suggested by [1]. They propose an incremental frequent ordered tree mining algorithm based on rightmost branch extension. Learning extraction patterns differs from tree mining mostly in that we are required to learn patterns containing extraction variables. We also consider learning multiple relations between nodes and allowing to abstract some node labels and do not reduce to only relations of one type (usually either descendant or child relationships) as most tree mining techniques do. In [4] a tree query aggregation algorithm is proposed. Given a set of tree patterns, their system builds a new pattern more general than the original patterns which can be used as a replacement of the original pattern. They only consider an unordered and non-injective definition of pattern matching. Many theoretical results on tree subsumption, tree pattern containment, and tree pattern evaluation should be taken into account when considering learning tree patterns. In [10] complexity results for different tree inclusion problems are reported. They shown that unordered injective tree inclusion (preserving labeling and ancestorship) is NP-complete while the ordered version is PTIME. Recently, [16] showed that the (noninjective) containment problem of the XPath fragment allowing child and descendant relationships, label abstraction and branching together is CoNP-complete. Faced to these results and the previously stated requirements for information extraction, it is necessary to adapt extact methods in order to be efficient. The weight-based approch we propose and the tree cutting heuristics we use allow to overcome these limits and still learn efficient patterns. 4 Background We model XML and HTML documents as unranked ordered trees, having nodes labeled with symbols from an alphabet Σ. In this paper, we will consider both unordered 6

7 html 0 head 1 body 3 link 2 rel= stylesheet type= text/css href= style.css h1 4 table 5 table 14 table 21 People list tr 6 tr 8 tr 10 tr 12 tr 15 tr 17 tr 19 tr 22 tr 24 th 7 td 9 td 11 td 13 th 16 td 18 td 20 th 23 td 25 Nantes Fabrice Nordine Alex Grenoble Gwen Bertrand Lille Claire Figure 4: The tree representation of the People/Location document <people> { for $x in //table let $city = $x/th in return { for $name in $x/td return <entry> <name>$name</name> <city>$city</city> </entry> } } </people> Figure 5: XQuery allowing to extraction from the People/Location document and ordered trees. When considering unordered tree, we simply ignore the ordering of the trees. XPath is a widely used standard and therefore is an interesting target language for information extraction. However, XPath is monadic (it cannot express n-ary extraction) and therefore cannot express the target extraction. Tree patterns are a simple extension of XPath which allows to attach a variable to any node of the XPath s tree representation. This, gives us the tree pattern of figure 4 which allows to correctly extract the desired couples. Formally, tree patterns can be defined as follows : Definition 3 (Tree Pattern [16]) A tree pattern p is an unranked tree over alphabet Σ with a distinguished subset of edges called descendant edges, and a k-tuple of nodes called result tuple, for some k 0. 7

8 html 0 body 1 table 2 tr 3 tr 5 th 4 city td 6 name Figure 6: Target pattern Tree patterns are a superset of a fragment (noted XP {,//,[]} ) of the widely-used XPath query language. This fragment allows label wildcard ( ), descendant expressions (//) and branching expression ([]). The main difference between tree patterns and expressions of XP {,//,[]} is that tree patterns also allow to attach variables to nodes, thus allowing to simultaneously extract multiple values with a single expression. In the context of information extraction of n-ary data, this aspect comes in very handy. In the following, for a tree or pattern t we will denote by NODES(t) the set of nodes in t, for any given node n, LABEL(n) denotes the label of n, PARENT(n) denotes the parent of n, CHILDREN(n) denotes the subset of N ODES(t) which are the child nodes of n (in the case of a pattern, independently of the type of edge). Furthermore, we will denote by < the ordering obtained by walking thru the nodes of t in a depth-first manner. For any two nodes n and m, n m denotes that n is a strict ancestor of m. Also, when t is a tree pattern CEDGES(t) and DEGEDES(t) respectively denote the set of child edges (c-edges) of t and the set of descendant edges (d-edges) of t. Finally, for a pattern p, V ARIABLES(p) denotes the set of variables contained in p, and for a node n, V AR(n) is the name of the attached variable, if any. Determining when a tree matches a pattern can be defined through the notion of embedding. Informally, an embedding is a function which maps each node of the pattern into the nodes of the matching tree in such a way that all the properties described by the patterns are conserved. The strict minimum is that the relations between the nodes are conserved. The following definition, defines this notion of embedding. Definition 4 (Embedding) An function e from a pattern p to a tree t is an unordered embedding iff it respects the following requirements : (1) e is root preserving (ie. ROOT(p) = ROOT(e(p))) 8

9 (2) for all n NODES(p) : (3) for each n, m NODES(p) : LABEL(n) = or LABEL(n) = LABEL(e(n)) if (n, m) is a c-edge in p then (e(n), e(m)) is an edge in t if (n, m) is a d-edge in p then e(m) is a proper descendant of e(n) in t Consider, the pattern of figure 4 and the tree of figure 4. A the function e which associates nodes 0 to 6 of the pattern to respectively nodes 0, 3, 5, 6, 7, 8, 9 of the tree is an embedding. Imposing that each node of the pattern may only be mapped to a distinct node in the tree and, furthermore, that for each node n of the pattern, each outgoing edge implies the existence of distinct subtrees under e(n) can be obtained by imposing that the embedding be injective. It is also natural to impose that the children of each node n map into distinct subtrees of e(n). This can be obtained by requiring that the mapping conserves the lowest common ancestors. Definition 5 (Lowest common ancestor) The lowest common ancestor of two nodes n and m of a tree (or pattern) t (noted lca(n, m)) is the unique node z such that : z is an ancestor of n and m all other ancestors of both n and m are also ancestors of z In figure 4, lca(6, 15) = 3. Definition 6 (Injective embedding) An embedding e from a pattern p to a tree t is injective iff it also satisfies the following requirements : (4) n, m NODES(p), n m e(n) e(m) (5) n, m NODES(p), e(lca(n, m)) lca(e(n), e(m) In many cases, it may also be interesting to conserve the order of the children of a node. Definition 7 (Ordered injective embedding) An embedding e from a pattern p to a tree t is an ordered injective embedding iff it is an injective embedding which also satisfies the following requirements : (6) for all x, y NODES(p), n < m iff e(n) < e(m) Definition 8 (Match) A tree t is said to match a pattern iff there exists an embedding e from the nodes of p to the nodes of t. Of course, the notion of match depends on the type of embedding considered. Each embedding from a pattern to a tree gives rise to the extraction of a tuple of nodes. The extracted nodes are the images of variable nodes in the pattern. If V AR(n) = X then e(n) is extracted as the value of X. 9

10 Definition 9 (Extracted tuples) Given a tree pattern p and a tree t. Let (v 1,...,v n ) denote the variables of p. The extracted relation from t given p is the set : R p (t) = {(x 1,..., x n ), ee(v i ) = x i and e is an embedding from p to t } For example, the previously defined embedding from the pattern of figure 4 to the tree of figure 4, extracts the tuple (7, 9). It should be noted that in the context of information extraction we are actually interested in the contents of the nodes. By considering the content of nodes 7 and 9 we effectively extract the valid (city, name) tuple (N antes, F abrice). Embeddings allow to determine whether a tree is matched by a pattern. The notion of homomorphism from a tree pattern to another can be defined by adapting condition (3) of definition 4 and adding the condition that variables must be matched. Definition 10 (Homomorphism) A mapping h from a pattern p to another pattern p is a homomorphism iff it respects the following requirements : (0) for all n NODES(p) : h(v AR(n)) = V AR(n) if V AR(n) is defined (1) h is root preserving (ie. ROOT(p) = ROOT(h(p))) (2) for all n NODES(p) : LABEL(n) = or LABEL(n) = LABEL(h(n)) (3) for each n, m NODES(p) : if (n, m) is a c-edge in p then (e(n), e(m)) is a c-edge in p if (n, m) is a d-edge in p then h(m) is a proper descendant of h(n) in p (independently of the type of edges). Let T Σ denote the set of trees which can be constructed over alphabet Σ. We denote by L(p) the subset of trees of T Σ which match p called the language of p. The partial order of the subsets of T Σ induce a partial order over the pattern space : a pattern p is said to be more general than another pattern p (noted p p ) iff L(p ) L(p). Testing whether a pattern p is more general than a pattern p is called the pattern containment problem. When no abstracted labels are allowed, and the patterns only contain c-edges (resp. d-edges), and the embedding is injective the problem is the same as the subtree (resp. tree) inclusion problem discribed in [10]. The containment problem of containment for XP {//,,[]} has been proven to be CoNP-Complete [16] while remaining PTIME for XP {,[]}, XP {//,[]}, and XP {//, }. Proposition 1 Given two patterns p and p, if there exists a homomorphism h from NODES(p ) to NODES(p) which respects same constraints as those defined for matching, then p p. 10

11 The proof of proposition 1 is easy to see. Take any tree t matching p with embedding e. Then e h is a valid embedding from p to t thus proving that t also matched p. While the existence of a homomorphism is a sufficient condition for pattern inclusion, it is not allways a necessary condition [16] have shown that for XP {//,,[]} this is not the case. The generalization algorithm we propose in the next section guarantees the existence of a homomorphism from the generalization to the patterns to be generalized. Therefore, the answer to this question is mostly of theoretical interest from the point of view of this work. Proposition 2 Let p and p be two patterns such that there exists an injective homomorphism h. Let r and r denote respectively the root nodes of p and p. Then CHILDREN(r) CHILDREN(r ). Proposition 2 is a direct consequence of restriction (5). Suppose r has k children and that r has k children such that k > k. Now suppose without loss of generality that h maps each of the k first (for any ordering) children of r into different subtree rooted at the k children of r. Let c k +1 denote the next non mapped node. Any mapping of c k +1 will violate condition (5). Indeed, suppose w.l.o.g. that we map it into the same subtree of root c 1 that the first child c 1 has been mapped to. Then the first common ancestor of c 1 and c k +1 in p is r while the first common ancestor of h(c 1 ) and h(c k +1) in p is c 1 and not h(r) = r. Given a set of example labeled trees we want to build a pattern which will effectively extract this labeled information. Any tree can be seen as a tree pattern which matches itself. Therefore we only need to consider generalizing tree patterns. In this context we wish to build a generalized pattern which matches at least the same set of trees as the initial patterns. We consider the problem of generalizing two tree patterns together. In order to only extract very similar data, we want to keep the patterns as specific as possible. Therefore given two tree patterns we wish to build a new tree pattern which is more general than the initial patterns but kept as specific as possible. Definition 11 (Least general generalization) The least general generalization (lgg) of two tree patterns p 1 and p 2 is a pattern p such that : p p 1 and p p 2 no other pattern p, such that p p, also respects the previous conditions It is well know in the inductive logic programming (ILP) community that the lgg of horn clauses under θ-subsumption 1 is unique (up to logical equivalence) [19]. However, the the size of the generalization is quadratic in the size of the initial clauses and the obtained clause might not be in reduced form. Reducing a requires checking θ- subsumption which is known to be NP-complete. On tree patterns, efficient reduction algorithms have been proposed [20]. However, they do not allow label abstraction. By requiring that the homomorphism be injective, the lgg is not unique but is always in reduced form (ie. there are no redundant nodes). Given these results, there seems to be no best choice. Here we chose to learning injective patterns using a weight based selection. 1 A clause C subsumes a clause C is there exists a substitution (homomorphism) θ such that Cθ C 11

12 div 1 div 1 i 2 sp 4 b 2 bq 4 X a 3 Y a 5 em 6 (a) t 1 X a 3 sp 5 Y a 6 em 7 (b) t 2 div 1,1 2,2 sp 4,5 X a 3,3 Y a 5,6 em 6,7 (c) generalization of t 1 and t 2 Figure 7: Two examples trees and their generalization 5 Maximal weight generalization In this section, we introduce the notion of weighted pattern. This will allow to partly handle the non uniqueness of an lgg in the case of injective embeddings. It will also allow to introduce some control over which type of information should be preferred when characterizing the data to be extracted. A tree pattern can be transformed into a set of relational constraints which we will note rel(p) in the following. Each node n i is transformed into a variable N i, each c-edge (n i, n j ) is transformed into a constraint child(n i, N j ), each d-edge (n i, n j ) is transformed into a constraint descendent(n i, N j ), and for each a Σ, a constraint label a (N i ) is added for every node n i such that LABEL(n i ) = a. The variables of a pattern can also be considered as extraction constraints. Therefore, for each node n where V AR(n i ) is defined and equal to X we add a constraint extract X (N i ) Other types of constraints (eg. leaf, next-sibling, etc.) could have been considered. In the following we will denote the set of relational symbols over which a pattern is describe. In this paper we will consider n-ary patterns over the set : CDLX ={child/2, descendant/2} {label a /1, a Σ} {extract Xi, i [1, n]} Definition 12 (Weighting function) A weighting is a function W : R which associates a real number to each symbol in. Given such a weighting function it is now possible to associate a weight to a pattern. 12

13 div b a bq sp a em div i a sp a em Figure 8: Generalization matrix for t 1 and t 2 Definition 13 (Pattern weight) Let p be a pattern and W a weighting. The weight of p is the sum of the weights of the relations appearing in p. w(p) = Σ r( X) rel(p) W(r) It is possible to decompose the weight of a pattern into the sum of a local weight to which is added the sum of the weights of its subtrees. Definition 14 (Node weight) The weight of a node n is the weight of the subtree rooted at this node. By abuse of notation, we will also denote by w(n) the weight of a node n. Proposition 3 When = CLRX the following are verified : The weight of a node n is : w(n) =w local (n) + w extract (n) +w(child) CEDGES(n) +w(descendant) DEDGES(n) where and w local (n) = w extract (n) = { w(label a ) if LABEL(n) = a 0 otherwise { w(extract X ) if V AR(n) = X 0 otherwise The weight of a pattern p is the weight of its root. Definition 15 (Maximal weight generalization) A generalization g of two patterns p and p is of maximal weight if there exists no other generalization g of the same patterns with a higher weight. 13

14 Proposition 4 For any patterns g, p, p such that the exists two homomorphisms h and h, from NODES(g) to respectively NODES(p) and NODES(p ) there exists a unique subset S of NODES(p) NODES(p ) such that there is a one-to-one mapping sel from S to NODES(g) such that (x, y) S h(sel(x, y)) = x h (sel(x, y)) = y. Proposition 4 says is that any generalization g of two patterns p and p can be seen as a selection of couples from the Cartesian product of the nodes of p and p. Therefore, finding a maximal weight generalization of p and p consists in finding the valid subset of nodes which has maximal weight. Definition 16 (Underlying pattern) Given two patterns p and p and a selection S NODES(p) NODES(p ), the underlying pattern g is a tree pattern such that : sel is the unique one-to-one mapping from S to N ODES(g) for all (x, y) S we have LABEL(sel(x, y)) = { a if LABEL(x) = LABEL(y) = a otherwise for all (x, y) S where V AR(x) and V AR(y) are defined and equal we have : V AR(sel(x, y)) = V AR(x) the parent of a node n = sel(x, y) in g is the node n = sel(x, y ) such that (1) x x, (2) y y and (3) the exists no node n = sel(x, y ) such that x x x and y y y. for each node n = sel(x, y) and its parent n = sel(x, y ), (n, n) is a c-edge iff (x, x) is a c-edge in p and (y, y) is a c-edge in p, otherwise (n, n ) is a d-edge. Definition 16 describes how to build pattern given a node selection. Building a pattern in this way allows to keep it specific and keeping it tree shaped. Indeed, some of the nodes are not linked as descendants when they are already linked as children. Definition 17 (Injective selection) Given two patterns p and p, a selection of nodes S NODES(p) NODES(p ) is injective iff for all (x, y) S, (x, y ) S y = y) and (x, y) S x = x) Definition 18 (Order conservative selection) Given two patterns p and p, a selection of nodes S NODES(p) NODES(p ) is order conservative iff for all (x, y), (x, y ) S, x < x y < y. Consider figure 7 where t 1 and t 2 are two example fragments of HTML trees (bq and sp are short for respectively blockquote and span) and there generalization p. In each of the two trees we want to extract respectively the nodes labeled by X and Y. Both trees t 1 and t 2 have been numbered. In the case of the pattern, each couple of 14

15 html 0 [0] head 1 [1] body 3 [2] link 2 [1] href= style.css rel= stylesheet type= text/css h1 4 [1] table 5 table 12 table 17 tr 6 [1] tr 8 [2] tr 10 [3] tr 13 [1] tr 15 [2] tr 18 [1] tr 20 th 7 [1] td 9 [1] td 11 [1] th 14 [1] td 16 [1] th 19 [1] city td 21 [1] name Figure 9: Learned pattern for People/Location numbers (i, j) correspond to nodes i and j respectively in t 1 and t 2 to which the node maps Now consider we have not yet built the pattern. We wish to find the selection S of nodes which allows to build the pattern with maximal weight. First, we know that the root of the pattern will be required to map to the roots of both trees. Therefore (1, 1) belongs to S. Proposition 5 Given two patterns p and p and a maximal weight generalization g (with mappings h and h ). For any node n in g we have : h(parent(n)) = PARENT(h(n)) or h (PARENT(n)) = PARENT(h (n)) The intuition behind the proof of proposition 5 is that if for a given node n = sel(x, y) and its parent n = sel(x, y ), if neither x = PARENT(x) or y = PARENT(y), then there exists a node that we have missed (and whose addition would add to the weight of the pattern). Indeed, there exists a couple (x, y ) such that x x x and y y y whose addition would have replaced n as a child of n with at least the weight of n since n is one of its candidate children. Now let us consider the possible children of node sel(1, 1) in the pattern of figure 7. In order for the pattern to respect proposition 2, node (1, 1) can only have two children. According to proposition 5, for any (i, j) S we only need to consider as candidate 15

16 children couples (k, l) where k is child or descendant of i and l is a child of j or where k is a child of i and l is a descendant of j. It is not necessary to consider combining descendants of i and j together since we are assured the generated pattern would have a smaller weight than the combination of their parents. For example, the when building p, the couple (2, 2) will do better than the combination (3, 3) since the weight of (3, 3) included in the weight of (2, 2). Since we are maximizing the global weight, we will keep as many children as possible and therefore for (1, 1) we will keep exactly two of the candidates (recall that we had at most two). However, all candidates are not compatible together. For example, (3, 2) and (5, 2) can not be taking together since they would both map to node 2 in t 2, which is forbidden by restriction (4). Also, (5, 2) and (6, 4) are not compatible since nodes 5 and 6 are both descendants of 4 in t 1. This would violate restriction (5) since in the pattern their first common ancestor would be (1, 1) which is mapped to 1 in t 1 while the first common ancestor of nodes 5 and 6 is 4 in t 1. To calculate the best candidate children for (1, 1) we first calculate the best weight for each candidate and select the best compatible subset of candidates. In figure 7, it can be easily seen that the best solution maps together the div/i/a branch of t 1 with the div/b/a branch of t 2. Also, we can see that when generalizing the other side, we are face to two concurrent choices as a child for (1, 1). Either we map 4 and 4 together or we map 4 with 5. When scoring both candidates, 4, 5 does better than 4, 4 since it allows to keep the span(a, em) subtree of both trees by losing node 4 in t 2 and replacing two child edges by a descendant edge. The best weight obtained by choosing (4, 5) as child of (1, 1) is therefore of 21 (3 labels, 2 child edges, 1 descendant edge, 1 variable). Note that the internal weight of the subtree rooted at (4, 5) is however 20 since we do not consider the descendant edge from (1, 1) to (4, 5). The best result obtainable by combination (4, 4), is to lose node 5 in t 2 and therefore replacing two child edges by two descendant edges. We also lose a label since the label of 4 in t 1 and 4 in t 2 have different labels. The weight for combination (4, 4) is 18 (1 child edge, 2 labels, 2 descendant edges, 1 variable). Therefore, the best children for (1, 1) are (2, 2) and (4, 5). 6 Pattern learning algorithm We now give our generic algorithm which allows to recursively calculate the maximal weight pattern given both a weighting function and a local compatibility test corresponding to target type of homomorphism (unordered injective, ordered injective). In the following (experimentations included), we will consider the scoring function with the following weights : w(child) = 2, w(descendant) = 1, w(label) = 2 and w(extract) = 10. The algorithm calculates the selection S whose underlying pattern is of maximal weight. It recursively walks down the tree, calculates the score for the leaves, and on the way back up selects for each node the best candidate children for the node. Let p and p respectively rooted at nodes r and r. Let N and N denote the number of nodes in respectively p and p. The pattern p we are seeking to build is such that there exists two homomorphisms h and h from the nodes of p to the nodes of respectively 16

17 Input: Two patterns p 1 and p 2 and candidate node n i,j Output: The best subpattern rooted at n i,j and its weight Let I 1 be the indexes of the children of n 1i in p 1 Let I 2 be the indexes of the children of n 2j in p 2 Let C be the set of candidate children for all (k, l) I 1 I 2 do C C best_child(p 1, p 2, n k,l ) end for Sort C by decreasing weight (best weight first) C optimize(c) if LABEL(n 1i ) = LABEL(n 2j ) then lbl LABEL(n 1i ) scr w l else lbl scr 0 end if scr src + Σ nk,l C weight(n k,l) return tree of weight scr with a root labeled lbl having C as children Algorithm 1: Calculate the subpattern and its weight for a given candidate node p and p and such that w(r) is maximal. Each node in p is required to match exactly one node in each of p and p. Therefore there are N N candidate nodes for the generalization g we are building. Let n 1,...,n N be the nodes of p and n 1,...,n N be the nodes of p. We will suppose the indices are chosen to reflect the ordering obtained by walking through the tree depth first (ie. <). For short, let us note n i,j = sel(n i, n j ) (ie. the candidate node which would be mapped to n i in p and n j in p ). With restriction (1) (root conservation) we already know that n 1,1 will be a node of p. We now need to determine which other candidate nodes will be kept in the pattern. As noted previously, the weight of a pattern only depends on its subpatterns. Therefore, the impact of the choice of a node on the global weight of the pattern only depends on the choices for its descendants. Supposing a candidate node n i,j has been kept in p, finding the optimal subpattern rooted in n i,j consists in considering its candidate children, calculating the optimal subpatterns rooted at each candidate child, and choosing the best subset of children which respect the matching restriction. The set of candidate children for a node n i,j are the nodes n k,l such that n k and n l are proper descendants of respectively n i and n j. All other nodes would not respect restriction (2) (conserving the child/descendant relationships). Also, since we are maximizing w(p), every child of n i,j will at least be mapped to a child of n i or to a child of n j. Indeed, by doing otherwise we would at least lose a child edge which could have been kept simply by choosing the combination of their parent nodes. We can deduce from proposition 2 that for any node n i,j in p, n i,j has at most min( CHILDREN(n i ), CHILDREN(n j ) ) children. We can also deduce from 17

18 Input: Two patterns p 1 and p 2 and parent node n i,j and a node n k,l Output: The best child n k,l for n i,j this combination can generate Let s weight(p 1, p 2, n k,l ) if (i, k) and (j, l) are both child edges in their respective patterns then s s + w c else s s + w d end if Let I 1 be the indexes of the descendants of n 1k in p 1 Let I 2 be the indexes of the descendants of n 2l in p 2 Let (k, l ) (k, l) Let bs weight(k, l) for all k I 1 do if weight(k, l) + w d > bs then Let (k, l ) (k, l) Let bs weight(k, l) + w d end if end for for all l I 2 do if weight(k, l ) + w d > bs then Let (k, l ) (k, l ) Let bs weight(k, l ) + w d end if end for return n k,l Algorithm 2: Calculate the best child for a node given a combination 18

19 proposition 2 that each of the children of n i,j will be mapped to nodes in distinct subtrees of both n i and n j. Two child nodes verifying this property are said to be compatible. Therefore, for every node n k child of n i and every node n l child of n j, there will be only one best candidate from the set of nodes which map to nodes of the subtrees rooted at n i and n l. This implies that, there will be a maximum of CHILDREN(n i ) CHILDREN(n j) candidate children for n i,j. Also, the unique candidate for a the a given combination (k, l) is a combination of n k or any of its descendants with n l or any of its descendants. Calculating the best set of children for node n i,j can be done in two steps : (1) for each child combination (k, l) calculate the best child for n i,j for this combination and (2) select the best set of compatible children from the set of candidates. The compatibility test can simply be done by checking that both components of the combinations from which they come are different which means that they match different subtrees in both patterns. Algorithm 1, gives the procedure which calculates the best subpattern for a given node n i,j and its weight. It first calculates the set of best candidate children C for each combination by calling the function best_child given in algorithm 2. It then calls optimize with this set to calculate the subset of compatible children which give the best weight. Let k = min( CHILDREN(n i ), CHILDREN(n j ) ). optimize(c) chooses the best subset of C of size k and whose elements are all compatible. Compatibility depends on the type of selection we are considering : either injective (definition 17) or order conservative (definition 18). In our implementation, this compatibility function is given as a parameter. Algorithm 2 calculates the best child and its weight given a parent node n k,l and a combination n k,l. The first candidate is the combination node itself. It then tries to do better with nodes which would map to node n 2l in p 2 and to a proper descendants of n 1k in p 1 and then with nodes which would map to node n 1K in p 1 and to a proper descendants of n 2l in p 2. Since the weight of each candidate node only depends on the choice of its best children, we are only required to calculate it once. In our implementation, we use a matrix where each cell corresponds to a candidate node and stores the weight for this cell. Each line of the matrix corresponds to a node of the first pattern and each column corresponds to a node of the second pattern. By numbering the nodes in depth first order, we have the nice property that for each node with index i there exists an index j such that all nodes with index k, i k j are descendants of node indexed i. Figure 8 gives the generalization matrix for trees t 1 and t 2 of figure 7. The scoring function used is defined with w c = 2, w d = 1, w l = 2 and w v = 10. The nodes and labels of t 1 appear in the two first colums of the figure and those of t 2 in the first two lines. The cells of the matrix contain the maximal weights obtained by associating the node of t 1 in the same column with the node of t 2 in the same row. The arrows point to the outgoing edges of the pattern giving this best weight. For example, associating node 1 of t 1 with node 4 of t 2 gives a maximal weight of 4. The weight of 4 is obtained because the subpattern rooted at (1, 4) contains two child edges (2 2 = 4), has no conserved labels, and has no descendant edges. In the example, the thicker (and red if printed in color) arrows, show the output pattern obtained by the generalization algorithm. It is obtained as described in definition 19

20 Source bigbook okra s20 LX-0 pagesjaunes amazon Relation (name, address) (name, mail, score) (f ile, score, size, type) a 13-ary relation (name, address, city) (title, price) Table 1: Target relations to be extracted in each source Source Good Wrong Missed Rec. Prec. bigbook okra s L L L L pagesjaunes.fr amazon.com Table 2: Experimental results with unordered embedding 16. This pattern is indeed the one which was anticipated in figure 7. The weight of 41 of the root node (1, 1) of the pattern corresponds to the sum of the outgoing edges ( = 26) to which is added the label weight of 2 (node 1 in 1 1 and node 1 in t 2 both have the same label div), plus a descendant weight of 1 for the edge to (4, 5) plus a child weight of 2 for the edge to (2, 2). The effective implementation of our algorithms, make use of optimizations. During the construction of the set of best children for a node, it is possible to quickly calculate an upper bound of the weight the remaining child set may lead to. If such an upper bound is lower than the best current solution, a complete evaluation is no longer required. These optimizations allow for a noticeable speed up of the algorithm. 7 Evaluation We have implemented our algorithm in Ocaml and evaluated it on different datasets. We proceeded to an evaluation when considering both unordered injective embeddings and ordered injective embeddings. In each evaluation, we did à 5-fold cross validation taking one set of documents as example set and the remaining for testing 2. 2 k-fold cross validation usually consists in taking one out for testing an learning on the rest. In information extraction, few examples are often sufficient an therefore the left out set is used for learning and the rest for testing 20

21 html 0 [0] br 3 [7] head 1 [1] table 4 [8] align= center border= 0 cellpadding= 0 cellspacing= 0 width= 100 body 2 [2] alink= FF9933 bgcolor= FFFFFF text= table vlink= [9] border= 0 cellpadding= 0 cellspacing= 0 width= 100 tr 6 [1] table 36 [10] border= 0 cellpadding= 0 cellspacing= 4 width= 100 table 37 [11] border= 0 width= 100 td 7 [1] valign= top width= 180 td 8 [2] bgcolor= width= 1 td 9 [3] valign= top table 10 [1] border= 0 cellpadding= 10 cellspacing= 0 width= 100 tr 11 [1] td 12 [1] align= left class= small valign= top td 34 [4] bgcolor= width= 1 td 35 [5] valign= top width= 180 table 13 width= 100 br 14 table 15 width= 100 br 33 tr 16 [1] valign= top tr 24 [2] td 17 [2] td 18 [3] align= center td 19 [4] class= small width= 100 td 25 [1] colspan= 4 a 20 [1] br 22 [2] br 23 [3] table 26 [1] width= 100 b 21 [1] title tr 27 [1] valign= top td 28 [1] class= small width= 50 td 32 [2] font 29 [1] face= verdana,arial,helvetica size= -1 b 30 [1] 21 font 31 [1] prix color= Figure 10: Unordrerd injective pattern learned using GTree for Amazon

22 Source Good Wrong Missed Rec. Prec. bigbook okra s L L L L pagesjaunes.fr amazon.com Table 3: Experimental results with ordered embedding System ML Doc. Rep. Pat. Rep. Ext. type Stalker yes string string automata composed unary IERel yes string string-based pattern n-ary Lixto some ELog ELog composed unary Squirrel yes Tree NSTT unary PAF yes Tree-based att/value Descision Tree seed-based n-ary GTree yes Tree Tree Pattern n-ary Table 4: Qualitative comparison of different extraction systems The first two datasets Bigbook and Okra come from the RISE information extraction repository. These two datasets are the most referenced sets in the information extraction community. They are however, known to be easy. The L0-0, L3-0, L8-0, and L9-0 datasets, are artificial datasets made available by Marty. Its is the same data with different representations. We only present results for the datasets for which tree patterns over the relations considered in this paper have sufficient expressivity (we plan in future work to add other relations). Finally we evaluated our approach on two real world dataset : Amazon DVD listings and Pagesjaunes address entries. The unordered pattern obtained for Amazon is given in figure 10. Table 1 gives the target relation to extract in each dataset. Table 2 presents the results obtained when learning patterns in an unordered injective setting and table 3 present the results obtained in an ordered injective setting. In all cases we have a very high if not perfect recall. The results for Amazon and Pagesjaunes are particularly encouraging, since these datasets come from existing web sites. However, some cases show a bad precision. This means that some extractions were incorrect. Such results appear in data which have a linear format (ie. all the tuples follow each other under the same node). In such cases, a pattern defined over child and descendant relations alone, are not sufficient to handle such cases. This shows that working only with child and descendant relationships is not sufficient in some cases and that handling negative examples may be required. The algorithms can be adapted to include learning relations such as next-sibling. We are currently integrating these 22

23 results in our implementation. The tables show that results in the unordered and ordered cases are very similar. These results seem to suggest that simply respecting ordering does not provide much information on these sources. It should be noted that the ordering considered is similar to linking siblings together with a following-sibling relation. We believe that this ordering may not be strong enough to be informative and that a next-sibling relation might be an interesting replacement. Comparing our work to other work done in information extraction is difficult for different reasons. The currently referenced datasets (RISE) are either two simple for web information extraction or require natural language information extraction techniques. Therefore there is no real basis for comparison. The availability of the soccer dataset is a first effort in the direction of comparing systems.we will also contribute to this effort in making our datasets (Amazon, Pagesjaunes and otherswe are currently tagging to automate testing) available for comparison. Table 4 gives a qualitative comparison of either string-based systems known to be n-ary and systems using tree based representations and our system GTree. In order, the colums give the system name, whether machine learning is used or not, the document representation, the pattern representation and the type of extraction. 8 Conclusion In this paper we presented a novel approach to n-ary wrapper generation for information extraction from the web. We proposed to use tree patterns as the extraction language, and gave an algorithm capable of generating such a pattern given a set of example labeled documents. We have presented both the theoretical and practical aspects of the method. This method has been implemented and tested on different data sets. The evaluation shows that the approach is useful in cqases where string based approaches have show there limits. Learning tree patterns has many advantages over existing methods : it takes into account explicitly the tree structure of Web documents, it is capable of building patterns capable of skipping nodes, the extracted instances can have multiple attributes, it is not sensitive to node orderings and the generated patterns are easily understandable by a human expert. In future work we plan to extend the algorithm to include other types of relational constraints. Shortly, we will be adding next-sibling and following-sibling types of contraints. This should allow enable the method to handle linear formats (at least in some cases). Also, here we have only considered positive examples. We plan to introduce negative examples which would allow to limit overgeneralizations. References [1] Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroki Arimura, Hiroshi Satamoto, and Setsuo Arikawa. Efficient substructure discovery from large semi-structured data. In Robert L. Grossman, Jiawei Han, Vipin Kumar, Heikki Mannila, and Rajeev Motwani, editors, SDM. SIAM,

Efficient Subtree Inclusion Testing in Subtree Discovering Applications

Efficient Subtree Inclusion Testing in Subtree Discovering Applications Efficient Subtree Inclusion Testing in Subtree Discovering Applications RENATA IVANCSY, ISTVAN VAJK Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Schema-Guided Query Induction

Schema-Guided Query Induction Schema-Guided Query Induction Jérôme Champavère Ph.D. Defense September 10, 2010 Supervisors: Joachim Niehren and Rémi Gilleron Advisor: Aurélien Lemay Introduction Big Picture XML: Standard language for

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017 CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees

CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees MTreeMiner: Mining oth losed and Maximal Frequent Subtrees Yun hi, Yirong Yang, Yi Xia, and Richard R. Muntz University of alifornia, Los ngeles, 90095, US {ychi,yyr,xiayi,muntz}@cs.ucla.edu bstract. Tree

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Towards Schema-Guided XML Query Induction

Towards Schema-Guided XML Query Induction Towards Schema-Guided XML Query Induction Jérôme Champavère Rémi Gilleron Aurélien Lemay Joachim Niehren Université de Lille INRIA, France ICML-2007 Workshop on Challenges and Applications of Grammar Induction

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Figure 4.1: The evolution of a rooted tree.

Figure 4.1: The evolution of a rooted tree. 106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

New Implementation for the Multi-sequence All-Against-All Substring Matching Problem

New Implementation for the Multi-sequence All-Against-All Substring Matching Problem New Implementation for the Multi-sequence All-Against-All Substring Matching Problem Oana Sandu Supervised by Ulrike Stege In collaboration with Chris Upton, Alex Thomo, and Marina Barsky University of

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

4 Fractional Dimension of Posets from Trees

4 Fractional Dimension of Posets from Trees 57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information

A more efficient algorithm for perfect sorting by reversals

A more efficient algorithm for perfect sorting by reversals A more efficient algorithm for perfect sorting by reversals Sèverine Bérard 1,2, Cedric Chauve 3,4, and Christophe Paul 5 1 Département de Mathématiques et d Informatique Appliquée, INRA, Toulouse, France.

More information

Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search

Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search Lingbo Kong, Rémi Gilleron, Aurélien Lemay Mostrare, INRIA Futurs, Villeneuve d Ascq, Lille, 59650 FRANCE mlinking@gmail.com, {remi.gilleron,

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

15.4 Longest common subsequence

15.4 Longest common subsequence 15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

A Formalization of Transition P Systems

A Formalization of Transition P Systems Fundamenta Informaticae 49 (2002) 261 272 261 IOS Press A Formalization of Transition P Systems Mario J. Pérez-Jiménez and Fernando Sancho-Caparrini Dpto. Ciencias de la Computación e Inteligencia Artificial

More information

The Structure of Bull-Free Perfect Graphs

The Structure of Bull-Free Perfect Graphs The Structure of Bull-Free Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertex-disjoint

More information

Unifying and extending hybrid tractable classes of CSPs

Unifying and extending hybrid tractable classes of CSPs Journal of Experimental & Theoretical Artificial Intelligence Vol. 00, No. 00, Month-Month 200x, 1 16 Unifying and extending hybrid tractable classes of CSPs Wady Naanaa Faculty of sciences, University

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Discrete mathematics

Discrete mathematics Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Core Membership Computation for Succinct Representations of Coalitional Games

Core Membership Computation for Succinct Representations of Coalitional Games Core Membership Computation for Succinct Representations of Coalitional Games Xi Alice Gao May 11, 2009 Abstract In this paper, I compare and contrast two formal results on the computational complexity

More information

J. Carme, R. Gilleron, A. Lemay, J. Niehren. INRIA FUTURS, University of Lille 3

J. Carme, R. Gilleron, A. Lemay, J. Niehren. INRIA FUTURS, University of Lille 3 Interactive Learning o Node Selection Queries in Web Documents J. Carme, R. Gilleron, A. Lemay, J. Niehren INRIA FUTURS, University o Lille 3 Web Inormation Extraction Data organisation is : adapted to

More information

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017 12. Graphs and Trees 2 Aaron Tan 6 10 November 2017 1 10.5 Trees 2 Definition Definition Definition: Tree A graph is said to be circuit-free if, and only if, it has no circuits. A graph is called a tree

More information

Horn Formulae. CS124 Course Notes 8 Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018 CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it

More information

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD Algorithms (2IL15) Lecture 2 THE GREEDY METHOD x y v w 1 Optimization problems for each instance there are (possibly) multiple valid solutions goal is to find an optimal solution minimization problem:

More information

This article was originally published in a journal published by Elsevier, and the attached copy is provided by Elsevier for the author s benefit and for the benefit of the author s institution, for non-commercial

More information

Html basics Course Outline

Html basics Course Outline Html basics Course Outline Description Learn the essential skills you will need to create your web pages with HTML. Topics include: adding text any hyperlinks, images and backgrounds, lists, tables, and

More information

Fully dynamic algorithm for recognition and modular decomposition of permutation graphs

Fully dynamic algorithm for recognition and modular decomposition of permutation graphs Fully dynamic algorithm for recognition and modular decomposition of permutation graphs Christophe Crespelle Christophe Paul CNRS - Département Informatique, LIRMM, Montpellier {crespell,paul}@lirmm.fr

More information

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path. Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

More information

Automatically Maintaining Wrappers for Semi- Structured Web Sources

Automatically Maintaining Wrappers for Semi- Structured Web Sources Automatically Maintaining Wrappers for Semi- Structured Web Sources Juan Raposo, Alberto Pan, Manuel Álvarez Department of Information and Communication Technologies. University of A Coruña. {jrs,apan,mad}@udc.es

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Choosing the Right Patterns

Choosing the Right Patterns Choosing the Right Patterns An Experimental Comparison between Different Tree Inclusion Relations Jeroen De Knijf and Ad Feelders Algorithmic Data Analysis Group Department of Information and Computing

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

Representing and Querying XML with Incomplete Information

Representing and Querying XML with Incomplete Information Representing and Querying XML with Incomplete Information Serge Abiteboul INRIA Joint work with Victor Vianu, UCSD and Luc Segoufin, INRIA Organization Incomplete databases XML Motivations Setting: documents,

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 and External Memory 1 1 (2, 4) Trees: Generalization of BSTs Each internal node

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Greedy Algorithms October 28, 2016 École Centrale Paris, Châtenay-Malabry, France Dimo Brockhoff Inria Saclay Ile-de-France 2 Course Overview Date Fri, 7.10.2016 Fri, 28.10.2016

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 B-Trees and External Memory 1 (2, 4) Trees: Generalization of BSTs Each internal

More information

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner

More information

CSE 214 Computer Science II Introduction to Tree

CSE 214 Computer Science II Introduction to Tree CSE 214 Computer Science II Introduction to Tree Fall 2017 Stony Brook University Instructor: Shebuti Rayana shebuti.rayana@stonybrook.edu http://www3.cs.stonybrook.edu/~cse214/sec02/ Tree Tree is a non-linear

More information

Module 11. Directed Graphs. Contents

Module 11. Directed Graphs. Contents Module 11 Directed Graphs Contents 11.1 Basic concepts......................... 256 Underlying graph of a digraph................ 257 Out-degrees and in-degrees.................. 258 Isomorphism..........................

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information

In this lecture, we ll look at applications of duality to three problems:

In this lecture, we ll look at applications of duality to three problems: Lecture 7 Duality Applications (Part II) In this lecture, we ll look at applications of duality to three problems: 1. Finding maximum spanning trees (MST). We know that Kruskal s algorithm finds this,

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal

More information

Mining Generalised Emerging Patterns

Mining Generalised Emerging Patterns Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au

More information

The Resolution Algorithm

The Resolution Algorithm The Resolution Algorithm Introduction In this lecture we introduce the Resolution algorithm for solving instances of the NP-complete CNF- SAT decision problem. Although the algorithm does not run in polynomial

More information

Efficient subset and superset queries

Efficient subset and superset queries Efficient subset and superset queries Iztok SAVNIK Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 5000 Koper, Slovenia Abstract. The paper

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Trees. T.U. Cluj-Napoca -DSA Lecture 2 - M. Joldos 1

Trees. T.U. Cluj-Napoca -DSA Lecture 2 - M. Joldos 1 Trees Terminology. Rooted Trees. Traversals. Labeled Trees and Expression Trees. Tree ADT. Tree Implementations. Binary Search Trees. Optimal Search Trees T.U. Cluj-Napoca -DSA Lecture 2 - M. Joldos 1

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

Lecture 3 February 9, 2010

Lecture 3 February 9, 2010 6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees

More information

Algorithmic Aspects of Communication Networks

Algorithmic Aspects of Communication Networks Algorithmic Aspects of Communication Networks Chapter 5 Network Resilience Algorithmic Aspects of ComNets (WS 16/17): 05 Network Resilience 1 Introduction and Motivation Network resilience denotes the

More information

How to use the Dealer Car Search ebay posting tool. Overview. Creating your settings

How to use the Dealer Car Search ebay posting tool. Overview. Creating your settings How to use the Dealer Car Search ebay posting tool Overview The Dealer Car Search ebay posting tool is designed to allow you to easily create an auction for a vehicle that has been loaded into Dealer Car

More information

Reasoning with Patterns to Effectively Answer XML Keyword Queries

Reasoning with Patterns to Effectively Answer XML Keyword Queries Noname manuscript No. (will be inserted by the editor) Reasoning with Patterns to Effectively Answer XML Keyword Queries Cem Aksoy Aggeliki Dimitriou Dimitri Theodoratos the date of receipt and acceptance

More information

Lecture 11: Maximum flow and minimum cut

Lecture 11: Maximum flow and minimum cut Optimisation Part IB - Easter 2018 Lecture 11: Maximum flow and minimum cut Lecturer: Quentin Berthet 4.4. The maximum flow problem. We consider in this lecture a particular kind of flow problem, with

More information

15.4 Longest common subsequence

15.4 Longest common subsequence 15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible

More information

6.001 Notes: Section 31.1

6.001 Notes: Section 31.1 6.001 Notes: Section 31.1 Slide 31.1.1 In previous lectures we have seen a number of important themes, which relate to designing code for complex systems. One was the idea of proof by induction, meaning

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information

CSCI2100B Data Structures Trees

CSCI2100B Data Structures Trees CSCI2100B Data Structures Trees Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Introduction General Tree

More information

Slides for Faculty Oxford University Press All rights reserved.

Slides for Faculty Oxford University Press All rights reserved. Oxford University Press 2013 Slides for Faculty Assistance Preliminaries Author: Vivek Kulkarni vivek_kulkarni@yahoo.com Outline Following topics are covered in the slides: Basic concepts, namely, symbols,

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Matching Algorithms for User Notification in Digital Libraries

Matching Algorithms for User Notification in Digital Libraries Matching Algorithms for User Notification in Digital Libraries H. Belhaj Frej, P. Rigaux 2 and N. Spyratos Abstract We consider a publish/subscribe system for digital libraries which continuously evaluates

More information

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington 1 Trees Trees are a natural data structure for representing specific data. Family trees. Organizational

More information

Answering Aggregate Queries Over Large RDF Graphs

Answering Aggregate Queries Over Large RDF Graphs 1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo

More information

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY General definitions; Representations; Graph Traversals; Topological sort; Graphs definitions & representations Graph theory is a fundamental tool in sparse

More information

FOUR EDGE-INDEPENDENT SPANNING TREES 1

FOUR EDGE-INDEPENDENT SPANNING TREES 1 FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem

More information

Lecture 19 Thursday, March 29. Examples of isomorphic, and non-isomorphic graphs will be given in class.

Lecture 19 Thursday, March 29. Examples of isomorphic, and non-isomorphic graphs will be given in class. CIS 160 - Spring 2018 (instructor Val Tannen) Lecture 19 Thursday, March 29 GRAPH THEORY Graph isomorphism Definition 19.1 Two graphs G 1 = (V 1, E 1 ) and G 2 = (V 2, E 2 ) are isomorphic, write G 1 G

More information

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18 CS 124 Quiz 2 Review 3/25/18 1 Format You will have 83 minutes to complete the exam. The exam may have true/false questions, multiple choice, example/counterexample problems, run-this-algorithm problems,

More information

1 The range query problem

1 The range query problem CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition

More information

Backtracking. Chapter 5

Backtracking. Chapter 5 1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

Minimal Dominating Sets in Graphs: Enumeration, Combinatorial Bounds and Graph Classes

Minimal Dominating Sets in Graphs: Enumeration, Combinatorial Bounds and Graph Classes Minimal Dominating Sets in Graphs: Enumeration, Combinatorial Bounds and Graph Classes J.-F. Couturier 1 P. Heggernes 2 D. Kratsch 1 P. van t Hof 2 1 LITA Université de Lorraine F-57045 Metz France 2 University

More information

DETAILS OF SST-SWAPTREE ALGORITHM

DETAILS OF SST-SWAPTREE ALGORITHM A DETAILS OF SST-SWAPTREE ALGORITHM Our tree construction algorithm begins with an arbitrary tree structure and seeks to iteratively improve it by making moves within the space of tree structures. We consider

More information

Stochastic propositionalization of relational data using aggregates

Stochastic propositionalization of relational data using aggregates Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases

More information

Backtracking and Branch-and-Bound

Backtracking and Branch-and-Bound Backtracking and Branch-and-Bound Usually for problems with high complexity Exhaustive Search is too time consuming Cut down on some search using special methods Idea: Construct partial solutions and extend

More information

Conflict Graphs for Combinatorial Optimization Problems

Conflict Graphs for Combinatorial Optimization Problems Conflict Graphs for Combinatorial Optimization Problems Ulrich Pferschy joint work with Andreas Darmann and Joachim Schauer University of Graz, Austria Introduction Combinatorial Optimization Problem CO

More information

Disjoint Support Decompositions

Disjoint Support Decompositions Chapter 4 Disjoint Support Decompositions We introduce now a new property of logic functions which will be useful to further improve the quality of parameterizations in symbolic simulation. In informal

More information

PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet

PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet IEICE TRANS. FUNDAMENTALS, VOL.E8??, NO. JANUARY 999 PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet Tetsuo SHIBUYA, SUMMARY The problem of constructing the suffix tree of a tree is

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is

More information

Multi-relational Decision Tree Induction

Multi-relational Decision Tree Induction Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com

More information

Efficient Filtering of XML Documents with XPath Expressions

Efficient Filtering of XML Documents with XPath Expressions The VLDB Journal manuscript No. (will be inserted by the editor) Efficient Filtering of XML Documents with XPath Expressions Chee-Yong Chan, Pascal Felber?, Minos Garofalakis, Rajeev Rastogi Bell Laboratories,

More information

Labeling Dynamic XML Documents: An Order-Centric Approach

Labeling Dynamic XML Documents: An Order-Centric Approach 1 Labeling Dynamic XML Documents: An Order-Centric Approach Liang Xu, Tok Wang Ling, and Huayu Wu School of Computing National University of Singapore Abstract Dynamic XML labeling schemes have important

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

EE 368. Weeks 5 (Notes)

EE 368. Weeks 5 (Notes) EE 368 Weeks 5 (Notes) 1 Chapter 5: Trees Skip pages 273-281, Section 5.6 - If A is the root of a tree and B is the root of a subtree of that tree, then A is B s parent (or father or mother) and B is A

More information