Inference of node replacement graph grammars

Size: px
Start display at page:

Download "Inference of node replacement graph grammars"

Transcription

1 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. Intelligent Dt Anlysis (27) 24 IOS Press Inference of node replcement grph grmmrs Jcek P. Kukluk, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer Science nd Engineering, University of Texs t Arlington, Box 95, Arlington, TX 769, USA Received 2 My 26 Revised 2 August 26 Accepted 3 Septemer 26 Astrct. Grph grmmrs comine the reltionl spect of grphs with the itertive nd recursive spects of string grmmrs, nd thus represent n importnt next step in our ility to discover knowledge from dt. In this pper we descrie n pproch to lerning node replcement grph grmmrs. This pproch is sed on previous reserch in frequent isomorphic sugrphs discovery. We extend the serch for frequent sugrphs y checking for overlp mong the instnces of the sugrphs in the input grph. If sugrphs overlp y one node we propose node replcement grmmr production. We lso cn infer hierrchy of productions y compressing portions of grph descried y production nd then infer new productions on the compressed grph. We vlidte this pproch in experiments where we generte grphs from known grmmrs nd mesure how well our system infers the originl grmmr from the generted grph. We lso descrie results on severl rel-world tsks from chemicl mining to XML schem induction. We riefly discuss other grmmr inference systems indicting tht our study extends clsses of lernle grph grmmrs. Keywords: Grmmr induction, grph grmmrs, grph mining. Introduction String grmmrs re fundmentl to linguistics nd computer science. Grph grmmrs cn represent reltions in dt which strings cnnot. Grph grmmrs cn represent hierrchicl structures in dt nd generlize knowledge in grph domins. They hve een pplied s nlyticl tools in physics, iology, nd engineering [7,5]. Our ojective with this reserch is to develop lgorithms for grph grmmr inference cple of lerning recursive productions. We introduce n lgorithm which uilds on previous work in discovering frequent sugrphs in grph [5]. We check if sugrphs overlp nd if they overlp y one node, we use this node nd sugrph structure to propose node replcement grph grmmr. We lso cn infer hierrchy of productions y compressing portions of grph descried y production nd then infer new productions on the compressed grph. We vlidte this pproch in experiments where we generte grphs from known grmmrs nd mesure how well our system infers the originl grmmr from the generted grph. We show how inference error depends on noise, complexity of the grmmr, numer of lels, nd size of input grph. We lso descrie results on severl rel-world tsks Corresponding uthor. Tel.: ; Fx: ; E-mil: holder@cse.ut.edu X/7/$7. 27 IOS Press nd the uthors. All rights reserved

2 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 2 2 J.P. Kukluk et l. / Inference of node replcement grph grmmrs from chemicl mining to XML schem induction. A vst mount of reserch hs een done in string grmmr inference [2]. We found only few studies in grph grmmr inference, which we descrie in the next section. In the reminder of the pper we the first define grph grmmrs. Next we descrie our lgorithm for inferring grph grmmrs. Then, we show experiments to revel the dvntges nd limittions of our method. We close with conclusions nd future directions. 2. Relted work Jeltsch nd Kreowski [9] did theoreticl study of inferring hyperedge replcement grph grmmrs from simple undirected, unleled grphs. Their pper leds through n exmple where from four complete iprtite grphs K 3,,K 3,2,K 3,3,K 3,4, the uthors descrie the inference of grmmr tht cn generte more generl clss of iprtite grphs K 3,n, where n. The uthors define four opertions tht led to finl hyperedge replcement grmmr. The opertions re: INIT, DECOMPOSE, RENAME, nd REDUCE. The INIT opertion will strt the process from grmmr which hs ll smple grphs in its productions nd therefore it genertes only the smple grphs. Then, the DECOMPOSE opertion trnsforms the initil productions into productions tht re more generl ut cn still produce every grph from the smple grphs. RENAME llows for chnging nmes of non-terminl lels. REDUCE removes redundnt productions. Jeltsch nd Kreowski [9] strt the process from grmmr which hs ll the smple grphs in its productions. Then they trnsform the initil productions into productions tht re more generl ut cn still produce every grph from the smple grphs. Their pproch gurntees tht the finl grmmr will generte grphs tht contin ll smple grphs. Otes et l. [7] discuss the prolem of inferring proilities of every grmmr rule for stochstic hyperedge replcement context free grph grmmrs. They cll their progrm Prmeter Estimtion for Grph Grmmrs (PEGG). They ssume tht the grmmr is given. Given structure of grmmr S nd finite set of grphs E generted y grmmr S, they sk wht re the proilities θ ssocited with every rule of the grmmr. Their strtegy is to look for set of prmeters θ tht mximizes the proility p(e S, θ). In terms of similrity to string grmmr inference we consider the Sequitur system developed y Nevill-Mnning nd Witten [6]. Sequitur infers hierrchicl structure y replcing sustrings sed on grmmr rules. The new, compressed string is serched for sustrings which cn e descried y grmmr rules, nd they re then compressed with the grmmr nd the process continues itertively. Similrly, in our pproch we replce the prt of grph descried y the inferred grph grmmr with single node nd we look for grmmr rules on the compressed grph nd repet this process itertively until the grph is fully compressed. The most relevnt work to this reserch is Jonyer et l. s pproch to node-replcement grph grmmr inference [,]. Their system strts y finding frequently occurring sugrphs in the input grphs. Frequent sugrphs re those tht when replced y single nodes minimize the description length of the grph. They check if isomorphic instnces of the sugrphs tht minimize the mesure re connected y one edge. If they re, production S PS is proposed, where P is the frequent sugrph. P nd S re connected y one edge. Our pproch is similr to Jonyer s in tht we lso strt y finding frequently occurring sugrphs, ut we test if the instnces of the sugrphs overlp y one node. Jonyer s method of testing if sugrphs re djcent y one edge limits his grmmrs to description of chins of isomorphic sugrphs connected y one edge. Since n edge of frequent sugrph connecting it to the other isomorphic sugrph cn e included to the sugrph structure, testing sugrphs for overlp llows

3 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 3 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 3 Grmmr S 6% 4% nx S S Generted grph nx nx nx nx nx Grmmr found S S nx Compressed grph nx S Fig.. Jonyerfls pproch to grph grmmr inference finds chin of ptterns in the tree. us to propose clss of grmmrs tht hve more expressive power thn the grph structures covered y Jonyer s grmmrs. For exmple, testing for overlp llows us to propose grmmrs which cn descrie tree structures, while Jonyer s pproch does not llow for tree grmmrs. In Fig. we illustrte the limittion of Jonyer s pproch. We generted tree from grmmr tht hs two productions. The first production is selected 6% of time, nd the second production is terminl which is single vertex nd is selected 4% of time. The grmmr found y Jonyer s pproch, nd resulting compressed grph, re shown on the right side of the figure. The grmmr represents the chin of ptterns connected y one edge in the grph, ut it cnnot represent the tree. The pproch we propose in this pper does not hve this limittion. In our pproch we use the frequent sugrph discovery system Sudue developed y Cook nd Holder [5]. We would like to mention other systems developed to discover frequent sugrphs nd therefore hve the potentil to e modified into system which cn infer grph grmmr. Kurmochi nd Krypis [3] implemented the FSG system for finding ll frequent sugrphs in lrge grph dtses. FSG strts y ll frequent one nd two edge sugrphs. Then, in ech itertion, it genertes cndidte sugrphs y expnding the sugrphs found in the previous itertion y one edge. In every itertion, the lgorithm checks how mny times the cndidte sugrph occurs within n entire grph. The cndidtes, whose frequency is elow user-defined level, re pruned. The lgorithm returns ll sugrphs occurring more frequently thn the given level. In the cndidte genertion phse, computtion costs of testing grphs for isomorphism re reduced y uilding unique code for the grph (cnonicl leling). Yn nd Hn introduced gspn [24], which does not require cndidte genertion to discover frequent sustructures. The uthors comine depth first serch nd lexicogrphic order in their lgorithm. Their lgorithm strts from ll frequent one-edge grphs. The lels on these edges together with lels on incident nodes define code for every such grph. Expnsion of these one-edge grphs mps them to longer codes. The codes re stored in tree structure such tht if α =(,,..., m ) nd β =(,,..., m,), then the β code is child of the α code. Since every grph cn mp to mny codes, the codes in the tree structure re not unique. If there re two codes in the code tree tht mp to the sme grph nd one is smller then the other, the rnch with the smller code is pruned during depth first serch trversl of the code tree. Only the minimum code uniquely defines the grph. Code ordering nd pruning reduces the cost of mtching frequent sugrphs in gspn. Since gspn nd FSG re oth frequent sugrph mining systems, they lso cn e used to infer grph grmmrs with the modifiction to llow sugrph instnces to overlp nd mechnism for

4 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 4 4 J.P. Kukluk et l. / Inference of node replcement grph grmmrs overlp detection. Modifiction of gspn nd FSG where the grph would e compressed with frequent isomorphic sugrphs would llow for uilding hierrchy of productions. Overlpping sustructures re ville s n option in the Sudue system [5]. Also, Sudue llows for identifiction of one sustructure with the est compression score, which we cn modify to identify one grmmr production with the est score, while FSG nd gspn return ll cndidte sugrphs ove user-defined frequency level leving interprettion nd finl selection for the user. 3. Node replcement recursive grph grmmr We give the definition of grph nd grph grmmrs which is relevnt to our implementtion. The defined grph hs lels on vertices nd edges. Every edge of the grph cn e directed or undirected. We emphsize the role of recursive productions in the nme of the grmmr, ecuse the type of inferred productions re such tht the non-terminl lel on the left side of the production ppers one or more times in the node lels of grph on the right side. It is the min chrcteristic of our grmmr productions. Our pproch cn lso infer non-recursive productions. The emedding mechnism of the grmmr consists of connection instructions. Every connection instruction is pir of vertices tht indicte where the production grph cn connect to itself in recursive fshion. Our grph genertor cn generte lrger clss of grph grmmrs thn defined elow. We will descrie the grmmrs used in genertion lter in the pper. A leled grph G is 6-tuple, G =(V, E, µ,ν, η, L), where V is the set of nodes, E V V is the set of edges, µ : V L is function ssigning lels to the nodes, v : E L is function ssigning lels to the edges, η : E {, } is function ssigning direction property to edges ( if undirected, if directed). L is set of lels on nodes nd edges. A node replcement recursive grph grmmr is tuple Gr =(,, Γ,P), where is n lphet of node lels, is n lphet of terminl node lels,, Γ is n lphet of edge lels, which re ll terminls, P is finite set of productions of the form (d, G, C), where d, G is grph, nd there re two types of productions: () recursive productions of the form (d, G, C), with d, G is grph, where there is t lest one node in G leled d. C is n emedding mechnism with set of connection instructions, C V V, where V is the set of nodes of G. A connection instruction (v i,v j ) C implies tht derivtion cn tke plce y replcing v i in one instnce of G with v j in nother instnce of G. All the edges incident to v i re incident to v j. All the edges incident to v j remin unchnged. (2) non-recursive production, there is no node in G leled d (our inference system does not infer n emedding mechnism for these productions). Ech grmmr production cn hve one or more connection instructions. If the grmmr production does not hve connection instruction, it is non-recursive production. Ech connection instruction consists of two integers. They re the numers of vertices in two isomorphic sugrphs. In Fig. 4 we show three isomorphic instnces nd connection instructions. Connection instructions re determined

5 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 5 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 5 from overlp. They show how instnces overlp in the input grph nd cn e used in genertion. We compress portions of the grph descried y productions. Connection instructions show how one instnce connects to its isomorphic copy. They do not show how n instnce is connected to the compressed grph. We intended to give definition of grph grmmr which descries the clss of grmmrs which cn e inferred y our pproch. We do not infer the emedding mechnism of recursive nd non-recursive productions for the compressed grph, ut this is n issue for further theoreticl nd experimentl study. When production is non-recursive, instnces do not overlp nd do not connect to ech other. We do not explicitly give n emedding mechnism for this cse. We introduce the definition of two dt structures used in our lgorithm. Sustructure S of grph G is dt structure which consists of: () grph definition of sustructure S G which is grph isomorphic to sugrph of G (2) list of instnces (I,I 2,...,I n ) where every instnce is sugrph of G isomorphic to S G Recursive sustructure recursivesu is dt structure which consists of: () grph definition of sustructure S G which is grph isomorphic to sugrph of G (2) list of connection instructions which re pirs of integer numers descriing how instnces of the sustructure cn overlp to comprise one instnce of the corresponding grmmr production rule. (3) list of recursive instnces (IR,IR 2,...,IR n ) where every instnce IR k is sugrph of G. Every instnce IR k consist of one or more isomorphic, overlpping y no more thn one vertex, copies of S G. In our definition of sustructure we refer to sugrph isomorphism. However, in our lgorithm we re not solving the sugrph isomorphism prolem. We re using polynomil time em serch to discover sustructures nd grph isomorphism to collect instnces of the sustructures. 4. Hierrchy of grph grmmrs We encountered in existing literture clssifiction of grph grmmrs sed on the emedding mechnism [2]. The emedding mechnism is importnt in the genertion process, ut if we use grph grmmrs in prsing or s tool to mine dt nd visulize common ptterns, the emedding mechnism my hve less importnce or cn e omitted. Without the emedding mechnism the grph grmmr still conveys informtion out grph structures used in productions nd reltions etween them. In Fig. 2 we give the clssifiction of grph grmmrs sed on the type of their productions, not sed on the type of emedding mechnism. The production of the grmmrs in the hierrchy is of the form (d, G, C) where d is the left hnd side of the production, G is grph, nd C is the emedding mechnism. d cn e single node, single edge or grph, nd we respectively cll the grmmr node-, edge- or grph replcement grmmr. If the replcement of d with G does not depend on vertices djcent to d or edges incident to d, nor ny other vertices or edges outside d in grph hosting d, we cll the grmmr context free. Otherwise, the grmmr is context sensitive. We wnted to plce the grph grmmrs we re le to infer in this hierrchy. We circled two of them. Node replcement recursive grph grmmr is the one descried in this pper. The set of grmmrs inferred y Jonyer et l. [,] we cll chin grmmrs. Chin grmmrs descrie grphs or portion of grphs composed from isomorphic sugrphs where every sugrph is djcent to the other y one edge. The productions of chin grmmrs re of the form S PS, where P is the sugrph. P nd S re connected y one edge. Chin grmmrs re suset of node replcement recursive grph grmmrs.

6 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 6 6 J.P. Kukluk et l. / Inference of node replcement grph grmmrs Fig. 2. Hierrchy of grph grmmrs. Node replcement grph grmmrs descrie more generl clss of grph grmmrs thn our lgorithm is le to lern. An exmple of node replcement grph grmmr tht we cnnot lern is grmmr with lternting productions, s in Fig The lgorithm We will first descrie n lgorithm informlly llowing for n intuitive understnding of the ide. The exmple in Fig. 3 shows grph composed of three overlpping sustructures. The lgorithm genertes cndidte sustructures nd evlutes them using the following mesure of compression, size (G) size (S)+size (G S) where G is the input grph, S is sustructure nd G S is the grph derived from G y compressing ech instnce of S into single node. size (g) cn e computed simply y dding the numer of nodes nd edges: size (g) =vertices (g)+edges (g). Another successful mesure of size (g) is the Minimum Description Length (MDL) discussed in detil in [4,9]. Either of these mesures cn e used to guide the serch nd determine the est grph grmmr. In our experiments we used only the size mesure. The compression mesure could lso tke into ccount the connection instructions. We could penlize recursive sustructures for the dditionl informtion connection instruction crry. We did preliminry experiments incorporting connection instruction in the mesure, ut we did not oserve significnt improvements in results nd therefore we decided to leve the mesure unchnged. The grmmr from Fig. 3 consists of grph isomorphic to three overlpping sustructures nd connection instructions. We find connection instructions when we check for overlps. In this exmple there re two connection instructions, 3 nd 4. Hence, in genertion of grph from the grmmr,

7 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 7 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 7 Fig. 3. A grph with overlpping sustructures nd grph grmmr representtion of it Fig. 4. Sustructure nd its instnces while determining connection instructions (continution of the exmple from Fig. 3). in every derivtion step n isomorphic copy of the sugrph definition will e connected to the existing grph y connecting node of the sugrph to either node 3 or node 4 in the existing grph. The grmmr shown on the right in Fig. 3 cnnot only regenerte the grph shown on the left, ut lso generte generliztions of this grph. Generliztion in this exmple mens tht the grmmr descries grphs composed from one or more str looking sustructures of four nodes leled nd. All these sustructures overlp on node with the lel. In our lgorithm we llow instnces to overlp y one node only. Certinly, rel dt cn contin instnces which overlp on more thn one node. We encountered this cse in experiments with chemicl structures. We lso did studies where we llow instnces to overlp y two nodes, which led us to consider inference of edge replcement recursive grph grmmrs. Allowing for lrger overlp is n open reserch issue in inference of new clsses of grph grmmrs which we hve not yet explored. In this pper we limit overlp to only one node. In the lgorithm, we do not llow instnces to grow further thn overlp y one node. Algorithm is our grmmr discovery pseudocode. The function INFER GRAMMAR is similr to the descriptions of Cook et l. [5] for sustructure discovery nd Jonyer et l. [] for discovering grmmrs descriing chins of isomorphic sugrphs connected y one edge. The input to the lgorithm is grph G which cn e one connected grph or set of disconnected grphs. G cn hve directed edges or undirected edges. The lgorithm ssumes lels on nodes nd edges. The lgorithm processes the list of sustructures Q. In Fig. 4 we see n exmple of sustructure definition. A sustructure consists of grph definition nd set of instnces from the input grph tht re isomorphic to the grph definition. The exmple in Fig. 4 is continution of the exmple in Fig. 3. The numers in prentheses refer to nodes of the grph in Fig. 3.

8 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 8 8 J.P. Kukluk et l. / Inference of node replcement grph grmmrs The lgorithm strts (line 3) with list of sustructures where every sustructure is single node nd its instnces re ll nodes in the grph with this node lel. The est sustructure is initilly the first sustructure in the Q list (line 4). In line 8 we extend ech sustructure in Q in ll possile wys y single edge nd node or only y single edge if oth nodes re lredy in the grph definition of the sustructure. We llow instnces to grow nd overlp, ut ny two instnces cn overlp y only one node. We keep ll extended sustructures in newq. We evlute sustructures in newq in line 2. The recursive sustructure recursivesu is evluted long with non-recursive sustructures nd is competing with non-recursive sustructures. The totl numer of sustructures considered is determined y the

9 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 9 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 9 input prmeter Limit. In line 9 we compress G with estsu. Compression replces every instnce of estsu with single node. This node is leled with non-terminl lel. The compressed grph is further processed until it cnnot e compressed ny more. In consecutive itertions estsu cn hve one or more non-terminl lels. It llows us to crete hierrchy of grmmr productions. The input prmeter Bem specifies the width of em serch, i.e., the length of Q. For more detils out the lgorithm see [5,,]. The function RECURSIFY SUBSTRUCTURE tkes sustructure Snd, if instnces of S overlp, proposes recursive sustructure recursivesu. The list of connection instructions nd the list of recursive instnces re two min components of recursivesu. We initilize them in line nd 2. We check for overlp in line 4. Figure 4 ssists us in explining conversion of sustructure S into recursive sustructure. Every instnce grph hs two positive integers ssigned to it. One integer, in prentheses in Fig. 4, is the numer of node in the processed grph G. The second integer is node numer of n instnce grph. The instnces re isomorphic to the sustructure grph definition nd instnce node numers re ssigned to them ccording to this isomorphism. We check for overlp in line 4. Given pir of instnces (I,I 2 ) we exmine if there is node v G, which lso elongs to I ndi 2. We find two overlpping nodes [3,4], exmining node numers in prentheses in the exmple in Fig. 4. Hving the numer of node v G we find corresponding to v two node numers of instnce grphs v I I nd v I I 2 (line 5 nd 6). The pir of integers (v I,v I ) is connection instruction. There re two connection instructions in Fig. 4, -3 nd -4. If (v I,v I ) is not lredy in the list of connections instructions for recursive sustructure, we include it in line 8. The order in which we test instnces in the instnce list for overlp does not mtter. Even if one instnce overlps with two or more different instnces on different nodes, overlp does not depend on the order of instnces. Therefore, the connection instructions found from ech overlp will e the sme, no mtter in which order we trverse the instnce list. We crete the recursive sustructure s instnce list in lines to 3 of RECURSIFY SUBSTRUC- TURE. A recursive instnce is connected sugrph of Gwhich cn e descried y the discovered grmmr production. It mens tht for every suset of instnces {I m, I m+,..., I l } from the instnce list of S, in which union I m I m+... I l is connected grph, we crete one recursive instnce IR k = I m I m+... I l. The recursive instnces re no longer isomorphic s instnces of S nd they vry in size. Recursive sustructures compete with non-recursive. All instnces descried y single recursive production re collectively replced y single node in the evlution process; wheres, for non-recursive productions, ech instnce is replced y single node, one per instnce. This method of tends to prefer recursive productions. Sudue uses heuristic serch whose complexity is polynomil in the size of the input grph [5]. Our modifiction does not chnge the complexity of this lgorithm. The overlp test is the min computtionlly expensive ddition of our grmmr discovery lgorithm. Anlyzing informlly, the numer of nodes of n instnce grph is not lrger thn V, where V is the numer of nodes in the input grph. Checking two instnces for overlp will not tke more thn O(V 2 ) time. The numer of pirs of instnces is no more thn V 2, so the entire overlp test will not tke more thn O(V 4 ) time. 6. Hierrchy of productions In our first exmple from Fig. 3, we descried grmmr with only one production. Now we would like to introduce complex exmple to illustrte the inference of grmmr which descries more generl tree structure. In Fig. 5 we hve tree with ll nodes hving the sme lel. There re two repetitive sugrphs in the tree. One hs three edges leled,, nd c. The other hs two edges

10 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. J.P. Kukluk et l. / Inference of node replcement grph grmmrs (S2) (S2) Fig. 5. The tree (left) nd inferred tree grmmr (right). (S) Fig. 6. An exmple of grph grmmr used in the experiments. with lels x nd y. There re lso three edges K, K2, nd K3 which re not prt of ny repetitive sugrph. In the first itertion we find grmmr production S, ecuse overlpping sugrphs with edges,, nd c score the highest in compressing the grph. Exmining production S, we notice tht node 3 is not involved in connection instructions. It is consistent with the input grph where there re no two sugrphs overlpping on this node. The compressed grph, t this point, contins the node S, edges K, K2, K3 nd sugrphs with edges x nd y. In the second itertion our progrm finds ll overlpping sustructures with edges x nd y nd proposes production S2. Compressing the tree with production S2 results in grph which we use s n initil production S, ecuse the grph cn e compressed no further. In Fig. 5 productions for S nd S2 hve grphs s terminls. We will omit drwing terminl grphs in susequent figures. The tree used in this exmple ws used in our experiments, nd the grmmr on the right in Fig. 5 is the ctul inferred grmmr. 7. Experiments 7.. Methodology Hving our system implemented, we fced the chllenge of evluting its performnce. There re n infinite numer of grmmrs s well s grphs generted from these grmmrs. In our experiments we restricted grmmrs to node replcement grmmrs with two productions. The second production replces non-terminl node with single terminl node. In Fig. 6 we give n exmple of such grmmr. The grmmr on the left is of the form used in genertion. The grmmr on the right is the inferred grmmr in our experiment. The inferred grmmr production is ssumed to hve terminting

11 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. J.P. Kukluk et l. / Inference of node replcement grph grmmrs lterntive with the sme structure s the recursive lterntive, ut with no non-terminls. We omit terminting production in Fig. 6. We ssocite proilities with productions used in genertion. These proilities define how often prticulr production is used in derivtions. Assigning proilities to productions helps us to control the size of the generted grph. Our inference system does not infer proilities. Otes et l. [7] ddresses the prolem of inferring proilities ssuming tht the productions of grmmr re given. We re considering inferring proilities long with productions s future work. We developed grph genertor to generte grphs from known grmmr. We cn generte directed or undirected grphs with lels on nodes nd edges. Our genertor produces grph y replcing non-terminl node of grph until ll nodes nd edges re terminl. The genertion process expnds the grph s long s there re ny non-terminl edges or nodes. Since selection of production is rndom ccording to the proility distriution specified in the input file, the numer of nodes of generted grph is lso rndom. We plce limits on the size of the generted grph with two prmeters: minnodes nd mxnodes. We generte grphs from the grmmr until the numer of nodes is etween minnodes nd mxnodes. We distinguish two different distortion opertions to the grph generted from the grmmr: corruption nd dded noise. Corruption involves the redirection of rndomly selected edges. The numer of edges of grph multiplied y noise gives the numer of redirected edges, where noise is vlue from to. We redirect n edge e =(v,v 2 ) y replcing nodes v nd v 2 with two new, rndomly selected grph nodes v nd v 2. When we dd noise, we do not destroy generted grph structure. We dd new nodes nd new edges with lels ssigned rndomly from lels used in lredy generted grph structure. We compute the numer of dded nodes from the formul (noise/(- noise))*numer of nodes. The numer of dded edges we find from (noise/(- noise))*numer of edges. A new edge connects two nodes selected rndomly from existing nodes of the generted structure nd newly dded nodes. The distortion of grph when we dd noise is from dded edges nd vertices. The higher the noise vlue, the higher the proility tht newly dded nodes will e connected to the rest of the grph. If the noise vlue is low, the new dded nodes might not get connected to the grph nd the distortion to the grph will e only from dded edges. We exmined grmmrs with one, two, nd three non-terminls. The first productions of the grmmrs hve n undirected, connected grph with lels on nodes nd edges on the right side. We use ll possile connected simple grphs with three, four, nd five nodes s the structures of grphs used in the productions. There re twenty nine different simple connected undirected unleled grphs [8]. We show them in Fig. 9. Performing experiments with lrger grph structures requires more computtion time. Experiments with complex grmmrs which involve severl productions we pln to descrie in seprte report. We ttempted severl experiments with lrge socil networks of generted terrorist networks; however, we did not find meningful recursive ptterns in this domin. For these experiments the input file hd 8 grphs with totl numer of vertices of 23,23 nd totl numer of edges of 34,32. Computtions for this grph took 885 seconds. Our grph genertor genertes grphs from the known grmmr tht is sed on one of the twenty-nine grph structures. Then we use our inference system to infer the grmmr from the generted grph. We mesure n error etween the originl nd inferred grmmr. We use MDL s mesure of the complexity of grmmr. Our results descrie the dependency of the grmmr inference error on complexity, noise, numer of lels, nd size of generted grphs MDL s mesure of complexity of grmmr We seek to understnd the reltionship etween grph grmmr inference nd grmmr complexity, nd so need mesure of grmmr complexity. One such mesure is the Minimum Description Length

12 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 2 2 J.P. Kukluk et l. / Inference of node replcement grph grmmrs (MDL) of grph, which is the minimum numer of its necessry to completely descrie the grph. Here we define the MDL mesure, which while not provly miniml, is designed to e ner-miniml encoding of grph. See [5] for more detiled discussion. MDL(grph) = vits + rits + eits, where vits is the numer of its needed to encode the nodes nd node lels of the grph vits =lgv + v lg l u, v is the numer of nodes in the grph; lg v is the numer of its to encode the numer of nodes v in the grph; l u is the numer of unique lels in the grph. rits is the numer of its needed to encode the rows of the djcency mtrix of the grph v ( ) v rits =(v +)lg( +)+ lg ki i= is the mximum numer of s in ny row of the djcency mtrix; k i is the numer of s in row i of the djcency mtrix. eits is the numer of its needed to encode edges given in djcency mtrix eits = e( + lg l u )+(K +)lgm, e is the numer of edges of grph; m is the mximum numer of edges etween ny two nodes; in our grphs m =ecuse grphs re simple, therefore (K +)lgm =; K is numer of sin djcency mtrix of grph, in our grphs K = e. Since ll the grmmrs in our experiments hve two productions nd the second production replces non-terminl with single node, the complexity of the grmmr will vry depending only on the grph on the right side of the first production. We would like our results for one, two nd three non-terminl grmmrs to e comprle; therefore we do not wnt our mesure of complexity of grmmr to e dependent on the numer of non-terminls. In every grph used in the productions we reserve three nodes. We give the sme lel to these nodes. When we generte grph, we replce one, two, or three lels of these nodes with the non-terminl S when we need grmmr with one, two or three nonterminls. However, when we mesure MDL of grph we leve the originl three lels unchnged. In our experiments we lwys use tht sme non-terminl lel. In the generl cse production cn contin different non-terminls. Every non-terminl would need to e counted s different lel of grph nd MDL would increse with incresing numer of non-terminls. Therefore, replcing ll non-terminls with one lel elimintes the influence of the numer of non-terminls on.the MDL mesure, nd we cn compre inference error of the structures of the grphs used in productions cross one, two nd three non-terminls. Next, we give n exmple of clculting the MDL of grph using the grph structure from Fig. 6. The djcency mtrix of the grph nd the MDL clcultion re s follows. c c

13 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 3 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 3 vits =lgv + v lg l u =lg5+5lg9=8.7 rits =(v +)lg( +)+ v i= =(5+)lg(3+)+lg =22.29 ( ) v lg ki ( ( ( ( ( lg +lg +lg +lg ) 3) ) ) ) eits = e( + lg l u )+(K +)lgm = 6( + lg 9) + (6 + ) lg = 25.2 MDL(grph) =vits + rits + eits =65.48 We cn compre this result with n MDL vlue 26.9 of tringle with three vertices, three edges nd four different lels Error We introduce mesure to compre the originl grmmr to the inferred grmmr. Our definition of n error hs two spects. First, there is the structurl difference etween the inferred nd the originl grph used in the productions. Second, there is the difference etween the numer of non-terminls nd the numer of connection instructions. If there is no error, the numer of non-terminls in the originl grmmr is the sme s the numer of connection instructions in the inferred grmmr. We compute the structurl difference etween grphs with n lgorithm for inexct grph mtch initilly proposed y Bunke nd Allermnn [2]. For more detils see lso [5]. We would like our error to e vlue etween nd ; therefore, we normlize the error y hving in the denomintor the sum of the size of the grph used in the originl grmmr nd the numer of non-terminls. We do not llow n error to e lrger thn ; therefore, we tke the minimum of nd our mesure s finl vlue. The restriction tht the error is not lrger thn prohiits unnecessry influence on the verge error tken from severl vlues y inferred grph structure significntly lrger thn the grph used in the originl grmmr. where Error =min (, mtchcost(g,g 2 )+ #CI #NT size(g )+#NT mtchcost(g,g 2 ) is the miniml numer of opertions required to trnsform g to grph isomorphic to g 2,org 2 to grph isomorphic to g. The opertions re: insertion of n edge or node, deletion of node or n edge, or sustitution of node or edge lel. #CI is the numer of inferred connection instructions #NT is the numer of non-terminls in the originl grmmr size(g ) is the sum of the numer of nodes nd edges in the grph used in the grmmr production Most of the time mtchcost(g,g 2 ) is only frction of the size of the originl grmmr. We could choose lrger denomintor to ensure tht error will not e lrger thn. However, in some cses the lerned grmmr ws n ttched pir of the structure in the originl grmmr, s in Fig., where the lnguge of the lerned grmmr is very close to the originl grmmr, ut the structurl ),

14 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 4 4 J.P. Kukluk et l. / Inference of node replcement grph grmmrs error noise MDL error noise MDL error noise MDL Fig. 7. Error s function of noise nd MDL where grph structure ws not corrupted. error noise MDL error noise MDL error noise MDL Fig. 8. Error s function of noise nd MDL where grph structure ws corrupted. difference etween the lerned nd originl grmmrs is lrge. We decided on size(g )+#NT nd limit error to, ecuse it gve clerer picture of the error. Using size(g )+size(g 2 )+#NT s denomintor would lso e vlid mesure, nd e etween nd, ut my unduly penlize for vrints of the trget grmmr Experiment : Error s function of noise nd complexity of grmmr We used twenty nine grphs from Fig. 9 in grmmr productions. We ssigned different lels to nodes nd edges of these grphs except three nodes used for non-terminls. We generted grphs with noise from to.9 in. increments. For every vlue of noise nd MDL we generted thirty grphs from the known grmmr nd inferred the grmmr from the generted grph. We computed the inference error nd verged it over thirty exmples. We generted 87 grphs to plot ech of the three grphs in Fig. 7. The first plot shows results for grmmrs with one non-terminl. The second nd the third plot show results for grmmrs with two nd three non-terminls. We did not corrupt the generted grph structure in experiments in Fig. 7. As noise we dded nodes nd edges to the generted grph structure. We used only the ADD NOISE TO GRAPH function of our genertor. Figure 8 hs the sme results s Fig. 7 with the difference tht we corrupted the grph structure generted from the grmmr nd then we dded nodes nd edges to the grph. We used oth CORRUPT GRAPH STRUCTURE nd ADD NOISE TO GRAPH functions of the genertor to distort the grph. We see trends in the plots in Figs 7 nd 8 Error decreses s MDL increses. A low vlue of MDL is ssocited with smll grphs, with three or four nodes nd few edges. These grphs, when used on the right hnd side of grmmr production, generte grphs with fewer lels thn grmmrs with high MDL. Smller numers of lels in the grph increse the inference error, ecuse everything in the grph looks similr, nd the pproch is more likely to propose nother grmmr which is very different

15 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 5 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 5 Tle 2 Twenty nine simple grphs ordered ccording to incresing verge inference error of six experiments in Figs 7 nd 8. The numers in the tle refer to structures in Fig. 9 Numer of non-terminls Not Corrupted Corrupted Fig. 9. Twenty nine simple connected grphs ordered ccording to non-decresing MDL vlue. thn the originl. As expected, the error increses s the noise increses in experiments with corrupted grph structure. However, there is little dependency of n error from the noise if the grph generted from the grmmr is not corrupted (Fig. 7). We verge the vlue of n error over ten vlues of noise which gives us the vlue we cn ssocite with the grph structure. It llowed us to order grph structures used in the grmmr productions sed on verge inference error. In Fig. 9 we show ll twenty nine connected simple grphs with three, four nd five nodes used in productions ordered in non-decresing MDL vlue of grph structure. We decided on this order ecuse grphs we produce show dependency of error on MDL of grph structure. This order gives us the sense of complexity. In Tle 2 we give n order of grph structures for six experiments with corrupted nd non-corrupted structures nd one, two, nd three non-terminls. The numers in the tle refer to structure numers in Fig. 9. We see in Tle 2 tht grph numer 2 is close to the eginning of the list in ll six experiments. Grphs numer, 2, nd re close to the end of ll six lists. We conclude tht when grph numer 2 is used in the grmmr production, it is the esiest for our inference lgorithm to find the correct grmmr. When grph numer, 2, or is used in the grmmr production nd generted grphs hve noise present, we infer grmmrs with some error. We lso oserve tendency of decresing error with incresing MDL in the grph orders in Tle 2. Grph 29 hs the highest MDL, ecuse it hs the most nodes nd edges. In five experiments grph 29 is closer to the end of the list. Quntittive definition of n error llows us to utomte the process nd perform tests on thousnds of grphs. The error is cused y wrongly inferred grph structure used in the production or numer of connection instructions which is too lrge or too smll. However, there re cses where the inferred grmmr represents the grph well, ut the grph in the production hs different structure. For exmple, we oserved tht the grmmr with MDL = nd grph numer cuses n error even if we infer

16 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 6 6 J.P. Kukluk et l. / Inference of node replcement grph grmmrs riginl grmmr Inferred grmmr (S) Fig.. An inference error where lrger grph structure is proposed. originl grmmr inferred grmmr Fig.. Error where inferred grmmr is reduced to production with single edge. -2 () () (c) Fig. 2. Grph with overlpping squres (), inferred grmmr (), nd grph generted from inferred grmmr (c). the grmmr from grphs with no corruption nd zero noise. The inferred grph structure contins two overlpping copies of the grph used in the originl grmmr production. We illustrte it in Fig.. The structure hs significnt error, yet does sujectively cpture the recursive structure of the originl grmmr Experiment 2: Error s function of numer of lels nd complexity of grmmr We would like to evlute how error depends on the numer of different lels used in grmmr. We restricted grph structures used in productions to grphs with five nodes. Every grph structure we leled with, 2, 3, 4, 5 or 6 different lels. For every vlue of MDL nd numer of lels we generted 3 different grphs from the grmmr nd computed verge error etween them nd the lerned grmmrs. The generted grphs were without corruption nd without noise. We show the results for one, two, nd three non-terminls in Fig. 3. Below the three dimensionl plots, for clrity, we give two dimensionl plots with tringles representing the errors. The lrger nd lighter the tringle the lrger is the error. We see tht the error increses s the numer of different lels decreses. We see on the two dimensionl plots the shift in error towrds grphs with higher MDL when the numer of non-terminls increses. The verge error for grphs with only one lel is or very close to. The most frequent inference error results from the tendency of our lgorithm to propose one-edge grmmrs when inferred from

17 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 7 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 7 one non-terminl two non-terminls three non-terminls # lels # lels MDL MDL MDL # lels Fig. 3. Error s function of MDL nd numer of different lels used in grmmr definition. grph with only one lel. We illustrte this in Fig. where we see production with pentgon using only one lel. The inferred grmmr hs one edge with two connection instructions - nd -2. Since ll the edges in the generted grph hve the sme lel nd ll the nodes hve the sme lel, this grmmr compresses the grph well nd is evluted highly y our compression-sed mesure. However, this one-edge grmmr cnnot generte even single pentgon. An evlution mesure which penlizes grmmrs for n inility to generte n input grph would is the lgorithm wy from single-edge grmmrs nd could correct the one-edge grmmr prolem. However, this pproch would require grph-grmmr prsing, which is computtionlly complex Experiment 3: Error s function of size of grph nd complexity of grmmr We generted grphs from grmmrs with two non-terminls nd noise =.2. The numer of nodes of the generted grphs ws from the intervl [x, x + 2], where we chnge x from 2 to 42. For ech vlue of x nd MDL we generted thirty grphs nd compute verge inference error over them. We show in Fig. 4 the results for corrupted nd not corrupted grph structure. We concluded tht there is no dependency etween the size of smple grph nd inference error Experiment 4: Limittions In Fig. 2 we show n exmple illustrting the limits of our pproch. In Fig. 2() we hve grph consisting of overlpping squres. All lels on nodes re the sme, nd we omit them. The squres do not overlp y one node ut y n edge. Our lgorithm ssumes tht only one node overlps in the instnces of the sustructure nd therefore infers the grmmr shown in Fig. 2(). The inferred

18 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 8 8 J.P. Kukluk et l. / Inference of node replcement grph grmmrs error size 2 () error size 2 () MDL 2 Fig. 4. Error s function of MDL nd size of generted grphs (noise=.2, two non-terminls): () uncorrupted grph structure, () corrupted grph structure. S S * t t4 S t3 * t7 d t6 t2 S t5 S (S) t7 t5 3 t d 4 S t6 2 t4 t2 Connection 5 6 instructions t3 (S) -5 (S) -6 Fig. 5. The grmmr with lternting productions (left) nd inferred grmmr (right). grmmr cn generte chins, n exmple of which is shown in Fig. 4(c). The originl input grph is not in the set of grphs generted y the inferred grmmr. An extension of our method to overlpping edges would llow us to infer the correct grmmr in this exmple. Figure 5 shows nother exmple illustrting the limits of our lgorithm. The first grph in the first production on the left is squre with two non-terminls leled S, nd the grph of the second production is tringle with one non-terminl leled S. Our lgorithm is not designed to find lternting productions of this type. We generted grph from the grmmr on the left, nd the grmmr we inferred is on the right in Fig. 5. The inferred grmmr hs one production in which the grph comines oth the tringle nd squre. The set of grphs generted y lternting squres nd tringles ccording to the grmmr from the left does not mtch exctly the set of grphs of the inferred grmmr. Nevertheless, we consider it n ccurte inference, ecuse the inferred grmmr will descrie the mjority of every grph generted y the originl grmmr. If we were lerning lternting productions, we would need to infer them in seprte itertions nd nlyze connection instructions, not in every itertion s we do now, ut once for ll interctions comined Experiment 5: Chemicl structures As n exmple from the rel-world domin of chemistry, we use four chemicl structures s the input grphs in our next experiment. Figures 6 nd 7 show the structures of the molecules nd the grmmr productions we found in these structures. The first structure in Fig. 6 is the structure of cellulose with hydrogen onding. The second molecule is mcrocyclic gllium croxylte [23]. We found grmmr production with the G-G ond. The grph used in the production definition ppers four times in the

19 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 9 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 9 Chemicl structure cellulose with hydrogen onding Inferred productions R G O C O G O C O R Mcrocyclic gllium croxylte R O C G O G O R O C C O R O G O G O C R R O C O G O C O G R wter-solule tin-sed metllodendrimer Si[CH 2 CH 2 Sn(CH 2 CH 2 CONHCH 2 CH 2 OH) 3 )] 4 OH Fig. 6. Three chemicl structures (left) nd the inferred grmmr production (right). structure. The third structure in Fig. 6 is wter-solule tin-sed metllodendrimer [2]. We inferred two productions. We found production S in the first itertion. Production S hs connection instruction - which mens tht vertex numer is replced y n isomorphic instnce of the right hnd side of the

20 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 2 2 J.P. Kukluk et l. / Inference of node replcement grph grmmrs Fig. 7. The structure of dendronized polymer nd its representtion in hierrchicl grph grmmr productions. S production, nd the connecting vertex in the new instnce of grph is lso vertex. We found the second production fter ll instnces of S were compressed into single vertices. The grph on the right hnd side of production S is grph of chemicl structure compressed with S. In Fig. 7 we hve the structure of dendronized polymer [25]. Its grph grmmr representtion consists of three productions. Zhng et l. descrie severl chemicl structures where the grph we found in production S2 in Fig. 7 ppers two, six, nd fourteen times. Since production S conveys the ide of one or more connected grphs of the S structure, it intends to descrie the entire fmily of chemicl structures descried in Zhng et l. s pper. The grmmr productions we found cpture the underlying motifs of the chemicl structures. They show the repetitive connected components, the sic uilding locks of the structures. We cn serch for such underlying uilding lock motifs in different domins, hoping tht they will improve our understnding of chemicl, iologicl, computer, nd socil networks Experiment 6: Inferring XML schem XML hs ecome common formt for dt exchnge on the Internet. XML schem or DTDs define the structure nd constrints on XML documents. Developers often need to write schem for existing XML documents, nd so they would find schem inference system prcticl. The structure of n XML file is n exmple of reltionl dt with repetitive recursive ptterns we cn represent s grph grmmrs. We developed converter which converts n XML file into tree. We used the Jv implementtion of Document Oject Model (DOM) in our converter. According to [] there re twelve DOM node (S2)

21 Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. 2 J.P. Kukluk et l. / Inference of node replcement grph grmmrs 2 <?xml version="."?> <!-- Smple for p_sustnce26.xml --> <!DOCTYPE PhrmcologiclActionSustnceSet SYSTEM "p_sustnce26.dtd"> <!-- Root element --> <PhrmcologiclActionSustnceSet> <!-- Sustnce (Descriptor) --> <Sustnce> <RecordUI>D536</RecordUI> <RecordNme> <String>Aluminum Hydroxide</String> </RecordNme> <!-- The list of PAs for this sustnce --> <PhrmcologiclActionList> <!-- First PA --> <PhrmcologiclActionOfSustnce> <DescriptorReferredTo> <DescriptorUI>D276</DescriptorUI> <DescriptorNme> <String>Adjuvnts, Immunologic</String> </DescriptorNme> </DescriptorReferredTo> </PhrmcologiclActionOfSustnce>. <!ENTITY % DescriptorReference "(DescriptorUI, DescriptorNme)"> <!ELEMENT PhrmcologiclActionSustnceS et (Sustnce*)> <!ELEMENT Sustnce ((RecordUI,RecordNme), PhrmcologiclActionList)+> <!ELEMENT PhrmcologiclActionList (PhrmcologiclActionOfSustn ce)+> <!ELEMENT PhrmcologiclActionOfSustnc e (DescriptorReferredTo)> <!ELEMENT DescriptorReferredTo (%DescriptorReference;)> <!ELEMENT DescriptorUI (#PCDATA)> <!ELEMENT DescriptorNme (String)> <!ELEMENT RecordUI (#PCDATA) > <!ELEMENT RecordNme (String) > <!ELEMENT String (#PCDATA)> Fig. 8. An XML file descriing phrmcology dt nd the Document Type Definition (DTD). types: Element, Attr, Text, CDATASection, EntityReference, Entity, ProcessingInstruction, Comment, Document, DocumentType, DocumentFrgment, nd Nottion. However in our implementtion we uild directed tree with root node lwys leled DOC nd then the only type of dt we extrct in the exmples elow is Element. From the perspective of our grph grmmr inference system we need pttern which is repeted in the grph, so we eliminted unique text dt (e.g., nmes, crd numers, nd price vlues). We do not ssign lels to the edges of the tree. We selected domins for our experiments where we cn esily identify the mening of results. XML files often contin dt with repetitive structures. An exmple of the eginning of such file with phrmcology dt nd its Document Type Definition (DTD) is show in Fig. 8. This is smple file found on the Ntionl Lirry of Medicine wesite. The tree fter conversion of this file we show in Fig. 9. In Fig. 2 we see the grph grmmr found y our inference system. Productions S nd S2 give the representtion of structure of n XML file which contins phrmcologicl dt. Exmining the DTD we see tht element PhrmcologiclActionSustnceSet contins zero or more Sustnce elements. It is indicted y *. Similrly, element PhrmcologiclActionList contins one or more PhrmcologiclActionOfSustnce elements. It is mrked in the DTD with +. We discover these two concepts in production S nd S2 in Fig. 2. However, our inference system cnnot distinguish etween one or more nd zero or more concepts. In the genertion process we would connect the node leled S2 of one instnce of the production grph to the node with the sme lel of nother instnce. The process would continue until the node leled S2 would e replced y lel PhrmcologiclActionSustnceSet. Production S is included s non-terminl node of production S2. In our present implementtion we do not specify to which node of grph on the right hnd side of S we would connect Sustnce node of S2. Following from prent to child nodes of structure of grphs of S nd S2 we cn find corresponding DTD entries. For exmple PhrmcologiclActionOfSustnce hs only one child DescriptorReferredTo, which corresponds to the DTD entry <!ELEMENT PhrmcologiclActionOfSustnce (DescriptorReferredTo)>. We conclude

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs SAPPER: Sugrph Indexing nd Approximte Mtching in Lrge Grphs Shijie Zhng, Jiong Yng, Wei Jin EECS Dept., Cse Western Reserve University, {shijie.zhng, jiong.yng, wei.jin}@cse.edu ABSTRACT With the emergence

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

documents 1. Introduction

documents 1. Introduction www.ijcsi.org 4 Efficient structurl similrity computtion etween XML documents Ali Aïtelhdj Computer Science Deprtment, Fculty of Electricl Engineering nd Computer Science Mouloud Mmmeri University of Tizi-Ouzou

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

Notes for Graph Theory

Notes for Graph Theory Notes for Grph Theory These re notes I wrote up for my grph theory clss in 06. They contin most of the topics typiclly found in grph theory course. There re proofs of lot of the results, ut not of everything.

More information

ISG: Itemset based Subgraph Mining

ISG: Itemset based Subgraph Mining ISG: Itemset bsed Subgrph Mining by Lini Thoms, Stynryn R Vlluri, Kmlkr Krlplem Report No: IIIT/TR/2009/179 Centre for Dt Engineering Interntionl Institute of Informtion Technology Hyderbd - 500 032, INDIA

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers Mth Modeling Lecture 4: Lgrnge Multipliers Pge 4452 Mthemticl Modeling Lecture 4: Lgrnge Multipliers Lgrnge multipliers re high powered mthemticl technique to find the mximum nd minimum of multidimensionl

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

ZZ - Advanced Math Review 2017

ZZ - Advanced Math Review 2017 ZZ - Advnced Mth Review Mtrix Multipliction Given! nd! find the sum of the elements of the product BA First, rewrite the mtrices in the correct order to multiply The product is BA hs order x since B is

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

PPS: User Manual. Krishnendu Chatterjee, Martin Chmelik, Raghav Gupta, and Ayush Kanodia

PPS: User Manual. Krishnendu Chatterjee, Martin Chmelik, Raghav Gupta, and Ayush Kanodia PPS: User Mnul Krishnendu Chtterjee, Mrtin Chmelik, Rghv Gupt, nd Ayush Knodi IST Austri (Institute of Science nd Technology Austri), Klosterneuurg, Austri In this section we descrie the tool fetures,

More information

An Efficient Algorithm for Discovering Frequent Subgraphs. Technical Report

An Efficient Algorithm for Discovering Frequent Subgraphs. Technical Report An Efficient Algorithm for Discovering Frequent Sugrphs Technicl Report Deprtment of Computer Science nd Engineering Universit of Minnesot 4-192 EECS Building 200 Union Street SE Minnepolis, MN 55455-0159

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

Meaningful Change Detection in Structured Data.

Meaningful Change Detection in Structured Data. Meningful Chnge Detection in Structured Dt Sudrshn S. Chwthe Hector Grci-Molin Computer Science Deprtment, Stnford University, Stnford, Cliforni 94305 fchw,hectorg@cs.stnford.edu Astrct Detecting chnges

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

Approximation by NURBS with free knots

Approximation by NURBS with free knots pproximtion by NURBS with free knots M Rndrinrivony G Brunnett echnicl University of Chemnitz Fculty of Computer Science Computer Grphics nd Visuliztion Strße der Ntionen 6 97 Chemnitz Germny Emil: mhrvo@informtiktu-chemnitzde

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it.

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it. 6.3 Volumes Just s re is lwys positive, so is volume nd our ttitudes towrds finding it. Let s review how to find the volume of regulr geometric prism, tht is, 3-dimensionl oject with two regulr fces seprted

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

Video-rate Image Segmentation by means of Region Splitting and Merging

Video-rate Image Segmentation by means of Region Splitting and Merging Video-rte Imge Segmenttion y mens of Region Splitting nd Merging Knur Anej, Florence Lguzet, Lionel Lcssgne, Alin Merigot Institute for Fundmentl Electronics, University of Pris South Orsy, Frnce knur.nej@gmil.com,

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES GENEATING OTHOIMAGES FO CLOSE-ANGE OBJECTS BY AUTOMATICALLY DETECTING BEAKLINES Efstrtios Stylinidis 1, Lzros Sechidis 1, Petros Ptis 1, Spiros Sptls 2 Aristotle University of Thessloniki 1 Deprtment of

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

Lily Yen and Mogens Hansen

Lily Yen and Mogens Hansen SKOLID / SKOLID No. 8 Lily Yen nd Mogens Hnsen Skolid hs joined Mthemticl Myhem which is eing reformtted s stnd-lone mthemtics journl for high school students. Solutions to prolems tht ppered in the lst

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

The Complexity of Nonrepetitive Coloring

The Complexity of Nonrepetitive Coloring The Complexity of Nonrepetitive Coloring Dániel Mrx Institut für Informtik Humoldt-Universitt zu Berlin dmrx@informtik.hu-erlin.de Mrcus Schefer Deprtment of Computer Science DePul University mschefer@cs.depul.edu

More information

Statistical classification of spatial relationships among mathematical symbols

Statistical classification of spatial relationships among mathematical symbols 2009 10th Interntionl Conference on Document Anlysis nd Recognition Sttisticl clssifiction of sptil reltionships mong mthemticl symbols Wl Aly, Seiichi Uchid Deprtment of Intelligent Systems, Kyushu University

More information

ON THE DEHN COMPLEX OF VIRTUAL LINKS

ON THE DEHN COMPLEX OF VIRTUAL LINKS ON THE DEHN COMPLEX OF VIRTUAL LINKS RACHEL BYRD, JENS HARLANDER Astrct. A virtul link comes with vriety of link complements. This rticle is concerned with the Dehn spce, pseudo mnifold with oundry, nd

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

The Complexity of Nonrepetitive Coloring

The Complexity of Nonrepetitive Coloring The Complexity of Nonrepetitive Coloring Dániel Mrx Deprtment of Computer Science nd Informtion Theory Budpest University of Technology nd Econonomics Budpest H-1521, Hungry dmrx@cs.me.hu Mrcus Schefer

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

Compilers Spring 2013 PRACTICE Midterm Exam

Compilers Spring 2013 PRACTICE Midterm Exam Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs

More information

Registering as a HPE Reseller. Quick Reference Guide for new Partners in Asia Pacific

Registering as a HPE Reseller. Quick Reference Guide for new Partners in Asia Pacific Registering s HPE Reseller Quick Reference Guide for new Prtners in Asi Pcific Registering s new Reseller prtner There re five min steps to e new Reseller prtner. Crete your Appliction Copyright 2017 Hewlett

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

MTH 146 Conics Supplement

MTH 146 Conics Supplement 105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 80Q Multiply two frctions or frction nd whole numer. (QT N ) Excerpted from: The Mth Lerning Center PO Box 99, Slem, Oregon 9709 099 www.mthlerningcenter.org

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Semistructured Data Management Part 2 - Graph Databases

Semistructured Data Management Part 2 - Graph Databases Semistructured Dt Mngement Prt 2 - Grph Dtbses 2003/4, Krl Aberer, EPFL-SSC, Lbortoire de systèmes d'informtions réprtis Semi-structured Dt - 1 1 Tody's Questions 1. Schems for Semi-structured Dt 2. Grph

More information

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples COMPUTER SCIENCE 123 Foundtions of Computer Science 6. Tuples Summry: This lecture introduces tuples in Hskell. Reference: Thompson Sections 5.1 2 R.L. While, 2000 3 Tuples Most dt comes with structure

More information

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables George Boole IT 3123 Hrdwre nd Softwre Concepts My 28 Digitl Logic The Little Mn Computer 1815 1864 British mthemticin nd philosopher Mny contriutions to mthemtics. Boolen lger: n lger over finite sets

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

Improper Integrals. October 4, 2017

Improper Integrals. October 4, 2017 Improper Integrls October 4, 7 Introduction We hve seen how to clculte definite integrl when the it is rel number. However, there re times when we re interested to compute the integrl sy for emple 3. Here

More information

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information