Cross-Supervised Synthesis of Web-Crawlers

Size: px
Start display at page:

Download "Cross-Supervised Synthesis of Web-Crawlers"

Transcription

1 Cross-Supervised Synthesis of Web-Crwlers Adi Omri Technion Shron Shohm Acdemic College of Tel Aviv Yffo Ern Yhv Technion ABSTRACT A web-crwler is progrm tht utomticlly nd systemticlly trcks the links of website nd extrcts informtion from its pges. Due to the different formts of websites, the crwling scheme for different sites cn differ drmticlly. Mnully customizing crwler for ech specific site is time consuming nd error-prone. Furthermore, becuse sites periodiclly chnge their formt nd presenttion, crwling schemes hve to be mnully updted nd djusted. In this pper, we present technique for utomtic synthesis of web-crwlers from exmples. The min ide is to use hnd-crfted (possibly prtil) crwlers for some websites s the bsis for crwling other sites tht contin the sme kind of informtion. Techniclly, we use the dt on one site to identify dt on nother site. We then use the identified dt to lern the website structure nd synthesize n pproprite extrction scheme. We iterte this process, s synthesized extrction schemes result in dditionl dt to be used for re-lerning the website structure. We implemented our pproch nd utomticlly synthesized 30 crwlers for websites from nine different ctegories: books, TVs, conferences, universities, cmers, phones, movies, songs, nd hotels. Ctegories nd Subject Descriptors D.1.2 [Progrmming Techniques]: Automtic Progrmming; I.2.2 [Artificil Intelligence]: Progrm Synthesis 1 Introduction A web-crwler is progrm tht utomticlly nd systemticlly trcks the links of website nd extrcts informtion from its pges. One of the chllenges of modern crwlers is to extrct complex structured informtion from different websites, where the informtion on ech site my be represented nd rendered in different mnner nd where ech dt item my hve multiple ttributes. For exmple, price comprison sites use custom crwlers for gthering informtion bout products nd their prices cross the web. These crwlers hve to extrct the structured informtion describing products nd their prices from sites with different formts nd representtions. The differences between sites often force progrmmer to crete customized crwler for ech site, tsk tht is time consuming nd error-prone. Furthermore, websites Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distributed for profit or commercil dvntge nd tht copies ber this notice nd the full cittion on the first pge. Copyrights for components of this work owned by others thn ACM must be honored. Abstrcting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission nd/or fee. Request permissions from permissions@cm.org. ICSE 16, My 14-22, 2016, Austin, TX, USA c 2016 ACM. ISBN /16/05... $15.00 DOI: my eventully chnge their formt nd presenttion, therefore the crwling schemes hve to be mnully mintined nd djusted. Gol The gol of this work is to utomticlly synthesize webcrwlers for fmily of websites tht contin the sme kind of informtion but my significntly differ on lyout nd formtting. We ssume tht the progrmmer provides one or more hnd-crfted web-crwlers for some of the sites in the fmily, nd would like to utomticlly generte crwlers for other sites in the fmily. For exmple, given fmily of four websites of online book stores (ech contining tens of thousnds of books), nd hnd-crfted crwler for one of them, we utomticlly generte crwlers for the other three. Note tht our gol is not only to extrct dt from web-sites, but to synthesize the progrms tht extrct the dt. Existing Techniques Our work is relted to wrpper induction [24]. The gol of wrpper induction is to utomticlly generte extrction rules for website bsed on the regulrity of pges inside the site. Our min ide is to try nd leverge similr regulrity cross multiple sites. However, becuse different sites my significntly differ on their lyout, we hve to cpture this regulrity t more bstrct level. Towrds tht end, we define n bstrct logicl representtion of website tht llows us to identify commonlity even in the fce of different formtting detils. In contrst to supervised techniques [24, 25, 29, 5], which require lbeled exmples, nd unsupervised techniques [2, 3, 8, 32, 31, 34, 40] tht frequently require mnul nnottion of the extrcted dt, our pproch uses cross supervision, where the lerned extrction rules of one site re used to produce lbeled exmples for lerning the extrction rules in nother site. Our technique uses XPth [7], widely used web documents query lnguge long with regulr expressions. This mkes our resulting extrction schemes humn redble nd esy to modify when needed. There hs been some work on the problem of XPth robustness to site chnges [9, 26, 28], trying to pick the most robust XPth query for extrcting prticulr piece of informtion. While robustness is desirble property, our bility to efficiently synthesize crwler circumvents this chllenge s crwler cn be regenerted whenever site chnges. Our Approch: Cross-Supervised Lerning of Crwling Schemes We present technique for utomticlly synthesizing dt-extrcting crwlers. Our technique is bsed on two observtions: (i) sites with similr content hve dt overlps, nd (ii) in given site, informtion with similr semntics is usully locted in nodes with similr loction in the document tree. Using these observtions, we synthesize dt-extrcting crwlers for group of sites shring the sme type of informtion. Strting from one or more hnd-crfted crwlers which provide reltively smll initil set of crwled dt, we use n itertive pproch to discover dt instnces in new websites nd extrpolte dt ex-

2 trction schemes which re in turn used to extrct new dt. We refer to this process s cross-supervised lerning, s dt from one web-site is repetedly used to guide synthesis in other sites. Our crwlers extrct dt describing different ttributes of items. We introduce the notion of continer to mintin reltionships between different ttributes tht refer to the sme item. We use continers, which re utomticlly selected without ny prior knowledge of the structure of the website, to hndle pges with multiple items, nd to filter out irrelevnt dt. This llows us to synthesize extrction schemes from positive exmples only. Our pproch is sclble nd prcticl: we used cross-supervision to synthesize crwlers for severl product review websites, e.g., tvexp.com, weppir.com, cmexp.com nd phonesum.com. Min Contributions The contributions of this pper re: A frmework for utomtic synthesis of web-crwlers. The min ide is to use hnd-crfted crwlers for number of websites s the bsis for crwling other sites tht contin the sme kind of informtion. A new cross-supervised crwler synthesis lgorithm tht extrpoltes crwling schemes from one web-site to nother. The lgorithm hndles pges with multiple items nd synthesizes crwlers using only positive exmples. An implementtion nd evlution of our pproch, utomticlly synthesizing 30 crwlers for websites from nine different ctegories: books, TVs, conferences, universities, cmers, phones, movies, songs nd hotels. The crwlers tht we synthesize re rel crwlers tht were used to crwl more thn 12, 000 webpges over ll ctegories. 2 Overview 2.1 Motivting Exmple Consider price comprison service for books, which crwls book seller websites nd provides list of sellers nd corresponding prices for ech book. Exmples of such book seller sites include brnesndnoble.com (B&N), blckwell.co.uk (BLACKWELL) nd bebooks.com (ABE). Ech of these sites lists wide collection of books, typiclly presented in templte generted webpges feeding from dtbse. Since these pges re templte generted, they present structured informtion for ech book in formt tht is repeted cross books. By recognizing this repetitive structure for given site, one cn synthesize dt extrction query nd use it to utomticlly extrct the entire book collection. While the formt within single site is typiclly stble, the formts between sites differ considerbly. Fig. 1 shows smll nd simplified frgment of the pge structure on B&N nd BLACKWELL in HTML. Fig. 3 shows prt of the tree representtion of the corresponding sites (s well s of ABE), where D 1,..., D 4 denote different pges. Due to the differences in structure, the dt extrction query cn differ drmticlly. For exmple, in BLACKWELL nd ABE, ech of the pges (D 2, D 3, D 4) presents single book, wheres in B&N the pge D 1 shows list of severl books. The gol of this work is to utomticlly synthesize crwlers for new sites bsed on some existing hnd-crfted crwler(s). For exmple, given crwler for the BLACKWELL site, our technique synthesizes crwler for B&N website. The synthesized crwler is depicted in Fig. 2. We show tht this cn be done despite the significnt differences between the sites BLACKWELL, nd B&N, in terms of HTML structure. We note tht the exmples tht we present in this section re bbrevited nd simplified. For exmple, the rel DOM tree for the B&N pge we show here contins round 1, 000 nodes. The structure of the full trees, nd the XPths required for processing them re more involved thn wht is shown here. brnesndnoble.com <ol clss="result-set box"> <li clss="result box">.. < clss="detils below-xis" > < href="..." dt-bntrck="title_ " clss="title" > THROUGH THE LOOKING GLASS</> < href=".." dt-bntrck="contributor_ " clss="contributor" > Dvid Winston Busch</>... < clss="price-formt"> < href="..." dt-bntrck="pperbck_formt"> <spn clss="formt">pperbck</spn> <spn clss="price">$9.91</spn> </> </> </>...</li> <li clss="result box">.. < clss="detils below-xis" > < href="..." dt-bntrck="title_ " clss="title"> Alice s Adventures in Wonderlnd</> < href=".." dt-bntrck="contributor_ " clss="contributor" >Lewis Crroll</>... < clss="price-formt"> < href="..." dt-bntrck="pperbck_formt"> <spn clss="formt">pperbck</spn> <spn clss="price">$6.49</spn> </> </> </>...</li> </ol> blckwell.com < id="product-biblio"> <h1>through the looking glss</h1> < clss="link_type1" href="/jsp//lewis_crroll"> Dvid Winston Busch </> < clss="price-info" lign="center"> <spn clss="price"> 8.99</spn> </> </> Figure 1: Frgments of webpges with the similr ttribute vlues for book on two different book shopping sites. 2.2 Cross-Supervised Lerning of Crwling Schemes Our min observtion is tht despite the significnt differences in the concrete lyout of websites, the pges of websites tht exhibit the sme product ctegory often shre the sme logicl structure; they present similr ttributes for ech product. For exmple, ech of the pges of Fig. 1 presents the sme importnt ttributes bout the book, including its title, uthor nme nd price. Moreover, there is lrge number of shred products between these websites. The book Through the looking glss is one such exmple for B&N nd BLACKWELL. Our technique exploits dt overlps cross sites in order to lern the concrete structure of new website s bsed on other websites. Specificlly, we identify in s concrete dt extrcted from other sites nd s such lern the structure in which this dt is represented in s. We then use multiple exmples of the structure in which the dt ppers in s in order to generlize nd get n extrction query for s. This enbles our lgorithm to extrpolte crwler for s. We do not require precise mtch of dt cross sites, s our technique lso hndles noisy dt. (For exmple, prices do not hve

3 1 clss MySpider(CrwlSpider): 2 nme = "brnesndnoble" 3 llowed_domins = [" 4 strt_urls = [ 5 (" 6 ] 7 8 rules = ( 9 Rule(LinkExtrctor( 10 llow=("/s/.*"),cllbck="prse_item", follow=true 11 ), 12 ) def prse_item(self, response): 15 sel = Selector(response) 16 rows = sel.xpth( //body///section//ol["result-set box"]/li[@clss="result box"]//[@clss="detils below-xis"] ) 17 for r in rows: 18 item = BooksItem() 19 item[ title ] = r.xpth( 20 //[@clss="title"] 21 ).extrct() 22 item[ uthor ] = r.xpth( 23 //[@clss="contributor"] 24 ).extrct() 25 item[ price ] = r.xpth( 26 //[@clss="price-formt"]//spn[@clss="price"] 27 ).extrct() 28 yield item Figure 2: Crwler for jv books from Brnes&Noble. to be identicl; ny number cn be mtch.) Crwling schemes A crwler, such s the one of Fig. 2, contins some boilerplte code defining the crwler clss nd its opertions. However, the essence of the crwler is its crwling scheme. For exmple, in Fig. 2 the crwling scheme is highlighted in boldfce. A crwling scheme is defined with respect to set of semntic groups, clled ttributes, which define the types of dt to be extrcted. In the books exmple, the ttributes re: book title, uthor nd price. Given set of ttributes, crwling scheme consists of the following two components: (i) A dt extrction query tht defines how to obtin vlues of the ttributes for ech item listed on the site. (ii) A strting point URL nd URL filtering pttern which let the crwler locte relevnt pges nd filter out irrelevnt pges without downloding nd processing them. Our crwlers use XPth s query lnguge for dt extrction. XPth is query lnguge for selecting nodes from n XML document which is bsed on the tree representtion of the XML document, nd provides the bility to nvigte round the tree, selecting nodes by describing their pth from the document tree root node. For exmple, Fig. 4 nd Fig. 5 show the crwling schemes for crwling books from BLACKWELL nd B&N respectively, where the dt extrction query is expressed using XPths. Two-level dt extrction schemes We ssume tht the dt extrction query hs two levels: The first level query is n XPth describing n item continer. Intuitively, continer is sub-tree tht contins ll the ttribute vlues we would like to extrct (defined more formlly in Sec. 5.) For exmple, in Fig. 4, the XPth //body/[@clss="content mincore-shop"]... describes continer of book ttributes on BLACKWELL pges. The second level queries contin n extrction XPth for vlues of ech inidul ttribute. These XPths re reltive to the root of the continer. For exmple, ///h1/ in Fig. 4 is used to pick the node tht hs type h1 (heding 1), contining the book title. Itertive synthesis of crwling schemes Our pproch considers set of websites, nd set of ttributes to extrct. To bootstrp the synthesis process, the user is required to provide the set of websites for which crwler synthesis is desired, s well s crwling scheme for t lest one of these sites. Alterntively, the user cn provide multiple prtil crwling schemes for different sites, tht together cover ll the different item ttributes. The synthesis process strts by invoking the provided extrction scheme(s) on the corresponding sites to obtin n initil set of vlues for ech one of the ttributes. These vlues re then used to locte nodes tht contin ttribute vlues in the document trees of webpges of new sites. The nodes tht contin ttribute vlues revel the structure of pges of the corresponding websites. In prticulr, smllest subtrees tht exhibit ll the ttributes mount to continers. This llows for synthesis of dt extrction schemes for new websites. The newly lerned extrction schemes re used to extrct more vlues nd dd them to the set of vlues of ech ttribute, possibly llowing for dditionl websites to be hndled. This process is repeted until complete extrction schemes re obtined for ll websites, or until no dditionl vlues re extrcted. In our exmple, the lgorithm strts with the dt extrction scheme for BLACKWELL (see Fig. 4), provided by user. It extrcts from D 2 uthor-x, title-x, nd price s vlues of the book title, uthor, nd price ttributes, respectively (see Fig. 3). These vlues re identified in D 1 (B&N) within the subtree of the left most node represented by //body/.../ol["result-set box"] /li[@clss="result box"]/... /[@clss="detils below-xis"], which then points to the ltter node s possible continer. Additionl vlues tken from D 3 nd other pges in BLACKWELL identify dditionl nodes in the B&N tree s ttribute nd continer nodes. Note tht uthor-x is lso found in nother subtree in D 1. However, there re no instnces of the remining ttributes in tht subtree; Therefore, the subtree is not considered continer nd the corresponding node is treted s noise. By identifying the commonlity between the identified continers nd between nodes of the sme ttribute, dt extrction scheme for B&N is synthesized (see below). In the next itertion, the new dt scheme is used to extrct from B&N the vlues uthor-z, title-z nd price s dditionl vlues for book title, uthor, nd price respectively (tht did not exist in BLACKWELL). The new vlues re locted in ABE (see D 4 in Fig. 3), llowing to lern n extrction scheme for ABE s well. XPth synthesis for two-level queries Our pproch synthesizes two level extrction scheme for ech website from set of ttribute nodes nd cndidte continers identified in its webpges. The twolevel query structure is reflected lso in the synthesis process of the extrction scheme. Techniclly, we use two-phse pproch to synthesize the extrction scheme. In ech site, we first generte n XPth query for the continers. We then filter the ttribute nodes keeping only those rechble from continers tht gree with the continer XPth, nd generte XPths for their extrction reltively to the continer nodes. To generte n XPth query for set of nodes (e.g., for the set of continers), we consider the concrete XPth of ech node this is n XPth tht extrcts exctly this node. We unify these concrete XPths by greedy lgorithm tht ims to find the most concrete (most strict) XPth query tht grees with mjority of the concrete XPths. Keeping the unified XPth s concrete s possible prevents the ddition of noise to the extrction scheme. The generted XPths for B&N re depicted in Fig. 5. In this exmple, unifiction is trivil since the XPths re identicl. How-

4 B&N D1 Blckwell D2 D3 Abe D4 body body body body ol clss= result-set box" td td id= bookinfo" uthor-x li clss= result box li clss= result box li clss= buy-options" clss= buy-options" h1 Id = book-title h2 Id = book-uthor clss= bsket.." clss= detils below-xis clss= detils below-xis clss= price-info clss= price-info clss= title clss= price-formt" clss= title clss= price-formt" h1 id= book-title" clss= link_type1 spn clss="price" h1 id= book-title" clss= link_type1 spn clss="price" spn clss="price id= book-price clss= contributor spn clss= price" clss= contributor spn clss= price" spn clss= lrge spn clss= lrge title-x uthor-x price title-z uthor-z price title-x uthor-x price title-y uthor-y price title-z uthor-z price Figure 3: Exmple DOM trees Continer: Title: Author: Price: URL Pttern: Continer: Title: Author: Price: URL Pttern: //body/[@clss="content mincore-shop"] /tble[@clss="min-pge"]/tr/ td[@clss="two-col-right"]/tble/tr/td ///h1/ ///[@clss="link_type1"] //[@id="buy-options"]//spn.*jsp/id/.* Figure 4: Crwling scheme for BLACKWELL. //body///section//ol["result-set box"] /li[@clss="result box"]/ /[@clss="detils below-xis"] //[@clss="title"] //[@clss="contributor"] //[@clss="price-formt"] //spn[@clss="price"] /s/.* Figure 5: Crwling scheme for B&N. ever, if for exmple ech of the continer nodes lbeled in D 1 hd different id s, the id feture would hve been removed during unifiction. Note tht even if the subtree tht contins the noisy instnce of uthor-x in D 1 hd been identified s cndidte continer (e.g., if it hd contined vlues of the other ttributes), it would hve been discrded during the unifiction. URL pttern synthesis In order to synthesize URL pttern for the crwling scheme of new site, we extend the itertive technique used for synthesis of dt extrction schemes; In ech itertion of the lgorithm, for ech website we identify set of pges of interest s pges tht contin ttribute dt. We filter these pges in ccordnce with the filtering of continer nd ttribute nodes. We then unify the URLs of remining pges similrly to XPth unifiction. Fig. 5 depicts the URL pttern generted by our pproch for B&N. This pttern identifies webpges in B&N tht present list of books these re the pges whose structure conforms with the synthesized extrction scheme. Note tht B&N lso presents the sme books in seprte pge ech, but such pges require different crwling scheme. 3 Preliminries In this section we define some terms tht will lter be used to describe our pproch. 3.1 Logicl Structure of Webpges Ech webpge implements some logicl structure. Following [19], we use reltions s logicl description of dt which is independent of its concrete representtion. A reltionl specifiction is set of reltions, where ech reltion is defined by set of column nmes nd set of vlues for the columns. A tuple t = c 1 : d 1, c 2 : d 2,... mps set of columns {c 1, c 2,...} to vlues. A reltion r is set of tuples {t 1, t 2,...} such tht the columns of every t, t r re the sme. For exmple, B&N, BLACKWELL nd ABE described in Section 2 implement reltionl description of list of books, where ech book hs title, n uthor nd price. Then book title, uthor nd price re columns, nd the set of books is modeled s reltion with these columns, where ech tuple is book item. Dt items, ttributes nd instnces We refer to ech tuple of reltion r s dt item. The columns of reltion r re clled ttributes, denoted Att. Ech ttribute defines clss of dt shring semntic similrities, such s mening nd/or extrction scheme. The vlue of ttribute Att in some tuple of r is lso clled n instnce of. The set of ll vlues of ll ttributes is denoted V. Ech ttribute is ssocited with n equivlence reltion tht determines if two vlues re equivlent or not s instnces of. (The notion of equivlence my differ between different ttributes.) By defult (if not specified by the user) we use the bg of words representtion of ech vlue d, denoted W (d), nd use Jccrd similrity function [21], J(d 1, d 2), with threshold of 0.5 s n equivlence indictor between vlues d 1 nd d 2: d 1 d 2 iff J(d 1, d 2) > 0.5 where J(d 1, d 2) = 3.2 Concrete Lyout of Webpges W (d1) W (d2) W (d 1) W (d 2). Techniclly, webpges re documents with structured dt, such s XML or HTML documents. The concrete lyout of the webpge implements its logicl structure, where ttribute instnces re presented s nodes in the DOM tree. XML documents s DOM trees A well formed XML document, describing webpge of some website, cn be represented by DOM tree. A DOM tree is lbeled ordered tree with set of nodes N nd lbeling function tht lbels ech node with set of node fetures (not to be confused with dt ttributes), where some of the fetures might be unspecified. Common node fetures include tg, clss nd id.

5 For exmple, Fig. 3 depicts prt of the tree representtion of pges of B&N, BLACKWELL nd ABE. A node lbeled by, clss=title is node whose tg is, clss is title, nd id is unspecified. Node descriptors A node descriptor is n expression x in some lnguge defining set of nodes in the DOM tree. We use Expr to denote the set of node descriptors. For node descriptor x Expr nd webpge p, we define x p to be the set of nodes described by x from p. When p is cler from the context, we omit it from the nottion. A node descriptor is concrete if it represents exctly one node. We sometimes lso refer to node descriptors s extrction schemes. In this work, we use XPth s specifiction lnguge for node descriptors. 3.3 XPth s Dt Extrction Lnguge XPth [7] is query lnguge for trversing XML documents. XPth expressions (XPths in short) re used to select nodes from the DOM tree representtion of n XML document. An XPth expression is sequence of instructions, x = x 1... x k. Ech instruction x i defines how to obtin the next set of nodes given the set of nodes selected by the prefix x 1... x i 1, where the empty sequence selects the root node only. Roughly speking, ech instruction x i consists of (i) xis defining where to look reltively to the current nodes: t children ( / ), descendnts ( // ), prent, siblings, (ii) node filters describing which tg to look for (these cn be ll, text, comment, etc.), nd (iii) predictes tht cn restrict the selected nodes further, for exmple by referring to vlues of dditionl node fetures (e.g. clss) tht should be mtched. For exmple, the XPth ///*/[@clss="link_type1"] selects ll nodes tht follow sequence of nodes tht cn strt nywhere in the DOM tree, nd hs to consist of node with tg= followed by some node whose fetures re unspecified nd is followed by node with tg= nd clss=link_type1. 4 The Crwler Synthesis Problem In this section we formulte the crwler synthesis problem. A crwler for website cn be ided into two prts: pge crwler, nd dt extrctor. The pge crwler is responsible for grbbing the pges of the site tht contin relevnt informtion. The dt extrctor is responsible for extrcting dt of interest from ech pge. Logicl structure of interest Our work considers websites whose dt-contining webpges shre the following logicl structure: ech webpge describes one min reltion, denoted dt. As such, dt items re tuples of the dt reltion. Further, the set Att of ttributes consists of the columns of the dt reltion. Note tht different concrete lyouts cn implement this simple logicl structure. For exmple, if we consider webpge tht exhibits list of books, then the concrete lyout cn first group books by uthor, nd for ech uthor list the books, or it cn void the prtition bsed on uthors. Further, some websites will present ech book in seprte webpge, wheres others will list severl books in the sme pge. Even for websites tht re structured similrly by the former prmeters, the mpping of ttribute instnces to nodes in the DOM tree cn vry significntly. Pge crwlers A pge crwler for website s is given by URL pttern, denoted U(s), which identifies the set of webpges of interest. These re the webpges of the website tht contin dt of the relevnt kind. We denote by P (s) the set of webpges whose URL mtches U(s). Dt Extrctors Recll tht we consider webpges where instnces of different ttributes re grouped into tuples of some reltion, denoted dt. We re guided by the observtion tht dt in such webpges is typiclly stored in subtrees, where ech subtree contins instnces of ll ttributes for some dt item (i.e., tuple of the dt reltion). We refer to the roots of such subtrees s continers: Continers: A node in the DOM tree whose subtree contins ll the entries of single dt item (i.e., single tuple of dt) is clled continer. Note tht ny ncestor of continer is lso continer. We therefore lso define the notion of best continer to be continer such tht none of its predecessors is continer. Depending on the concrete lyout of the webpge, best continer might correspond to n element in list or in nother dt structure. It might lso be the root of webpge, if ech webpge presents only one dt item. For exmple, in the tree D 1 depicted in Fig. 3, both of the nodes selected by //body/.../[@clss="detils below-xis"] re continers, nd s such so re their ncestors, including the root. However, the ltter re not best continers since they include strict subtrees tht re lso continers. Attribute nodes: A node in the DOM tree tht holds n instnce of n ttribute Att is clled n -ttribute node, or simply n ttribute node when is cler from the context or does not mtter. Dt extrctors: A dt extrctor for the reltion dt over columns Att in some website s cn be described by pir (continer, f) where continer Expr is node descriptor representing continers, nd f : Att Expr is possibly prtil function tht mps ech ttribute nme to node descriptor, with the mening tht this descriptor represents the ttribute nodes reltively to the continer node, i.e., the ttribute descriptor considers the continer node s the root. The dt extrctor is prtil if f is prtil. If continer is empty, it is interpreted s node descriptor tht extrcts the root of the pge. If continer is empty nd f is undefined for every ttribute, we sy tht the dt extrctor is empty. Exmples of dt extrctors pper in Fig. 5 nd Fig. 4. Crwler synthesis The crwler synthesis problem w.r.t. set Att of ttributes is defined s follows. Its input is set S of websites, where ech website s S is ssocited with dt extrctor, denoted E(s), over Att. E(s) might be prtil or even empty. The desired output is pge crwler, long with complete dt extrctor for every s S. 5 Dt Extrctor Synthesis In this section we focus on synthesizing dt extrctors, s first step towrds synthesizing crwlers. We temporrily ssume tht the pge crwler is given, i.e., for ech website we hve the set of webpges of interest, nd present our pproch for synthesizing dt extrctors. We will remove this ssumption lter, nd lso ddress synthesis of the pge crwler, using similr techniques. The input to the dt extrctor synthesis is therefore set S of websites, where ech website s S is ssocited with set of webpges, denoted P (s), nd with dt extrctor, denoted E(s), which might be prtil or even empty. The gol is to synthesize complete dt extrctor for every s S. The min chllenge in synthesizing dt extrctor is identifying the mpping between the logicl structure of webpge, nd its concrete lyout s DOM tree. The key to understnding this mpping mounts to identifying the continer nodes in the DOM tree tht contin ll the ttributes of single dt item (tuple). Once this mpping is lernt, the next step is to cpture it by synthesizing extrction schemes in the form of XPths. The dt extrctor synthesis lgorithm is first described using the generic notion of node descriptors. In Section 5.2 we then instntite it for the cse where node descriptors re provided by XPths. Before we describe our lgorithm, we review its min ingredi-

6 ents. In the following, we use N(p) to denote the set of nodes in the DOM tree of webpge p P (s). Knowledge bse of dt cross websites Our synthesizer mintins knowledge bse O : Att 2 V which consists of set of observed instnces for ech ttribute Att. These re instnces collected cross different websites from S. They enble the synthesizer to locte potentil -ttribute nodes in webpges for which the dt extrctor of is unspecified. Dt to node mpping per website In ddition to the globl knowledge bse, for ech website s S our synthesizer mintins: (i) set N cont (p) N(p) of (cndidte) continer nodes for ech webpge p P (s), nd (ii) set N (p) N(p) of (cndidte) ttribute nodes for ech webpge p P (s) nd ttribute Att. Deriving extrction schemes per website The synthesis lgorithm itertively updtes the continer nd ttribute node sets for ech webpge in P (s), nd ttempts to generte dt extrctor E(s) : Expr (Att Expr) for s by generting node descriptors for the set of continers, nd for ech of the ttributes. The extrction scheme is shred by ll webpges of the website. The updtes of the sets nd the ttempts to generte node descriptors from the sets re interleved, s one cn ffect the other; on the one hnd node descriptors re generted in n ttempt to represent the sets; on the other hnd, once descriptors re generted, elements of the sets tht do not conform to them re removed. While ttribute instnces re used to identify ttribute nodes cross different websites, the synthesis of node descriptors is performed for ech website seprtely nd independently of others (while considering ll of the webpges ssocited with the website). 5.1 Algorithm Algorithm 1 presents our dt extrctor synthesis lgorithm. The lgorithm is itertive, where ech itertion consists of two phses: Phse 1: Dt extrction for knowledge bse extension. Initilly, the sets O() of instnces of ll ttributes Att re empty. In ech itertion, we use yet un-crwled extrction schemes to extrct ttribute nodes in ll webpges of ll websites nd extend the sets O() for every ttribute bsed on the content of the extrcted nodes. At the first itertion, input extrction schemes re used. In lter itertions, we use newly lernt extrction schemes, generted in phse 2 of the previous itertion. Phse 2: Synthesis of dt extrctors. For every website s S for which the extrction scheme is not yet stisfctory, we ttempt to generte n extrction scheme by performing the following steps: (1) Locting ttribute nodes per pge: We trverse ll webpges p P (s) nd for ech ttribute we use the instnces O() collected in phse 1 (from this itertion nd previous ones) to identify potentil -ttribute nodes in p. Techniclly, for every p P (s) we iterte on ll n N(p) nd use the (defult or user-specified) equivlence reltion to decide whether n contins dt tht mtches the ttribute instnces in O(). If so, n is dded to N (p). (2) Locting continer nodes per pge: In every webpge p P (s) we locte potentil continer nodes, nd collect them in N cont (p). A continer is expected to contin instnces of ll ttributes Att. However, since our knowledge of the ttribute instnces is incomplete, we need to lso consider subsets of Att. In ech webpge, we define the best set of ttributes to be the set of ll ttributes whose instnces pper in it. Potentil continers re nodes whose subtree contins ttribute nodes of the best set of ttributes, nd no strict subtree contins nodes of the sme set of ttributes. The ltter ensures tht the continer is best. Techniclly, for every node n N(p) we compute the set of rechble ttributes Att such tht n -ttribute node in N (p) is rechble from n. Nodes n whose set is best nd no other node rechble from n hs the sme set of rechble ttributes re collected in N cont (p). For ech continer node n c N cont (p) we lso mintin its support - the number of ttribute nodes rechble from it. (3) Generting continer descriptor: We consider the concrete node descriptor of every continer node n c N cont (p) in every webpge p P (s). We unify the concrete node descriptors cross ll webpges into single node descriptor, nd use it to updte E(s), relying on the observtion tht continers re typiclly elements of some dt structure nd re therefore ccessed similrly. (4) Filtering ttribute nodes bsed on continer descriptor: We filter the sets N cont (p) of continers in ll webpges to keep only continers tht mtch the unified node descriptor, nd ccordingly filter the sets N (p) of ttribute nodes in ll webpges to contin only nodes tht re rechble from the filtered sets of continers. This step enbles us to utomticlly distinguish the nodes we re interested in from others tht ccidentlly contin ttribute instnces, without ny -priori knowledge. (5) Generting ttribute descriptors: For ech ttribute Att, we consider the concrete node descriptors of ll the nodes in the filtered sets N (p) of ll webpges p P (s), where the concrete node descriptor of n is computed reltively to the continer node whose subtree contins n. For ech ttribute, we find unified node descriptor for these concrete node descriptors, nd use it to updte E(s). Agin, we use the observtion tht continers re structured similrly nd therefore ttribute dt within them is ccessed similrly. Remrk. For successful ppliction of our lgorithm, t lest one extrction scheme should be provided for every ttribute. Our pproch is lso pplicble if user provides set of nnotted webpges insted of set of initil extrction expressions. Section 2 describes running exmple of our lgorithm. Node descriptor unifiction Node descriptors for the continer nd ttributes re generted by unifying concrete node descriptors of the nodes in N cont (p) nd N (p) respectively. Roughly speking, the purpose of the unifiction is to derive node descriptor tht is generl enough to describe s mny of the concrete node descriptors s possible, but lso s concrete s possible in order to introduce s little noise s possible. Concreteness of node descriptor x is mesured by n bstrction score, denoted bs(x). The node descriptor unifiction lgorithm is prmetric in the bstrction score. In Section 5.2, we provide definition of this score when the node descriptors re given by XPths. DEFINITION 5.1. For set X of concrete node descriptors nd weight function support tht ssocites ech x X with its support, the unifiction problem ims to find node descriptor x g, s.t.: support({x X x x g }) support(x) > δ, i.e., x g cptures t lest δ of the 1. totl support of the node descriptors in X. 2. bs(x g) is miniml. In continer descriptor unifiction (step 3), the given node descriptors represent continer nodes. The support of ech descriptor represents the number of ttribute nodes rechble from the continer. In ttribute descriptors unifiction (step 5), the given descriptors represent ttribute nodes for some ttribute, ll of which re rechble from set of continers of interest. The ttribute node descriptors re reltive to the continer nodes. 5.2 Implementtion using sequentil XPths In order to complete the description of our dt extrctor synthesizer, we describe how the ingredients of Algorithm 1 re implemented when node descriptors re given by XPths. Specificlly, our pproch uses sequentil XPths:

7 Algorithm 1: Dt Extrctor Synthesizer Input: set of ttributes Att Input: set of websites S Input: mp E : S (Expr (Att Expr)) mpping website s to dt extrctor E(s) which consists of (possibly empty) continer descriptor s well s (possibly prtil) mpping of ttributes to node descriptors O = [] while there is chnge in O or E do /* Dt extrction phse */ forech s S s.t. E(s) is uncrwled do O = O ExtrctInstnces (Att, P (s), E(s), O) /* Synthesis phse */ forech s S s.t. E(s) is incomplete do /* Locte ttribute nodes */ forech p P (s) do forech Att do N (p) = FindAttNodes (N(p),, O()) /* Locte continer nodes */ forech p P (s) do bestattset = { Att N (p) } forech n N(p) do rechatt[p][n] = { Att n rech(n) : n N (p)} support[p][n] = #{n rech(n) Att : n N (p)} N cont (p) = cndidtes = {n N(p) rechatt[n] = bestattset} forech n cndidtes do forech n children(n) do if n cndidtes then N cont (p) = N cont (p) \ {n} brek /* Generte continer descriptor */ Exprs = {(reltiveexpr(p, emptyexpr, n), support[p][n]) p P (s), n N cont (p)} continerexpr = UnifyExpr (Exprs) FilterAttributeNodes() /* Generte ttribute descriptors */ forech Att do Exprs = {(reltiveexpr(p, continerexpr, n), 1) p P (s), n N (p)} ttexpr[] = UnifyExpr (Exprs) E(s) = (continerexpr, ttexpr) return E Sequentil XPths A pth π in the DOM tree is sequence of nodes n 1,..., n k, where for every 1 i < k, there is n edge from n i to n i+1. Such pth cn nturlly be encoded using n XPth XS(π) = x 1... x k where ech x i strts with /. x 1 my strt with // rther thn / if π does not necessrily strt t the root of the tree. Further, ech x i uses node filters nd predictes to describe the fetures of n i. Therefore, x i cn be described vi equlities f 1 = v 1,..., f m = v m, such tht f j F, where F is the set of node fetures used. We consider F = {tg, clss, id} for simplicity, but our pproch is not limited to these fetures. A feture might be unspecified for n i, in which cse no corresponding equlity will be included in x i. For exmple, let π be the left most pth in D 2 (Fig. 3). Then XS(π) = //body/.../td//h1. XS(π) cn lso be described s sequence tg=body... tg=td tg= tg=h1. We refer to XPths of the bove form s sequentil. The XPths tht our pproch genertes s node descriptors re ll sequentil. Concrete XPths Ech node n in the DOM tree cn be uniquely described by the unique pth, denoted π n, leding from the root to n. The XPth XS(π n) is sequentil XPth such tht XS(π n) {n}, nd XS(π n) is miniml (i.e., every other sequentil XPth tht lso describes n, describes superset of XS(π n) ). We therefore refer to XS(π n) s the concrete XPth of n, denoted XS(n) with buse of nottion. (If we include in F the position of node mong its siblings s n dditionl node feture, nd encode it by n XPth instruction using sibling predictes then we will hve XS(π n) = {n}). Agreement of sequentil XPths We observe tht for sequentil XPths, checking if node n mtches node descriptor x g (i.e. n x g ) cn be done by checking if the concrete XPth XS(n) grees with the XPth x g, where greement is defined s follows. DEFINITION 5.2. Let x = x 1... x k nd x g = x g 1... xg m be sequentil XPths. The instruction x i grees with instruction x g i if whenever some feture is specified in x i, it either hs the sme vlue in x g i or it is unspecified in xg i. The XPth x grees with the XPth x g if m k, nd for every i m, x i grees with x g i. For exmple, //body/.../td/[id=nme1]/h1 grees with both //body/.../td//h1, nd //body/.../td/. Node descriptor unifiction vi XPth unifiction We now describe our solution to the node descriptor unifiction problem in the setting of sequentil XPths. We first define the bstrction score: Abstrction score For sequentil XPth instruction x i we define spec(x i) to be the subset of fetures whose vlue is specified in x i, nd unspec(x i) = F \ spec(x i) is the set of unspecified fetures in x i. We define the bstrction score of x i to be the number of fetures in unspec(x i), tht is, bs(x i) = unspec(x i). For sequentil XPth x = x 1... x k, we define bs(x) to be the sum of bs(x i). Greedy lgorithm for unifiction Algorithm 2 presents our unifiction lgorithm. We use the observtion tht for sequentil XPths, the condition x x g tht ppers in item 1 of the unifiction problem (see Definition 5.1) cn be reduced to checking if the XPth x grees with the XPth x g. Let X be weighted set of sequentil XPths, with weight function support tht ssocites ech XPth in X with its support. Let TS = support(x) denote the totl support of XPths in X. The unifiction lgorithm selects k to be the length of the longest XPth in X. It then constructs unified XPth x g = x g 1,..., xg m top down, from i = 1 to k (possibly stopping t i = m < k). Intuitively, in ech step the lgorithm tries to select the most concrete instruction whose support is high enough. Note tht there is trdeoff between the high-support requirement nd the highconcreteness requirement. We use the threshold s wy to blnce these mesures. At itertion i of the lgorithm, X i 1 is the restriction of X to the XPths whose prefix grees with the prefix x g 1,..., xg i 1 of xg computed so fr (Initilly, X 0 = X). We inspect the i th instructions of ll XPths in X i 1. The corresponding set of instructions is denoted by I i = {x i x X i 1 }. The support of n instruction x B w.r.t. I i is support({x X i 1 x i grees with x B}). To select the most concrete instruction whose support is high enough, we consider predefined order on sets of feture-vlue pirs, where sets tht re considered more concrete (i.e., more specified ) precede sets considered more bstrct. Techniclly, we consider only feture-vlue sets where ech feture hs unique vlue. The order on such sets used in the lgorithm is defined such tht if B 1 > B 2 then B 1 precedes B 2. In prticulr, we mke sure tht sets where ll fetures re specified re first in tht order. For every set B of feture-vlue pirs, ordered by the predefined order, we consider the instruction x B tht is specified exctly on

8 the fetures in B, s defined by B. If its support exceeds δ, we set x g i to xb nd Xi to {x X i 1 x i grees with x B}. Otherwise, x B is not yet stisfctory nd the serch continues with the next B. There is lwys B for which the support of the x B exceeds the threshold, for instnce, the lst set B is lwys the empty set with x B = /*, which grees with ll the concrete XPths in X i 1. If t some itertion I i =, i.e. the XPths in X i 1 re ll of length < i nd therefore there is no next instruction to discover, the lgorithm termintes. Otherwise, it termintes when i = k. EXAMPLE 1. Given the following concrete XPths s n input: cx 1 = /[clss= title ]/spn/[id= t1 ] cx 2 = /[clss= title ]/spn/[id= t2 ] cx 3 = /[clss= note ]/spn/[id= n1 ] The unifiction strts with X 0 = {cx 1, cx 2, cx 3}, nd i = 1. To select x g 1, recll tht the lgorithm first considers the most specific feture-vlue sets (in order to find the most specific instruction). In our exmple it strts from B 1 = {tg=, clss=note} for which x B1 = /[clss= note ]. However, cx 3 is the only XPth in X 0 which grees with x B1. Therefore it hs support of 1/3. We use threshold of δ = 1/2. Thus, the support of x B1 is insufficient. The lgorithm skips to the next option, obtining x B2 = /[clss= title ]. This instruction is s specific s x B1 nd hs sufficient support of 2/3 (it grees with cx 1 nd cx 2). Therefore, for i = 1, the lgorithm selects x g 1 = xb 2 nd X 1 = {cx 1, cx 2}. For i = 2, the lgorithm selects x g 2 = /spn s the most specific instruction, which lso hs support of 2/2 (both cx 1 nd cx 3 from X 1 gree with it). For i = 3, the lgorithm selects x g 3 = / s none of the more specific instructions (/[id= t1 ] or /[id= t2 ]) hs support greter thn δ = 1/2. The resulting unified XPth is x = /[clss="title"]/spn/. Algorithm 2: Top-Down XPth Unifiction Input: set X of sequentil XPths Input: support function support : X N Input: threshold δ TS = support(x) k = mx x X x X 0 = X forech i = 1,..., k do I i = {x i x X i 1 } if I i = then i = i 1 brek forech B F in decresing order of B do support B = FindSupport(x B, X i 1, i, support) if support B > δ TS then x g i = x B X i = {x X i 1 x i grees with x B } brek return x g 1,..., xg i 6 Crwler Synthesis In this section we complete the description of our crwler synthesizer. To do so, we describe the synthesis of pge crwler for ech website s. Recll tht pge crwler corresponds to URL pttern U(s) which defines the webpges of interest. The synthesis of pge crwler is intertwined with the dt extrctor synthesis, nd uses similr unifiction techniques to generte the URL pttern. Initiliztion We ssume tht ech website s S is given by min webpge p min(s). Initilly, the set P (s) of webpges of s is the set of ll webpges obtined by following links in p min(s) nd recursively following links in the resulting pges, where the trversed links re selected bsed on some heuristic function which determines which links re more likely to led to relevnt pges. Itertions We pply the dt extrctor synthesis lgorithm of Section 5 using the sets P (s). At the end of phse 2 of ech itertion, we updte U(s) using the steps described below. At the beginning of phse 1 of the subsequent itertion we then updte P (s) to the set of webpges whose URLs conform with U(s). (6) Filtering webpge sets: Bsed on the observtion tht relevnt webpges of website s hve similr structure, we keep in P (s) only webpges tht contin continer nd ttribute nodes tht mtch the generted E(s) nd re rechble from p min(s) vi such webpges. (7) Generting URL ptterns: For ech webpge p P (s) we consider its URL. We unify the URLs into U(s) by vrition of Algorithm 2 which views URL s sequence of instructions, similrly to sequentil XPth. 7 Evlution In this section we evlute the effectiveness of our pproch. We used it to synthesize dt extrcting web-crwlers for rel-world websites contining structured dt of different ctegories. Our experiments focus on two different spects: (i) the bility to successfully synthesize web-crwlers, nd (ii) the performnce of the resulting web crwlers. 7.1 Experimentl Settings We hve implemented our tool in C#. All experiments rn on mchine with qud core CPU nd 32GB memory. Our experiments were run on 30 different websites, relted to nine different ctegories: books, TVs, conferences, universities, cmers, phones, movies, songs nd hotels. For ech ctegory we selected group of 3-4 known sites, which pper in the first pge of Google serch results. The sites in ech ctegory hve different structure, but they shre t lest some of their instnces, which mkes our pproch pplicble. The complexity of the dt extrcted from different ctegories is lso different. For instnce movie hs four ttributes: title, genre, director nd list of ctors. For book, the set of ttributes consists of title, uthor nd price, while the ttribute set of cmer consists of the nme nd price only. In ech ctegory we used one mnully written crwler nd utomticlly synthesized the others (for the books ctegory we lso experimented with 3 prtil extrction schemes, one for ech ttribute). To synthesize the web crwlers, our tool processed over 12, 000 webpges from the 30 different sites. To evlute the effectiveness of our tool we consider 4 spects of synthesized crwlers: (i) Crwling scheme completeness, (ii) URL filtering, (iii) Continer extrction, nd (iv) Attributes extrction. 7.2 Experiments nd Results Crwling Scheme Completeness A complete crwling scheme defines extrction queries for ll of the dt ttributes. The completeness of the synthesized crwling schemes is n indictor for the success of our pproch in synthesizing crwlers. To mesure completeness, we clculted for ech ctegory the verge number of ttributes covered by the schemes, ided by the number of ttributes of the ctegory. The results re reported in Fig. 6 (left). The results show tht the resulting extrction schemes re mostly complete, with few missing ttribute extrction queries.

9 Figure 6: Results: Crwling scheme completeness (left), URL filtering (middle) nd Attribute extrction (right) for ech ctegory. Figure 7: Attribute extrction precision nd recll, nd crwling scheme completeness, s function of the threshold of Jccrd similrity used to define equivlence between instnces. URL Filtering The bility to locte pges contining dt is n importnt spect of crwler s performnce. To evlute the URL filtering performnce of the synthesized crwlers, we mesure the recll nd precision of the synthesized URL pttern for ech site: recll = Rel Sol Rel precision = Rel Sol Sol To do so, we hve mnully generted two sets of URLs for ech site: one contining URLs for pges tht contin relevnt dt, comprising the Rel set (ground truth), nd nother, denoted Irr, contins mixture of irrelevnt URLs from the sme site. Sol contins the URLs from Rel Irr tht mtch the synthesized URL pttern for the site (i.e., the URLs ccepted by the synthesised URL pttern). A good performing URL filtering pttern should mtch ll the URLs from Rel nd should not mtch ny from Irr. The verge recll nd precision scores of the sites of ech ctegory re clculted nd reported in Fig. 6 (middle). Continer Extrction To check the correctness of the synthesized continer extrction query, we hve mnully reviewed the resulting continer XPths ginst the HTML sources of the relevnt webpges for ech site, to verify tht ech extrcted continer contins exctly one dt item. We found tht the continers lwys contined no more thn one item. However, in few cmers nd songs websites, the continer query ws too specific nd did not extrct some of the continers (this hppened in tbles contining clss= odd in some rows nd clss= even in others), which ffected the recll scores of ttribute extrction. (1) Attributes Extrction We clculte the recll nd precision (see eqution (1)) of the extrction query for ech ttribute. Techniclly, for ech ctegory of sites, we hve mnully written extrction queries for ech ttribute in every one of the ctegory relted sites. For ech ttribute, we used these extrction queries to extrct the instnces of from set of smple pges from ech site. The extrcted instnces re collected in Rel. We hve lso pplied the synthesized extrction queries (s conctention of the continer XPth nd ttribute XPth) to extrct instnces of from the sme pges into Sol. For ech site, the precision nd recll re clculted ccording to eqution (1). The verge (over sites of the sme ctegory) recll nd precision scores of ll ttributes of ech ctegory re reported in Fig. 6 (right). Equivlence Reltion To evlute the effect of the threshold used in the equivlence reltion,, on the synthesized crwlers, we hve mesured the verge completeness, s well s the verge recll nd precision scores of ttribute extrction s function of the threshold. The results pper in Fig. 7. Remrk. The reported ttribute extrction reclls in Fig. 6 nd Fig. 7 re computed bsed on queries for which synthesis succeeded (missing queries ffect only completeness, nd not recll). 7.3 Discussion The completeness of the synthesized extrction schemes is highly dependent upon the bility to identify instnces in pges of some site by comprison to instnces gthered from other sites. For most ctegories, completeness is high. For the conferences ctegory, however, completeness is low. This is due to the use of cronyms in conference nmes (e.g., ICSE) in some sites vs. full nmes (e.g., Interntionl Conference on Softwre Engineering) in others, which mkes it hrd for our syntx-bsed equivlence reltion to identify mtches. This could be improved by using semntic equivlence reltions (such s ESA [12] or W2V [33]). As for the qulity of the resulting extrction schemes nd URL filtering ptterns, most of the ctegories hve perfect recll (Fig. 6). However, some hve slightly lower recll due to our ttempt to keep the synthesized XPths (or regulr expressions, for URL filtering) s concrete s possible while hving mjority greement. This design choice mkes our method tolernt to smll noises in the identified dt instnces, nd prevents such noises from cusing drifting, without negtive exmples. Yet, in some cses, the resulting XPths re too specific nd result in sub-optiml recll. For precision, most ctegories hve good scores, while few hve lower scores. Loss of precision cn be ttributed to the mjoritybsed unifiction nd the lck of negtive exmples. For the books ctegory, for instnce, the synthesized extrction XPth of price for some sites is too generl, since they list multiple price instnces (originl price, discount mount, nd new price). All re listed in

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Semistructured Data Management Part 2 - Graph Databases

Semistructured Data Management Part 2 - Graph Databases Semistructured Dt Mngement Prt 2 - Grph Dtbses 2003/4, Krl Aberer, EPFL-SSC, Lbortoire de systèmes d'informtions réprtis Semi-structured Dt - 1 1 Tody's Questions 1. Schems for Semi-structured Dt 2. Grph

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

How to Design REST API? Written Date : March 23, 2015

How to Design REST API? Written Date : March 23, 2015 Visul Prdigm How Design REST API? Turil How Design REST API? Written Dte : Mrch 23, 2015 REpresenttionl Stte Trnsfer, n rchitecturl style tht cn be used in building networked pplictions, is becoming incresingly

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Epson Projector Content Manager Operation Guide

Epson Projector Content Manager Operation Guide Epson Projector Content Mnger Opertion Guide Contents 2 Introduction to the Epson Projector Content Mnger Softwre 3 Epson Projector Content Mnger Fetures... 4 Setting Up the Softwre for the First Time

More information

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES MARCELLO DELGADO Abstrct. The purpose of this pper is to build up the bsic conceptul frmework nd underlying motivtions tht will llow us to understnd ctegoricl

More information

Lecture T4: Pattern Matching

Lecture T4: Pattern Matching Introduction to Theoreticl CS Lecture T4: Pttern Mtching Two fundmentl questions. Wht cn computer do? How fst cn it do it? Generl pproch. Don t tlk bout specific mchines or problems. Consider miniml bstrct

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis On the Detection of Step Edges in Algorithms Bsed on Grdient Vector Anlysis A. Lrr6, E. Montseny Computer Engineering Dept. Universitt Rovir i Virgili Crreter de Slou sin 43006 Trrgon, Spin Emil: lrre@etse.urv.es

More information

vcloud Director Service Provider Admin Portal Guide vcloud Director 9.1

vcloud Director Service Provider Admin Portal Guide vcloud Director 9.1 vcloud Director Service Provider Admin Portl Guide vcloud Director 9. vcloud Director Service Provider Admin Portl Guide You cn find the most up-to-dte technicl documenttion on the VMwre website t: https://docs.vmwre.com/

More information

such that the S i cover S, or equivalently S

such that the S i cover S, or equivalently S MATH 55 Triple Integrls Fll 16 1. Definition Given solid in spce, prtition of consists of finite set of solis = { 1,, n } such tht the i cover, or equivlently n i. Furthermore, for ech i, intersects i

More information

Policy-contingent state abstraction for

Policy-contingent state abstraction for Policy-contingent stte bstrction for hierrchicl MDPs oelle Pineu nd Geoffrey Gordon School of Computer Science Crnegie Mellon University Pittsburgh PA 15213 jpineuggordon @cs.cmu.edu Abstrct Hierrchiclly

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

ISG: Itemset based Subgraph Mining

ISG: Itemset based Subgraph Mining ISG: Itemset bsed Subgrph Mining by Lini Thoms, Stynryn R Vlluri, Kmlkr Krlplem Report No: IIIT/TR/2009/179 Centre for Dt Engineering Interntionl Institute of Informtion Technology Hyderbd - 500 032, INDIA

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Misrepresentation of Preferences

Misrepresentation of Preferences Misrepresenttion of Preferences Gicomo Bonnno Deprtment of Economics, University of Cliforni, Dvis, USA gfbonnno@ucdvis.edu Socil choice functions Arrow s theorem sys tht it is not possible to extrct from

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Reducing Costs with Duck Typing. Structural

Reducing Costs with Duck Typing. Structural Reducing Costs with Duck Typing Structurl 1 Duck Typing In computer progrmming with object-oriented progrmming lnguges, duck typing is lyer of progrmming lnguge nd design rules on top of typing. Typing

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

CMSC 331 First Midterm Exam

CMSC 331 First Midterm Exam 0 00/ 1 20/ 2 05/ 3 15/ 4 15/ 5 15/ 6 20/ 7 30/ 8 30/ 150/ 331 First Midterm Exm 7 October 2003 CMC 331 First Midterm Exm Nme: mple Answers tudent ID#: You will hve seventy-five (75) minutes to complete

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases

Overview. Network characteristics. Network architecture. Data dissemination. Network characteristics (cont d) Mobile computing and databases Overview Mobile computing nd dtbses Generl issues in mobile dt mngement Dt dissemintion Dt consistency Loction dependent queries Interfces Detils of brodcst disks thlis klfigopoulos Network rchitecture

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

Integration. September 28, 2017

Integration. September 28, 2017 Integrtion September 8, 7 Introduction We hve lerned in previous chpter on how to do the differentition. It is conventionl in mthemtics tht we re supposed to lern bout the integrtion s well. As you my

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Text mining: bag of words representation and beyond it

Text mining: bag of words representation and beyond it Text mining: bg of words representtion nd beyond it Jsmink Dobš Fculty of Orgniztion nd Informtics University of Zgreb 1 Outline Definition of text mining Vector spce model or Bg of words representtion

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

CSE 401 Midterm Exam 11/5/10 Sample Solution

CSE 401 Midterm Exam 11/5/10 Sample Solution Question 1. egulr expressions (20 points) In the Ad Progrmming lnguge n integer constnt contins one or more digits, but it my lso contin embedded underscores. Any underscores must be preceded nd followed

More information

Data sharing in OpenMP

Data sharing in OpenMP Dt shring in OpenMP Polo Burgio polo.burgio@unimore.it Outline Expressing prllelism Understnding prllel threds Memory Dt mngement Dt cluses Synchroniztion Brriers, locks, criticl sections Work prtitioning

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-169 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Memory-Optimized Software Synthesis from Dataflow Program Graphs withlargesizedatasamples

Memory-Optimized Software Synthesis from Dataflow Program Graphs withlargesizedatasamples EURSIP Journl on pplied Signl Processing 2003:6, 54 529 c 2003 Hindwi Publishing orportion Memory-Optimized Softwre Synthesis from tflow Progrm Grphs withlrgesizetsmples Hyunok Oh The School of Electricl

More information

Scanner Termination. Multi Character Lookahead

Scanner Termination. Multi Character Lookahead If d.doublevlue() represents vlid integer, (int) d.doublevlue() will crete the pproprite integer vlue. If string representtion of n integer begins with ~ we cn strip the ~, convert to double nd then negte

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples COMPUTER SCIENCE 123 Foundtions of Computer Science 6. Tuples Summry: This lecture introduces tuples in Hskell. Reference: Thompson Sections 5.1 2 R.L. While, 2000 3 Tuples Most dt comes with structure

More information

Policy-contingent state abstraction for hierarchical MDPs

Policy-contingent state abstraction for hierarchical MDPs Policy-contingent stte bstrction for hierrchicl MDPs Joelle Pineu nd Geoffrey Gordon School of Computer Science Crnegie Mellon University Pittsburgh, PA 15213 jpineu,ggordon@cs.cmu.edu Abstrct Hierrchiclly

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

COMMON HALF YEARLY EXAMINATION DECEMBER 2018

COMMON HALF YEARLY EXAMINATION DECEMBER 2018 li.net i.net li.net i.net li.net i.net li.net i.net li.net i.net li.net i.net li.net i.net li.net i.net li.net i.net.pds.pds COMMON HALF YEARLY EXAMINATION DECEMBER 2018 STD : XI SUBJECT: COMPUTER SCIENCE

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

USING HOUGH TRANSFORM IN LINE EXTRACTION

USING HOUGH TRANSFORM IN LINE EXTRACTION Stylinidis, Efstrtios USING HOUGH TRANSFORM IN LINE EXTRACTION Efstrtios STYLIANIDIS, Petros PATIAS The Aristotle University of Thessloniki, Deprtment of Cdstre Photogrmmetry nd Crtogrphy Univ. Box 473,

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

pdfapilot Server 2 Manual

pdfapilot Server 2 Manual pdfpilot Server 2 Mnul 2011 by clls softwre gmbh Schönhuser Allee 6/7 D 10119 Berlin Germny info@cllssoftwre.com www.cllssoftwre.com Mnul clls pdfpilot Server 2 Pge 2 clls pdfpilot Server 2 Mnul Lst modified:

More information

CPSC 213. Polymorphism. Introduction to Computer Systems. Readings for Next Two Lectures. Back to Procedure Calls

CPSC 213. Polymorphism. Introduction to Computer Systems. Readings for Next Two Lectures. Back to Procedure Calls Redings for Next Two Lectures Text CPSC 213 Switch Sttements, Understnding Pointers - 2nd ed: 3.6.7, 3.10-1st ed: 3.6.6, 3.11 Introduction to Computer Systems Unit 1f Dynmic Control Flow Polymorphism nd

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Migrating vrealize Automation to 7.3 or March 2018 vrealize Automation 7.3

Migrating vrealize Automation to 7.3 or March 2018 vrealize Automation 7.3 Migrting vrelize Automtion to 7.3 or 7.3.1 15 Mrch 2018 vrelize Automtion 7.3 You cn find the most up-to-dte technicl documenttion on the VMwre website t: https://docs.vmwre.com/ If you hve comments bout

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Stained Glass Design. Teaching Goals:

Stained Glass Design. Teaching Goals: Stined Glss Design Time required 45-90 minutes Teching Gols: 1. Students pply grphic methods to design vrious shpes on the plne.. Students pply geometric trnsformtions of grphs of functions in order to

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Integration. October 25, 2016

Integration. October 25, 2016 Integrtion October 5, 6 Introduction We hve lerned in previous chpter on how to do the differentition. It is conventionl in mthemtics tht we re supposed to lern bout the integrtion s well. As you my hve

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications. 15-112 Fll 2018 Midterm 1 October 11, 2018 Nme: Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

Simrad ES80. Software Release Note Introduction

Simrad ES80. Software Release Note Introduction Simrd ES80 Softwre Relese 1.3.0 Introduction This document descries the chnges introduced with the new softwre version. Product: ES80 Softwre version: 1.3.0 This softwre controls ll functionlity in the

More information

Fall 2018 Midterm 2 November 15, 2018

Fall 2018 Midterm 2 November 15, 2018 Nme: 15-112 Fll 2018 Midterm 2 November 15, 2018 Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X 4. Mon, Sept. 30 Lst time, we defined the quotient topology coming from continuous surjection q : X! Y. Recll tht q is quotient mp (nd Y hs the quotient topology) if V Y is open precisely when q (V ) X

More information

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association 9. CISC - Curriculum & Instruction Steering Committee The Winning EQUATION A HIGH QUALITY MATHEMATICS PROFESSIONAL DEVELOPMENT PROGRAM FOR TEACHERS IN GRADES THROUGH ALGEBRA II STRAND: NUMBER SENSE: Rtionl

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE 3.1 Scheimpflug Configurtion nd Perspective Distortion Scheimpflug criterion were found out to be the best lyout configurtion for Stereoscopic PIV, becuse

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

Spring 2018 Midterm Exam 1 March 1, You may not use any books, notes, or electronic devices during this exam.

Spring 2018 Midterm Exam 1 March 1, You may not use any books, notes, or electronic devices during this exam. 15-112 Spring 2018 Midterm Exm 1 Mrch 1, 2018 Nme: Andrew ID: Recittion Section: You my not use ny books, notes, or electronic devices during this exm. You my not sk questions bout the exm except for lnguge

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1

a < a+ x < a+2 x < < a+n x = b, n A i n f(x i ) x. i=1 i=1 Mth 33 Volume Stewrt 5.2 Geometry of integrls. In this section, we will lern how to compute volumes using integrls defined by slice nlysis. First, we recll from Clculus I how to compute res. Given the

More information

Topic: Software Model Checking via Counter-Example Guided Abstraction Refinement. Having a BLAST with SLAM. Combining Strengths. SLAM Overview SLAM

Topic: Software Model Checking via Counter-Example Guided Abstraction Refinement. Having a BLAST with SLAM. Combining Strengths. SLAM Overview SLAM Hving BLAST with SLAM Topic: Softwre Model Checking vi Counter-Exmple Guided Abstrction Refinement There re esily two dozen SLAM/BLAST/MAGIC ppers; I will skim. # # Theorem Proving Combining Strengths

More information