Inducing Probabilistic Grammars by Bayesian Model Merging

Size: px
Start display at page:

Download "Inducing Probabilistic Grammars by Bayesian Model Merging"

Transcription

1 To pper in ICGI-94 Inducing Proilistic Grmmrs y Byesin Model Merging Andres Stolcke Stephen Omohundro Interntionl Computer Science Institute 1947 Center St., Suite 600 Berkeley, CA E-mil: fstolcke,omg@icsi.erkeley.edu Astrct We descrie frmework for inducing proilistic grmmrs from corpor of positive smples. First, smples re incorported y dding d-hoc rules to working grmmr; susequently, elements of the model (such s sttes or nonterminls) re merged to chieve generliztion nd more compct representtion. The choice of wht to merge nd when to stop is governed y the Byesin posterior proility of the grmmr given the dt, which formlizes trde-off etween close fit to the dt nd defult preference for simpler models ( Occm s Rzor ). The generl scheme is illustrted using three types of proilistic grmmrs: Hidden Mrkov models, clss-sed n-grms, nd stochstic context-free grmmrs. 1 Introduction Proilistic modeling hs ecome incresingly importnt for pplictions such s speech recognition, informtion retrievl, mchine trnsltion, nd iologicl sequence processing. The types of models used vry widely, rnging from simple n-grms to Hidden Mrkov Models (HMMs) nd stochstic context-free grmmrs (SCFGs). A centrl prolem for these pplictions is to find suitle models from corpus of smples. Most common proilistic models cn e chrcterized y two prts: discrete structure (e.g., the topology of n HMM, the context-free ckone of SCFG), nd set of continuous prmeters which determine the proilities for the words, sentences, etc. descried y the grmmr. Given the discrete structure, the continuous prmeters cn usully e fit using stndrd methods, such s likelihood mximiztion. In the cse of models with hidden vriles (HMMs, SCFGs) estimtion typiclly involves expecttion mximiztion (EM) (Bum et l. 1970; Dempster et l. 1977; Bker 1979). In this pper we ddress the more difficult first hlf of the prolem: finding the discrete structure of proilistic model from trining dt. This tsk includes the prolems of finding

2 the topology of n HMM, nd finding the set of context-free productions for n SCFG. Our pproch is clled Byesin model merging ecuse it performs successive merging opertions on the sustructures of model in n ttempt to mximize the Byesin posterior proility of the overll model structure, given the dt. In this pper, we give n introduction to Byesin model merging for proilistic grmmr inference, nd demonstrte the pproch on vrious model types. We lso report riefly on some of the pplictions of the resulting lerning lgorithms primrily in the re of nturl lnguge modeling. 2 Byesin Model Merging Model merging (Omohundro 1992) hs een proposed s n efficient, roust, nd cognitively plusile method for uilding proilistic models in vriety of cognitive domins (e.g., vision). The method cn e chrcterized s follows: Dt incorportion: Given ody of dt X, uild n initil model M 0 y explicitly ccommodting ech dt point individully such tht M 0 mximizes the likelihood P(XjM). The size of the initil model will thus grow with the mount of dt, nd will usully not exhiit significnt generliztion. Structure merging: Build sequence of new models, otining M i+1 from M i y pplying generliztion or merging opertor m tht colesces sustructures in M i, M i+1 = m(m i ), i = 0; 1; : : : The merging opertion is dependent on the type of model t hnd (s will e illustrted elow), ut it generlly hs the property tht dt points previously explined y seprte model sustructures come to e ccounted for y single, shred structure. The merging process therey grdully moves from simple, instnce-sed model towrd one tht expresses structurl generliztions out the dt. To guide the serch for suitle merging opertions we need criterion tht trdes off the goodness of fit of dt X ginst the desire for simpler, nd therefore more generl models. As formliztion of this trdeoff, we use the posterior proility P(M jx) of the model given the dt. According to Byes rule, P(MjX) = P(M)P(XjM) P(X) the posterior is proportionl to the product of prior proility term P(M) nd likelihood term P(XjM) (the denomintor P(X) does not depend on M nd cn therefore e ignored for the purpose of mximizing). The likelihood is defined y the model semntics, wheres the prior hs to e chosen to express the is, or prior expecttion, s to wht the likely models re. This choice is domin-dependent nd will e elorted elow. Finlly, we need serch strtegy to find models with high (mximl, if possile) posterior proility. A simple pproch here is Best-first serch: Strting with the initil model (which mximizes the likelihood, ut usully hs very low prior proility), explore ll possile merging steps, nd successively choose the one (greedy serch) or ones (em serch) tht give the gretest ;

3 0:5 I 0: F I 0:5 1 0: F I F I F I F Figure 1: Model merging for HMMs. immedite increse in posterior. Stop merging when no further increse is possile (fter looking hed few steps to void simple locl mxim). In prctice, to keep the working models of mngele size, we cn use n on-line version of the merging lgorithm, in which the dt incorportion nd the merging/serch stges re interleved. We now mke these concepts concrete for vrious types of proilistic grmmr. 3 Model merging pplied to proilistic grmmrs 3.1 Hidden Mrkov Models Hidden Mrkov Models (HMMs) re proilistic form of non-deterministic finite-stte models (Riner & Jung 1986). They llow prticulrly strightforwrd version of the model merging pproch. Dt incorportion. For ech oserved smple crete unique pth etween the initil nd finl sttes y ssigning new stte to ech symol token in the smple. For exmple, given the dt X = f; g, the initil model M 0 is shown t the top of Figure 1.

4 Merging. In single merging step, two old HMM sttes re replced y single new stte, which inherits the union of the trnsitions nd emissions from the old sttes. Figure 1 shows four successive merges (where ech new stte is given the smller of the two indices of its predecessors). The second, third nd fifth models in the exmple hve smller model structure without chnging the generted proility distriutions, wheres the fourth model effectively generlizes from the finite smple set to the infinite set f() n ; n > 0g. The crucil point is tht ech of these models cn e found y loclly mximizing the posterior proility of the HMM, under wide rnge of priors (see elow). Also, further merging in the lst model structure shown produces lrge penlty in the likelihood term, therey decresing the posterior. The lgorithm thus stops t this point. Prior distriutions. Our pproch hs een to choose reltively uninformtive priors, which spred the prior proility cross ll possile HMMs without giving explicit preference to prticulr topologies. A model M is defined y its structure (topology) M S nd its continuous prmeter settings M. The prior my therefore e decomposed s P(M) = P(M S )P( M jm S ) : Model structures receive prior proility ccording to their description length, i.e., P(M S ) / exp( `(M S )); where `(M S ) is the numer of its required to encode M S, e.g., y listing ll trnsitions nd emissions. The prior proilities for M, on the other hnd, re ssigned using Dirichlet distriution for ech of the trnsition nd emission multinomil prmeters, similr to the Byesin decision tree induction method of Buntine (1992). (The prmeter prior effectively spreds the posterior proility s if certin numer of evenly distriuted virtul smples hd een oserved for ech trnsition nd emission.) For convenience we ssume tht the prmeters ssocited with ech stte re priori independent. There re three intuitive wys of understnding why simple priors like the ones used here led to higher posterior proilities for simpler HMMs, other things eing equl: Smller topologies hve smller description length, nd hence higher prior proility. This corresponds to the intuition tht lrger structure needs to e picked from mong lrger rnge of possile eqully sized lterntives, thus mking ech individul choice less prole priori. Lrger models hve more prmeters, thus mking ech prticulr prmeter setting less likely (this is the Occm fctor (Gull 1988)). After two sttes hve een merged, the effective mount of dt per prmeter increses (the evidence for the merged sustructures is pooled). This shifts nd peks the posterior distriutions for those prmeters closer to their mximum likelihood settings. These principles lso pply muttis mutndis to the other pplictions of model merging inference.

5 Posterior computtion. Recll tht the trget of the inference procedure is the model structure, hence the gol is to mximize the posterior P(M S jx) / P(M S )P(XjM S ) The mthemticl reson why one wnts to mximize P(M S jx), rther thn simply P(MjX), is tht for inference purposes model with high posterior structure represents etter pproximtion to the Byes-optiml procedure of verging over ll possile models M, including oth structures nd prmeter settings (see Stolcke & Omohundro (1994:17f.) for detils). The evlution of the second term ove involves the integrl over the prmeter prior, P(XjM S ) = Z M P( MjM S )P(XjM S ; M )d M ; which cn e pproximted using the common Viteri ssumption out smple proilities in HMMs (which in our cse tends to e correct due to the wy the HMM structures re initilly constructed). Applictions nd results. We compred the model merging strtegy pplied to HMMs ginst the stndrd Bum-Welch procedure when pplied to fully prmeterized, rndomly initilized HMM structure. The ltter represents one potentil pproch to the structure finding prolem, effectively turning it into prmeter estimtion prolem, ut it fces the prolem of locl mxim in the prmeter spce. Also, in Bum-Welch pproch the numer of sttes in the HMM hs to e known, guessed or estimted in dvnce, wheres the merging pproch chooses tht numer dptively from the dt. Both pproches need to e evluted empiriclly. First, we tested the two methods on few simple regulr lnguges tht we turned into HMMs y ssigning uniform proilities to their corresponding finite-stte utomt. Trining proceeded using either rndom or structure covering sets of smples. The merging pproch relily inferred these dmittedly simple HMM structures. However, the Bum-Welch estimtor turned out to e extremely sensitive to the initil prmeter settings nd filed on more thn hlf of the trils to find resonle structure, oth with miniml nd redundnt numers of sttes. 1 Second, we tested merging nd Bum-Welch (nd numer of other methods) on set of nturlly occurring dt tht might e modeled y HMMs. The tsk ws to derive phonetic pronuncition models from ville trnscriptions in the TIMIT speech dtse. In this cse, the Bum-Welch-derived model structures turned out to e close in generliztion performnce to the slightly etter merged models (s mesured y cross-entropy on test set). 2 However, to chieve this performnce the Bum-Welch HMMs mde use of out twice s mny trnsitions s the more compct merged HMMs, which would hve serious impct on potentil pplictions of such models in speech recognition. Finlly, the HMM merging lgorithm ws integrted into the trining of medium-scle spoken lnguge understnding system (Wooters & Stolcke 1994). Here, the lgorithm lso serves the purpose of inducing multi-pronuncition word models from speech dt, ut it is now coupled with seprte process tht estimtes the coustic emission likelihoods for the HMM sttes. The gol of this setup ws to improve the system s performnce over comprle 1 Cse studies of the structures, under-generliztions nd overgenerliztions found in this experiment cn e found in Stolcke & Omohundro (1994). 2 rgued tht this domin is slightly simpler, since it contins, for exmple, no looping HMM structures.

6 system tht used only the stndrd single-pronuncition HMMs for ech word, while remining prcticl in terms of trining cost nd recognition speed. By using the more complex, merged HMMs the word error ws indeed reduced significntly (from 40.6% to 32.1%), indicting tht the pronuncition models produced y the merging process were t lest dequte for this kind of tsk. 3.2 Clss-sed n-grm Models Brown et l. (1992) descrie method for uilding clss-sed n-grm models from dt. Such models express the trnsition proilities etween words not directly in terms of individul word types, ut rther etween word ctegories, or clsses. Ech clss, in turn, hs fixed emission proilities for the individul words. One potentil dvntge of this pproch is tht it cn drsticlly reduce the numer of prmeters ssocited with ordinry n-grm models, y effectively shring prmeters etween similrly distriuted words. To infer word clsses utomticlly, Brown et l. (1992) suggest n lgorithm tht successively merges clsses ccording to mximum-likelihood criterion, until trget numer of clsses is reched. From our perspective we cn cst their lgorithm s n instnce of model merging, the essentil difference eing the non-byesin (likelihood-sed) criterion guiding the merging nd stopping. In fct, in retrospect, clss merging in n-grm grmmrs cn e understood s specil cse of HMM merging. A clss-sed n-grm model cn e strightforwrdly expressed s specil form of HMM in which ech clss corresponds to stte, nd trnsition proilities correspond to clss n-grm proilities. 3.3 Stochstic Context-Free Grmmrs Bsed on the model merging pproch to HMM induction, we hve extended the lgorithm to pply to stochstic context-free grmmrs (SCFGs), the proilistic generliztion of CFGs (Booth & Thompson 1973; Jelinek et l. 1992). A more detiled description of SCFG model merging cn e found in Stolcke (1994). Dt incorportion. To incorporte new smple string into SCFG we cn simply dd top-level production (for the strt nonterminl S) tht covers the smple precisely. For exmple, the grmmr t the top of Figure 2 rises from the smples f; ; g. Insted of letting terminl symols pper in production right-hnd sides, we lso crete one nonterminl for ech oserved terminl, which simplifies the merging opertors. Merging. The ovious nlog of merging HMM sttes is the merging of nonterminls in SCFG. This is indeed one of the strtegies used to generlize given SCFG, nd it cn potentilly produce inductive leps y generting grmmr tht genertes more thn its predecessor, while reducing the size of the grmmr. However the hllmrk of context-free grmmrs re the hierrchicl, center-emedding structures they cn represent. We therefore introduce second opertor clled chunking. It tkes given sequence of nonterminls nd revites it using newly creted nonterminl, s illustrted y the sequence AB in the second grmmr of Figure 2. In tht exmple, one more chunking step, followed y two merging steps produces grmmr for the lnguge

7 S! AB! AABB! AAABBB A! B! Chunk (AB)! X: S! X! AXB! AAXBB X! AB Chunk (AXB)! Y : S! X! Y! AY B X! AB Y! AXB Merge S; Y : S! X! ASB X! AB Merge S; X: S! AB! ASB Figure 2: Model merging for SCFGs. f n n ; n > 0g. (The proilities in the grmmr re implicit in the usge counts for ech production, nd re not shown in the figure.) Priors. As efore, we split the prior for grmmr M into contriution for the structurl spects M S, nd one for the continuous prmeter settings M. The gol is to mximize the posterior of the structure given the dt, P(M S jx). For P(M S ) we gin use description length-induced distriution, otined y simple enumertive encoding of the grmmr productions (ech occurrence of nonterminl contriutes log N its to the description length, where N is the numer of nonterminls). For P( M jm S ) we oserve tht the production proilities ssocited with given left-hnd side form multinomil, nd so we use symmetricl Dirichlet priors for these prmeters.

8 Lnguge Smple no. Grmmr Serch Prentheses 8 S! () j (S) j SS BF 2n 5 S! j SS BF () n 5 S! j SS BF n n 5 S! j S BF wcw R ; w 2 f; g 7 S! c j S j S BS(3) Addition strings 23 S! j j (S) j S + S BS(4) Shpe grmmr 11 S! dy j Y S Y! j cy BS(4) Bsic English 25 S! I m A j he T j she T j it T! they V j you V j we V! this C j tht C T! is A V! re A Z! mn j womn A! there j here C! is Z j ZT BS(3) Tle 1: Test grmmrs from Cook et l. (1976). Serch methods re indicted y BF (est-first) or BS(n) (em serch with width n). Serch. In the cse of HMMs, greedy merging strtegy (lwys pursuing only the loclly most promising choice) seems to give generlly good results. Unfortuntely, this is no longer true in the extended SCFG merging lgorithm. The chief reson is tht chunking steps typiclly require severl following merging steps nd/or dditionl chunking steps to improve grmmr s posterior score. To ccount for this compliction, we use more elorte em serch tht considers numer of reltively good grmmrs in prllel, nd stops only fter certin neighorhood of lterntive models hs een serch without producing further improvements. The experiments reported elow use smll em widths (etween 3 nd 10). Forml lnguge experiments. We strt y exmining the performnce of the lgorithm on exmple grmmrs found in the literture on other CFG induction methods. Cook et l. (1976) use collection of techniques relted to ours for inferring proilistic CFGs from smple distriutions, rther thn solute smple counts (see discussion in the next section). These lnguges nd the inferred grmmrs re summrized in Tle 1. They include clssic textook exmples of CFGs (the prenthesis lnguge, rithmetic expressions) s well s simple grmmrs ment to model empiricl dt. We replicted Cook s results y pplying the lgorithm to the sme smll sets of high proility strings s used in Cook et l. (1976). (The numer of distinct smple strings is given in the second column of Tle 1.) Since the Byesin frmework mkes use of the ctul oserved smple counts, we scled these to sum to 50 for ech trining corpus. The Byesin merging procedure produced the trget grmmrs in ll cses, using different

9 levels of sophistiction in the serch strtegy (s indicted y column 4 in Tle 1). Since Cook s lgorithm uses very different, non-byesin formliztion of the dt fit vs. grmmr complexity trde-off we cn conclude tht the exmple grmmrs must e quite roust to vriety of resonle implementtions of this trde-off. A more difficult lnguge tht Cook et l. (1976) list s eyond the scope of their lgorithm cn lso e inferred, using em serch: the plindromes ww R ; w 2 f; g. We ttriute this improvement to the more flexile serch techniques used. Nturl Lnguge syntx. An ovious question rising for SCFG induction lgorithms is whether they re sufficient for deriving dequte models from relistic corpor of nturlly occurring smples, i.e., to utomticlly uild models for nturl lnguge processing pplictions. Preliminry experiments on such corpor hve yielded mixed results, which led us to conclude tht dditionl methods will e necessry for success in this re. A fundmentl prolem is tht ville dt will typiclly e sprse reltive to the complexity of the trget grmmrs, i.e., not ll constructions will e represented with sufficient coverge to llow the induction of correct generliztions. We re currently investigting techniques to incorporte dditionl, independent sources of generliztion. For exmple, prt-of-speech tgging phse prior to SCFG induction proper could reduce the work of the merging lgorithm considerly. Given these difficulties with lrge-scle nturl lnguge pplictions, we hve resorted to smller experiments tht try to determine whether certin fundmentl structures found in NL grmmrs cn in principle e identified y the Byesin frmework proposed here. In Stolcke (1994) numer of phenomen re exmined, including Lexicl ctegoriztion Nonterminl merging ssigns terminl symols to common nonterminls whenever there is sustntil overlp in the contexts in which they occur. Phrse structure strction Stndrd phrsl ctegories such s noun phrses, prepositionl nd ver phrses re creted y chunking ecuse they llow more compct description of the grmmr y reviting common colloctions, nd/or ecuse they llow more succinct generliztions (in comintion with merging) to e stted. Agreement Co-vrition in the forms of co-occurring syntctic or lexicl elements (e.g., numer greement etween suject nd vers in English) is induced y merging of nonterminls. However, even in this lerning frmework it ecomes cler tht CFGs (s opposed to, sy, feture-se grmmr formlisms) re n indequte representtion for these phenomen. The usul low-up in grmmr size to represent greement in CFG form cn lso cuse the wrong phrse structure rcketing to e prefered y the simplicity is. Recursive phrse structure Recursive nd itertive productions for phenomen such s emedded reltive cluses cn e induced using the chunking nd merging opertors. We conclude with smll grmmr exhiiting recursive reltive cluse emedding, from Lngley (1994). The trget grmmr hs the form S --> NP VP VP --> Ver NP NP --> Art Noun --> Art Noun RC

10 RC --> Rel VP Ver --> sw herd Noun --> ct dog mouse Art --> the Rel --> tht with uniform proilities on ll productions. Chunking nd merging of 100 rndom smples produces grmmr tht is wekly equivlent to the ove grmmr. It lso produced essentilly identicl phrse structure, except for more compct implementtion of the recursion through RC: S --> NP VP VP --> V NP NP --> DET N --> NP RC RC --> REL VP DET --> --> the N --> ct --> dog --> mouse REL --> tht V --> herd --> sw 4 Relted work Mny of the ingredients of the model merging pproch hve een used seprtely in vriety of settings. Successive merging of sttes (or stte equivlence clss construction) is technique widely used in lgorithms for finite-stte utomt (Hopcroft & Ullmn 1979) nd utomt lerning (Angluin & Smith 1983); recent ppliction to proilistic finite-stte utomte is Crrsco & Oncin (1994). Bell et l. (1990) nd Ron et l. (1994) descrie method for lerning deterministic finitestte models tht is in sense the opposite of the merging pproch: successive stte splitting. In this frmework, ech stte represents unique suffix of the input, nd sttes re repetedly refined y extending the suffixed represented, s long s this move improves the model likelihood y certin minimum mount. The clss of models thus lernle is restricted, since ech stte cn mke predictions sed only on inputs within ounded distnce from the current input, ut the pproch hs other dvntges, e.g., the finl numer of sttes is typiclly smller thn for merging lgorithm, since the tendency is to overgenerlize, rther thn undergenerlize. We re currently investigting stte splitting s complementry serch opertor in our merging lgorithms. Horning (1969) first proposed using Byesin formultion to cpture the trde-off etween grmmr complexity nd dt fit. His lgorithm, however, is sed on serching for the grmmr with the highest posterior proility y enumerting ll possile grmmrs (such tht one cn

11 tell fter finite numer of steps when the optiml grmmr hs een found). Unfortuntely, the enumertion pproch proved to e infesile for prcticl purposes. The chunking opertion used in SCFG induction is prt of numer of lgorithms imed t CFG induction, including Cook et l. (1976), Wolff (1987), nd Lngley (1994), where it is typiclly pired with other opertions tht hve effects similr to merging. However, only the lgorithm of Cook et l. (1976) hs proilistic CFGs s the trget of induction, nd therefore merits closer comprison to our pproch. A mjor conceptul difference of Cook s pproch is tht it is sed on n informtiontheoretic qulity mesure tht depends only on the reltive frequencies of oserved smples. The Byesin pproch, on the other hnd, explicitly tkes into ccount the solute frequencies of the dt. Thus, the mount of dt ville not only its distriution hs n effect on the outcome. For exmple, hving oserved the smples ; ; ;, model of f n ; n > 0g is quite likely. On the other hnd, if the sme smples were oserved hundred times, with no other dditionl dt, such conclusion should e intuitively unlikely, lthough the smple strings themselves nd their reltive frequencies re unchnged. The Byesin nlysis confirms this intuition: 100-fold smple frequency entils 100-fold mgnifiction of the log-likelihood loss incurred for ny generliztion, which would lock the inductive lep to model for f n ; n > 0g. Incidentlly, one cn use smple frequency s principled device to control the degree of generliztion in Byesin induction lgorithm explicitly (Quinln & Rivest 1989; Stolcke & Omohundro 1994). 5 Future directions Since ll lgorithms presented here re of generte-nd-evlute kind, they re trivil to integrte with externl sources of constrints or informtion out possile cndidte models. Externl structurl constrints cn e used to effectively set the prior (nd therefore posterior) proility for certin models to zero. We hope to explore more informed priors nd constrints to tckle lrger prolems, especilly in the SCFG domin. In retrospect, the merging opertions used in our proilistic grmmr induction lgorithms shre strong conceptul nd forml similrity to those used y vrious induction methods for non-proilistic grmmrs (Angluin & Smith 1983; Skkir 1990). Those lgorithms re typiclly sed on constructing equivlence clsses of sttes sed on some criterion of distinguishility. Intuitively, the (difference in) posterior proility used to guide the Byesin merging process represents fuzzy, proilistic version of such n equivlence criterion. This suggests looking for other non-proilistic induction methods of this kind nd dpting them to the Byesin pproch. A promising cndidte we re currently investigting is the trnsducer inference lgorithm of Oncin et l. (1993). 6 Conclusions We hve presented Byesin model merging frmework for inducing proilistic grmmrs from smples, y stepwise generliztion from smple-sed d-hoc model through successive merging opertors. The frmework is quite generl nd cn therefore e instntited for vriety of stndrd or novel clsses of proilistic models, s demonstrted here for HMMs nd SCFGs.

12 The HMM merging vrint, which is empiriclly more relile for structure induction thn Bum-Welch estimtion, is eing used successfully in speech modeling pplictions. The SCFG version of the model lgorithm generlizes nd simplifies numer of relted lgorithms tht hve een proposed previously, thus showing how the Byesin posterior proility criterion cn comine dt fit nd model simplicity in uniform nd principled wy. The more complex model serch spce encountered with SCFGs lso highlights the need for reltively sophisticted serch strtegies. References ANGLUIN, D., & C. H. SMITH Inductive inference: Theory nd methods. ACM Computing Surveys BAKER, JAMES K Trinle grmmrs for speech recognition. In Speech Communiction Ppers for the 97th Meeting of the Acousticl Society of Americ, ed. y Jred J. Wolf & Dennis H. Kltt, , MIT, Cmridge, Mss. BAUM, LEONARD E., TED PETRIE, GEORGE SOULES, & NORMAN WEISS A mximiztion technique occuring in the sttisticl nlysis of proilistic functions in Mrkov chins. The Annls of Mthemticl Sttistics BELL, TIMOTHY C., JOHN G. CLEARY, & IAN H. WITTEN Text Compression. Englewood Cliffs, N.J.: Prentice Hll. BOOTH, TAYLOR L., & RICHARD A. THOMPSON Applying proility mesures to strct lnguges. IEEE Trnsctions on Computers C BROWN, PETER F., VINCENT J. DELLA PIETRA, PETER V. DESOUZA, JENIFER C. LAI, & ROBERT L. MERCER Clss-sed n-grm models of nturl lnguge. Computtionl Linguistics BUNTINE, WRAY Lerning clssifiction trees. In Artificil Intelligence Frontiers in Sttistics: AI nd Sttistics III, ed. y D. J. Hnd. Chpmn & Hll. CARRASCO, RAFAEL C., & JOSÉ ONCINA, Lerning stochstic regulr grmmrs y mens of stte merging method. This volume. COOK, CRAIG M., AZRIEL ROSENFELD, & ALAN R. ARONSON Grmmticl inference y hill climing. Informtion Sciences DEMPSTER, A. P., N. M. LAIRD, & D. B. RUBIN Mximum likelihood from incomplete dt vi the EM lgorithm. Journl of the Royl Sttisticl Society, Series B GULL, S. F Byesin inductive inference nd mximum entropy. In Mximum Entropy nd Byesin Methods in Science nd Engineering, Volume 1: Foundtions, ed. y G. J. Erickson & C. R. Smith, Dordrecht: Kluwer. HOPCROFT, JOHN E., & JEFFREY D. ULLMAN Introduction to Automt Theory, Lnguges, nd Computtion. Reding, Mss.: Addison-Wesley.

13 HORNING, JAMES JAY A study of grmmticl inference. Technicl Report CS 139, Computer Science Deprtment, Stnford University, Stnford, C. JELINEK, FREDERICK, JOHN D. LAFFERTY, & ROBERT L. MERCER Bsic methods of proilistic context free grmmrs. In Speech Recognition nd Understnding. Recent Advnces, Trends, nd Applictions, ed. y Pietro Lfce & Rento De Mori, volume F75 of NATO Advnced Sciences Institutes Series, Berlin: Springer Verlg. Proceedings of the NATO Advnced Study Institute, Cetrro, Itly, July LANGLEY, PAT, Simplicity nd representtion chnge in grmmr induction. Unpulished mss. OMOHUNDRO, STEPHEN M Best-first model merging for dynmic lerning nd recognition. Technicl Report TR , Interntionl Computer Science Institute, Berkeley, C. ONCINA, JOSÉ, PEDRO GARCÍA, & ENRIQUE VIDAL Lerning susequentil trnsducers for pttern recognition interprettion tsks. IEEE Trnsctions on Pttern Anlysis nd Mchine Intelligence QUINLAN, J. ROSS, & RONALD L. RIVEST Inferring decision trees using the minimum description length principle. Informtion nd Computtion RABINER, L. R., & B. H. JUANG An introduction to hidden Mrkov models. IEEE ASSP Mgzine RON, DANA, YORAM SINGER, & NAFTALI TISHBY The power of mnesi. In Advnces in Neurl Informtion Processing Systems 6, ed. y Jck Cown, Gerld Tesuro, & Joshu Alspector. Sn Mteo, CA: Morgn Kufmnn. SAKAKIBARA, YASUBUMI Lerning context-free grmmrs from structurl dt in polynomil time. Theoreticl Computer Science STOLCKE, ANDREAS, Byesin Lerning of Proilistic Lnguge Models. Berkeley, CA: University of Cliforni disserttion., & STEPHEN OMOHUNDRO Best-first model merging for hidden Mrkov model induction. Technicl Report TR , Interntionl Computer Science Institute, Berkeley, CA. WOLFF, J. G Cognitive development s optimistion. In Computtionl models of lerning, ed. y L. Bolc, Berlin: Springer Verlg. WOOTERS, CHUCK, & ANDREAS STOLCKE Multiple-pronuncition lexicl modeling in speker-independent speech understnding system. In Proceedings Interntionl Conference on Spoken Lnguge Processing, Yokohm.

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Outline. Two combinatorial optimization problems in machine learning. Talk objectives. Grammar induction. DFA induction.

Outline. Two combinatorial optimization problems in machine learning. Talk objectives. Grammar induction. DFA induction. Outline Two comintoril optimiztion prolems in mchine lerning Pierre.Dupont@uclouvin.e 1 Feture selection ICTEAM Institute Université ctholique de Louvin Belgium My 1, 011 P. Dupont (UCL Mchine Lerning

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM Interntionl Journl of Innovtive Reserch in Advnced Engineering (IJIRAE) Volume1 Issue1 (Mrch 2014) A Comprison of the Discretiztion Approch for CST nd Discretiztion Approch for VDM Omr A. A. Shib Fculty

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

Inference of node replacement graph grammars

Inference of node replacement graph grammars Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. Intelligent Dt Anlysis (27) 24 IOS Press Inference of node replcement grph grmmrs Jcek P. Kukluk, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

Statistical classification of spatial relationships among mathematical symbols

Statistical classification of spatial relationships among mathematical symbols 2009 10th Interntionl Conference on Document Anlysis nd Recognition Sttisticl clssifiction of sptil reltionships mong mthemticl symbols Wl Aly, Seiichi Uchid Deprtment of Intelligent Systems, Kyushu University

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

CSEP 573 Artificial Intelligence Winter 2016

CSEP 573 Artificial Intelligence Winter 2016 CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yroslvsky. Fundmentls of Digitl Imge Processing. Course 0555.330 Lecture. Imge enhncement.. Imge enhncement s n imge processing tsk. Clssifiction of imge enhncement methods Imge enhncement is processing

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Efficient Rerouting Algorithms for Congestion Mitigation

Efficient Rerouting Algorithms for Congestion Mitigation 2009 IEEE Computer Society Annul Symposium on VLSI Efficient Rerouting Algorithms for Congestion Mitigtion M. A. R. Chudhry*, Z. Asd, A. Sprintson, nd J. Hu Deprtment of Electricl nd Computer Engineering

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association 9. CISC - Curriculum & Instruction Steering Committee The Winning EQUATION A HIGH QUALITY MATHEMATICS PROFESSIONAL DEVELOPMENT PROGRAM FOR TEACHERS IN GRADES THROUGH ALGEBRA II STRAND: NUMBER SENSE: Rtionl

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

TO REGULAR EXPRESSIONS

TO REGULAR EXPRESSIONS Suject :- Computer Science Course Nme :- Theory Of Computtion DA TO REGULAR EXPRESSIONS Report Sumitted y:- Ajy Singh Meen 07000505 jysmeen@cse.iit.c.in BASIC DEINITIONS DA:- A finite stte mchine where

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

documents 1. Introduction

documents 1. Introduction www.ijcsi.org 4 Efficient structurl similrity computtion etween XML documents Ali Aïtelhdj Computer Science Deprtment, Fculty of Electricl Engineering nd Computer Science Mouloud Mmmeri University of Tizi-Ouzou

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it.

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it. 6.3 Volumes Just s re is lwys positive, so is volume nd our ttitudes towrds finding it. Let s review how to find the volume of regulr geometric prism, tht is, 3-dimensionl oject with two regulr fces seprted

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

CMPSC 470: Compiler Construction

CMPSC 470: Compiler Construction CMPSC 47: Compiler Construction Plese complete the following: Midterm (Type A) Nme Instruction: Mke sure you hve ll pges including this cover nd lnk pge t the end. Answer ech question in the spce provided.

More information

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables George Boole IT 3123 Hrdwre nd Softwre Concepts My 28 Digitl Logic The Little Mn Computer 1815 1864 British mthemticin nd philosopher Mny contriutions to mthemtics. Boolen lger: n lger over finite sets

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

Spectral Analysis of MCDF Operations in Image Processing

Spectral Analysis of MCDF Operations in Image Processing Spectrl Anlysis of MCDF Opertions in Imge Processing ZHIQIANG MA 1,2 WANWU GUO 3 1 School of Computer Science, Northest Norml University Chngchun, Jilin, Chin 2 Deprtment of Computer Science, JilinUniversity

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

Parallel Square and Cube Computations

Parallel Square and Cube Computations Prllel Squre nd Cube Computtions Albert A. Liddicot nd Michel J. Flynn Computer Systems Lbortory, Deprtment of Electricl Engineering Stnford University Gtes Building 5 Serr Mll, Stnford, CA 945, USA liddicot@stnford.edu

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

Lecture T4: Pattern Matching

Lecture T4: Pattern Matching Introduction to Theoreticl CS Lecture T4: Pttern Mtching Two fundmentl questions. Wht cn computer do? How fst cn it do it? Generl pproch. Don t tlk bout specific mchines or problems. Consider miniml bstrct

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

Registering as a HPE Reseller. Quick Reference Guide for new Partners in Asia Pacific

Registering as a HPE Reseller. Quick Reference Guide for new Partners in Asia Pacific Registering s HPE Reseller Quick Reference Guide for new Prtners in Asi Pcific Registering s new Reseller prtner There re five min steps to e new Reseller prtner. Crete your Appliction Copyright 2017 Hewlett

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

ZZ - Advanced Math Review 2017

ZZ - Advanced Math Review 2017 ZZ - Advnced Mth Review Mtrix Multipliction Given! nd! find the sum of the elements of the product BA First, rewrite the mtrices in the correct order to multiply The product is BA hs order x since B is

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

On Achieving Optimal Throughput with Network Coding

On Achieving Optimal Throughput with Network Coding In IEEE INFOCOM On Achieving Optiml Throughput with Network Coding Zongpeng Li, Bochun Li, Dn Jing, Lp Chi Lu Astrct With the constrints of network topologies nd link cpcities, chieving the optiml end-to-end

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Text mining: bag of words representation and beyond it

Text mining: bag of words representation and beyond it Text mining: bg of words representtion nd beyond it Jsmink Dobš Fculty of Orgniztion nd Informtics University of Zgreb 1 Outline Definition of text mining Vector spce model or Bg of words representtion

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 80Q Multiply two frctions or frction nd whole numer. (QT N ) Excerpted from: The Mth Lerning Center PO Box 99, Slem, Oregon 9709 099 www.mthlerningcenter.org

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information