OpenNWA: Nested-Word Automata

Size: px
Start display at page:

Download "OpenNWA: Nested-Word Automata"

Transcription

1 OpenNWA: Nested-Word Automt Evn Driscoll 1 Adity Thkur 1 Amnd Burton 1 Thoms Reps 1,2 1 University of Wisconsin Mdison Computer Sciences Deprtment {driscoll,di,urton,reps}@cs.wisc.edu 2 GrmmTech, Inc. Astrct Nested-word utomt (NWAs) re lnguge formlism tht helps ridge the gp etween finite-stte utomt nd pushdown utomt. NWAs cn express some context-free properties, such s prenthesis mtching, yet retin ll the desirle closure chrcteristics of finite-stte utomt. This document descries OpenNWA, C++ implementtion of NWAs. Open- NWA provides the expected utomt-theoretic opertions, such s intersection, determiniztion, nd complementtion, s well s query opertions. It is pckged with WALi W eighted Automt Lirry nd interopertes closely with the weighted pushdown system portions of WALi. The OpenNWA lirry is ville from It is pckged with the W eighted Automt Lirry, WALi. WALi 4.0 is pired with the first pulic relese of OpenNWA, which is wht this document descries. Building instructions cn e found in README.txt in the root directory of the OpenNWA/WALi distriution. We ssume the reder is fmilir with nested words nd nested-word utomt; if not, App. A provides the definition we use nd Alur et l. [2, 3] descrie them in detil. Our implementtion corresponds to the definition from the DLT 2006 pper [2]; reltive to follow-up work [3], this definition removes the distinction etween the mchine s liner nd hierrchicl sttes, nd on cll trnsition lwys stcks the source stte. In ddition, we only require tht the finl liner stte e ccepting for n NWA to ccept word. (Following [3], this specific vrint is wekly-hierrchicl, linerly-ccepting NWA.) The terminology used in the interfce of the pckge is gered towrd progrm nlysis, so we use terms such s cll site nd return site to refer to sttes of the NWA even though they tke on different menings in different contexts. In contrst to the stndrd definitions, we llow internl ε trnsitions (ut not cll or return ε trnsitions). We lso llow wild trnsitions, which mtch ny single symol. (Wilds cn pper on clls nd returns.) Supported y NSF under grnts CCF-{ , , }, y ONR under grnts N00014-{ , 10-M-0251}, y ARL under grnt W911NF , nd y AFRL under grnts FA nd FA C Any opinions, findings, nd conclusions or recommendtions expressed in this mteril re those of the uthor(s) nd do not necessrily reflect the views of the sponsoring gencies. 1

2 Contents 1 Lirry Overview NWA core clsses NWA non-memer functions Supporting fetures from WALi Simple exmple use of the lirry The NestedWord clss 8 3 The Nw clss Construction, copying, ssignment, nd clering Simple mnipultions Client Informtion The opennw::query nmespce Querying informtion out n utomton s trnsitions Querying other structurl spects of n utomton Querying properties of n utomton s lnguge NWA seriliztion Prser Exmples NWA description formt Building NWAs from other NWAs (nmespce opennw::construct) Union Intersection Conctention Kleene str Reverse Determinize Complement Conversions eween WPDSs nd NWAs (nmespce opennw::nw pds) WPDS to NWA NWA to WPDS Forwrds flow stcking clls Bckwrds flow stcking clls Forwrds flow stcking returns Bckwrds flow stcking returns A Nested-Word Automt 40 B Determinize 42 C Tles 44 2

3 1 Lirry Overview The core of the NWA lirry is in the nmespce opennw. It includes the clsses NestedWord, Nw, nd others. There re lrge numer of non-memer functions tht operte on NWAs. These re divided into three su-nmespces: opennw::query, opennw::construct, nd opennw::nw pds. There is lso n experimentl prser in opennw. Finlly, there re some opertions nd clsses provided y WALi tht OpenNWA uses. This mostly consists of the key-hndling code, ut lso includes the WPDS portions of WALi. Throughout this document, include pths re reltive to the Source directory from the WALi/OpenNWA distriution. Section 1.4 illustrtes simple smple use of the lirry. 1.1 NWA core clsses These types live in nmespce opennw: NestedWord Implements single nested word. Currently the only opertion tht cn e performed using one is checking whether NestedWord is memer of n Nw s lnguge. Defined in opennw/nestedword.hpp. See 2. Nw Implements nested-word utomton. Defined in opennw/nw.hpp with forwrd declrtion in opennw/nwfwd.hpp. See 3. ( 5 lso discusses memer functions relted to serilizing n NWA.) ref ptr<t> This is reference-counted, intrusive smrt-pointer templte. ( Intrusive mens tht the pointed-to type must e modified to include count field used y the pointer. This is most conveniently done y suclssing wli::countle.) Defined in wli/ref ptr.hpp. However, using wli::ref ptr declrtion rings this type into the opennw nmespce s well; we suggest preferring the ltter version. In ddition, there re typedefs of oth ref ptr<nw> nd ref ptr<nestedword>, nd we suggest preferring those nmes. NwRefPtr NwRefPtr is typedef of ref ptr<nw>. Defined in opennw/nwfwd.hpp. Stte nd Symol These re oth typedefs of wli::key (see elow). Both re defined in opennw/nwfwd.hpp. Note: the use of keys for oth sttes nd symols mens tht they cn e confused from types perspective, so use cution tht this does not hppen. In ddition, future version of the lirry my chnge Stte nd Symol to e distinct types. StteSet nd SymolSet These re typedefs of std::sets of the corresponding type. (Client code should not rely on this fct; it could chnge to e n unordered mp or other similr continer.) Both re defined in opennw/nwfwd.hpp. ClientInfo This clss holds client-specific informtion tht is ssocited with stte. Client code cn suclss ClientInfo nd use functions of the Nw clss to ttch instnces to sttes. To use, include opennw/clientinfo.hpp. See 3.3. WeightGen This strct clss descries how to ssign weights to trnsitions of the NWA, nd is used oth when converting n NWA to PDS nd when doing prestr/poststr queries on the NWA directly. To use, include opennw/weightgen.hpp. See

4 1.2 NWA non-memer functions The lirry provides lrge numer of non-memer functions for operting on NWAs. These re prtitioned into the following nmespces: opennw::query This nmespce provides functions for querying n utomton. The functions in this nmespce re: Functions tht query trnsitions of the NWA, returning informtion out one of the sttes or the symol in trnsitions tht meet some criteri. (For exmple, give me ll sttes tht pper s the trget stte of n internl trnsition with this source stte. ) See 4.1. Determining whether two NWAs hve ny sttes in common. See 4.2. Determining whether n Nw is deterministic. See 4.2. Determining whether NestedWord is memer of the lnguge of n NWA. See 4.3. Determining whether the lnguge of n NWA is empty. See 4.3. Determining whether the lnguges of two NWAs re equl, or if one is suset of the other. See 4.3. Prestr nd poststr opertions on n NWA. See 4.3. opennw::construct This nmespce provides functions for constructing n NWA. Functions in this nmespce re stndrd, utomt-theoretic opertions such s intersection, union, conctention, Kleene str, nd reversl. See 6. opennw::nw pds This nmespce provides functions for converting etween NWAs nd WPDSs, nd relted functions. (Note tht construction of n NWA from WPDS is in this nmespce, not in construct.) See 7. opennw This nmespce, in ddition to the items lredy discussed, provides n experimentl prser for seriliztion formt of NWAs. See Supporting fetures from WALi The NWA lirry uses these portions of the stndrd WALi lirry. Unless otherwise specified, they re in nmespce wli. See [5]. Key A Key is unique identifier nming sttes nd symols in n NWA or NestedWord. It is ctully typedef of n integer, though client code should not rely on this precise type. Declred in wli/key.hpp. Key getkey(...) This function produces key from some input. If the input hs not een seen efore, it returns new key; if it hs, then it returns the sme key s efore. (One cn think of this s trnslting whtever unique identifier is known y the client code to the Key needed y OpenNWA nd WALi.) Overlods of this function re provided for the following types: std::string const & chr * int std::set<key> const & Key, Key (this is two-rgument version of getkey) key src t The finl overlod, for key src t, is ref ptr to KeySource oject. KeySource is n strct clss tht client code cn suclss to provide trnsltion to Keys for ritrry types. 4

5 All overlods re declred in wli/key.hpp. std::string key2str(key k) key2str is n inverse of getkey, except tht it lwys returns string representtion. If the key ws creted with getkey(std::string const &) in the first plce, this returns copy of the originl string. 1 Declred in wli/key.hpp. wli::wf::wfa This is weighted finite utomton. It is used in NWA poststr nd prestr queries in much the sme mnner s in WPDS poststr nd prestr queries. Defined in wli/wf/wfa.hpp. wli::wpds::wpds This is weighted pushdown system. NWAs use WPDSs ehind the scenes when doing poststr nd prestr queries, nd the lirry provides conversion routines etween the two utomt types. Defined in wli/wpds/wpds.hpp. 1.4 Simple exmple use of the lirry The following is n illustrtion of sic opertions of the lirry. The full code is ville in the file Doc/NWA tex/exmple/exmple.cpp in the OpenNWA distriution. // This progrm will wlk through creting the NWA shown in Figure 4 of the // ssocited documenttion, nd reverse it (to get the result of Figure 12). // We then mke NestedWord of the one word in Figure 4 s lnguge nd // nother NestedWord of the one word in Figure 12 s lnguge, then test tht // ech is memer of just the pproprite NWA. #include <iostrem> using std::cout; #include "wli/nw/nwa.hpp" #include "wli/nw/nestedword.hpp" #include "wli/nw/construct/reverse.hpp" #include "wli/nw/query/lnguge.hpp" using wli::getkey; using nmespce wli::nw; using wli::nw::construct::reverse; using wli::nw::query::lngugecontins; // These symols re used in the NWA nd oth words Symol const sym_ = getkey(""); Symol const sym_ = getkey(""); Symol const sym_cll = getkey("cll"); Symol const sym_ret = getkey("ret"); /// Cretes the NWA shown in Figure 4 of the Wli NWA documenttion NWA crete_figure_4() { NWA out; // Trnslte the nmes of the sttes then symols to Wli identifiers Stte strt = getkey("strt"); 1 For ints, it is the textul representtion of the originl numer. For set<key>, it is the set printed in stndrd set nottion. For pir of keys, it is the pir printed s n ordered pir. In generl, it is the result of clling key src t::print. See the print function in ech of the files wli/*source.cpp. 5

6 Stte cll = getkey("cll"); Stte entry = getkey("entry"); Stte stte = getkey("stte"); Stte exit = getkey("exit"); Stte return_ = getkey("return"); Stte finish = getkey("finish"); // Add the trnsitions out.ddinternltrns(strt, sym_, cll); out.ddclltrns(cll, sym_cll, entry); out.ddinternltrns(entry, sym_, stte); out.ddinternltrns(stte, sym_, exit); out.ddreturntrns(exit, cll, sym_ret, return_); out.ddinternltrns(return_, sym_, finish); // Set the initil nd finl sttes out.ddinitilstte(strt); out.ddfinlstte(finish); } return out; /// Cretes (the one) word in the lnguge of Figure 4 s NWA NestedWord crete_forwrds_word() { NestedWord out; out.ppendinternl(sym_); out.ppendcll(sym_cll); out.ppendinternl(sym_); out.ppendinternl(sym_); out.ppendreturn(sym_ret); out.ppendinternl(sym_); return out; } /// Cretes (the one) word in the lnguge of the reverse of Figure 4 s NWA NestedWord crete_ckwrds_word() {... } int min() { // Crete the NWA nd reversed NWA NWA fig4 = crete_figure_4(); NWARefPtr fig4_reversed = reverse(fig4); // These re the words we re testing NestedWord forwrds_word = crete_forwrds_word(); NestedWord ckwrds_word = crete_ckwrds_word(); // Now do the tests cout << "fig4 contins:\n" 6

7 << " forwrds_word [expect 1] : " << lngugecontins(fig4, forwrds_word) << "\n" << " ckwrds_word [expect 0] : " << lngugecontins(fig4, ckwrds_word) << "\n" } << "fig4_reversed contins:\n" << " forwrds_word [expect 0] : " << lngugecontins(*fig4_reversed, forwrds_word) << "\n" << " ckwrds_word [expect 1] : " << lngugecontins(*fig4_reversed, ckwrds_word) << "\n"; 7

8 2 The NestedWord clss The clss opennw::nestedword provides support for uilding nd iterting over nested words; currently there is no support for modifying them (other thn ppending). The representtion used y this clss is closer to tht of word in visily-pushdown lnguge [3]. It holds the liner contents of word, ut does not store the nesting reltion explicitly. Insted, ech position in the word is nnotted with whether it is n internl, cll, or return position. The nesting reltion is induced y the mtchings etween clls nd returns. A position in word is represented y the (nested) structure NestedWord::Position. Position itself declres n enumertion Type, which hs the possile vlues CllType, InternlType, or ReturnType. A Position oject hs two (pulic) fields: Symol symol nd Position::Type type; these hold the symol t tht position nd the type of the position. The NestedWord clss hs just seven functions: void NestedWord::ppendInternl(Symol s) void NestedWord::ppendCll(Symol s) void NestedWord::ppendReturn(Symol s) These ppend the symol s to the liner word, nd nnotte tht position s n internl, cll, or return position, respectively. void NestedWord::ppend(Position p) Appends p to the word. size t NestedWord::size() const Returns the length of the word. NestedWord::const itertor NestedWord::egin() const NestedWord::const itertor NestedWord::end() const These return n itertor to the strt or end of the nested word. The type returned y dereferencing these itertors is Position oject. (There is no non-const version of these functions.) The only opertion the lirry currently supports on NestedWords, esides uilding them, is checking whether nested word is in n NWA s lnguge. This is done y the function opennw::query::lngugecontins(nw const &, NestedWord const &); see

9 3 The Nw clss 3.1 Construction, copying, ssignment, nd clering The NWA provides two constructors: the defult constructor nd the copy constructor. The defult constructor cretes n empty NWA. Therefter, client code cn dd or remove sttes, dd or remove symols, dd or remove trnsitions, nd set the sttus of certin sttes s initil or finl. The following functions re sic NWA opertions: Nw::Nw() Constructs n empty NWA. Nw::Nw(Nw const & other) Copies other; the utomt will not shre structure. Any client informtion tht is present is cloned. Nw& Nw::opertor=(Nw const & other) Assigns other to this; the utomt will not shre structure. Client informtion is cloned. ool Nw::opertor==(Nw const & other) Determines whether the two utomt re structurlly equl tht is, they contin exctly the sme set of sttes (including initil nd finl sttes), symols, nd trnsitions nd returns the result. (To test lnguge equlity, use the lngugeequls function covered in 4.) void Nw::cler() Removes ll sttes, symols, nd trnsitions from the NWA. 3.2 Simple mnipultions An Nw oject trcks the set of sttes, symols, nd trnsitions in the utomton. It lso trcks the set of initil nd ccepting sttes. Counting ech kind of trnsition seprtely, this gives 7 kinds of entities tht client code my need to mnipulte. For ech kind of entity, there re Nw memer functions to dd nd remove single entity, check whether prticulr entity is in the utomton, count the numer of entities of prticulr type, remove ll entities of prticulr type, nd retrieve the set of entities in the NWA. In ddition, there re memer functions for counting nd clering ll trnsitions, regrdless of the type. The nmes of these functions re very regulr, ut T. 1 (App. C) gives list. The only potentil difficulties re in the interctions etween the functions. Nturlly, removing stte or symol lso removes ll trnsitions involving it; likewise, clering ll sttes or clering ll symols lso clers ll trnsitions. (Clling removeinitilstte or removefinlstte does not cuse ripple effects.) Adding trnsition implicitly dds the sttes nd symol involved in tht trnsition if they re not lredy present; hence it is not necessry to explicitly dd ll sttes nd symols efore dding trnsitions. (It is quite resonle to uild n NWA y just dding initil sttes, finl sttes, nd trnsitions.) The system supports two met-symols: opennw::epsilon (ε) nd opennw::wild (@). EPSILON is the stndrd ε from utomt theory; it denotes tht trnsition cn e trversed without mtching nd consuming n input symol. Epsilon symols cnnot lel cll or return trnsitions. WILD is the ny symol; it mtches ny single symol. Becuse the NWA lphet is not fixed, the ctul symols tht WILD stnds for is fluid. (This is useful if you do not know ll possile symols t the time you construct the NWA.) Note: EPSILON nd WILD re not explicit elements of Σ; see T. 1, footnote 5. 9

10 3.3 Client Informtion Ech stte in the NWA cn e ssocited with some client-specific informtion. To utilize this functionlity, client code must suclss ClientInfo nd override ClientInfo * clone() const. Once done, client informtion cn e ttched or retrieved using the following two functions (oth memers of Nw): ref ptr<clientinfo> Nw::getClientInfo(Stte st) const Returns the client-specific informtion ssocited with the given stte, st. void Nw::setClientInfo(Stte st, ref ptr<clientinfo> ci) Sets the client-specific informtion, ci, ssocited with the given stte, st. Client informtion is trcked through the use of ref ptrs, so the progrmmer must consider the stndrd lising nd lifetime considertions imposed y using smrt pointers. (ClientInfo supplies the count field needed y ref ptr.) Opertions tht clone client informtion, such s copying n NWA, cn lso result in chnges in lising: if the client informtion for sttes p nd q re lised in NWA N tht is then copied to NWA N, the copies of p nd q in N will hve seprte copies of the client informtion. In ddition, you my wish to suclss Nw itself. There re severl virtul helper functions tht re clled during intersection nd determiniztion to compute the client info for the resulting utomton; these cn e overriden to customize the ehvior. (This design is n instnce of the templte method design pttern.) The list of these helper methods is: intersectclientinfointernl intersectclientinfocll intersectclientinforeturn mergeclientinfo mergeclientinfointernl mergeclientinfocll mergeclientinforeturn (The first three re used y intersection nd the remining four y determiniztion.) In ddition, stteintersect nd trnsitionintersect cn further customize the ehvior of intersection, including computtion of client informtion. 6.2 nd 6.6 hve more informtion on intersection nd determiniztion, nd discuss these functions further. 10

11 4 The opennw::query nmespce The opennw::query nmespce provides functions relted to querying oth the structure nd lnguge of n utomton. In this section, we first discuss how to retrieve informtion from n utomton s trnsitions ( 4.1) then move on to other queries out n utomton proper. Following tht, we discuss queries out the lnguge of n utomton ( 4.3). 4.1 Querying informtion out n utomton s trnsitions The opennw::query nmespce provides lrge numer of functions for retrieving informtion out n Nw s trnsitions. The questions nswered y these functions re of the form, e.g., wht re ll symols tht pper on return trnsition where the cll predecessor stte is s1 nd the return site stte is s1? Functions return informtion either out ll trnsitions or out trnsitions of prticulr type (internl, cll, or return). The nmes of these functions hve regulr ptterns sed on the informtion tht is known nd desired, ut the rules for producing the nmes re perhps not s simple s they could e. Becuse of this, Ts. 2 5 in App. C provide quick reference for the functions. The tles do not hve ll the informtion one needs to know in order to cll them, nd the entries use some shorthnds; ut they provide enough informtion for specifics to e looked up in the corresponding heder or Doxygen documenttion. The cption of ech tle gives the heder tht contins the functions listed. In the interest of spce, the types of the function rguments re omitted, ut they re likely to e wht you expect. (The numer of rguments is lso lwys sufficient to uniquely identify the function ecuse ll types re just wli::keys nywy.) Almost ll functions return StteSet, SymolSet, or std::set of pirs of stte nd symol. (Those tht return set of pirs re mrked specilly.) 4.2 Querying other structurl spects of n utomton The NWA lirry two dditionl functions for querying ttriutes of NWAs themselves. These functions re declred in the heder opennw/query/utomton.hpp (nd, of course, re defined in the nmespce opennw::query). ool isdeterministic(nw const & nw) Computes whether the given NWA is deterministic nd returns the result. Wrning: The result is not cched in the NWA, nd s presently-implemented, this function is very inefficient. Improvements to this will come in lter version of the lirry. ool sttesoverlp(nw const &, Nw const & ) Computes whether the two NWAs shre ny sttes nd returns the result. size t numcllsites(nw const & ) size t numentrysites(nw const & ) size t numexitsites(nw const & ) size t numreturnsites(nw const & ) Returns the indicted sttistic of the given utomton. A cll site is stte which is in the cll position of cll trnsition, etc. 4.3 Querying properties of n utomton s lnguge The NWA lirry hs severl functions for querying properties of the lnguge of n NWA (or how the lnguges of two NWAs relte), nd it lso supports the poststr nd prestr queries used in progrm nlysis. 11

12 The following functions perform stndrd lnguge-theoretic queries. They re locted in nmespce opennw::query nd re declred in opennw/query/lnguge.hpp: ool lngugecontins(nw const & nw, NestedWord const & word) Computes whether word is memer of nw s lnguge, nd returns the result. (It simultes the NWA in nondeterministic fshion; the NWA is not determinized to nswer this query.) The worst-cse running time is O( word w log w), where w is the width of the nondeterminism of the NWA (i.e. the mximum numer of simultneous configurtions tht re possile when reding word) 2. ool lngugeisempty(nw const & nw) Computes whether the NWA s lnguge is empty, nd returns the result. ref ptr<nestedword> getsomeacceptedword(nw const & nw) If nw s lnguge is empty, returns NULL. Otherwise returns n ritrry word in L(nw). (If nw ccepts unlnced words, the return vlue my e unlnced.) ref ptr<nestedword> getsomeshortestacceptedword(nw const & nw) If nw s lnguge is empty, returns NULL. Otherwise returns n ritrry shortest word in L(nw). (If nw ccepts unlnced words, the return vlue my e unlnced.) ool lngugesuseteq(nw const & left, Nw const & right) Computes whether L(left) L(right) nd returns the result. Since this procedure must determinize right, the worst-cse running time is O(2 n2 n 2 m) where n is the numer of sttes in right nd m is the numer of sttes in left. ool lngugeequls(nw const &, Nw const & ) Performs lngugesuseteq(, ) && lngugesuseteq(, ). This determinizes oth mchines; the worst-cse running time is O(2 n2 n 2 m + 2 m2 m 2 n). There re lso functions for performing poststr nd prestr query on n NWA. These work y converting the NWA to WPDS (see 7 nd, in prticulr, 7.2) using the function NwToWpdsClls then running the query on the PDS. Conceptully, the trnsition system queried y prestr nd poststr is the spce of configurtions of the NWA. A configurtion is pir q, s, where q is the current stte nd s is the stck of unmtched cll sites. (More exctly, it is the stck of sttes the mchine ws in efore reding symol in cll position tht hs not yet een mtched.) In relity, the trnsition system is the configurtion spce of the WPDS resulting from clling NwToWpdsClls (see 7 nd 7.2.1); tht spce is p, t, where p is WPDS stte nd t is the stck of unmtched cll sites with the NWA s current stte q dded on top. There re two vrints of ech function. One returns the WFA tht results from the query, nd the other stores the WFA in n output prmeter. The WFA type is wli::wf::wfa. The functions tht perform poststr nd prestr re memers of Nw. (This interfce will likely chnge in lter relese, in which cse the equivlent functionlity will e moved into nmespce query nd these versions will e deprected.) They re: WFA Nw::prestr(WFA const & input, WeightGen & wg) void Nw::prestr(WFA const & input, WFA & output, WeightGen & wg) WFA Nw::poststr(WFA const & input, WeightGen & wg) void Nw::poststr(WFA const & input, WFA & output, WeightGen & wg) Computes the prestr or poststr of the configurtions specified y Computes the poststr of the configurtions specified y input, using the weights generted y wg. Either returns the result or stores it in the prmeter output. 2 We count stte lookups, dditions, etc. s constnt time even though they re ctully logrithmic, nd occsionlly liner. 12

13 5 NWA seriliztion It is possile to serilize n NWA in two different supported formts. The first is description lnguge of our own cretion, descried in most of the rest of this section. We lso provide prser for this formt, with minor cvets. The grmmr tht the prser uses is defined formlly in 5.3. The second is output in Grphviz Dot formt. The output is nturl. The color of n edge denotes the type of trnsition: lck edges re internl trnsitions, green edges re cll trnsitions, nd red edges re sets of return trnsitions with the sme exit site, return site, nd symol. Return trnsitions re leled with the symol for its trnsitions nd the list of cll predecessors. Becuse it is possile to wind up with extremely long key nmes (especilly in determinized NWA) tht cn essentilly destroy the ility of Dot to render useful output, y defult ny stte nmes longer thn 20 chrcters re replced y the numericl vlue of the key. A cll to print dot cn specify different ehvior, however. All figures lter in the document hve een creted vi the Dot output (with miniml hnd editing in some cses). There is memer function to produce ech type of output: std::ostrem& Nw::print(std::ostrem & strem) const Writes the custom output descried in this section to strem, returning strem. (This method is inherited from the wli::printle strct clss.) std::ostrem& Nw::print dot(ostrem & os, std::string const & n, ool rev = true) const Outputs Dot description of the NWA to the strem os, returning os. The nme of the grph (digrph "n" {...}) is given y n. Finlly, if rev is set to flse, the forementioned revition step ove for stte nmes is not done. Use this freedom t your peril. 5.1 Prser The NWA lirry contins semi-experimentl prser for descriptions of NWAs. The formt is descried elow, nd mtches the output of Nw::print(std::ostrem &), provided tht the result of clling key2str on the stte nd symol keys produces strings tht mtch the description of ctul-nmes in this grmmr. Note: This is experimentl code; in prticulr, the error-hndling in it is siclly non-existent. We detect when the input does not mtch the grmmr, ut we sometimes signl such input filures with ssertion violtions. In future version of the lirry, we will report errors more grcefully, lmost certinly through exceptions. There re two functions tht prse NWA descriptions, oth declred in opennw/nwprser.hpp in nmespce opennw: NwRefPtr red nw(std::istrem & is, std::string * nme = NULL) Reds description of single NWA from is nd returns it. The seriliztion formt llows n optionl nme for n NWA; if one is specified nd nme is non-null, then the nme is stored in the string pointed to y nme. std::mp<std::string, NwRefPtr> red nw proc set(std::istrem &) Reds the entire input strem, returning mp from the nme of ech utomton description to the prsed NWA. (This form ws originlly creted to red NWAs for n entire progrm formtted with ech procedure in seprte NWA. The nme of ech NWA ws the nme of the procedure in the originl progrm.) The return type is typedefed to the nme ProcedureMp, defined in the sme heder nd nmespce. 13

14 Q0: Strt Qf: Finish Delt_i: (Strt,, Cll) Delt_i: (Entry,, Stte) Delt_i: (Stte,, Exit) Delt_i: (Return,, Finish) Delt_c: (Cll, cll, Entry) Delt_r: (Exit, Cll, ret, Return) Figure 1: Serilized form of the NWA depicted in Fig Exmples Figs. 1 nd 2 show exmples of the NWA seriliztion formt, illustrting oth the generl form nd some of the flexiilities in wht precise formt is ccepted. 5.3 NWA description formt This section descries the grmmr of the file formt. Note: Some chrcteristics in the description elow hve the following phrse s prt of their description: you should not rely on this. In such cses, we reserve the right to chnge this ehvior in future versions. The grmmr for n NWA is s follows. In order to ccept couple of slightly different output formts we hve used in the pst, there re some choices in whether rces re present nd other spects. A single input strem cn contin either single NWA (just series of locks) or multiple NWAs. If you would like to descrie multiple NWAs, ech individul strts with the literl nw. nw-description ::= ( nw nme? :?)? {? lock + }? Brces re required if nw is present nd the nme is sent in order to distinguish the first lock heder (Q:, Q0:, or Qf:) from the nme of the NWA. An NWA is sequence of locks; ech lock specifies one or more sttes, symols, or trnsitions. lock ::= stte-lock sigm-lock delt-lock There cn e more thn one lock of given type; e.g. it is perfectly fine to specify ll trnsitions in one lock (s in Fig. 2), one lock per trnsition (s in Fig. 1), or ny mixture. The lock heder specifies wht kind of lock it is, nd the ody is comm-seprted list of whtever entity the lock heder specifies (e.g., sttes). The ody my e surrounded y curly rces, ut they re not required unless the ody is empty. Stte locks cn specify sttes, initil sttes, or ccepting sttes; these re denoted y the lock heders Q:, Q0:, nd Qf:, respectively. 14

15 Q: { Begin (=2), End (=3), Entry (=4), Exit (=6), ( Begin, prime ) (=10), ( End, prime ) (=11), ( Entry, prime ) (=12), ( Exit, prime ) (=13)} Q0: { Begin (=2), ( Begin, prime ) (=10)} Qf: { End (=3), ( Begin, prime ) (=10)} Sigm: { cll,, ret} Delt_c: { (Begin (=2), cll, Entry (=4) ), (( Begin, prime ) (=10), cll, Entry (=4) ) } Delt_i: { (Entry (=4),, Exit (=6) ), (( Entry, prime ) (=12),, ( Exit, prime ) (=13) ) } Delt_r: { (Exit (=6), Begin (=2), ret, End (=3) ), (Exit (=6), Begin (=2), ret, ( Begin, prime ) (=10) ), (Exit (=6), ( Begin, prime ) (=10), ret, ( Begin, prime ) (=10) ), (Exit (=6), ( Begin, prime ) (=10), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), Begin (=2), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), Begin (=2), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), End (=3), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), End (=3), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), Entry (=4), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), Entry (=4), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), Exit (=6), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), Exit (=6), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), ( Begin, prime ) (=10), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), ( Begin, prime ) (=10), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), ( End, prime ) (=11), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), ( End, prime ) (=11), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), ( Entry, prime ) (=12), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), ( Entry, prime ) (=12), ret, ( End, prime ) (=11) ), (( Exit, prime ) (=13), ( Exit, prime ) (=13), ret, ( Begin, prime ) (=10) ), (( Exit, prime ) (=13), ( Exit, prime ) (=13), ret, ( End, prime ) (=11) ) } Figure 2: Serilized form of the NWA depicted in Fig

16 stte-lock ::= Q: nme-list Q0: nme-list Qf: nme-list (It is not necessry to explicitly list ll sttes; if stte is listed in trnsition, it is implicitly dded to the mchine s needed. Thus, it is quite resonle to hve mchine description with no Q: locks.) A nme-list is simply comm-seprte list of nmes of sttes. nme-list ::= {? ( nme (, nme ) )? }? Recll tht if the list is empty, it must e followed y } (or EOF) to distinguish the next lock heder from the nme of stte. The grmmr for nme will e specified elow. Symol locks re just like nme locks, except tht they specify symol nmes. The lock heder is simply sigm. sigm-lock ::= sigm: nme-list A trnsition lock cn list internl, cll, or return trnsitions, denoted y the lock heders Delt i, Delt c, nd Delt r, respectively. delt-lock ::= Delt i: triple-list Delt c: triple-list Delt r: qud-list The odies in ech cse re simply comm-seprted list of triples or quds (like nme-list, if these re empty, they must e followed y } ): triple-list ::= {? ( triple (, triple ) )? }? qud-list ::= {? ( qud (, qud ) )? }? nd ech triple (resp., qud) is 3-tuple (4-tuple) of nmes: triple ::= ( nme, nme, nme ) qud ::= ( nme, nme, nme, nme ) All tht remins is to define the grmmr for nme. Becuse the formt grew orgniclly nd we wished to keep comptiility with existing log files (which could previously not e utomticlly prsed), the nme terminl is it convoluted. The nme consists of the ctul nme of the stte or symol in question, followed y n optionl prenthesis-delimited token tht is ignored. (In the output of print(), the nme is the result of key2str nd the optionl token is the ctul numeric vlue of the key.) Before we discuss the grmmr of nme, the following shows some exmples of strings tht re nmes: 123 xyz 2y 123 (=4) xyz (=42) 2y (=o3oth) (,)(x,y) (, ) (=32) <1, 2> nd strings which re not nmes: hello world hello world (=32) (unmtched 16

17 The ctul grmmr is: nme ::= ctul-nme dummy-token? dummy-token ::= ( ( { ( }) ) The ctul-nme portion generlly ehves like stndrd progrmming-lnguge identifier, ut with one importnt difference: it cn contin lnced prentheses, nd ny chrcters re llowed within set of prentheses. (We ctully llow lrger set of chrcters thn is typicl, ut you should not rely on this. Surround your nmes with prentheses if you use specil chrcters.) We cn now define the grmmr of ctul-nme. ctul-nme ::= unit + unit ::= norml-chr pren-group norml-chr ::= chrcter other thn whitespce,,, or pren (std::isspce is used to determine whether chrcter is whitespce.) left-pren ::= ( { [ < right-pren ::= ) } ] > non-pren ::= chrcter other thn ny of the ove pren-group ::= left-pren ( non-pren pren-group ) right-pren (Finlly, we do not currently demnd tht the types of prens mtch up: e.g., (1, 2, 3] is vlid nme. You should not rely on this feture.) 17

18 6 Building NWAs from other NWAs (nmespce opennw::construct) The lirry provides functions for performing utomt-theoretic opertions upon NWAs. The supported opertions re union, intersection, conctention, reversl, Kleene str, complement, nd determiniztion. The lirry supports two interfces to ech of these opertions. In one, the opertion lloctes n NWA with new, performs the construction, nd returns NwRefPtr to the result. In the other, the opertion tkes reference to n NWA, clers it, nd constructs the result in-plce. The first form is usully more convenient to use, nd cretes mini lnguge for set expressions; the second form mkes it possile to store the result of n opertion in suclss of Nw, or n existing Nw oject. Ech of these functions is in the nmespce opennw::construct: NwRefPtr unionnw(nw const &, Nw const & ) void unionnw(nw & out, Nw const &, Nw const & ) Computes the union of the NWAs nd, either returning the result or storing it in out. See Section 6.1. (This function is clled unionnw insted of union ecuse the ltter is C++ keyword.) The two NWAs must not hve ny sttes in common, nd the output will e nondeterministic even if oth input NWAs re deterministic. The running time 3 is O( Q + Q + δ + δ ). NwRefPtr intersect(nw const &, Nw const & ) void intersect(nw & out, Nw const &, Nw const & ) Computes the intersection of the NWAs nd, either returning the result or storing it in out. See Section 6.2. If oth input NWAs re deterministic, the output will e deterministic. The worst-cse running time is O( Q Q d d ), where d is the mximum out-degree of stte in nd d is the mximum out-degree of stte in. NwRefPtr conct(nw const & left, Nw const & right) void conct(nw & out, Nw const & left, Nw const & right) Computes the conctention of the NWAs left nd right, either returning the result or storing it in out. See Section 6.3. The output utomton will e nondeterministic. The running time is O( Q + Q + δ + δ + Q δ r ). NwRefPtr str(nw const & orig) void str(nw & out, Nw const & orig) Computes the Kleene str of the NWA orig, either returning the result or storing it in out. See Section 6.4. The output utomton will e nondeterministic. The running time is O( Q + δ + Q 0 Q f δ c ). NwRefPtr reverse(nw const & orig) void reverse(nw & out, Nw const & orig) Computes the NWA tht ccepts the reverse of ech nested word ccepted y the NWA orig, either returning the result or storing it in out. See Section 6.5. The running time is O( Q + δ ). NwRefPtr determinize(nw const & orig) NwRefPtr determinize(nw & out, Nw const & orig) Computes determiniztion of orig, either returning the result or storing it in out. See Section 6.6. The worst-cse running time for this lgorithm is t lest O( Q 3 2 Q 2 ). 3 We count stte lookups, dditions, etc. s constnt time even though they re ctully logrithmic, nd occsionlly liner. 18

19 Begin Strt Strt Stte2 Cll cll Begin Cll cll Stte3 Entry Stte2 Entry Stte4 Stte Stte3 Stte End Exit Exit Stte4 ret: Cll, ret: Cll, Return Return End Finish Figure 3: NWA. Finish An exmple Figure 4: A second exmple NWA. Figure 5: The NWA resulting from the union of the NWA in Figure 3 nd the NWA in Figure 4. Note tht there re two initil sttes. NwRefPtr complement(nw const & orig) void complement(nw & out, Nw const & orig) Computes the complement of the determiniztion of orig, either returning the result or storing it in out. See Section 6.7. complement() determinizes orig, so its worst-cse running time is lso t lest O( Q 3 2 Q 2 ). As mentioned ove, these functions crete smll lnguge of set expressions. For instnce, to compute n utomton M whose lnguge is A (B C) (where A, B, nd C re NwRefPtrs), one cn write NwRefPtr M = unionnw(*a, *str(*intersect(*b, *C))) 6.1 Union The union of two NWAs is constructed y tking the union of ech of the components of the NWAs. (In prticulr, it does not do cross-product construction, nd will lwys produce nondeterministic utomton s result s long s oth mchines hve t lest one initil stte.) Formlly, the union of NWAs N = (Q N, Σ N, Q 0,N, δ N, F N ) nd M = (Q M, Σ M, Q 0,M, δ M, F M ) is N M = (Q N Q M, Σ N Σ M, Q 0,N Q 0,M, δ N δ M, F N F M ). As n exmple, Fig. 5 illustrtes the union of Fig. 3 nd Fig

20 (key#6,9) Strt (key#8,8) Cll cll (key#4,4) cll Entry (key#2,2) Exit ret: Cll, Return Finish Figure 6: Simple NWA to intersect with the NWA in Figure 3. Figure 7: The NWA resulting from the intersection of the NWA in Figure 3 nd the NWA in Figure 6. Note tht ecuse ech of the input NWAs only ccepts single word nd those words re different, the lnguge of the intersection is empty. The NWA uilt up y intersect() got s fr s it could. (key#2,3) is the pir of sttes (strt,strt) from the two input utomt. The stte sets of the NWAs must not overlp, i.e., Q 1 Q 2 =. Client code should not rely on this condition eing checked or ny prticulr ehvior occurring if it does not hold. Client informtion is copied directly from the originl NWAs using ClientInfo::clone(). 6.2 Intersection The intersection of two NWAs is computed in the stndrd cross-product fshion, using worklist lgorithm to only compute those sttes tht re rechle. The lgorithm trverses the originl NWAs strting t the initil sttes nd incrementlly dds trnsitions for ech pir of intersectle trnsitions tht re encountered. By defult, trnsitions re intersectle when the trnsitions re the sme kind (internl, cll, or return) nd the symols on the edge re identicl or one is wild. For exmple, Fig. 6 shows the intersection of Fig. 3 nd Fig. 6, nd Fig. 9 shows the intersection of the two NWAs in Fig. 8. It is possile to customize wht symols re considered equivlent, or otherwise impose constrints on wht trnsitions cn e intersected, y overriding the trnsitionintersect function in suclss of Nw. In ddition, it is possile to impose dditionl constrints on wht sttes cn e comined y overriding stteintersect. Both functions lso produce the result of the intersection: trnsitionintersect produces the symol tht will e used on the resulting edge, nd stteintersect produces the stte tht will e used s the trget. 20

21 Finish Strt Cll 1 cll Entry 1 Return ret: Cll, Cll 2 Exit Stte 1 cll Entry 2 Stte Exit 1 Entry cll Stte 2 ret: Cll 1, Cll Exit 2ret: Cll 2, Return 2 Finish Return 1 Strt Ded End Figure 8: Two complex NWAs to intersect. (key#4,18) (key#2,2) (key#11,19) cll (key#13,21) (key#12,20) (key#14,22) ret: (key#11,19), (key#15,23) (key#3,3) Figure 9: The NWA resulting from the intersection of the NWAs in Figure 8. 21

22 The defult ehvior of trnsitionintersect is tht two trnsitions re intersectle if neither symol is epsilon nd either the symols re the sme or t lest one of the symols is wild. (Epsilon trnsitions re delt with in intersect() itself. If you override trnsitionintersect, it should return flse if either input is epsilon.) The defult ehvior of stteintersect is tht ny two sttes cn e comined, the resulting stte is leled with key tht is uniquely generted from the pir of the keys of the two sttes under considertion, nd the client informtion ssocited with the resulting stte is null. Client informtion is initilly generted y the helper method stteintersect, ut cn e ltered through the use of the helper methods intersectclientinfointernl, intersectclientinfocll, nd intersectclientinforeturn, which re invoked y intersect() s trnsitions of the three different kinds involving the ssocited stte re dded. The defult ehvior of these three functions is to perform no chnges to the ClientInfo. These methods cn e overridden in suclsses to specify lterntive ehviors. The following opertions re virtul functions in clss Nw nd re intended to e overridden to customize ehvior: ool stteintersect(nw const & first, Stte stte1, Nw const & second, Stte stte2, Stte & resst, ref ptr<clientinfo> & resci) Determines whether the given sttes cn e comined nd, if so, cretes the comined stte. If the two sttes re incomptle, returns flse. If the two sttes re comptle, it computes the key of the comined stte (storing it in resst nd the client informtion (storing it in resci), then returns true. ool trnsitionintersect(nw const & first, Symol sym1, Nw const & second, Symol sym2, Symol & ressym ) Determines whether the given symols re considered to e equivlent for the purposes of intersection. If so, it computes the symol tht should e ssocited with the comined trnsition (storing it in ressym) nd returns true. If not, returns flse. void intersectclientinfointernl(nw const & first, Stte src1, Stte tgt1, Nw const & second, Stte src2, Stte tgt2, Symol ressym, Stte resst ) void intersectclientinfocll(nw const & first,stte cll1, Stte entry1, Nw const & second, Stte cll2, Stte entry2, Symol ressym, Stte resst ) void intersectclientinforeturn(nw const & first, Stte exit1, Stte cll1,stte ret1, Nw const & second, Stte exit2,stte cll2, Stte ret2, Symol ressym, Stte resst ) Clled fter trnsition of the corresponding type is dded to the utomton. It is intended to e used to lter the client informtion ssocited with resst given the endpoints of the new trnsition. 6.3 Conctention The conctention of two NWAs is constructed y tking the union of ll sttes nd trnsitions of the two utomt, nd dding internl epsilon trnsitions from ech finl stte of the first NWA to ech initil stte of the second NWA. In the resulting NWA, the initil sttes re the initil sttes from the first NWA, nd the finl sttes re the finl sttes of the second NWA. 22

23 However, the conctention construction is it more complicted thn just this, ecuse we need to ddress the issue of n unlnced-left word eing conctented with n unlnced-right word. (Recll the definition of wht hppens when n NWA reds pending return: it is llowed to tke return trnsition where the cll predecessor is n initil stte of the utomton.) To del properly with the cse where the first opernd genertes strings with pending clls nd the second opernd genertes strings with pending returns, the trnsitions from the second utomton re ugmented in the following mnner. When computing the conctention of left nd right, for ech trnsition (q, p 0,, q ) δr right where p 0 Q right 0 (these re trnsitions tht right cn tke when reding pending return) nd ech p Q left, we dd (q, p,, q ) to δ r of the result. (It is ctully only necessry to dd such trnsition for sttes p tht either pper in the cll position of cll trnsition in left or re in Q left 0.) The stte sets of the NWAs must not overlp, i.e., Q 1 Q 2 =. Client code should not rely on this condition eing checked. Client informtion is copied directly from the originl NWAs using ClientInfo::clone(). Fig. 12 shows the result of conctenting Fig. 3 nd Fig Kleene str Like conctention, Kleene-str is complicted in the cse of NWAs ecuse of the ility to hve unlnced words in the utomton s lnguge. Reltive to stndrd FAs, in the conctention construction we only needed to dd extr trnsitions; in this construction, we must dd dditionl sttes s well. The NWA resulting from performing Kleene-Str on the NWA shown in Fig. 13 is shown in Fig. 14. The construction presented in [3] hs minor error. The error is nlogous to not dding distinguished strt stte in the trditionl Thompson construction, 4 nd in fct cn e exhiited using the sme exmple (it is not necessry to use NWA clls or returns). Alur confirmed tht our interprettion of the construction in [3] is correct [1]. Below, we present version tht uses ε-trnsitions, nd thus it looks it different from the version in [3]. When computing R = A for some NWA A, the resulting NWA hs two copies of A. These re denoted y primed nd unprimed version of sttes from A in the definition elow. Suppose tht R is reding word w = w 1 w 2 w n, where ech w i L(A). R egins in strt stte of A. Henceforth it mintins the following invrint on the stte tht R is in with respect to the portion of w red so fr: if the next symol σ is in return position, then tht symol is pending return in the current w i iff R is in the A portion, i.e., if the current stte is primed. (Note tht this return only needs to e pending in the current w i. In the full string w, σ my mtch cll in n erlier w j, or it my e pending in the whole string.) Internl trnsitions thus keep R in the sme copy of A, nd cll trnsitions lwys tke R to the unprimed copy of A (ecuse if it then reds return, the return will mtch tht cll). Return trnsitions cn trget either copy of A: if the cll predecessor is unprimed, then the trget will e unprimed; if the cll predecessor is primed, then the trget will e primed. The description ove descries R s opertion under norml conditions. If R is in finl stte (either primed or unprimed) of the utomton A, it is lso llowed to guess 4 The initil stte of the utomton A must ccept, ecuse ε is in L ; nd ecuse of this property it is incorrect to merely dd epsilon trnsitions from the old finl sttes to the old initil sttes. If you do this nd there is cycle from the initil stte ck to itself (for exmple, self-loop on the initil stte), the word corresponding to tht pth would e ccepted even though it should not e. 23

24 Begin Stte2 End Strt Figure 10: Simple NWA to conctente onto the NWA in Figure 3. Cll cll Entry Finish Return Stte ret Exit ret: Cll, Exit Stte Return Finish Entry cll: Return, * Begin Cll Stte2 Strt Figure 11: The NWA resulting from performing reverse on the NWA in Figure 3. End Figure 12: The NWA resulting from the conctention of the NWA in Figure 3 with the NWA in Figure

25 Begin cll Entry Exit ret: Begin, End Figure 13: An NWA on which to perform Kleene-Str. Begin End cll Entry ret: Begin, Exit ret: ( Begin, prime ), cll ret: Begin, ( Begin, prime ) ( End, prime ) ( Begin, prime ) ret: (ny) ret: (ny) ( Exit, prime ) ( Entry, prime ) Figure 14: The NWA resulting from performing Kleene-Str on the NWA in Figure

26 tht it should restrt y tking n epsilon trnsition to distinguished strt nd finl stte q 0. This guess is correct if it just red the lst chrcter in w i (mking the next chrcter the first one in w i+1 ). Note tht q 0 only hs trnsitions to the A portion of R, mintining the invrint. The reson for the two copies of A comes into ply when R reds return σ while in the A portion. By the invrint, σ is pending in the current w i. In the originl utomton A, the trnsitions tht the mchine cn use re return trnsitions (q, r, σ, p) where the cll predecessor r is in Q 0. We need to mke sure tht R cn tke those sme trnsitions. There re two cses we need to consider: 1. For the cses where σ is pending in the whole string w, we need to hve version of the return trnsition with q 0 in the cll-predecessor position, so we dd (q, q 0, σ, p ). 2. For the cses where σ is mtched with cll in some erlier w j, it does not mtter wht stte the mchine ws in efore tht cll; thus we dd (q, s, σ, p ) for ech stte s in Q Q. Norml opertion corresponds to the trnsitions introduced y the Internl, Cll, nd Loclly-Mtched-Return inference rules given elow. The source sttes of oth trnsitions dded y Loclly-Mtched-Return re unprimed, ecuse if the current symol is return tht mtches cll in the sme w i, R will e in the A portion y the invrint. The Restrt rule llows R to restrt, nd the Strt rule llows R to get from q 0 to the A portion; these trnsitions trget the A portion ecuse there hve een no clls red in the current w i, nd thus return symol would e pending. The Glolly-Pending-Return rule ddresses the sitution where the current symol is return tht is pending in the whole string w. (This is the first cse in the previous prgrph.) The Loclly-Pending-Return rule ddresses the sitution where the current symol is return tht is pending in the current w i ut mtches cll from previous w j. Formlly, if the originl NWA is (Q, Σ, Q 0, δ, Q f ), then the result of performing Kleene-Str on tht NWA is (Q, Σ, Q 0, δ, Q f ). The sets of sttes re defined y Q = Q Q {q 0 } (with Q = {q q Q} nd q 0 Q), nd Q 0 = Q f = {q 0}. The trnsitions in δ re defined y the following rules: (q, σ, p) δ i Internl (q, σ, p) δi (q, σ, p ) δi (q, σ, p) δ c (q, σ, p) δc (q, σ, p) δc Cll (q, r, σ, p) δ r Loclly-Mtched-Return (q, r, σ, p) δr (q, r, σ, p ) δr q Q 0 Strt (q 0, ε, q ) δi q Q f (q, ε, q 0 ) δi (q, ε, q 0 ) δi Retrt Glolly-Pending-Return (q, r, σ, p) δ r r Q 0 (q, q 0, σ, p ) δ r Loclly-Pending-Return (q, r, σ, p) δ r r Q 0 s Q Q (q, s, σ, p ) δ r Client informtion is copied directly from the originl NWA (using ClientInfo::clone()) such tht for ech q Q, q nd q hve (different copies of) the sme client informtion. Note: The key for stte q is generted from the key for stte q using the expression getkey(q, getkey("prime")). The input utomton to the Kleene str function must not lredy contin oth q nd q for ny q. 26

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

Lexical analysis, scanners. Construction of a scanner

Lexical analysis, scanners. Construction of a scanner Lexicl nlysis scnners (NB. Pges 4-5 re for those who need to refresh their knowledge of DFAs nd NFAs. These re not presented during the lectures) Construction of scnner Tools: stte utomt nd trnsition digrms.

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the LR() nlysis Drwcks of LR(). Look-hed symols s eplined efore, concerning LR(), it is possile to consult the net set to determine, in the reduction sttes, for which symols it would e possile to perform reductions.

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 3: Lexer genertors Viktor Leijon Slides lrgely y John Nordlnder with mteril generously provided y Mrk P. Jones. 1 Recp: Hndwritten Lexers: Don t require sophisticted

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

PPS: User Manual. Krishnendu Chatterjee, Martin Chmelik, Raghav Gupta, and Ayush Kanodia

PPS: User Manual. Krishnendu Chatterjee, Martin Chmelik, Raghav Gupta, and Ayush Kanodia PPS: User Mnul Krishnendu Chtterjee, Mrtin Chmelik, Rghv Gupt, nd Ayush Knodi IST Austri (Institute of Science nd Technology Austri), Klosterneuurg, Austri In this section we descrie the tool fetures,

More information

CMPSC 470: Compiler Construction

CMPSC 470: Compiler Construction CMPSC 47: Compiler Construction Plese complete the following: Midterm (Type A) Nme Instruction: Mke sure you hve ll pges including this cover nd lnk pge t the end. Answer ech question in the spce provided.

More information

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string. CS 340, Fll 2014 Dec 11 th /13 th Finl Exm Nme: Note: in ll questions, the specil symol ɛ (epsilon) is used to indicte the empty string. Question 1. [5 points] Consider the following regulr expression;

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è

More information

Scanner Termination. Multi Character Lookahead

Scanner Termination. Multi Character Lookahead If d.doublevlue() represents vlid integer, (int) d.doublevlue() will crete the pproprite integer vlue. If string representtion of n integer begins with ~ we cn strip the ~, convert to double nd then negte

More information

COS 333: Advanced Programming Techniques

COS 333: Advanced Programming Techniques COS 333: Advnced Progrmming Techniques Brin Kernighn wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Junwen Li, li@cs, CS 217,258-0451 Yong Wng,yongwng@cs, CS

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

CSE 401 Midterm Exam 11/5/10 Sample Solution

CSE 401 Midterm Exam 11/5/10 Sample Solution Question 1. egulr expressions (20 points) In the Ad Progrmming lnguge n integer constnt contins one or more digits, but it my lso contin embedded underscores. Any underscores must be preceded nd followed

More information

Regular Expressions and Automata using Miranda

Regular Expressions and Automata using Miranda Regulr Expressions nd Automt using Mirnd Simon Thompson Computing Lortory Univerisity of Kent t Cnterury My 1995 Contents 1 Introduction ::::::::::::::::::::::::::::::::: 1 2 Regulr Expressions :::::::::::::::::::::::::::::

More information

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program: Lexicl Anlysis Red source progrm nd produce list of tokens ( liner nlysis) source progrm The lexicl structure is specified using regulr expressions Other secondry tsks: (1) get rid of white spces (e.g.,

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grmmrs Descriing Lnguges We've seen two models for the regulr lnguges: Finite utomt ccept precisely the strings in the lnguge. Regulr expressions descrie precisely the strings in the lnguge.

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015 Finite Automt Lecture 4 Sections 3.6-3.7 Ro T. Koether Hmpden-Sydney College Wed, Jn 21, 2015 Ro T. Koether (Hmpden-Sydney College) Finite Automt Wed, Jn 21, 2015 1 / 23 1 Nondeterministic Finite Automt

More information

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night Should e done L hours nd Office hours Sign up for the miling list t, strting to send importnt info to list http://groups.google.com/group/cs453-spring-2011 Red Ch 1 nd skim Ch 2 through 2.6, red 3.3 nd

More information

Agenda & Reading. Class Exercise. COMPSCI 105 SS 2012 Principles of Computer Science. Arrays

Agenda & Reading. Class Exercise. COMPSCI 105 SS 2012 Principles of Computer Science. Arrays COMPSCI 5 SS Principles of Computer Science Arrys & Multidimensionl Arrys Agend & Reding Agend Arrys Creting & Using Primitive & Reference Types Assignments & Equlity Pss y Vlue & Pss y Reference Copying

More information

box Boxes and Arrows 3 true 7.59 'X' An object is drawn as a box that contains its data members, for example:

box Boxes and Arrows 3 true 7.59 'X' An object is drawn as a box that contains its data members, for example: Boxes nd Arrows There re two kinds of vriles in Jv: those tht store primitive vlues nd those tht store references. Primitive vlues re vlues of type long, int, short, chr, yte, oolen, doule, nd flot. References

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

TO REGULAR EXPRESSIONS

TO REGULAR EXPRESSIONS Suject :- Computer Science Course Nme :- Theory Of Computtion DA TO REGULAR EXPRESSIONS Report Sumitted y:- Ajy Singh Meen 07000505 jysmeen@cse.iit.c.in BASIC DEINITIONS DA:- A finite stte mchine where

More information

12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number>

12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number> DDD16 Compilers nd Interpreters DDB44 Compiler Construction R Prsing Prt 1 R prsing concept Using prser genertor Prse ree Genertion Wht is R-prsing? eft-to-right scnning R Rigthmost derivtion in reverse

More information

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or Opertor Precedence Most progrmming lnguges hve opertor precedence rules tht stte the order in which opertors re pplied (in the sence of explicit prentheses). Thus in C nd Jv nd CSX, +*c mens compute *c,

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

COS 333: Advanced Programming Techniques

COS 333: Advanced Programming Techniques COS 333: Advnced Progrmming Techniques How to find me wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Mtvey Arye (rye), Tom Jlin (tjlin), Nick Johnson (npjohnso)

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

Lists in Lisp and Scheme

Lists in Lisp and Scheme Lists in Lisp nd Scheme Lists in Lisp nd Scheme Lists re Lisp s fundmentl dt structures, ut there re others Arrys, chrcters, strings, etc. Common Lisp hs moved on from eing merely LISt Processor However,

More information

Java CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications

Java CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications Jv CUP Jv CUP is prser-genertion tool, similr to Ycc. CUP uilds Jv prser for LALR(1) grmmrs from production rules nd ssocited Jv code frgments. When prticulr production is recognized, its ssocited code

More information

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex Long Quiz2 45mins Nme: Personl Numer: Prolem. (20pts) Here is n Tle of Perl Regulr Ex Chrcter Description. single chrcter \s whitespce chrcter (spce, t, newline) \S non-whitespce chrcter \d digit (0-9)

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Assignment 4. Due 09/18/17

Assignment 4. Due 09/18/17 Assignment 4. ue 09/18/17 1. ). Write regulr expressions tht define the strings recognized by the following finite utomt: b d b b b c c b) Write FA tht recognizes the tokens defined by the following regulr

More information

Lexical Analysis and Lexical Analyzer Generators

Lexical Analysis and Lexical Analyzer Generators 1 Lexicl Anlysis nd Lexicl Anlyzer Genertors Chpter 3 COP5621 Compiler Construction Copyright Roert vn Engelen, Florid Stte University, 2007-2009 2 The Reson Why Lexicl Anlysis is Seprte Phse Simplifies

More information

Typing with Weird Keyboards Notes

Typing with Weird Keyboards Notes Typing with Weird Keyords Notes Ykov Berchenko-Kogn August 25, 2012 Astrct Consider lnguge with n lphet consisting of just four letters,,,, nd. There is spelling rule tht sys tht whenever you see n next

More information

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009 Deprtment of Computer cience Columbi University mple Midterm olutions COM W4115 Progrmming Lnguges nd Trnsltors Mondy, October 12, 2009 Closed book, no ids. ch question is worth 20 points. Question 5(c)

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Compilers Spring 2013 PRACTICE Midterm Exam

Compilers Spring 2013 PRACTICE Midterm Exam Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs

More information

Lab 1 - Counter. Create a project. Add files to the project. Compile design files. Run simulation. Debug results

Lab 1 - Counter. Create a project. Add files to the project. Compile design files. Run simulation. Debug results 1 L 1 - Counter A project is collection mechnism for n HDL design under specifiction or test. Projects in ModelSim ese interction nd re useful for orgnizing files nd specifying simultion settings. The

More information

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID: Fll term 2012 KAIST EE209 Progrmming Structures for EE Mid-term exm Thursdy Oct 25, 2012 Student's nme: Student ID: The exm is closed book nd notes. Red the questions crefully nd focus your nswers on wht

More information

stack of states and grammar symbols Stack-Bottom marker C. Kessler, IDA, Linköpings universitet. 1. <list> -> <list>, <element> 2.

stack of states and grammar symbols Stack-Bottom marker C. Kessler, IDA, Linköpings universitet. 1. <list> -> <list>, <element> 2. TDDB9 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing Updted/New slide mteril 007: Pushdown Automton for LR-Prsing Finite-stte pushdown utomton contins lterntingly sttes nd symols in NUΣ

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Progrmming Lnguges h"p://www.di.unipi.it/~ndre/did2c/plp- 14/ Prof. Andre Corrdini Deprtment of Computer Science, Pis Lesson 5! Gener;on of Lexicl Anlyzers Creting Lexicl Anlyzer with Lex

More information

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev Fll 2016-2017 Compiler Principles Lecture 1: Lexicl Anlysis Romn Mnevich Ben-Gurion University of the Negev Agend Understnd role of lexicl nlysis in compiler Regulr lnguges reminder Lexicl nlysis lgorithms

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Theory of Computation CSE 105

Theory of Computation CSE 105 $ $ $ Theory of Computtion CSE 105 Regulr Lnguges Study Guide nd Homework I Homework I: Solutions to the following problems should be turned in clss on July 1, 1999. Instructions: Write your nswers clerly

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

Discussion 1 Recap. COP4600 Discussion 2 OS concepts, System call, and Assignment 1. Questions. Questions. Outline. Outline 10/24/2010

Discussion 1 Recap. COP4600 Discussion 2 OS concepts, System call, and Assignment 1. Questions. Questions. Outline. Outline 10/24/2010 COP4600 Discussion 2 OS concepts, System cll, nd Assignment 1 TA: Hufeng Jin hj0@cise.ufl.edu Discussion 1 Recp Introduction to C C Bsic Types (chr, int, long, flot, doule, ) C Preprocessors (#include,

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

Symbol Table management

Symbol Table management TDDD Compilers nd interpreters TDDB44 Compiler Construction Symol Tles Symol Tles in the Compiler Symol Tle mngement source progrm Leicl nlysis Syntctic nlysis Semntic nlysis nd Intermedite code gen Code

More information

Reference types and their characteristics Class Definition Constructors and Object Creation Special objects: Strings and Arrays

Reference types and their characteristics Class Definition Constructors and Object Creation Special objects: Strings and Arrays Objects nd Clsses Reference types nd their chrcteristics Clss Definition Constructors nd Object Cretion Specil objects: Strings nd Arrys OOAD 1999/2000 Cludi Niederée, Jochim W. Schmidt Softwre Systems

More information

Compilation

Compilation Compiltion 0368-3133 Lecture 2: Lexicl Anlysis Nom Rinetzky 1 2 Lexicl Anlysis Modern Compiler Design: Chpter 2.1 3 Conceptul Structure of Compiler Compiler Source text txt Frontend Semntic Representtion

More information

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example:

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example: cisc1110 fll 2010 lecture VI.2 cll y vlue function prmeters more on functions more on cll y vlue nd cll y reference pssing strings to functions returning strings from functions vrile scope glol vriles

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

Virtual Machine (Part I)

Virtual Machine (Part I) Hrvrd University CS Fll 2, Shimon Schocken Virtul Mchine (Prt I) Elements of Computing Systems Virtul Mchine I (Ch. 7) Motivtion clss clss Min Min sttic sttic x; x; function function void void min() min()

More information

Notes for Graph Theory

Notes for Graph Theory Notes for Grph Theory These re notes I wrote up for my grph theory clss in 06. They contin most of the topics typiclly found in grph theory course. There re proofs of lot of the results, ut not of everything.

More information

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing Undergrdute Compilers Review nd Intro to MJC Announcements Miling list is in full swing Tody Some thoughts on grd school Finish prsing Semntic nlysis Visitor pttern for bstrct syntx trees Some Thoughts

More information

Lecture T4: Pattern Matching

Lecture T4: Pattern Matching Introduction to Theoreticl CS Lecture T4: Pttern Mtching Two fundmentl questions. Wht cn computer do? How fst cn it do it? Generl pproch. Don t tlk bout specific mchines or problems. Consider miniml bstrct

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1 Deterministic Finite Automt And Regulr Lnguges Fll 2018 Costs Busch - RPI 1 Deterministic Finite Automton (DFA) Input Tpe String Finite Automton Output Accept or Reject Fll 2018 Costs Busch - RPI 2 Trnsition

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-188 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

CMSC 331 First Midterm Exam

CMSC 331 First Midterm Exam 0 00/ 1 20/ 2 05/ 3 15/ 4 15/ 5 15/ 6 20/ 7 30/ 8 30/ 150/ 331 First Midterm Exm 7 October 2003 CMC 331 First Midterm Exm Nme: mple Answers tudent ID#: You will hve seventy-five (75) minutes to complete

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information