Opertor Precedence Most progrmming lnguges hve opertor precedence rules tht stte the order in which opertors re pplied (in the sence of explicit prentheses). Thus in C nd Jv nd CSX, +*c mens compute *c, then dd in. These opertors precedence rules cn e incorported directly into CFG. Consider E E + T T T T * P P P id ( E ) Does +*c men (+)*c or +(*c)? The grmmr tells us! Look t the derivtion tree: E E + T T T * P P P id id id The other grouping cn t e otined unless explicit prentheses re used. (Why?) 200 201 Jv CUP Jv CUP is prser-genertion tool, similr to Ycc. CUP uilds Jv prser for LALR(1) grmmrs from production rules nd ssocited Jv code frgments. When prticulr production is recognized, its ssocited code frgment is executed (typiclly to uild n AST). CUP genertes Jv source file prser.jv. It contins clss prser, with method Symol prse() The Symol returned y the prser is ssocited with the grmmr s strt symol nd contins the AST for the whole source progrm. The file sym.jv is lso uilt for use with JLex-uilt scnner (so tht oth scnner nd prser use the sme token codes). If n unrecovered syntx error occurs, Exception() is thrown y the prser. CUP nd Ycc ccept exctly the sme clss of grmmrs ll LL(1) grmmrs, plus mny useful non- LL(1) grmmrs. CUP is clled s jv jv_cup.min < file.cup 202 203
Jv CUP Specifictions User Code Additions You my define Jv code to e included within the generted prser: ction code {: /*jv code */ This code is plced within the generted ction clss (which holds user-specified production ctions). prser code {: /*jv code */ This code is plced within the generted prser clss. init with{: /*jv code */ This code is used to initilize the generted prser. scn with{: /*jv code */ This code is used to tell the generted prser how to get tokens from the scnner. Jv CUP specifictions re of the form: Pckge nd import specifictions User code dditions Terminl nd non-terminl declrtions A context-free grmmr, ugmented with Jv code frgments Pckge nd Import Specifictions You define pckge nme s: pckge nme You dd imports to e used s: import jv_cup.runtime.* 204 205 Terminl nd Non-terminl Declrtions You define terminl symols you will use s: terminl clssnme nme 1, nme 2,... clssnme is clss used y the scnner for tokens (CSXToken, CSXIdentifierToken, etc.) You define non-terminl symols you will use s: non terminl clssnme nme 1, nme 2,... clssnme is the clss for the AST node ssocited with the non-terminl (stmtnode, exprnode, etc.) Production Rules Production rules re of the form nme ::= nme 1 nme 2... ction or nme ::= nme 1 nme 2... ction 1 nme 3 nme 4... ction 2... Nmes re the nmes of terminls or non-terminls, s declred erlier. Actions re Jv code frgments, of the form {: /*jv code */ The Jv oject ssocted with symol ( token or AST node) my e nmed y dding :id suffix to terminl or non-terminl in rule. 206 207
RESULT nmes the left-hnd side non-terminl. The Jv clsses of the symols re defined in the terminl nd non-terminl declrtion sections. For exmple, prog ::= LBRACE:l stmts:s RBRACE {: RESULT = new csxlitenode(s, l.linenum,l.colnum) This corresponds to the production prog { stmts } The left rce is nmed l the stmts non-terminl is clled s. In the ction code, new CSXLiteNode is creted nd ssigned to prog. It is constructed from the AST node ssocited with s. Its line nd column numers re those given to the left rce, l (y the scnner). To tell CUP wht non-terminl to use s the strt symol (prog in our exmple), we use the directive: strt with prog 208 209 Exmple Let s look t the CUP specifiction for CSX-lite. Recll its CFG is progrm { stmts } stmts stmt stmts λ stmt id = expr if ( expr ) stmt expr expr + id expr - id id The corresponding CUP specifiction is: /*** This Is A Jv CUP Specifiction For CSX-lite, Smll Suset of The CSX Lnguge, Used In Cs536 ***/ /* Preliminries to set up nd use the scnner. */ import jv_cup.runtime.* prser code {: pulic void syntx_error (Symol cur_token){ report_error( CSX syntx error t line + String.vlueOf(((CSXToken) cur_token.vlue).linenum), null)} init with {: scn with {: return Scnner.next_token() 210 211
/* Terminls (tokens returned y the scnner). */ terminl CSXIdentifierToken IDENTIFIER terminl CSXToken SEMI, LPAREN, RPAREN, ASG, LBRACE, RBRACE terminl CSXToken PLUS, MINUS, rw_if /* Non terminls */ non terminl csxlitenode prog non terminl stmtsnode stmts non terminl stmtnode stmt non terminl exprnode exp non terminl ident strt with prog prog::= LBRACE:l stmts:s RBRACE new csxlitenode(s, l.linenum,l.colnum) stmts::= stmt:s1 stmts:s2 new stmtsnode(s1,s2, s1.linenum,s1.colnum) stmtsnode.null stmt::= ident:id ASG exp:e SEMI new sgnode(id,e, id.linenum,id.colnum) rw_if:i LPAREN exp:e RPAREN stmt:s new ifthennode(e,s, stmtnode.null, i.linenum,i.colnum) exp::= exp:leftvl PLUS:op ident:rightvl new inryopnode(leftvl, sym.plus, rightvl, op.linenum,op.colnum) exp:leftvl MINUS:op ident:rightvl new inryopnode(leftvl, sym.minus,rightvl, op.linenum,op.colnum) ident:i {: RESULT = i 212 213 ident::= IDENTIFIER:i {: RESULT = new ( new (i.identifiertext, i.linenum,i.colnum), exprnode.null, i.linenum,i.colnum) Let s prse { = } First, is prsed using ident::= IDENTIFIER:i {: RESULT = new ( new (i.identifiertext, i.linenum,i.colnum), exprnode.null, i.linenum,i.colnum) We uild 214 215
Next, is prsed using ident::= IDENTIFIER:i {: RESULT = new ( new (i.identifiertext, i.linenum,i.colnum), exprnode.null, i.linenum,i.colnum) We uild Then s sutree is recognized s n exp: ident:i {: RESULT = i Now the ssignment sttement is recognized: stmt::= ident:id ASG exp:e SEMI new sgnode(id,e, id.linenum,id.colnum) We uild sgnode 216 217 The stmts λ production is mtched (indicting tht there re no more sttements in the progrm). CUP mtches stmts::= stmtsnode.null nd we uild This uilds stmtsnode sgnode nullstmtsnode nullstmtsnode Next, stmts stmt stmts is mtched using stmts::= stmt:s1 stmts:s2 new stmtsnode(s1,s2, s1.linenum,s1.colnum) As the lst step of the prse, the prser mtches progrm { stmts } using the CUP rule prog::= LBRACE:l stmts:s RBRACE new csxlitenode(s, l.linenum,l.colnum) 218 219
The finl AST reurned y the prser is csxlitenode stmtsnode sgnode nullstmtsnode Errors in Context-Free Grmmrs Context-free grmmrs cn contin errors, just s progrms do. Some errors re esy to detect nd fix others re more sutle. In context-free grmmrs we strt with the strt symol, nd pply productions until terminl string is produced. Some context-free grmmrs my contin useless non-terminls. Non-terminls tht re unrechle (from the strt symol) or tht derive no terminl string re considered useless. Useless non-terminls (nd productions tht involve them) cn e sfely removed from 220 221 grmmr without chnging the lnguge defined y the grmmr. A grmmr contining useless non-terminls is sid to e nonreduced. After useless non-terminls re removed, the grmmr is reduced. Consider S A B x B A A C d Which non-terminls re unrechle? Which derive no terminl string? 222