Principls of Programming Languags Topic: Formal Languags II CS 34,LS, LTM, BR: Formal Languags II
Rviw A grammar can b ambiguous i.. mor than on pars tr for sam string of trminals in a PL w want to bas maning on pars so ambiguous pars -> ambiguous maning -> Espcially grammars for xprssions prcdnc associativity x + y * z x + y * z x - y - z x - y - z CS 34,LS, LTM, BR: Formal Languags II 2
Rviw Solution: ncod prcdnc & associativity in grammar non-trminal for ach lvl of prcdnc + - <trm> * / <factor> for ach non-trminal: <nt> ::= <nt2> <nt> ::= <nt> + <nt2> (nt2 highr prcdnc) (lft associativ) CS 34,LS, LTM, BR: Formal Languags II 3
Rviw Contxt Fr Grammars (CFGs) ar usd to spcify th ovrall structur of a programming languag: if/thn/ls,... brackts: ( ), { }, bgin/nd,... Rgular Grammars (RGs) ar usd to spcify th structur of tokns: idntifirs, numbrs, kywords,... Not: Th rcognition problm for CFGs and RGs rquirs a diffrnt computational modl (mor on this latr). CS 34,LS, LTM, BR: Formal Languags II 4
Extndd BNF (EBNF) Writ nontrminals as in BNF. (Variant: Writ thm with initial capital lttrs, or using a diffrnt font.) Us additional mtasymbols, as shortcuts: { } mans rpat th nclosd txt zro or mor tims [ ] mans th nclosd txt is optional ( ) is usd for grouping, usually with th altrnation symbol,.g., (...). If { }, [ ], or ( ) ar usd as trminal symbols in th languag bing dfind, thn thy must b quotd. (Variant: Thy must b undrlind.) CS 34,LS, LTM, BR: Formal Languags II 5
Formal Languag Thory Offrs a way to dscrib computation problms formulatd as languag rcognition problms Enabls proofs of rlativ difficulty of crtain computational problms Provids a mchanism to aid dscription of programming languag constructs Rgular xprssions ~ PL tokns (.g., kywords) Finit stat automata (FSAs) Contxt-fr grammars ~ PL statmnts CS 34,LS, LTM, BR: Formal Languags II 6
Formal Languag Thory Rcognizrs for languags ar mor complx as th languags thmslvs bcom mor complx Simpl constructs corrspond to FSAs Kywords, numrical constants Mor complxt constructs corrspond to Push-down Automata If statmnts, looping statmnts, dclarations Evn mor complx constructs corrspond to mor complx automata Typ chcking of us with dclard typ CS 34,LS, LTM, BR: Formal Languags II 7
Rgular Exprssions Formalism for dscribing simpl PL constructs rsrvd words idntifirs numbrs Simplst sort of structur Rcognizd by a finit stat automaton Dfind rcursivly CS 34,LS, LTM, BR: Formal Languags II 8
PL construct Rgular Exprssions RE Notation Languag an mpty RE { } symbol a a {a} null symbol {} R,S rgular xprs R S L R»L S a,b trminals a b (altrnation) {a,b} R,S rgular xprs RS L R L S a,b trminals ab (concatnation) {ab} CS 34,LS, LTM, BR: Formal Languags II 9
Rgular Exprssions PL construct RE Notation Languag R,S rgular xprs R * { }» L R» L R L R» L R L R L R a a * {,a,aa,aaa, } R,S rgular xprs R + L R» L R L R» L R L R L R... a a + {a,aa,aaa, } Not: a = a = a Prcdnc is {* +} ----concatnation ---- high to low (all ar lft associativ oprators) CS 34,LS, LTM, BR: Formal Languags II
RE Exampls 2 {,2} * 2 {2,,,,, } 2 * {, 2, 22, 222, } 2 * + {,,,,,2,22, } ( 2) * {,,2,2,,2,22, } ( ) * Binary numbrs that nd in CS 34,LS, LTM, BR: Formal Languags II
RE s s for PLs Lt lttr stand for a b c z and digit stand for 2 3 4 5 6 7 8 9 Loudn uss [-9] lttr (lttr digit) * is idntifir [a-z]([a-z] [-9])* digit + is an intgr constant [-9]+ digit *. digit + is ral numbr [-9]* \. [-9]+ Which idntifirs ar dscribd by lttr (lttr digit) *? ABC C B% X CS 34,LS, LTM, BR: Formal Languags II 2
Exampls Which of th following ar lgal ral numbrs dscribd by digit *. digit +?.5.5 2 4. 6.3.2 Can s that simpl PL constructs can b dfind as rgular xprssions Can you dfin a numbr in scintific notation as an RE? (.g.,.25+2) CS 34,LS, LTM, BR: Formal Languags II 3
Finit Stat Automaton (FA) Dscribd by <st of stats, lablld transitions, start stat, final stat(s)> Exampl: <{S,S,S2}, S ---> S, S, {S,S2}> S ---> S2 start S S S2 CS 34,LS, LTM, BR: Formal Languags II 4
Finit Stat Automaton (FA) FA accpts or rcognizs an input string iff thr is a path from its start stat to a final stat such that th labls on th path ar th trminals in that string. start S S2 S inputs: stats: S S S2 S --- --- S2 --- --- What strings ar rcognizd? transition tabl CS 34,LS, LTM, BR: Formal Languags II 5
Finit Stat Automaton (FA) Binary numbrs containing a pair of adjacnt s: start FA S S2 S S3, S S S2 S S S2 S2 S S3 S3 S3 S3 Rcognizs: ( ) * ( ) * CS 34,LS, LTM, BR: Formal Languags II 6
Finit Stat Automaton (FA) Binary numbrs containing a pair of adjacnt s: FA2 start, A B C Rcognizs: ( ) * ( ) * FA and FA2 rcogniz th sam st of strings, i.., th sam languag! Thrfor, FAs ar not uniqu. CS 34,LS, LTM, BR: Formal Languags II 7
Finit Stat Automaton (FA) Exponnt in scintific notation: start E +,- digit S S S2 digit S3 digit Rcognizs: E (+ -) digit + E digit + CS 34,LS, LTM, BR: Formal Languags II 8
Finit Stat Automaton (FA) Binary numbrs which bgin and nd with a : start S S S2 Rcognizs: ( ) * CS 34,LS, LTM, BR: Formal Languags II 9
Finit Stat Automaton (FA) Binary numbrs containing at last on digit, in which all th s prcd all th s: start S S S2 Rcognizs: + + * CS 34,LS, LTM, BR: Formal Languags II 2
Practical Uss of RE s As grammar for PL tokns (idntifir, float constant, tc) In tools for finding/changing in programs & txt grp sd awk CS 34,LS, LTM, BR: Formal Languags II 2
Tasks for REs and FAs Rcognition of a string Is this givn string in th languag dscribd (rcognizd) by this givn RE (FA)? Dscription of a languag Givn an RE (FA), what languag dos it dscrib (rcogniz)? Codification of a languag Givn a languag, find an RE and an FA that corrsponds to it CS 34,LS, LTM, BR: Formal Languags II 22
Tasks for REs and FAs Rcognition of a string Givn *, which of ths strings is dscribd by it:,,,, Givn th following FA: S S which of ths strings is rcognizd by it:,,,, CS 34,LS, LTM, BR: Formal Languags II 23
Tasks for REs and FAs Dscription of a languag What languag is dscribd by th following RE: () + () + What languag is rcognizd by th following FA: S S S2 S3 S4 CS 34,LS, LTM, BR: Formal Languags II 24
Tasks for REs and FAs Dscription of a languag What languag is rcognizd by th following FA: S S S2 S3 S4 CS 34,LS, LTM, BR: Formal Languags II 25
Tasks for REs and FAs Codification of a languag Complx constants ar parnthsizd pairs of intgrs Lt digit = 2 3 4 5 6 7 8 9 Th RE for complx constants is: ( digit +, digit + ) Th FA for complx constants is: digit digit (, ) S S digit S2 S3 digit S4 S5 CS 34,LS, LTM, BR: Formal Languags II 26
Nondtrministic FAs Exampls so far hav bn dtrministic finit stat automata (DFAs). To construct nondtrministic finit stat automata (NFAs): Allow mor than on transition with th sam labl. Allow transition. Rcogniz an input string iff thr is som path from th start stat to a final stat such that th labls on th path ar th trminals in th string. CS 34,LS, LTM, BR: Formal Languags II 27
DFAs vs. NFAs Rgular Exprssion: DFA: start ( ) * ( ) * C, A B NFA: start X, Y Z, CS 34,LS, LTM, BR: Formal Languags II 28
DFAs vs. NFAs Rgular Exprssion: + + DFA: start S2 S S S3 NFA: start S S S2 S3 CS 34,LS, LTM, BR: Formal Languags II 29
Constructing FAs from REs Thr is a systmatic translation from a Rgular Exprssion (RE) to a Nondtrministic Finit Stat Automaton (NFA). Thr is a translation from th rsulting NFA into a Dtrministic Finit Stat Automaton (DFA) that rcognizs th sam languag. This procss can b automatd! CS 34,LS, LTM, BR: Formal Languags II 3
RE to NFA For a in alphabt, construct: For, construct: start a For s,t REs, construct s t:.g., N(s) N(t) CS 34,LS, LTM, BR: Formal Languags II 3
RE to NFA For s,t REs, construct st:.g., N(s) N(t) For s RE, construct s * : N(s).g., * CS 34,LS, LTM, BR: Formal Languags II 32
Exampl Build th NFA for complx numbrs using this RE: ( digit +, digit + ). digit digit + digit Not this is sam as Kln * machin xcpt for bottom transition CS 34,LS, LTM, BR: Formal Languags II 33
digit +, Exampl digit, digit +, digit + digit, digit CS 34,LS, LTM, BR: Formal Languags II 34
Exampl ( digit +, digit + ) ( digit, digit ) Q: Can w mak this NFA mor fficint by convrting it into a DFA? CS 34,LS, LTM, BR: Formal Languags II 35
(c d ) * dd c,d d NFA to DFA d S S S2 Ida: look for sts of stats with sam transitions. Lt on stat in th DFA rprsnt st of stats in th NFA S on c to {S} S on d to {S,S} {S,S} on c to {S} {S,S} on d to {S,S,S2} {S,S,S2} on c to {S} {S,S,S2} on d to {S,S,S2} c {S} d c {S,S} c d {S,S,S2} d CS 34,LS, LTM, BR: Formal Languags II 36
NFA to DFA ( digit, digit S S S2 S3 S4 S5 S6 S7 S8 S9 Ida: look for sts of stats with sam transitions. Lt on stat in th DFA rprsnt st of stats in th NFA From S with ( to {S}. From S with digit to {S2,S3,S4} From S2,S3 with digit to {S2,S3,S4} From S3,S4 with, to {S5} From S5 with digit to {S6,S7,S8} From S6,S7 with digit to {S6,S7,S8} From S7,S8 with ) to {S9} ) CS 34,LS, LTM, BR: Formal Languags II 37
NFA to DFA digit digit ( digit, digit ) S S S234 S5 S678 S9 From S with ( to {S}. From S with digit to {S2,S3,S4} From S2,S3 with digit to {S2,S3,S4} From S3,S4 with, to {S5} From S5 with digit to {S6,S7,S8} From S6,S7 with digit to {S6,S7,S8} From S7,S8 with ) to {S9} CS 34,LS, LTM, BR: Formal Languags II 38
Th Chomsky Hirarchy Typ : Arbitrary Languags Rcognizd by: Turing Machins Typ : Contxt Snsitiv Languags Typ 2: Contxt Fr Languags Gnratd by: Contxt Fr Grammars Rcognizd by: Push-Down Automata Typ 3: Rgular Languags Gnratd by: Rgular Grammars Dscribd by: Rgular Exprssions Rcognizd by: Finit Stat Automata CS 34,LS, LTM, BR: Formal Languags II 39
Contxt Fr Languags Rcognizd by: Push-down automata (in thory) Parsrs (in practic) Parsing Stratgis: Bottom-up Top-down Parsing Algorithms: Nondtrministic Dtrministic <xpr > <xpr > + <xpr > <trm> <trm> <factor > <var > x <trm > * <trm > <factor > <factor > <num> <var > 3 y CS 34,LS, LTM, BR: Formal Languags II 4