chaptersolutions (USA) 2010/9/1 13:17 page 2 #3 2 Chapter 3 3.1 from page 106 The token sequence for the first four lines (by line) is: ID(main),LPAREN,RPAREN,LBRACE CONST,ID(float),ID(payment),ASSIGN,FLOATNUM(384.00),SEMICOLON ID(float),ID(bal),SEMICOLON ID(int),ID(month)ASSIGNINTNUM(0)SEMICOLON All of the occurrencesof ID require additionalinformation, as do the numbers. The attached information is shown in parentheses in the lines above. 3.3 from page 106 (a) (ab a) (ba b) (b) a a((c bc)da) (c)λ (ab c) 3.5 from page 107 Let DNOTZ be the set of digits from 1 to 9 and D be the set of digits from 0 to 9. (0 ( (λ 0) DNOTZ D ).(0 (D DNOTZ (λ 0))
chaptersolutions (USA) 2010/9/1 13:17 page 3 #4 3 3.7 from page 107 AlmostReserved tokens could be used to recover from syntax errors caused by misspelling reserved keywords. The grammar used to drive the parser must be rewritten to treat the token classes AlmostReserved and Identi f ier as interchangeable until an error is recognized. If the parsing error occurred in a state that would accept a reserved word and the input token was AlmostReserved, then the value of the token would be checked to see if it was a variant of the expected reserved word. If so, the reserved word could be assumed to be the inted input token and parsing could be restarted from the state it was in before the error. AlmostReserved tokens would have to be recognized by using an extension of one of the reserved word lookup techniques described at the of Section 3.7.1 on page 79. Because of the large number of identifiers that are one character change removed from reserved words, it would cause a huge increase in the number of states in the scanner tables to recognize them by generating regular expressions to recognize each of the possibilities. 3.11 from page 108 (a) Since the scanner must be capable of looking ahead over an arbitrary number of characters, it must be able to back up over any number of characters when an error occurs until it finds a state (if any) in which a token can be recognized. The characters beyond the accepted token that were examined before the error occurred must be saved for reprocessing when the next token is requested. (b) Specification for matching the keyword DO must include a context clause, requiring that it be followed by a string of digits (a statement label in Fortran), an identifier, an equal sign, an integer (the starting value of the loop) and then a comma. While the necessary lookahead is being done to look for this context, all of the characters processed must be saved in order to restart the scanning after it is determined whether are not thedo token is to be recognized. If the lookahead fails, scanning will continue with the character after theo. The scanner will then construct a longer identifier by apping the characters that appear before= todo. 3.14 from page 109 Since discrete states must be present in the FA to recognize each pair of left and right brackets, an FA with n states will be able to recognize strings in the set for any value of k up to n/2. Half of its states will be reachable via transitions labeled with left brackets and the other half must be reachable via transitions labeled by the matching right brackets. If such an FA is presented with a string that begins with n/2+1 left brackets, it cannot have a transition that allows it to accept the last left bracket, since that would leave it with less then n/2 states reachable via right bracket transitions.
chaptersolutions (USA) 2010/9/1 13:17 page 4 #5 4 3.17 from page 110 Since by definition an NFA may include transitions to multiple states for a given input symbol, the transition table used for DFAs must be generalized to include multiple successor states for a given input symbol. In addition, a row must be added to the table to include the transitions from each state labeled with λ. The information in this row is needed in order to mimic the operation of CLOSE during the scanning process. Scanning using an NFA might be attractive when processing a language in which a programmer can define new syntactic constructs, including new tokens. The scanner would run more slowly than a standard scanner using DFAs, but the cost of executing the MAKEDETERMINISTIC algorithm during compilation would be avoided. 3.19 from page 110 Rev(R) is a regular set, since a backwards path must exist through the DFA that defines R for each of the strings in Rev(R). An NFA for recognizing Rev(R) can be constructed by reversing the transitions and roles of the initial and accepting states in the original DFA. The NFA can be converted into a DFA using MAKEDETERMINISTIC in Section 3.8.2 on page 94. 3.21 from page 110 The characters of an integer literal are converted to an integer by multiplying the value so far by 10 as each new digit is processed and then adding the value of the new digit. An overflow will occur when either the multiplication by 10 or addition of the digit value produces a result that exceeds the maximum int value. Appropriate checking must be performed before each of these operations to produce an appropriate error message before an overflow occurs. 3.24 from page 111 (a) Double is a regular set if the vocabulary consists of only a single letter a. Any even-length string must be in the set defined by Double and a language consisting of only even-length strings can be specified easily: (aa) + (b) Double is not a regular set if the vocabulary consists of two letters. It is possible to define a set of strings consisting of identical repeating pieces only if all of the strings in the set can be enumerated. Since regular sets defined using the operator include infinite strings, the members of a regular set are not in general enumerable. Note that the answer in (a) does not attempt to define the two identical substrings. Rather, it simply takes advantage of the fact that any even-length string constructed from a vocabulary of only a single character must be decomposable into two identical parts and only constructs strings our of pairs of a s.
chaptersolutions (USA) 2010/9/1 13:17 page 5 #6 5 3.25 from page 111 (a) Seq(x, y)=(x(yx) (y λ)) (y(xy) (x λ)) (b) S=a((b Seq(a, c)) (b λ)) ((c Seq(a, b)) (c λ)) 3.27 from page 112 This algorithm has the same structure as MAKEDETERMINISTIC in Figure 3.23 on page 95. The only difference is that it does not merge states other than those reached byλtransitions. function REMOVELAMBDATRANS( N ) returns NFA NewN.StartState RECORDSTATE({ N.StartState}) foreach S WorkList do WorkList WorkList { S} foreach c Σ do NewN.T(S, c) RECORDSTATE({ N.T(s, c)}) NewN.AcceptStates { S NewN.States S N.AcceptStates } return (NewN) function CLOSE( S, T ) returns Set ans S repeat changed false foreach s ans do foreach t T(s,λ) do if t ans then ans ans { t} changed true until not changed return (ans) function RECORDSTATE( s) returns Set s CLOSE(s, N.T ) if s NewN.States then NewN.States NewN.States { s} WorkList WorkList { s} return (s) 3.29 from page 112 A DFA consisting of a just one state state can only accept the empty string unless it has a transition on one of more symbols that loops
chaptersolutions (USA) 2010/9/1 13:17 page 6 #7 6 back to its single state. The presence of such a loop enables the DFA to accept strings of length 1 (n), 2 (2n) or any length. Adding additional states to this DFA produces an analogous situation. A DFA with n states can accept strings only up to length n 1 unless it contains a loop. The presence of a loop means that a DFA can accept strings of any length, and thus a DFA of n states with a loop must be able to accept strings of length 2n and greater.