download instant at

Similar documents
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

Optimizing Finite Automata

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Introduction to Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

A simple syntax-directed

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Crafting a Compiler with C (V) Scanner generator

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Non-deterministic Finite Automata (NFA)

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Compiler Construction

Introduction to Lexical Analysis

Lexical Analysis. Chapter 2

CS 314 Principles of Programming Languages. Lecture 3

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Introduction

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Monday, August 26, 13. Scanners

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Wednesday, September 3, 14. Scanners

Lexical Analysis 1 / 52

UNIT -2 LEXICAL ANALYSIS

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Zhizheng Zhang. Southeast University

UNIT II LEXICAL ANALYSIS

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Lexical Analysis. Lecture 2-4

Lexical Error Recovery

Implementation of Lexical Analysis

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Midterm I (Solutions) CS164, Spring 2002

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Part 5 Program Analysis Principles and Techniques

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Compiling Regular Expressions COMP360

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Languages and Compilers

CMSC 330: Organization of Programming Languages

Lexical and Syntax Analysis

Lexical Analysis - 2

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Lecture 3-4

Figure 2.1: Role of Lexical Analyzer

Week 2: Syntax Specification, Grammars

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Chapter 2 :: Programming Language Syntax

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

ECE 468/573 Midterm 1 October 1, 2014

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Implementation of Lexical Analysis

Chapter 3: Lexing and Parsing

CS 314 Principles of Programming Languages

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Downloaded from Page 1. LR Parsing

2. Lexical Analysis! Prof. O. Nierstrasz!

Alternation. Kleene Closure. Definition of Regular Expressions

COMPILER DESIGN LECTURE NOTES

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Compiler phases. Non-tokens

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lexical Error Recovery

Lecture 3: Lexical Analysis

CPSC 434 Lecture 3, Page 1

Bottom-up Parser. Jungsik Choi

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Languages and Compilers

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Wednesday, August 31, Parsers

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS308 Compiler Principles Lexical Analyzer Li Jiang

LECTURE 6 Scanning Part 2

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Properties of Regular Expressions and Finite Automata

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

ECE 468/573 Midterm 1 September 30, 2015

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Chapter 3. Describing Syntax and Semantics ISBN

Transcription:

chaptersolutions (USA) 2010/9/1 13:17 page 2 #3 2 Chapter 3 3.1 from page 106 The token sequence for the first four lines (by line) is: ID(main),LPAREN,RPAREN,LBRACE CONST,ID(float),ID(payment),ASSIGN,FLOATNUM(384.00),SEMICOLON ID(float),ID(bal),SEMICOLON ID(int),ID(month)ASSIGNINTNUM(0)SEMICOLON All of the occurrencesof ID require additionalinformation, as do the numbers. The attached information is shown in parentheses in the lines above. 3.3 from page 106 (a) (ab a) (ba b) (b) a a((c bc)da) (c)λ (ab c) 3.5 from page 107 Let DNOTZ be the set of digits from 1 to 9 and D be the set of digits from 0 to 9. (0 ( (λ 0) DNOTZ D ).(0 (D DNOTZ (λ 0))

chaptersolutions (USA) 2010/9/1 13:17 page 3 #4 3 3.7 from page 107 AlmostReserved tokens could be used to recover from syntax errors caused by misspelling reserved keywords. The grammar used to drive the parser must be rewritten to treat the token classes AlmostReserved and Identi f ier as interchangeable until an error is recognized. If the parsing error occurred in a state that would accept a reserved word and the input token was AlmostReserved, then the value of the token would be checked to see if it was a variant of the expected reserved word. If so, the reserved word could be assumed to be the inted input token and parsing could be restarted from the state it was in before the error. AlmostReserved tokens would have to be recognized by using an extension of one of the reserved word lookup techniques described at the of Section 3.7.1 on page 79. Because of the large number of identifiers that are one character change removed from reserved words, it would cause a huge increase in the number of states in the scanner tables to recognize them by generating regular expressions to recognize each of the possibilities. 3.11 from page 108 (a) Since the scanner must be capable of looking ahead over an arbitrary number of characters, it must be able to back up over any number of characters when an error occurs until it finds a state (if any) in which a token can be recognized. The characters beyond the accepted token that were examined before the error occurred must be saved for reprocessing when the next token is requested. (b) Specification for matching the keyword DO must include a context clause, requiring that it be followed by a string of digits (a statement label in Fortran), an identifier, an equal sign, an integer (the starting value of the loop) and then a comma. While the necessary lookahead is being done to look for this context, all of the characters processed must be saved in order to restart the scanning after it is determined whether are not thedo token is to be recognized. If the lookahead fails, scanning will continue with the character after theo. The scanner will then construct a longer identifier by apping the characters that appear before= todo. 3.14 from page 109 Since discrete states must be present in the FA to recognize each pair of left and right brackets, an FA with n states will be able to recognize strings in the set for any value of k up to n/2. Half of its states will be reachable via transitions labeled with left brackets and the other half must be reachable via transitions labeled by the matching right brackets. If such an FA is presented with a string that begins with n/2+1 left brackets, it cannot have a transition that allows it to accept the last left bracket, since that would leave it with less then n/2 states reachable via right bracket transitions.

chaptersolutions (USA) 2010/9/1 13:17 page 4 #5 4 3.17 from page 110 Since by definition an NFA may include transitions to multiple states for a given input symbol, the transition table used for DFAs must be generalized to include multiple successor states for a given input symbol. In addition, a row must be added to the table to include the transitions from each state labeled with λ. The information in this row is needed in order to mimic the operation of CLOSE during the scanning process. Scanning using an NFA might be attractive when processing a language in which a programmer can define new syntactic constructs, including new tokens. The scanner would run more slowly than a standard scanner using DFAs, but the cost of executing the MAKEDETERMINISTIC algorithm during compilation would be avoided. 3.19 from page 110 Rev(R) is a regular set, since a backwards path must exist through the DFA that defines R for each of the strings in Rev(R). An NFA for recognizing Rev(R) can be constructed by reversing the transitions and roles of the initial and accepting states in the original DFA. The NFA can be converted into a DFA using MAKEDETERMINISTIC in Section 3.8.2 on page 94. 3.21 from page 110 The characters of an integer literal are converted to an integer by multiplying the value so far by 10 as each new digit is processed and then adding the value of the new digit. An overflow will occur when either the multiplication by 10 or addition of the digit value produces a result that exceeds the maximum int value. Appropriate checking must be performed before each of these operations to produce an appropriate error message before an overflow occurs. 3.24 from page 111 (a) Double is a regular set if the vocabulary consists of only a single letter a. Any even-length string must be in the set defined by Double and a language consisting of only even-length strings can be specified easily: (aa) + (b) Double is not a regular set if the vocabulary consists of two letters. It is possible to define a set of strings consisting of identical repeating pieces only if all of the strings in the set can be enumerated. Since regular sets defined using the operator include infinite strings, the members of a regular set are not in general enumerable. Note that the answer in (a) does not attempt to define the two identical substrings. Rather, it simply takes advantage of the fact that any even-length string constructed from a vocabulary of only a single character must be decomposable into two identical parts and only constructs strings our of pairs of a s.

chaptersolutions (USA) 2010/9/1 13:17 page 5 #6 5 3.25 from page 111 (a) Seq(x, y)=(x(yx) (y λ)) (y(xy) (x λ)) (b) S=a((b Seq(a, c)) (b λ)) ((c Seq(a, b)) (c λ)) 3.27 from page 112 This algorithm has the same structure as MAKEDETERMINISTIC in Figure 3.23 on page 95. The only difference is that it does not merge states other than those reached byλtransitions. function REMOVELAMBDATRANS( N ) returns NFA NewN.StartState RECORDSTATE({ N.StartState}) foreach S WorkList do WorkList WorkList { S} foreach c Σ do NewN.T(S, c) RECORDSTATE({ N.T(s, c)}) NewN.AcceptStates { S NewN.States S N.AcceptStates } return (NewN) function CLOSE( S, T ) returns Set ans S repeat changed false foreach s ans do foreach t T(s,λ) do if t ans then ans ans { t} changed true until not changed return (ans) function RECORDSTATE( s) returns Set s CLOSE(s, N.T ) if s NewN.States then NewN.States NewN.States { s} WorkList WorkList { s} return (s) 3.29 from page 112 A DFA consisting of a just one state state can only accept the empty string unless it has a transition on one of more symbols that loops

chaptersolutions (USA) 2010/9/1 13:17 page 6 #7 6 back to its single state. The presence of such a loop enables the DFA to accept strings of length 1 (n), 2 (2n) or any length. Adding additional states to this DFA produces an analogous situation. A DFA with n states can accept strings only up to length n 1 unless it contains a loop. The presence of a loop means that a DFA can accept strings of any length, and thus a DFA of n states with a loop must be able to accept strings of length 2n and greater.