Lexical Analysis - 2

Size: px
Start display at page:

Download "Lexical Analysis - 2"

Transcription

1 Lexical Analysis - 2 More regular expressions Finite Automata NFAs and DFAs Scanners JLex - a scanner generator 1

2 Regular Expressions in JLex Symbol - Meaning. Matches a single character (not newline) * Matches 0 or more copies of preceding RE + Matches 1 or more copies of preceding RE? Matches 0 or 1 occurrence of an RE ^ Everything it quotes is matches EXACTLY Matches the beginning of a line $ Matches the end of a line [ ] Character class = matches any character listed; [^ ] implies a match of any character NOT listed ( ) Groups a series of REs into a new RE 2

3 REs in JLex Symbol - Meaning { } Control for repeated matching a specific number of times; a{1,3} means match 1,2, or 3 instances of a \ Used to match a metacharacter or control character; \n matches newline; \* is the character * Means match either the RE proceeding it OR the RE following it RE1/RE2 Means match RE1 but only when followed by RE2; 0/1 will match the 0 in 01 but not in 02 3

4 Exercise Problem: write an RE to match a quoted string such as Hello. Need to decide if a quoted string can go across more than 1 line of text Note: JLex REs are line-input-oriented, unlike formal REs Also, JLex REs make the longest matches possible within a string of characters If multiple REs are given to JLex and several match the same longest expression, the first matching RE is used. 4

5 Finite Automata Automata that recognize strings defined by a regular expression (States, Input symbols, Transitions, Start_state, Set of Final_states) Transitions between states occur on specific input symbols Deterministic automata have only 1 transition per state on a specific input and do not allow transition on the empty string 5

6 Finite Automata Language recognized by automaton is set of strings it accepts by starting in the start state, using transitions corresponding to input symbols in the input string, and processing all input and finishing in a final state RE for integers: A B [ 0-9 ] + Start state Final state 6

7 FAs Nondeterministic finite automaton <{states}, {input symbols} (terminal symbols of a grammar) Transition function ((state,input)--> state), Start state {Final states}> NFA allows more than 1 transition on the same input symbol and/or transitions on ε Deterministic FA allows only 1 transition per input symbol and no ε transitions 7

8 Theoretical results: FAs Set of languages recognizable by NFAs is same as those recognizable by DFAs. There is an algorithm to check for equivalence of two languages recognized by 2 different FAs. RE for reals: ([0-9] + \. [0-9]*) ([0-9] * \. [0-9] + ) Shown: [0-9] +. 0,1,,9 0,1,,9. 0,1,,9 0,1,,9. NFA DFA 8

9 Practical FAs Encode transitions as a table Each column is an input symbol Each row is a state Entry at (s1,i1) is state to transition to when in state s1 and see input i1 Scanner has to try to find longest match in input to a possible token May have to look beyond end of token to do this! 9

10 RE to NFA Conversion Straightforward translation using composition operators of REs For RE ε, s ε f For RE a, terminal symbol, s a f For w,t REs with corresponding NFAs N(w), N(t), w t yields, s ε ε s N(w) s N(t) f f ε ε f 10

11 RE to NFA Conversion For w,t REs with corresponding NFAs N(w), N(t), w t yields, For w RE, w* yields, For (w) RE, use N(w). s s f f N(w) N(t) ε ε ε N(w) f s f ε 11

12 RE to NFA Conversion These drawings follow the Aho, Sethi, Ullman Compiler text and are equivalent to those in Appel For w + use fact that w + = w w* For w? use fact that w? = w ε [abc] = a b c For abc use fact that abc = a b c 12

13 How does an NFA compute? Start off in the start state Compute set S of all states reachable on ε transitions. Given next input symbol is a, calculate set of states T, reachable as transition(s,a) where s S Repeat steps 2,3 until input is exhausted. If final set of states contains a final state, then string has been recognized. 13

14 NFA to DFA Conversion Deterministic computation is desirable if we want a write a scanner as a program Need to convert NFA to equivalent DFA Then can simulate DFA recognition process using tables in program to describe transitions If process ends up in a final state, a token has been recognized 14

15 NFA to DFA Conversion Intuition: whenever there is an ε transition out of a state s, the NFA may go to any of the states reachable in this manner without consuming any input symbols. Call these states the ε-closure of state s. By looking at ε-closures, we form sets of related states in the NFA; these become states in the corresponding DFA Edges in the DFA correspond to sets of edges in the NFA (connecting different ε-closure sets of states) 15

16 NFA to DFA Conversion DFA derived is not the most efficient (smallest possible), but is usually of practical size There are ways of obtaining an optimal DFA by minimizing the numbers of states 16

17 Conversion Algorithm Need two primitive functions ε closure(t), for T a set of states in the NFA Returns a set of NFA states reachable from state s T by ε-transitions move(t, a), for T a set of states in the NFA returns a set of NFA states to which there is a transition on a from some NFA state s T Build set of states (D) and transitions (Dtrans) for the DFA 17

18 Conversion Algorithm Assume all states in NFA are unmarked initially. Let S = ε-closure(start state of NFA). Let D = {S}. while an unmarked state T D do Mark T; input symbols a do { U = ε-closure (move (T, a)); if U D then {add unmarked U to D}; Dtrans(T,a) = U; } endwhile 18

19 Possible Problems Theoretically, for NFA having n states can get DFA with 2 n states, but this doesn t happen in practice. Token is recognized if the ending state of the DFA contains an original final state of the NFA. In case of choice, use final state which represents the earliest rule in the list of productions for tokens 19

20 Optimal DFA There are algorithms for constructing the minimal (smallest) DFA Idea: Assume every state can transition on every input (can create an error state to do this). Try to prove that computation starting at states T and S differs on at least 1 input. If cannot find such an input can merge S and T. Resulting state has the union of their transitions. (ASU, pp141ff) 20

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions Last lecture CMSC330 Finite Automata Languages Sets of strings Operations on languages Regular expressions Constants Operators Precedence 1 2 Finite automata States Transitions Examples Types This lecture

More information

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions. Finite automata We have looked at using Lex to build a scanner on the basis of regular expressions. Now we begin to consider the results from automata theory that make Lex possible. Recall: An alphabet

More information

Lexical Analysis. Prof. James L. Frankel Harvard University

Lexical Analysis. Prof. James L. Frankel Harvard University Lexical Analysis Prof. James L. Frankel Harvard University Version of 5:37 PM 30-Jan-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Regular Expression Notation We will develop a

More information

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata Formal Languages and Compilers Lecture IV: Regular Languages and Finite Automata Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Implementation: Finite Automata Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)

More information

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1} CSE 5 Homework 2 Due: Monday October 6, 27 Instructions Upload a single file to Gradescope for each group. should be on each page of the submission. All group members names and PIDs Your assignments in

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5 CS 536 Spring 2015 1 Multi Character Lookahead We may allow finite automata to look beyond the next input character.

More information

Lexical Error Recovery

Lexical Error Recovery Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 13, 2007 Outline Recap

More information

Compiler phases. Non-tokens

Compiler phases. Non-tokens Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree

More information

[Lexical Analysis] Bikash Balami

[Lexical Analysis] Bikash Balami 1 [Lexical Analysis] Compiler Design and Construction (CSc 352) Compiled By Central Department of Computer Science and Information Technology (CDCSIT) Tribhuvan University, Kirtipur Kathmandu, Nepal 2

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:

More information

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles CSE45 Translation of Programming Languages Automata, Simple Language Design Principles Finite Automata State Graphs A state: The start state: An accepting state: A transition: a A Simple Example A finite

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Lexical Error Recovery

Lexical Error Recovery Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation

More information

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Lexical Analysis Chapter 1, Section 1.2.1 Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens

More information

2. Syntax and Type Analysis

2. Syntax and Type Analysis Content of Lecture Syntax and Type Analysis Lecture Compilers Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain 1 2 Compiler Construction F6S Lecture - 2 1 3 4 Compiler Construction F6S Lecture - 2 2 5 #include.. #include main() { char in; in = getch ( ); if ( isalpha (in) ) in = getch ( ); else error (); while

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

Syntax and Type Analysis

Syntax and Type Analysis Syntax and Type Analysis Lecture Compilers Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 1 Content

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Compiler Construction

Compiler Construction Compiler Construction Lecture 2: Lexical Analysis I (Introduction) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/

More information

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program

More information

CT32 COMPUTER NETWORKS DEC 2015

CT32 COMPUTER NETWORKS DEC 2015 Q.2 a. Using the principle of mathematical induction, prove that (10 (2n-1) +1) is divisible by 11 for all n N (8) Let P(n): (10 (2n-1) +1) is divisible by 11 For n = 1, the given expression becomes (10

More information

Zhizheng Zhang. Southeast University

Zhizheng Zhang. Southeast University Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

More information

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS308 Compiler Principles Lexical Analyzer Li Jiang CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular

More information

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur Finite Automata Dr. Nadeem Akhtar Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur PhD Laboratory IRISA-UBS University of South Brittany European University

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS 2010: Compilers Lexical Analysis: Finite State Automata Dr. Licia Capra UCL/CS REVIEW: REGULAR EXPRESSIONS a Character in A Empty string R S Alternation (either R or S) RS Concatenation (R followed by

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

Converting a DFA to a Regular Expression JP

Converting a DFA to a Regular Expression JP Converting a DFA to a Regular Expression JP Prerequisite knowledge: Regular Languages Deterministic Finite Automata Nondeterministic Finite Automata Regular Expressions Conversion of Regular Expression

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing CS 432 Fall 2017 Mike Lam, Professor Finite Automata Conversions and Lexing Finite Automata Key result: all of the following have the same expressive power (i.e., they all describe regular languages):

More information

Monday, August 26, 13. Scanners

Monday, August 26, 13. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

CS 310: State Transition Diagrams

CS 310: State Transition Diagrams CS 30: State Transition Diagrams Stefan D. Bruda Winter 207 STATE TRANSITION DIAGRAMS Finite directed graph Edges (transitions) labeled with symbols from an alphabet Nodes (states) labeled only for convenience

More information

Wednesday, September 3, 14. Scanners

Wednesday, September 3, 14. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012 Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved

More information

Writing a Lexical Analyzer in Haskell (part II)

Writing a Lexical Analyzer in Haskell (part II) Writing a Lexical Analyzer in Haskell (part II) Today Regular languages and lexicographical analysis part II Some of the slides today are from Dr. Saumya Debray and Dr. Christian Colberg This week PA1:

More information

Non-deterministic Finite Automata (NFA)

Non-deterministic Finite Automata (NFA) Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design

More information

Front End: Lexical Analysis. The Structure of a Compiler

Front End: Lexical Analysis. The Structure of a Compiler Front End: Lexical Analysis The Structure of a Compiler Constructing a Lexical Analyser By hand: Identify lexemes in input and return tokens Automatically: Lexical-Analyser generator We will learn about

More information

Theory of Computations Spring 2016 Practice Final Exam Solutions

Theory of Computations Spring 2016 Practice Final Exam Solutions 1 of 8 Theory of Computations Spring 2016 Practice Final Exam Solutions Name: Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Computer Science Department Carlos III University of Madrid Leganés (Spain) OUTLINE Introduction: Definitions The role of the Lexical Analyzer Scanner Implementation

More information

6 NFA and Regular Expressions

6 NFA and Regular Expressions Formal Language and Automata Theory: CS21004 6 NFA and Regular Expressions 6.1 Nondeterministic Finite Automata A nondeterministic finite automata (NFA) is a 5-tuple where 1. is a finite set of states

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Lexical Analysis. Chapter 2

Lexical Analysis. Chapter 2 Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Lecture 4 (Modified by Professor Vijay Ganesh) Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) TD parsing - LL(1) Parsing First and Follow sets Parse table construction BU Parsing Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) Problems with SLR Aho, Sethi, Ullman, Compilers

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

2. Lexical Analysis! Prof. O. Nierstrasz!

2. Lexical Analysis! Prof. O. Nierstrasz! 2. Lexical Analysis! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/! http://www.cs.purdue.edu/homes/hosking/!

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1 CSE 401 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2010 1/8/2010 2002-10 Hal Perkins & UW CSE B-1 Agenda Quick review of basic concepts of formal grammars Regular

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION CSE 105 THEORY OF COMPUTATION Spring 2016 http://cseweb.ucsd.edu/classes/sp16/cse105-ab/ Today's learning goals Sipser Ch 3.2, 3.3 Define variants of TMs Enumerators Multi-tape TMs Nondeterministic TMs

More information

Implementation of Lexical Analysis. Lecture 4

Implementation of Lexical Analysis. Lecture 4 Implementation of Lexical Analysis Lecture 4 1 Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working

More information

3.15: Applications of Finite Automata and Regular Expressions. Representing Character Sets and Files

3.15: Applications of Finite Automata and Regular Expressions. Representing Character Sets and Files 3.15: Applications of Finite Automata and Regular Expressions In this section we consider three applications of the material from Chapter 3: searching for regular expressions in files; lexical analysis;

More information

Alternation. Kleene Closure. Definition of Regular Expressions

Alternation. Kleene Closure. Definition of Regular Expressions Alternation Small finite sets are conveniently represented by listing their elements. Parentheses delimit expressions, and, the alternation operator, separates alternatives. For example, D, the set of

More information

UNIT II LEXICAL ANALYSIS

UNIT II LEXICAL ANALYSIS UNIT II LEXICAL ANALYSIS 2 Marks 1. What are the issues in lexical analysis? Simpler design Compiler efficiency is improved Compiler portability is enhanced. 2. Define patterns/lexeme/tokens? This set

More information

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Introduction: Definitions Lexical analysis or scanning: To read from left-to-right a source

More information

Regular Languages and Regular Expressions

Regular Languages and Regular Expressions Regular Languages and Regular Expressions According to our definition, a language is regular if there exists a finite state automaton that accepts it. Therefore every regular language can be described

More information

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION CSE 105 THEORY OF COMPUTATION Spring 2017 http://cseweb.ucsd.edu/classes/sp17/cse105-ab/ Today's learning goals Sipser Ch 1.2, 1.3 Design NFA recognizing a given language Convert an NFA (with or without

More information

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Theory of Computation Dr. Weiss Extra Practice Exam Solutions Name: of 7 Theory of Computation Dr. Weiss Extra Practice Exam Solutions Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try to be

More information

Theory of Programming Languages COMP360

Theory of Programming Languages COMP360 Theory of Programming Languages COMP360 Sometimes it is the people no one imagines anything of, who do the things that no one can imagine Alan Turing What can be computed? Before people even built computers,

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups

More information

Week 2: Syntax Specification, Grammars

Week 2: Syntax Specification, Grammars CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences

More information

lec3:nondeterministic finite state automata

lec3:nondeterministic finite state automata lec3:nondeterministic finite state automata 1 1.introduction Nondeterminism is a useful concept that has great impact on the theory of computation. When the machine is in a given state and reads the next

More information

Automating Construction of Lexers

Automating Construction of Lexers Automating Construction of Lexers Regular Expression to Programs Not all regular expressions are simple. How can we write a lexer for (a*b aaa)? Tokenizing aaaab Vs aaaaaa Regular Expression Finite state

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution Exam Facts Format Wednesday, July 20 th from 11:00 a.m. 1:00 p.m. in Gates B01 The exam is designed to take roughly 90

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

Decision Properties of RLs & Automaton Minimization

Decision Properties of RLs & Automaton Minimization Decision Properties of RLs & Automaton Minimization Martin Fränzle formatics and Mathematical Modelling The Technical University of Denmark Languages and Parsing MF Fall p./ What you ll learn. Decidable

More information

Lecture 4: Syntax Specification

Lecture 4: Syntax Specification The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic

More information

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer As the first phase of a compiler, the main task of the lexical analyzer is to read the input

More information

Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical

More information

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG) CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG) Objectives Introduce Pushdown Automaton (PDA) Show that PDA = CFG In terms of descriptive power Pushdown Automaton (PDA) Roughly

More information

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States TDDD16 Compilers and Interpreters TDDB44 Compiler Construction LR Parsing, Part 2 Constructing Parse Tables Parse table construction Grammar conflict handling Categories of LR Grammars and Parsers An NFA

More information

Compiler course. Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata

More information

Finite Automata and Scanners

Finite Automata and Scanners Finite Automata and Scanners A finite automaton (FA) can be used to recognize the tokens specified by a regular expression. FAs are simple, idealized computers that recognize strings belonging to regular

More information

DFA: Automata where the next state is uniquely given by the current state and the current input character.

DFA: Automata where the next state is uniquely given by the current state and the current input character. Chapter : SCANNING (Lexical Analysis).3 Finite Automata Introduction to Finite Automata Finite automata (finite-state machines) are a mathematical way of descriing particular kinds of algorithms. A strong

More information

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages Regular Languages MACM 3 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop The set of regular languages: each element is a regular language Each regular language is an example of a

More information

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

More information

CS2 Language Processing note 3

CS2 Language Processing note 3 CS2 Language Processing note 3 CS2Ah 5..4 CS2 Language Processing note 3 Nondeterministic finite automata In this lecture we look at nondeterministic finite automata and prove the Conversion Theorem, which

More information

Figure 2.1: Role of Lexical Analyzer

Figure 2.1: Role of Lexical Analyzer Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Lecture 15 Ana Bove May 23rd 2016 More on Turing machines; Summary of the course. Overview of today s lecture: Recap: PDA, TM Push-down

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3 CS 536 Spring 2015 1 Scanning A scanner transforms a character stream into a token stream. A scanner is sometimes

More information

Formal Languages. Formal Languages

Formal Languages. Formal Languages Regular expressions Formal Languages Finite state automata Deterministic Non-deterministic Review of BNF Introduction to Grammars Regular grammars Formal Languages, CS34 Fall2 BGRyder Formal Languages

More information

Lexical Analysis. Finite Automata. (Part 2 of 2)

Lexical Analysis. Finite Automata. (Part 2 of 2) # Lexical Analysis Finite Automata (Part 2 of 2) PA0, PA Although we have included the tricky file ends without a newline testcases in previous years, students made good cases against them (e.g., they

More information

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama Print family (or last) name: Print given (or first) name: I have read and understand all of

More information