ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

Similar documents
ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Definition of Regular Expression

Principles of Programming Languages

Lexical Analysis: Constructing a Scanner from Regular Expressions

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Lexical Analysis and Lexical Analyzer Generators

Fig.25: the Role of LEX

Topic 2: Lexing and Flexing

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Dr. D.M. Akbar Hussain

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Lexical analysis, scanners. Construction of a scanner

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

Recognition of Tokens

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

Reducing a DFA to a Minimal DFA

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

CMPSC 470: Compiler Construction

Assignment 4. Due 09/18/17

Compilers Spring 2013 PRACTICE Midterm Exam

Theory of Computation CSE 105

COMP 423 lecture 11 Jan. 28, 2008

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

TO REGULAR EXPRESSIONS

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CMPT 379 Compilers. Lexical Analysis

CS 430 Spring Mike Lam, Professor. Parsing

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Midterm I Solutions CS164, Spring 2006

ECE 468/573 Midterm 1 September 28, 2012

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

LEX5: Regexps to NFA. Lexical Analysis. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

Suffix Tries. Slides adapted from the course by Ben Langmead

What are suffix trees?

CS 321 Programming Languages and Compilers. Bottom Up Parsing

CS 241 Week 4 Tutorial Solutions

Compilation

Intermediate Information Structures

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

CS481: Bioinformatics Algorithms

Scanner Termination. Multi Character Lookahead

CS201 Discussion 10 DRAWTREE + TRIES

Typing with Weird Keyboards Notes

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

2014 Haskell January Test Regular Expressions and Finite Automata

CSE302: Compiler Design

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

CSCE 531, Spring 2017, Midterm Exam Answer Key

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lecture T1: Pattern Matching

Ma/CS 6b Class 1: Graph Recap

Stack. A list whose end points are pointed by top and bottom

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University

Suffix trees, suffix arrays, BWT

CSE 401 Midterm Exam 11/5/10 Sample Solution

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

2 Computing all Intersections of a Set of Segments Line Segment Intersection

COMBINATORIAL PATTERN MATCHING

Ma/CS 6b Class 1: Graph Recap

Lexical Analysis. Role, Specification & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-state DFA - RE to DFA

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing

Compiler Construction D7011E

Regular Expressions and Automata using Miranda

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

10.5 Graphing Quadratic Functions

Rational Numbers---Adding Fractions With Like Denominators.

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

Slides for Data Mining by I. H. Witten and E. Frank

CMSC 331 First Midterm Exam

ASTs, Regex, Parsing, and Pretty Printing

Here is an example where angles with a common arm and vertex overlap. Name all the obtuse angles adjacent to

Lists in Lisp and Scheme

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

acronyms possibly used in this test: CFG :acontext free grammar CFSM :acharacteristic finite state machine DFA :adeterministic finite automata

LING/C SC/PSYC 438/538. Lecture 21 Sandiway Fong

Information Retrieval and Organisation

Matrices and Systems of Equations

12-B FRACTIONS AND DECIMALS

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Transcription:

ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy

RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è < <= = <> > >= id è letter(letter digit)* num è digit+(.digit+)?(e(+ -)?digit+)? Trim whitespce delim è lnk t newline ws è delim+

TrnsiNon Digrm Διάγραμμα Μτάβασης Intermedite visul representnon The grph depicts how the pointer moves from chrcter to chrcter Circles re clled sttes They represent the pointer s posinons Edges leving stte s hve lels indicnng the chrcters required for moving to the next stte Other is specil (refers to ny chrcter tht is not indicted y ny of the other edges leving s) strt > = 0 6 7 * denotes sttes on which input retrcnon must tke plce (i.e., the pointer is moved ck to the strt of the digrm). other 8 *

TrnsiNon Digrm if expressions strt < = 0 1 2 return(relop, LE) > other 3 return(relop, NE) = 4 * return(relop, LT) EQ: equl LE: less or equl LT: less thn NE: not equl GE: greter or equl GT: grter thn > 5 6 return(relop, EQ) = 7 other 8 * return(relop, GE) return(relop, GT)

Keywords nd IdenNfiers Keywords is specil cse of idennfiers Once n idennfier is recognized we cn check if it is keyword le9er or digit strt le9er other 0 10 11 * return(get_token(), instll_id())

Unsigned numers digit digit digit other strt digit. digit E + or - digit * 12 13 14 15 16 17 18 19 E digit digit digit Recognizes 12.3E4 (digits frc>on? exponent?) strt digit 20 21 digit. digit 22 23 other * 24 Recognizes 12.3 (digits frc>on) strt digit 25 26 other * 27 Recognizes 12 (digits)

Finite Automt Ππρασμένα Αυτόματα Recognizer for lnguge A progrm tht tkes s input string x nd nswers yes if x is sentence of the lnguge nd no otherwise. Compile regulr expressions to recognizers Construct generlized trnsi>on digrm clled finite utomton Two clsses of finite utomt DeterminisNc, DFA (νττρμινιστικό) Non-determinisNc, NFA (μη-νττρμινιστικό)

DFAs nd NFAs Both DFA nd n NFA re cple of recognizing precisely the regulr sets Time-spce trde-off DFAs implement fster recognizers DFAs re igger (more sttes, more memory) Regulr expressions cn e compiled in oth DFA nd n NFA

NFA MthemNcl model tht consists of 1. set of sttes S 2. set of input symols Σ (the input symol lphet) 3. trnsinon funcnons move tht mps sttesymol pirs to sets of sttes 4. stte s 0 tht is disnnguished s the strt (or ininl) stte 5. set of sttes F disnnguished s ccepqng (or finl) sttes

NFA for ( )* Sttes: {0, 1, 2, 3} Symol lphet: {, } Strt stte: 0 AccepNng stte: 3 strt 0 1 2 3 An NFA looks like trnsi>on digrm, ut the sme chrcter cn lel two or more trnsi>ons out of one stte: Exmple: cn trnsit control: from Stte 0 to Stte 0 from Stte 0 to Stte 1 Also: edges cn e lel y the specil symol

ImplementNon using TrnsiNon Tle STATE INPUT SYMBOL 0 {0, 1} {0} 1 - {2} 2 - {3} If I m in stte 0 nd the input chrcter is, then I cn move to sttes 0 or 1 If I m in stte 0 nd the input chrcter is, then I cn move to stte 0 If I m in stte 1 nd the input chrcter is, then there is no stte to move If I m in stte 1 nd the input chrcter is, then I cn move to stte 2 strt 0 1 2 3

Accepted input strings ( )* Accepted input strings:,,,, 0 0 1 2 3 Severl other sequences of moves my e mde on the input string, ut none of the others hppened to end in n ccepnng stte: 0 0 0 0 0 strt 0 1 2 3

NFA for * * 1 2 strt 0 3 4

DFA 1. no stte hs n -trnsinon, i.e., trnsinon on input, 2. For ech stte s nd input symol, there is t most one edge leled leving s 0 1 You cn t hve leving stte 0 nd eing le to rech two sttes, i.e., stte 0 nd stte 1

DFA for ( )* strt 0 1 2 3 Recll the NFA version: strt 0 1 2 3

DFA is esy to code s := s 0 c := nextchr while c!= eof do s := move(s, c) c := nextchr end if s in F then return yes else return no

Wht do we do? NFAs re esy to conceive nd drw MulNple edges on the sme chrcters leving one stte cn cuse miguity (αμφισημιά) Mny pths tht spell out the sme input string Hrd to code DFAs re esy to implement in computer progrm

Suset ConstrucQon CONVERSION OF AN NFA INTO A DFA

OperNons OPERATION -closure(s) DESCRIPTION Set of NFA sttes rechle from NFA stte s on -trnsinons lone. -closure(t) Set of NFA sttes rechle from some NFA stte s in T on - trnsinons lone. move(t, ) Set of NFA sttes to which there is trnsinon on input symol from some NFA stte s in T. Not>on: s n NFA stte, T set of NFA sttes

Exmples move({1, 2}, ) = 2 1 2 strt 0 3 4 -closure(0) = {0, 1, 2, 3} -closure(1) = {1, 2} -closure(2) = {2} -closure(3) = {3} -closure(4) = {4}

The suset construcnon initilly, -closure(s0) is the only stte in Dsttes nd it is unmrked; while there is n unmrked stte T in Dsttes do egin mrk T for ech input symol do egin U = -closure(move(t,)) if U is not in Dsttes then dd U s n unmrked stte to Dsttes; Dtrn(T,) := U end for end while

-closure(t) push ll sttes in T onto stcki initilize -closure(t) to T; while stck is not empty do egin pop t for ech stte u with n edge from t to u leled do if u not in -closure(t) end if end for end while dd u to -closure(t) push u

Exmple IniNl NFA, for ( )* strt 2 3 0 1 6 7 8 9 10 4 5

Equivlent DFA C strt A B No trnsinons No two edges with the sme symol leving one stte Esy to trnsform to computer progrm D E

Step 1 The strt stte of the equivlent DFA is -closure(0) A = {0, 1, 2, 4, 7}, these re exctly the sttes rechle from stte 0 vi pth in which every edge is leled

Step 2 The input symol is {, }, we mrk A, nd compute -closure(move(a, )) move(a, ) is the set of sttes of the NFA hving trnsinons on from memers of A, tht is sttes 2 nd 7 (moving to 3 nd 8) -closure(move({0, 1, 2, 4, 7}, )) = -closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} This is B = {1, 2, 3, 4, 6, 7, 8}

Step 3 Among the sttes in A, only 4 hs trnsinon on to 5 the DFA hs trnsinon from A to C, nd C = -closure({5}) = {1, 2, 4, 5, 6, 7}

Step 4 We mrk the new sets B nd C, nd we repet Step 1-3

Repet steps UnNl ll sets of the DFA re mrked Finl sets A = {0, 1, 2, 4, 7} B = {1, 2, 3, 4, 6, 7, 8} C = {1, 2, 4, 5, 6, 7} D = {1, 2, 4, 5, 6, 7, 9} E = {1, 2, 3, 5, 6, 7, 10}

TrnsiNon Tle for DFA STATE INPUT SYMBOL A B C B B D C B C D B E E B C

strt NFA 2 3 0 1 6 7 8 9 10 4 5 C DFA strt A B D E