ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Similar documents
ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

Definition of Regular Expression

Lexical Analysis: Constructing a Scanner from Regular Expressions

Principles of Programming Languages

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Lexical Analysis and Lexical Analyzer Generators

Fig.25: the Role of LEX

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Topic 2: Lexing and Flexing

Dr. D.M. Akbar Hussain

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Lexical analysis, scanners. Construction of a scanner

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Reducing a DFA to a Minimal DFA

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Recognition of Tokens

Assignment 4. Due 09/18/17

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

CMPSC 470: Compiler Construction

Theory of Computation CSE 105

TO REGULAR EXPRESSIONS

Compilers Spring 2013 PRACTICE Midterm Exam

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

CS 430 Spring Mike Lam, Professor. Parsing

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CMPT 379 Compilers. Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Midterm I Solutions CS164, Spring 2006

ECE 468/573 Midterm 1 September 28, 2012

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

COMP 423 lecture 11 Jan. 28, 2008

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LEX5: Regexps to NFA. Lexical Analysis. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CS 321 Programming Languages and Compilers. Bottom Up Parsing

Compilation

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

CS 241 Week 4 Tutorial Solutions

2014 Haskell January Test Regular Expressions and Finite Automata

What are suffix trees?

Compiler Construction D7011E

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead

CMSC 331 First Midterm Exam

CSCE 531, Spring 2017, Midterm Exam Answer Key

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

Suffix Tries. Slides adapted from the course by Ben Langmead

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

CSE 401 Midterm Exam 11/5/10 Sample Solution

Regular Expressions and Automata using Miranda

CS201 Discussion 10 DRAWTREE + TRIES

Intermediate Information Structures

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

CS481: Bioinformatics Algorithms

Stack. A list whose end points are pointed by top and bottom

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Lexical Analysis. Role, Specification & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-state DFA - RE to DFA

ASTs, Regex, Parsing, and Pretty Printing

Matrices and Systems of Equations

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lecture T1: Pattern Matching

CSE302: Compiler Design

2 Computing all Intersections of a Set of Segments Line Segment Intersection

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

Typing with Weird Keyboards Notes

acronyms possibly used in this test: CFG :acontext free grammar CFSM :acharacteristic finite state machine DFA :adeterministic finite automata

Rational Numbers---Adding Fractions With Like Denominators.

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Suffix trees, suffix arrays, BWT

Ma/CS 6b Class 1: Graph Recap

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing

Ma/CS 6b Class 1: Graph Recap

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

CS 221: Artificial Intelligence Fall 2011

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number>

LING/C SC/PSYC 438/538. Lecture 21 Sandiway Fong

COMBINATORIAL PATTERN MATCHING

10.5 Graphing Quadratic Functions

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

The Fundamental Theorem of Calculus

COS 333: Advanced Programming Techniques

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

Transcription:

ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy

Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop è < <= = <> > >= id è letter(letter digit)* num è digit+(.digit+)?(e(+ -)?digit+)? Trim whitespce delim è lnk t newline ws è delim+

Trnsition Digrm Διάγραμμα Μτάβασης Intermedite visul representtion The grph depicts how the pointer moves from chrcter to chrcter Circles re clled sttes They represent the pointer s positions Edges leving stte s hve lels indicting the chrcters required for moving to the next stte Other is specil (refers to ny chrcter tht is not indicted y ny of the other edges leving s) strt > = 0 6 7 * denotes sttes on which input retrction must tke plce (i.e., the pointer is moved to nother trnsition digrm). other Trnsition digrm for >= 8 *

Trnsition Digrm reltion opertors strt < = 0 1 2 return(relop, LE) > other 3 return(relop, NE) = 4 * return(relop, LT) EQ: equl LE: less or equl LT: less thn NE: not equl GE: greter or equl GT: grter thn > 5 6 return(relop, EQ) = 7 other 8 * return(relop, GE) return(relop, GT)

Keywords nd Identifiers Keywords is specil cse of identifiers Once n identifier is recognized we cn check if it is keyword letter or digit strt letter other 0 10 11 * return(get_token(), instll_id())

Unsigned numers digit digit digit other strt digit. digit E + or - digit * 12 13 14 15 16 17 18 19 E digit digit digit Recognizes 12.3E4 (digits frction? exponent?) strt digit 20 21 digit. digit 22 23 other * 24 Recognizes 12.3 (digits frction) strt digit 25 26 other * 27 Recognizes 12 (digits)

Finite Automt Ππρασμένα Αυτόματα Recognizer for lnguge A progrm tht tkes s input string x nd nswers yes if x is sentence of the lnguge nd no otherwise. Compile regulr expressions to recognizers Construct generlized trnsition digrm clled finite utomton Two clsses of finite utomt Deterministic, DFA (νττρμινιστικό) Non-deterministic, NFA (μη-νττρμινιστικό)

DFAs nd NFAs Both DFA nd n NFA re cple of recognizing precisely the regulr sets Time-spce trde-off DFAs implement fster recognizers DFAs re igger (more sttes, more memory) Regulr expressions cn e compiled in oth DFA nd n NFA

NFA Mthemticl model tht consists of 1. set of sttes S 2. set of input symols Σ (the input symol lphet) 3. trnsition functions move tht mps sttesymol pirs to sets of sttes 4. stte s 0 tht is distinguished s the strt (or initil) stte 5. set of sttes F distinguished s ccepting (or finl) sttes

NFA for ( )* Sttes: {0, 1, 2, 3} Symol lphet: {, } Strt stte: 0 Accepting stte: 3 strt 0 1 2 3 An NFA looks like trnsition digrm, ut the sme chrcter cn lel two or more trnsitions out of one stte: Exmple: cn trnsit control: from Stte 0 to Stte 0 from Stte 0 to Stte 1 Also: edges cn e lel y the specil symol

Implementtion using Trnsition Tle STATE INPUT SYMBOL 0 {0, 1} {0} 1 - {2} 2 - {3} If I m in stte 0 nd the input chrcter is, then I cn move to sttes 0 or 1 If I m in stte 0 nd the input chrcter is, then I cn move to stte 0 If I m in stte 1 nd the input chrcter is, then there is no stte to move If I m in stte 1 nd the input chrcter is, then I cn move to stte 2 strt 0 1 2 3

Accepted input strings ( )* Accepted input strings:,,,, 0 0 1 2 3 Severl other sequences of moves my e mde on the input string, ut none of the others hppened to end in n ccepting stte: 0 0 0 0 0 strt 0 1 2 3

NFA for * * 1 2 strt 0 3 4

DFA 1. no stte hs n -trnsition, i.e., trnsition on input, 2. For ech stte s nd input symol, there is t most one edge leled leving s 0 1 You cn t hve leving stte 0 nd eing le to rech two sttes, i.e., stte 0 nd stte 1

DFA for ( )* strt 0 1 2 3 Recll the NFA version: strt 0 1 2 3

DFA is esy to code s := s 0 c := nextchr while c!= eof do s := move(s, c) c := nextchr end if s in F then return yes else return no

Wht do we do? NFAs re esy to conceive nd drw Multiple edges on the sme chrcters leving one stte cn cuse miguity (αμφισημιά) Mny pths tht spell out the sme input string Hrd to code DFAs re esy to implement in computer progrm

Suset Construction CONVERSION OF AN NFA INTO A DFA

Opertions OPERATION -closure(s) DESCRIPTION Set of NFA sttes rechle from NFA stte s on -trnsitions lone. -closure(t) Set of NFA sttes rechle from some NFA stte s in T on - trnsitions lone. move(t, ) Set of NFA sttes to which there is trnsition on input symol from some NFA stte s in T. Nottion: s n NFA stte, T set of NFA sttes

Exmples move({1, 2}, ) = 2 1 2 strt 0 3 4 -closure(0) = {0, 1, 2, 3} -closure(1) = {1, 2} -closure(2) = {2} -closure(3) = {3} -closure(4) = {4}

Exmple Initil NFA, for ( )* strt 2 3 0 1 6 7 8 9 10 4 5

Equivlent DFA C strt A B No trnsitions No two edges with the sme symol leving one stte Esy to trnsform to computer progrm D E

Step 1 The strt stte of the equivlent DFA is -closure(0) A = {0, 1, 2, 4, 7}, these re exctly the sttes rechle from stte 0 vi pth in which every edge is leled

Step 2 The input symol is {, }, we mrk A, nd compute -closure(move(a, )) move(a, ) is the set of sttes of the NFA hving trnsitions on from memers of A, tht is sttes 2 nd 7 (moving to 3 nd 8) -closure(move({0, 1, 2, 4, 7}, )) = -closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} This is B = {1, 2, 3, 4, 6, 7, 8}

Step 3 Among the sttes in A, only 4 hs trnsition on to 5 the DFA hs trnsition from A to C, nd C = -closure({5}) = {1, 2, 4, 5, 6, 7}

Step 4 We mrk the new sets B nd C, nd we repet Step 1-3

Repet steps Until ll sets of the DFA re mrked Finl sets A = {0, 1, 2, 4, 7} B = {1, 2, 3, 4, 6, 7, 8} C = {1, 2, 4, 5, 6, 7} D = {1, 2, 4, 5, 6, 7, 9} E = {1, 2, 3, 5, 6, 7, 10}

Trnsition Tle for DFA STATE INPUT SYMBOL A B C B B D C B C D B E E B C

strt NFA 2 3 0 1 6 7 8 9 10 4 5 C DFA strt A B D E

The suset construction initilly, -closure(s0) is the only stte in Dsttes nd it is unmrked; while there is n unmrked stte T in Dsttes do egin mrk T for ech input symol do egin U = -closure(move(t,)) if U is not in Dsttes then dd U s n unmrked stte to Dsttes; Dtrn(T,) := U end for end while

-closure(t) push ll sttes in T onto stck initilize -closure(t) to T; while stck is not empty do egin pop t for ech stte u with n edge from t to u leled do if u not in -closure(t) end if end for end while dd u to -closure(t) push u