Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Similar documents
Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Dr. D.M. Akbar Hussain

Lexical Analysis: Constructing a Scanner from Regular Expressions

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Definition of Regular Expression

Fig.25: the Role of LEX

Topic 2: Lexing and Flexing

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Reducing a DFA to a Minimal DFA

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Lexical analysis, scanners. Construction of a scanner

CS 430 Spring Mike Lam, Professor. Parsing

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CMPSC 470: Compiler Construction

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Assignment 4. Due 09/18/17

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSE 401 Midterm Exam 11/5/10 Sample Solution

Writing a Lexical Analyzer in Haskell (part II)

Lexical Analysis and Lexical Analyzer Generators

COMP 423 lecture 11 Jan. 28, 2008

CSCE 531, Spring 2017, Midterm Exam Answer Key

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

CS 241 Week 4 Tutorial Solutions

Principles of Programming Languages

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

TO REGULAR EXPRESSIONS

12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number>

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

ASTs, Regex, Parsing, and Pretty Printing

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Compiler Construction D7011E

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

Compilers Spring 2013 PRACTICE Midterm Exam

LEX5: Regexps to NFA. Lexical Analysis. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University

ECE 468/573 Midterm 1 September 28, 2012

CMPT 379 Compilers. Lexical Analysis

CS 321 Programming Languages and Compilers. Bottom Up Parsing

Compilation

2 Computing all Intersections of a Set of Segments Line Segment Intersection

Scanner Termination. Multi Character Lookahead

What are suffix trees?

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or

CS201 Discussion 10 DRAWTREE + TRIES

From Dependencies to Evaluation Strategies

Regular Expressions and Automata using Miranda

Context-Free Grammars

Midterm I Solutions CS164, Spring 2006

Lecture T1: Pattern Matching

Theory of Computation CSE 105

COS 333: Advanced Programming Techniques

Context-Free Grammars

CMSC 331 First Midterm Exam

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

stack of states and grammar symbols Stack-Bottom marker C. Kessler, IDA, Linköpings universitet. 1. <list> -> <list>, <element> 2.

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

COS 333: Advanced Programming Techniques

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

acronyms possibly used in this test: CFG :acontext free grammar CFSM :acharacteristic finite state machine DFA :adeterministic finite automata

Java CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications

Typing with Weird Keyboards Notes

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Problem Set 2 Fall 16 Due: Wednesday, September 21th, in class, before class begins.

10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing

Midterm 2 Sample solution

Recognition of Tokens

Algorithm Design (5) Text Search

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

Suffix trees, suffix arrays, BWT

Slides for Data Mining by I. H. Witten and E. Frank

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example:

box Boxes and Arrows 3 true 7.59 'X' An object is drawn as a box that contains its data members, for example:

Top-down vs Bottom-up. Bottom up parsing. Sentential form. Handles. Handles in expression example

Outline CS 412/413. Function calls. Stack layout. Tiling a call. Two translations

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Intermediate Information Structures

Qubit allocation for quantum circuit compilers

CS481: Bioinformatics Algorithms

2014 Haskell January Test Regular Expressions and Finite Automata

PPS: User Manual. Krishnendu Chatterjee, Martin Chmelik, Raghav Gupta, and Ayush Kanodia

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

Transcription:

Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte Mchines DFAs: Deterministic Finite Automt Complictions NFAs: Non Deterministic Finite Stte Automt From Regulr Expressions to NFAs From NFAs to DFAs Structure of Typicl Compiler Anlysis chrcter strem lexicl nlysis tokens words syntctic nlysis AST sentences semntic nlysis nnotted AST interpreter Synthesis IR code genertion IR optimiztion IR code genertion trget lnguge CS453 Lecture Regulr Expressions nd Trnsition Digrms 1 CS453 Lecture Regulr Expressions nd Trnsition Digrms 2 Exmple MeggyJv progrm import meggy.meggy; clss PA3Flower { pulic sttic void min(string[] whtever){ { // Upper left petl, clockwise Meggy.setPixel( (yte)2, (yte)4, Meggy.Color.VIOLET ); Meggy.setPixel( (yte)2, (yte)1, Meggy.Color.VIOLET); } } Atmel ssemly for Meggy.setPixel() cll min: cll _Z18MeggyJrSimpleSetupv # Push constnt 2 onto stck ldi r24, 2 push r24 # Push constnt 4 onto stck ldi r24, 4 push r24 # Push Meggy.Color.VIOLET onto the stck. ldi r22, 6 push r22 # Pop the rguments into registers in reverse order. pop r20 pop r22 pop r24 cll _Z6DrwPxhhh cll _Z12DisplySltev CS453 Lecture Regulr Expressions nd Trnsition Digrms 3 CS453 Lecture Regulr Expressions nd Trnsition Digrms 4

Aout The Slides on Lnguges nd Finite Automt Slides Originlly Developed y Prof. Costs Busch (2004) Mny thnks to Prof. Busch for developing the originl slide set. Adpted with permission y Prof. Dn Mssey (Spring 2007) Susequent modifictions, mny thnks to Prof. Mssey for CS 301 slides Adpted with permission y Prof. Michelle Strout (Spring 2011) Adpted for use in CS 453 Adpted y Wim Bohm( dded regulr expr! NFA! DFA, Spr2012) Lnguges A lnguge is set of strings (sometimes clled sentences) String: A finite sequence of letters Exmples: ct, dog, house, Defined over fixed lphet: Σ = {,, c,, z} Empty String Regulr Expressions A string with no letters: (sometimes λ is used) Regulr expressions descrie regulr lnguges You hve proly seen them in OSs / editors Oservtions: = 0 Exmple: ( ()(c)) * w = w = w = = descries the lnguge L(( ()(c))*) = {,c,,c,c,... }

Recursive Definition for Specifying Regulr Expressions Primitive regulr expressions: where α Σ, some lphet Given regulr expressions r 1 nd r 1 r 2 (r 1 )(r 2 ) r 1 * ( r 1 ),, α r 2 Are regulr expressions Regulr opertors choice: A B string from L(A) or from L(B) conctention: A B string from L(A) followed y string from L(B) repetition: A* 0 or more conctentions of strings A + grouping: ( A ) from L(A) 1 or more Conctention hs precedence over choice: A B C vs. (A B)C More syntctic sugr, used in scnner genertors: [c] mens or or c [\t\n ] mens t, newline, or spce [-z] mens,c,, or z CS453 Lecture Regulr Expressions nd Trnsition Digrms 10 Exmple Regulr Expressions nd Regulr Definitions Regulr definition: nme : regulr expression nme cn then e used in other regulr expressions Keywords print, while Finite Automton Input String Opertions: +, -, * Identifiers: let : [-za-z] // chose from to z or A to Z dig : [] id : let (let dig)* Finite Automton Output String Numers: dig + = dig dig* CS453 Lecture Regulr Expressions nd Trnsition Digrms 11

Finite Accepter Input String Finite Automton Output Accept or Reject Stte Trnsition Grph -Finite Accepter initil stte stte trnsition finl stte ccept Initil Configurtion Input String Reding the Input q 0

Input finished Output: ccept

String Rejection

Input finished The Empty String Output: reject q 0 Another Exmple q 0, Output: reject Would it e possile to ccept the empty string?

,, Input finished Output: ccept,,

Rejection,,,,

Input finished Which strings re ccepted?, Output: reject Formlities Q Deterministic Finite Accepter (DFA) Σ δ q 0 F ( Q Σ, δ, q F) M =, : set of sttes : input lphet, 0 : trnsition function : initil stte : set of finl (ccepting) sttes Input Alphet Σ Set of Sttes Q Σ = { } Q = { q, q, q, q, q q } 0 1 2 3 4, 5

Initil Stte Set of Finl Sttes F = { } q 4 F q 0 q 4 Trnsition Function δ : Q Σ Q δ δ ( q ) = 1 0, q q 1

δ ( q 0, ) = δ ( q 2, ) = q 3 Trnsition Function / tle δ q q 0 q 1 q 2 q 3 5 q5 q5 q5 δ Complictions 1. "1234" is n NUMBER ut wht out the 123 in 1234 or the 23, etc. Also, the scnner must recognize mny tokens, not one, only stopping t end of file. 2. "if" is keyword or reserved word IF, ut "if" is lso defined y the reg. exp. for identifier ID, we wnt to recognize IF. 3. we wnt to discrd white spce nd comments. q 4 4. "123" is NUMBER ut so is "235" nd so is "0", just s "" is n ID nd so is "cd, we wnt to recognize token, ut dd ttriutes to it. CS453 Lecture Regulr Expressions nd Trnsition Digrms 48

Compliction 1 1. "1234" is n NUMBER ut wht out the 123 in 1234 or the 23, etc. Also, the scnner must recognize mny tokens, not one, only stopping t end of file. So: recognize the lrgest string defined y some regulr expression, only stop getting more input if there is no more mtch. This introduces the need to reconsider chrcter, s it is the first of the next token e.g. fnme(cd ); would e scnned s ID OPEN ID COMMA ID CLOSE SEMI EOF scnning fnme would consume (, which would e put ck nd then recognized s OPEN Compliction 2 2. "if" is keyword or reserved word IF, ut "if" is lso defined y the reg. exp. for identifier ID, we wnt to recognize IF, so Hve some wy of determining which token ( IF or ID ) is recognized. This cn e done using priority, e.g. in scnner genertors n erlier definition hs higher priority thn lter one. By putting the definition for IF efore the definition for ID in the input for the scnner genertor, we get the desired result. CS453 Lecture Regulr Expressions nd Trnsition Digrms 49 CS453 Lecture Regulr Expressions nd Trnsition Digrms 50 Compliction 3 Compliction 4 3. we wnt to discrd white spce nd comments nd not other the prser with these. So: in scnner genertors, we cn specify, using regulr expression, white spce e.g. [\t\n ] nd return no token, i.e. move to the next specify comments using (NASTY) regulr expression nd gin return no token 4. "123" is NUMBER ut so is "235" nd so is "0", just s "" is n ID nd so is "cd, we wnt to recognize token, ut dd ttriutes to it. So, Scnners return Symols, not tokens. A Symol is (token, tokenvlue) pir, e.g. (NUMBER,123) or (ID,""). Often more informtion is dded to symol, e.g. line numer nd position (s we will do in MeggyJv) CS453 Lecture Regulr Expressions nd Trnsition Digrms 51 CS453 Lecture Regulr Expressions nd Trnsition Digrms 52

(Non) Deterministic Finite Stte Automt From regulr expressions to NFAs A Deterministic Finite Stte Automton (DFA) hs disjoint chrcter sets on its edges, i.e. the choice which stte is next is deterministic. A Non-deterministic Finite Stte Automton (NFA) does NOT, i.e. it cn hve chrcter sets on its edges tht overlp (non empty intersection), nd empty sets on the some edges (leled ). NFAs re used in the trnsltion from regulr expressions to FSAs. E.g. when we comine the reg. exp for IF with the reg.exp for ID y just merging the two Trnsition grphs, we would get n NFA. NFAs re first step in creting DFA for scnner. The NFA is then trnsformed into DFA. regexp simple letter empty string AB conct the NFAs A B split merge them A* uild loop A A ccept stte of the NFA for A B A B CS453 Lecture Regulr Expressions nd Trnsition Digrms 53 CS453 Lecture Regulr Expressions nd Trnsition Digrms 54 The Prolem Exmple IF nd ID DFAs re esy to execute (tle driven interprettion) NFAs re esy to uild from reg. exps, ut hrd to execute we would need some form of guessing, implemented y ck trcking To uild DFA from n NFA we void the ck trck y tking ll choices in the NFA t once, move with chrcter or gets us to set of sttes in the NFA, which will ecome one stte in the DFA. We keep doing this until we hve exhusted ll possiilities. This mechnism is clled trnsitive closure (This ends ecuse there is only finite set of susets of NFA sttes <=? ) let : [-z] dig : [] tok : if id if : i f id : let (let dig)* CS453 Lecture Regulr Expressions nd Trnsition Digrms 55 CS453 Lecture Regulr Expressions nd Trnsition Digrms 56

Exmple: NFA for IF nd ID 1 i f 2 3 IF -z 4 5 8 ID -z 7 6 IF hs priority over ID From 1, with we cn get to sttes 1 nd 4 this is clled n -closure We cn now simulte the ehvior of the NFA nd uild tle for the DFA mking chrcter moves plus -closures let : [-z] dig : [] tok : if id if : i f id : let (let dig)* NFA simultion scnning in 1 i f 2 3 4 IF -z ID -z 7 6 5 8 DFAstte NFAsttes Move Next 1 1,4 i 2,5,8,6 2 2,5,6,8 n 6,7,8 Only one of the sttes in 6,7,8 is n ccepting stte, n ID ccepting stte, so in is n ID CS453 Lecture Regulr Expressions nd Trnsition Digrms 57 CS453 Lecture Regulr Expressions nd Trnsition Digrms 58 NFA simultion scnning if 1 i f 2 3 4 -z -z 7 6 5 8 IF ID DFAstte NFAsttes Move Next 1 1,4 i 2,5,8,6 2 2,5,6,8 f 3,6,7,8 Two of the sttes in 3,6,7,8 re ccepting, n IF ccepting stte (3) nd n ID ccepting stte (8), IF hs priority over ID, so if is n IF Definitions: edge(s,c) nd closure edge(s,c): the set of ll NFA sttes rechle from stte s following n edge with chrcter c closure(s): the set of ll sttes rechle from S with no chrs or closure(s) = T = S ( T=S s T repet T =T; forll s in T { T =T; T = T ' ( edge(s,)) } until T ==T s T ' edge(s,)) CS453 Lecture Regulr Expressions nd Trnsition Digrms 59 This trnsitive closure lgorithm termintes ecuse there is finite numer of sttes in the NFA CS453 Lecture Regulr Expressions nd Trnsition Digrms 60

DFAedge nd NFA Simultion Suppose we re in stte DFA d = {s i, s k,s l } By moving with chrcter c from d we rech set of new NFA sttes, cll these DFAedge(d,c), new or lredy existing DFA stte DFAedge(d, c) = closure( NFA simultion: let the input string e c 1 c k d=closure({s1}) // s 1 the strt stte of the NFA for i from 1 to k d = DFAedge(d,c i ) s d edge(s, c)) Constructing DFA with closure nd DFAEdge stte d 1 = closure(s 1 ) the closure of the strt stte of the NFA mke new sttes y moving from existing sttes with chrcter c, using DFAEdge(d,c); record these in the trnsition tle mke ccepts in the trnsition tle, if there is n ccepting stte in d, decide priority if more thn one ccept stte. Insted of chrcters we use non-overlpping (DFA) chrcter clsses to keep the tle mngele. CS453 Lecture Regulr Expressions nd Trnsition Digrms 61 CS453 Lecture Regulr Expressions nd Trnsition Digrms 62 NFA to DFA (let s uild it) NFA to DFA 1 i f 2 3 4 -z -z 7 6 5 8 IF ID 1 i f 2 3 4 -z -z 7 6 5 8 IF ID 1: 1,4 2: i 2,5,6,8 3: ID 3,6,7,8 -h j-z 5: 5,6,8 ID f -e g-z -z -z IF -z 4: 6,7,8 ID CS453 Lecture Regulr Expressions nd Trnsition Digrms 63 CS453 Lecture Regulr Expressions nd Trnsition Digrms 64

The trnsition tle for IF ID p NFAsttes(p) i f -h -e,g-z -z, ACPT j-z 1 {1,4} {2,5,6,8} {5,6,8} 2 {2,5,6,8} {3,6,7,8} {6,7,8} ID 3 {3,6,7,8} {6,7,8} IF 4 {6,7,8} {6,7,8} ID homework 1 Build n NFA nd DFA for integer nd flot literls dot:. dig: [] int-lit: dig + flot-lit: dig* dot dig+ 5 {5,6,8} {6,7,8} ID CS453 Lecture Regulr Expressions nd Trnsition Digrms 65 CS453 Lecture Regulr Expressions nd Trnsition Digrms 66