Compila(on Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs. Noam Rinetzky

Size: px
Start display at page:

Download "Compila(on Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs. Noam Rinetzky"

Transcription

1 Compila(on Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1

2 2

3 3

4 What is a Compiler? source language target language Source text txt Executale code exe Compiler 4

5 Compiler vs. Interpreter 5

6 Toy compiler 6

7 The Real Anatomy of a Compiler 7

8 The Real Anatomy of a Compiler Source text txt Process text input characters Lexical Syntax tokens Analysis Analysis Annotated AST AST Sem. Analysis Intermediate code generation IR Intermediate code optimization IR Code generation Symolic Instructions Target code optimization SI Machine code generation MI Write executale output Executale code exe 8

9 Lexical Analysis Lexical analyzers are also known as scanners 9

10 Lexical Analysis: from Text to Tokens txt x = * 4*a*c Token Stream <ID, x > <EQ> <ID, > <MULT> <ID, > <MINUS> <INT,4> <MULT> <ID, a > <MULT> <ID, c > Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. 10

11 What does Lexical Analysis do? Scan the input Par((ons the text into stream of tokens Numers Iden(fiers Keywords Punctua(on word in the source language meaningful to the syntac(cal analysis Tokens usually represented as (kind, value) Defined using regular expressions * 11

12 What does Lexical Analysis do? Language: fully parenthesized expressions Context free language Regular languages Expr Num LP Expr Op Expr RP Num Dig Dig Num Dig LP ( RP ) Op + * ( ( ) * 19 ) 12

13 What does Lexical Analysis do? Language: fully parenthesized expressions Context free language Regular languages Expr Num LP Expr Op Expr RP Num Dig Dig Num Dig LP ( RP ) Op + * ( ( ) * 19 ) LP LP Num Op Num RP Op Num RP 13

14 From scanning to parsing program text ((23 + 7) * x) Lexical Analyzer token stream ( LP ( LP 23 Num + OP 7 Num ) RP * OP? Id ) RP Grammar: Expr... Id Id a... z syntax error Parser Op(*) valid Astract Syntax Tree Op(+) Id(?) Num(23) Num(7) 14

15 Some asic terminology Lexeme (aka symol) - a series of lecers separated from the rest of the program according to a conven(on (space, semi- column, comma, etc.) Pacern - a rule specifying a set of strings. Example: an iden(fier is a string that starts with a lecer and con(nues with lecers and digits (Usually) a regular expression Token - a pair of (pacern, acriutes) 15

16 Example void match0(char *s) /* find a zero */ { if (!strncmp(s, 0.0, 3)) return 0. ; } VOID ID(match0) LPAREN CHAR DEREF ID(s) RPAREN LBRACE IF LPAREN NOT ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN RETURN REAL(0.0) SEMI RBRACE EOF 16

17 Regular languages Formal languages Σ = finite set of lecers Word = sequence of lecer Language = set of words Regular languages defined equivalently y Regular expressions Finite- state automata 17

18 Example: Integers w/o Leading Zeros Digit = Digit0 = 0 Digit Pos = Digit Digit0* Integer = 0 Pos - Pos 18

19 Challenge: Amiguity If = if Id = Lecer (Lecer Digit)* if is a valid iden(fiers what should it e? iffy is also a valid iden(fier Solu(on Longest matching token Break (es using order of defini(ons Keywords should appear efore iden(fiers 19

20 Building a Scanner Take I Input: String Output: Sequence of tokens 20

21 Building a Scanner Take I Token nexttoken() { char c ; loop: c = getchar(); switch (c){ case ` `: goto loop ; case `;`: return SemiColumn; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '= return PlusEqual; default: ungetc(c); return Plus; }; case `<`: case `w`: } 21

22 There must e a ecer way! 22

23 A ecer way AutomaGcally generate a scanner Define tokens using regular expressions Use finite- state automata for detec(on 23

24 Reg- exp vs. automata Regular expressions are declara(ve Good for humans Not executale Automata are opera(ve Define an algorithm for deciding whether a given word is in a regular language Not a natural nota(on for humans 24

25 Overview Define tokens using regular expressions Construct a nondeterminis(c finite- state automaton (NFA) from regular expression Determinize the NFA into a determinis(c finite- state automaton (DFA) DFA can e directly used to iden(fy tokens 25

26 Automata theory: a ird s- eye view 26

27 Determinis(c Automata (DFA) M = (Σ, Q, δ, q 0, F) Σ - alphaet Q finite set of state q 0 Q ini(al state F Q final states δ : Q Σ à Q - transi(on func(on For a word w, M reach some state x M accepts w if x F 27

28 DFA in pictures An automaton is defined y states and transi(ons transition a, accepting state a c start start state,c a,,c a,,c 28

29 Accep(ng Words Words are read leq- to- right a c a c start Missing transi(on = non- acceptance Stuck state 29

30 Accep(ng Words Words are read leq- to- right a c a c start 30

31 Accep(ng Words Words are read leq- to- right a c a c start 31

32 Accep(ng Words Words are read leq- to- right a c a c start 32

33 Rejec(ng Words Words are read leq- to- right c a c start 33

34 Rejec(ng Words Missing transi(on means non- acceptance c a c start 34

35 Non- determinis(c Automata (NFA) M = (Σ, Q, δ, q 0, F) Σ - alphaet Q finite set of state q 0 Q ini(al state F Q final states δ : Q (Σ {ε}) 2 Q - transi(on func(on DFA: δ : Q Σ à Q For a word w, M can reach a numer of states X M accepts w if X M {} Possile: X = {} Possile ε- transi(ons 35

36 NFA Allow mul(ple transi(ons from given state laeled y same lecer a c start a c 36

37 Accep(ng words a c a c start a c 37

38 Accep(ng words Maintain set of states a c a c start a c 38

39 Accep(ng words a c a c start a c 39

40 Accep(ng words Accept word if reached an accep(ng state a c a c start a c 40

41 NFA+Є automata Є transi(ons can fire without reading the input start a c Є 41

42 NFA+Є run example a c start a c Є 42

43 NFA+Є run example Now Є transi(on can non- determinis(cally take place a c start a c Є 43

44 NFA+Є run example a c start a c Є 44

45 NFA+Є run example a c start a c Є 45

46 NFA+Є run example Є transi(ons can fire without reading the input a c start a c Є 46

47 Word accepted NFA+Є run example a c start a c Є 47

48 From regular expressions to NFA Step 1: assign expression names and otain pure regular expressions R 1 R m Step 2: construct an NFA M i for each regular expression R i Step 3: comine all M i into a single NFA Amiguity resolu8on: prefer longest accep8ng word 48

49 From reg. exp. to automata Theorem: there is an algorithm to uild an NFA +Є automaton for any regular expression Proof: y induc8on on the structure of the regular expression start 49

50 Basic constructs R = ε R = a a R = φ 50

51 Composi(on R = R1 R2 ε M1 ε ε M2 ε R = R1R2 ε M1 ε M2 ε 51

52 Repe((on R = R1* ε ε M1 ε ε 52

53 53

54 Naïve approach Try each automaton separately Given a word w: Try M 1 (w) Try M 2 (w) Try M n (w) Requires rese{ng aqer every acempt 54

55 Actually, we comine automata comines a a a*+ aa 0 ε ε ε ε 1 a 2 3 a a a a a*+ a a aa 55

56 Corresponding DFA a*+ 8 7 a 6 8 a a*+ 0# ε# ε# ε# ε# a# 1# 2# 3# a# 4# # 7# 8# 9# a# a# 10# a# # # a*+# # 5# 11# # a# a# 6# # 12# 13# aa# 75# a a a aa a a*+ 56

57 Scanning with DFA Run un(l stuck Rememer last accepgng state Go ack to accep(ng state Return token 57

58 Amiguity resolu(on a# 1# 2# a# ε# ε# 3# a# 4# # 5# # 6# a# 0# ε# a# # 7# 8# # a*+# ε# 9# a# 10# # 11# a# 12# # 13# aa# 75# Longest word Tie- reaker ased on order of rules when words have same length 58

59 Examples a# 1# 2# a# a*+ 8 7 a 6 8 a a*+ 0# ε# ε# ε# ε# 3# a# 4# # 7# 8# 9# a# a# 10# # # a*+# # 5# 11# # a# a# 6# # 12# 13# aa# 75# a a a aa a a*+ aaa: gets stuck aqer aa in state 12, acks up to state (5 8 11) pacern is a*+, token is a Tokens: <a*+, a> <a,a><a,a> 59

60 Examples a# 1# 2# a# a*+ 8 7 a 6 8 a a*+ 0# ε# ε# ε# ε# 3# a# 4# # 7# 8# 9# a# a# 10# # # a*+# # 5# 11# # a# a# 6# # 12# 13# aa# 75# a a a aa a a*+ aa: stops aqer second in (6 8), token is a ecause it comes first in spec Tokens: <a, a> <a,a> 60

61 Summary of Construc(on Descrie tokens as regular expressions Decide acriutes (values) to save for each token Regular expressions turned into a DFA Also, records which acriutes (values) to keep Lexical analyzer simulates the run of an automata with the given transi(on tale on any input string 61

62 A Few Remarks Turning an NFA to a DFA is expensive, ut Exponen(al in the worst case In prac(ce, works fine The construc(on is done once per- language At Compiler construc(on (me Not at compila(on (me 62

63 Implementa(on 63

64 if xy, i, zs98 3,32, , comm\n \n, \t, Implementa(on y Example if { return IF; } [a- z][a- z0-9]* { return ID; } [0-9]+ { return NUM; } [0-9]. [0-9]+ [0-9]*. [0-9]+ { return REAL; } (\- \- [a- z]*\n) ( \n \t) { ; }. { error(); } ID 2 3 IF ID error REAL 0 1 NUM REAL 12 w.s. error error w.s. 64

65 ID 2 3 IF ID error REAL 0 1 NUM REAL 12 w.s. error error w.s. int edges[][256]= { /*, 0, 1, 2, 3,..., -, e, f, g, h, i, j,... */ /* state 0 */ {0,, 0, 0,, 0, 0, 0, 0, 0,..., 0, 0, 0, 0, 0, 0}, /* state 1 */ {13,, 7, 7, 7, 7,, 9, 4, 4, 4, 4, 2, 4,, 13, 13}, /* state 2 */ {0,, 4, 4, 4, 4,..., 0, 4, 3, 4, 4, 4, 4,, 0, 0}, /* state 3 */ {0,, 4, 4, 4, 4,, 0, 4, 4, 4, 4, 4, 4,, 0, 0}, /* state 4 */ {0,, 4, 4, 4, 4,, 0, 4, 4, 4, 4, 4, 4,, 0, 0}, /* state 5 */ {0,, 6, 6, 6, 6,, 0, 0, 0, 0, 0, 0, 0,, 0, 0}, /* state 6 */ {0,, 6, 6, 6, 6,, 0, 0, 0, 0, 0, 0, 0,..., 0, 0}, /* state 7 */ /* state */... /* state 13 */ {0,, 0, 0, 0, 0,, 0, 0, 0, 0, 0, 0, 0,, 0, 0} }; 65

66 Pseudo Code for Scanner char* input = ; Token nexttoken() { lastfinal = 0; currentstate = 1 ; inputpositionatlastfinal = input; currentposition = input; while (not(isdead(currentstate))) { nextstate = edges[currentstate][*currentposition]; if (isfinal(nextstate)) { lastfinal = nextstate ; inputpositionatlastfinal = currentposition; } currentstate = nextstate; advance currentposition; } input = inputpositionatlastfinal + 1; return action[lastfinal]; } 66

67 Example ID 2 3 IF ID error REAL 0 1 NUM REAL 12 w.s. error error w.s. Input: if - - not- a- com 2 lanks 67

68 ID 2 3 IF ID error REAL final state input 0 1 NUM REAL 0 1 if - - not- a- com 12 w.s. error error w.s. 2 2 if - - not- a- com 3 3 if - - not- a- com return IF 3 0 if - - not- a- com 68

69 final state input not-a-com not-a-com not-a-com found whitespace not-a-com 69

70 ID 2 3 IF ID error REAL final state input not- a- com 12 w.s. error NUM REAL error w.s not- a- com not- a- com not- a- com error not- a- com not- a- com 70

71 ID 2 3 IF ID error REAL final state input 0 1 NUM REAL not- a- com 12 w.s. error error w.s not- a- com not- a- com error not- a- com not- a- com 71

72 Concluding remarks Efficient scanner Minimiza(on Error handling Automa(c crea(on of lexical analyzers 72

73 Efficient Scanners Efficient state representa(on Input uffering Using switch and gotos instead of tales 73

74 Minimiza(on Create a non- determinis(c automaton (NDFA) from every regular expression Merge all the automata using epsilon moves (like the construc(on) Construct a determinis(c finite automaton (DFA) State priority Minimize the automaton separate accep(ng states y token kinds 74

75 Example if { return IF; } [a- z][a- z0-9]* { return ID; } [0-9]+ { return NUM; } IF ID error NUM Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8 75

76 Example if { return IF; } [a- z][a- z0-9]* { return ID; } [0-9]+ { return NUM; } IF ID ID IF ID ID error NUM NUM NUM error Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8 76

77 Example ID IF if { return IF; } [a- z][a- z0-9]* ID { return ID; } ID IF [0-9]+ { return NUM; } ID ID ID IF error IF NUM ID NUM NUM NUM NUM error error NUM error Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8 77

78 Example ID IF if { return IF; } [a- z][a- z0-9]* { return ID; } ID [0-9]+ { return NUM; } ID IF ID NUM NUM error NUM error Modern compiler implementa(on in ML, Andrew Appel, (c)1998, Figures 2.7,2.8 78

79 Error Handling Many errors cannot e iden(fied at this stage Example: fi (a==f(x)). Should fi e if? Or is it a rou(ne name? We will discover this later in the analysis At this point, we just create an iden(fier token Some(mes the lexeme does not match any pacern Easiest: eliminate lecers un(l the eginning of a legi(mate lexeme Alterna(ves: eliminate/add/replace one lecer, replace order of two adjacent lecers, etc. Goal: allow the compila(on to con(nue Prolem: errors that spread all over 79

80 Automa(cally generated scanners Use of Program- Genera(ng Tools Specifica(on è Part of compiler Compiler- Compiler regular expressions JFlex input program scanner Stream of tokens 80

81 Use of Program- Genera(ng Tools Input: regular expressions and ac(ons Ac(on = Java code Output: a scanner program that Produces a stream of tokens Invoke ac(ons when pacern is matched regular expressions JFlex input program scanner Stream of tokens 81

82 Line Coun(ng Example Create a program that counts the numer of lines in a given input text file 82

83 Crea(ng a Scanner using Flex int num_lines = 0; %% \n ++num_lines;. ; %% main() { yylex(); printf( "# of lines = %d\n", num_lines); } 83

84 Crea(ng a Scanner using Flex int num_lines = 0; %% \n ++num_lines;. ; %% main() { yylex(); printf( "# of lines = %d\n", num_lines); } ini(al \n ^\n newline other 84

85 %% %% JFLex Spec File User code: Copied directly to Java file JFlex direc(ves: macros, state names Lexical analysis rules: Op(onal state, regular expression, ac(on How to reak input to tokens Ac(on when token matched Possile source of javac errors down the road DIGIT= [0-9] LETTER= [a- za- Z] YYINITIAL {LETTER} ({LETTER} {DIGIT})* 85

86 Crea(ng a Scanner using JFlex import java_cup.runtime.*; %% %cup %{ private int linecounter = 0; %} %eofval{ System.out.println("line numer=" + linecounter); return new Symol(sym.EOF); %eofval} NEWLINE=\n %% {NEWLINE} { linecounter++; } [^{NEWLINE}] { } 86

87 Catching errors What if input doesn t match any token defini(on? Trick: Add a catch- all rule that matches any character and reports an error Add aqer all other rules 87

88 A JFlex specifica(on of C Scanner import java_cup.runtime.*; %% %cup %{ private int linecounter = 0; %} Letter= [a- za- Z_] Digit= [0-9] %% \t { } \n { linecounter++; } ; { return new Symol(sym.SemiColumn);} ++ { return new Symol(sym.PlusPlus); } += { return new Symol(sym.PlusEq); } + { return new Symol(sym.Plus); } while { return new Symol(sym.While); } {Letter}({Letter} {Digit})* { return new Symol(sym.Id, yytext() ); } <= { return new Symol(sym.LessOrEqual); } < { return new Symol(sym.LessThan); } 88

89 Missing Crea(ng a lexical analysis y hand Tale compression Symol Tales Nested Comments Handling Macros 89

90 Lexical Analysis: What Input: program text (file) Output: sequence of tokens 90

91 Lexical Analysis: How Define tokens using regular expressions Construct a nondeterminis(c finite- state automaton (NFA) from regular expression Determinize the NFA into a determinis(c finite- state automaton (DFA) DFA can e directly used to iden(fy tokens 91

92 Lexical Analysis: Why Read input file Iden(fy language keywords and standard iden(fiers Handle include files and macros Count line numers Remove whitespaces Report illegal symols [Produce symol tale] 92

93 Syntax Analysis (1) Context Free Languages Context Free Grammars Pushdown Automata 93

94 The Real Anatomy of a Compiler Source text txt Process text input characters Lexical Syntax tokens Analysis Analysis Annotated AST AST Sem. Analysis Intermediate code generation IR Intermediate code optimization IR Code generation Symolic Instructions Target code optimization SI Machine code generation MI Write executale output Executale code exe 94

95 Frontend: Scanning & Parsing program text ((23 + 7) * x) Lexical Analyzer token stream ( LP ( LP 23 Num + OP 7 Num ) RP * OP x Id ) RP Grammar: E... Id Id a... z syntax error Parser Op(*) valid Astract Syntax Tree Op(+) Id() Num(23) Num(7) 95

96 From scanning to parsing program text ((23 + 7) * x) Lexical Analyzer token stream ( LP ( LP 23 Num + OP 7 Num ) RP * OP x Id ) RP Grammar: E... Id Id a... z syntax error Parser Op(*) valid Astract Syntax Tree Op(+) Id() Num(23) Num(7) 96

97 Parsing Construct a structured representa(on of the input Challenges How do you descrie the programming language? How do you check validity of an input? Is a sequence of tokens a valid program in the language? How do you construct the structured representa(on? Where do you report an error? 97

98 Some founda(ons 98

99 Context free languages (CFLs) L 01 = { 0 n 1 n n > 0 } L polyndrom = {pp p Σ*, p =reverse(p)} L polyndrom# = {p#p p Σ*, p =reverse(p), # Σ} 99

100 Context free grammars (CFG) V non terminals (syntac(c variales) T terminals (tokens) P deriva(on rules Each rule of the form V t (T 4 V)* S start symol G = (V,T,P,S) 100

101 What can CFGs do? Recognize CFLs S t 0T1 T t 0T1 ℇ 101

102 Recognizing CFLs Context Free Grammars (CFG) ~ language- defining power Nondeterminis(c push down automata (PDA) 102

103 Pushdown Automata (PDA) Nondeterminis(c PDAs define all CFLs Determinis(c PDAs model parsers. Most programming languages have a determinis(c PDA Efficient implementa(on 103

104 Intui(on: PDA An ε- NFA with the addi(onal power to manipulate one stack Top X Y IF $ stack control (ε- NFA) 104

105 Intui(on: PDA Think of an ε- NFA with the addi(onal power that it can manipulate a stack PDA moves are determined y: The current state (of its ε- NFA ) The current input symol (or ε) The current symol on top of its stack 105

106 Intui(on: PDA Current input if (oops) then stat:= lah else aort Top X Y IF $ stack control (ε- NFA) 106

107 Intui(on: PDA Moves: Change state Replace the top symol y 0 n symols 0 symols = pop ( reduce ) 0 < symols = sequence of pushes ( shiq ) Nondeterminis(c choice of next move 107

108 PDA Formalism PDA = (Q, Σ, Γ, δ, q 0, $, F): Q: finite set of states Σ: Input symols alphaet Γ: stack symols alphaet δ: transi(on func(on q 0 : start state $: start symol F: set of final states 108

109 The Transi(on Func(on δ(q, a, X) = { (p 1, σ 1 ),,(p n, σ n )} Input: triplet A state q Q An input symol a Σ or ε A stack symol X Γ Output: set of 0 k ac(ons of the form (p, σ) A state p Q σ a sequence X 1 X n Γ* of stack symols 109

110 Ac(ons of the PDA Say (p, σ) δ(q, a, X) If the PDA is in state q and X is the top symol and a is at the front of the input Then it can Change the state to p. Remove a from the front of the input (ut a may e ε). Replace X on the top of the stack y σ. 110

111 Example: Determinis(c PDA Design a PDA to accept {0 n 1 n n > 1}. The states: q = We have not seen 1 so far start state p = we have seen at least one 1 and no 0s since f = final state; accept. 111

112 Example: Stack Symols $ = start symol. Also marks the ocom of the stack, Indicates when we have counted the same numer of 1 s as 0 s. X = counter used to count the numer of 0s we saw 112

113 Example: Transi(ons δ(q, 0, $) = {(q, X$)}. δ(q, 0, X) = {(q, XX)}. These two rules cause one X to e pushed onto the stack for each 0 read from the input. δ(q, 1, X) = {(p, ε)}. When we see a 1, go to state p and pop one X. δ(p, 1, X) = {(p, ε)}. Pop one X per 1. δ(p, ε, $) = {(f, $)}. Accept at ocom. 113

114 Ac(ons of the Example PDA q $ 114

115 Ac(ons of the Example PDA q X $ 115

116 Ac(ons of the Example PDA q X X $ 116

117 Ac(ons of the Example PDA q X X X $ 117

118 Ac(ons of the Example PDA p X X $ 118

119 Ac(ons of the Example PDA p X $ 119

120 Ac(ons of the Example PDA p $ 120

121 Ac(ons of the Example PDA f $ 121

122 Example: Non Determinis(c PDA A PDA that accepts palindromes L {pp Σ* p =reverse(p)} 122

123 123

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1. Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1 http://www.cs.tau.ac.il/~msagiv/courses/wcc11-12.html 1 A motivating example Create a program that counts the number of lines in a given input

More information

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1 Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1 A motivating example Create a program that counts the number of lines in a given input text file Solution (Flex) int num_lines = 0; %% \n ++num_lines;.

More information

Compilation

Compilation Compilation 0368-3133 Lecture 1: Introduction Lexical Analysis Noam Rinetzky 1 2 Admin Lecturer: Noam Rinetzky maon@tau.ac.il http://www.cs.tau.ac.il/~maon T.A.: Oren Ish Shalom Textbooks: Modern Compiler

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

Compila(on (Semester A, 2013/14)

Compila(on (Semester A, 2013/14) Compila(on 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top- Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv, Jeff Ullman, Eran

More information

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised: EDA180: Compiler Construc6on Context- free grammars Görel Hedin Revised: 2013-01- 28 Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens

More information

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

More information

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool Module 8 - Lexical Analyzer Generator This module discusses the core issues in designing a lexical analyzer generator from basis or using a tool. The basics of LEX tool are also discussed. 8.1 Need for

More information

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Lexical Analysis Chapter 1, Section 1.2.1 Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens

More information

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington CSE 5317 Lecture 4: Scanning and parsing 28 Jan 2010 Nate Nystrom University of Texas at Arlington Administrivia hcp://groups.google.com/group/uta- cse- 3302 I will add you to the group soon TA Derek White

More information

EDAN65: Compilers, Lecture 02 Regular expressions and scanning. Görel Hedin Revised:

EDAN65: Compilers, Lecture 02 Regular expressions and scanning. Görel Hedin Revised: EDAN65: Compilers, Lecture 02 Regular expressions and scanning Görel Hedin Revised: 2014-09- 01 Course overview Regular expressions Context- free grammar ARribute grammar Lexical analyzer (scanner) SyntacIc

More information

Lexical analysis. Concepts. Lexical analysis in perspec>ve

Lexical analysis. Concepts. Lexical analysis in perspec>ve 4a Lexical analysis Concepts Overview of syntax and seman3cs Step one: lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc.

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

Compila(on Lecture 1: Introduc(on Lexical Analysis. Noam Rinetzky

Compila(on Lecture 1: Introduc(on Lexical Analysis. Noam Rinetzky Compila(on 0368-3133 Lecture 1: Introduc(on Lexical Analysis Noam Rinetzky 1 2 Admin Lecturer: Noam Rinetzky maon@tau.ac.il h.p://www.cs.tau.ac.il/~maon T.A.: Orr Tamir Textbooks: Modern Compiler Design

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

More information

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the

More information

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by The term languages to mean any set of string formed from some specific alphaet. The notation of concatenation can also e applied to languages. If L and M are languages, then L.M is the language consisting

More information

Compiler course. Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata

More information

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course) *4459255* [4459] 255 Seat No. T.E. (Computer Engineering) (Semester I) Examination, 2013 THEY OF COMPUTATION (2008 Course) Time : 3 Hours Max. Marks : 100 Instructions : 1) Answers to the two Sections

More information

Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical

More information

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

More information

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1 CSE 401 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2010 1/8/2010 2002-10 Hal Perkins & UW CSE B-1 Agenda Quick review of basic concepts of formal grammars Regular

More information

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

Figure 2.1: Role of Lexical Analyzer

Figure 2.1: Role of Lexical Analyzer Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Recap: First-Longest-Match Analysis Outline of Lecture

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input

More information

Lexical Analysis. Chapter 2

Lexical Analysis. Chapter 2 Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

Compiler phases. Non-tokens

Compiler phases. Non-tokens Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Recap: First-Longest-Match Analysis The Extended Matching

More information

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Related Course Objec6ves

Related Course Objec6ves Syntax 9/18/17 1 Related Course Objec6ves Develop grammars and parsers of programming languages 9/18/17 2 Syntax And Seman6cs Programming language syntax: how programs look, their form and structure Syntax

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release

More information

CSE 401/M501 Compilers

CSE 401/M501 Compilers CSE 401/M501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Spring 2018 UW CSE 401/M501 Spring 2018 B-1 Administrivia No sections this week Read: textbook ch. 1 and sec. 2.1-2.4

More information

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/} Class 5 Lex Spec Example delim [ \t\n] ws {delim}+ letter [A-Aa-z] digit [0-9] id {letter}({letter} {digit})* number {digit}+(\.{digit}+)?(e[+-]?{digit}+)? %% {ws} {/*no action and no return*?} if {return(if);}

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a EDA180: Compiler Construc6on Top- down parsing Görel Hedin Revised: 2013-01- 30a Compiler phases and program representa6ons source code Lexical analysis (scanning) Intermediate code genera6on tokens intermediate

More information

Simple Lexical Analyzer

Simple Lexical Analyzer Lecture 7: Simple Lexical Analyzer Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (03/10/17) Lecture 7: Simple Lexical Analyzer 2017-2018 1 / 1 Summary Use of jflex

More information

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

More information

CSc 453 Lexical Analysis (Scanning)

CSc 453 Lexical Analysis (Scanning) CSc 453 Lexical Analysis (Scanning) Saumya Debray The University of Arizona Tucson Overview source program lexical analyzer (scanner) tokens syntax analyzer (parser) symbol table manager Main task: to

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language? The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

More information

Compilation

Compilation Compiltion 0368-3133 Lecture 2: Lexicl Anlysis Nom Rinetzky 1 2 Lexicl Anlysis Modern Compiler Design: Chpter 2.1 3 Conceptul Structure of Compiler Compiler Source text txt Frontend Semntic Representtion

More information

DFA: Automata where the next state is uniquely given by the current state and the current input character.

DFA: Automata where the next state is uniquely given by the current state and the current input character. Chapter : SCANNING (Lexical Analysis).3 Finite Automata Introduction to Finite Automata Finite automata (finite-state machines) are a mathematical way of descriing particular kinds of algorithms. A strong

More information

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens

More information

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev Fall 2016-2017 Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University of the Negev Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning

More information

Compilation 2014 Warm-up project

Compilation 2014 Warm-up project Compilation 2014 Warm-up project Aslan Askarov aslan@cs.au.dk Revised from slides by E. Ernst Straight-line Programming Language Toy programming language: no branching, no loops Skip lexing and parsing

More information

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones Lexical Analysis - An Introduction Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

More information

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012 Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved

More information

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept.. 22nd Jan 2018 CONTENTS :- 1. Role of lexical analysis 2.

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages. Lecture 3 CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Monday, August 26, 13. Scanners

Monday, August 26, 13. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

Wednesday, September 3, 14. Scanners

Wednesday, September 3, 14. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Introduc)on. Tristan Naumann. Sahar Hasan. Steve Hehl. Brian Lu

Introduc)on. Tristan Naumann. Sahar Hasan. Steve Hehl. Brian Lu Introduc)on Tristan Naumann Sahar Hasan Steve Hehl Brian Lu Introduc)on Friendly Interac)ve Recursion Educator (FIRE) Recursion is one of the most difficult topics for novice programmers to understand

More information

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Implementation: Finite Automata Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2

More information

Lexical and Parser Tools

Lexical and Parser Tools Lexical and Parser Tools CSE 413, Autumn 2005 Programming Languages http://www.cs.washington.edu/education/courses/413/05au/ 7-Dec-2005 cse413-20-tools 2005 University of Washington 1 References» The Lex

More information

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky Compilation 0368-3133 Lecture 3: Syntax Analysis: Top-Down parsing Noam Rinetzky 1 Recursive descent parsing Define a function for every nonterminal Every function work as follows Find applicable production

More information

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing Also known as Shift-Reduce parsing More powerful than top down Don t need left factored grammars Can handle left recursion Attempt to construct parse tree from an input string eginning at leaves and working

More information

Parsing and Pattern Recognition

Parsing and Pattern Recognition Topics in IT 1 Parsing and Pattern Recognition Week 10 Lexical analysis College of Information Science and Engineering Ritsumeikan University 1 this week mid-term evaluation review lexical analysis its

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

LECTURE 6 Scanning Part 2

LECTURE 6 Scanning Part 2 LECTURE 6 Scanning Part 2 FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions. We then discussed how we can create a recognizer

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures) CS606- Compiler Construction MidTerm Papers Solved MCQS with Reference (1 to 22 lectures) by Arslan Arshad (Zain) FEB 21,2016 0300-2462284 http://lmshelp.blogspot.com/ Arslan.arshad01@gmail.com AKMP01

More information

CS 403: Scanning and Parsing

CS 403: Scanning and Parsing CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

More information

Topic 3: Syntax Analysis I

Topic 3: Syntax Analysis I Topic 3: Syntax Analysis I Compiler Design Prof. Hanjun Kim CoreLab (Compiler Research Lab) POSTECH 1 Back-End Front-End The Front End Source Program Lexical Analysis Syntax Analysis Semantic Analysis

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:

More information

Writing a Lexical Analyzer in Haskell (part II)

Writing a Lexical Analyzer in Haskell (part II) Writing a Lexical Analyzer in Haskell (part II) Today Regular languages and lexicographical analysis part II Some of the slides today are from Dr. Saumya Debray and Dr. Christian Colberg This week PA1:

More information

1.0 Languages, Expressions, Automata

1.0 Languages, Expressions, Automata .0 Languages, Expressions, Automata Alphaet: Language: a finite set, typically a set of symols. a particular suset of the strings that can e made from the alphaet. ex: an alphaet of digits = {-,0,,2,3,4,5,6,7,8,9}

More information

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

CS 4120 Introduction to Compilers

CS 4120 Introduction to Compilers CS 4120 Introduction to Compilers Andrew Myers Cornell University Lecture 6: Bottom-Up Parsing 9/9/09 Bottom-up parsing A more powerful parsing technology LR grammars -- more expressive than LL can handle

More information

8 ε. Figure 1: An NFA-ǫ

8 ε. Figure 1: An NFA-ǫ 0 1 2 3 4 a 6 5 7 8 9 10 LECTURE 27 Figure 1: An FA-ǫ 12.1 ǫ Transitions In all automata that we have seen so far, every time that it has to change from one state to another, it must use one input symol.

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

Lecture 9 CIS 341: COMPILERS

Lecture 9 CIS 341: COMPILERS Lecture 9 CIS 341: COMPILERS Announcements HW3: LLVM lite Available on the course web pages. Due: Monday, Feb. 26th at 11:59:59pm Only one group member needs to submit Three submissions per group START

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Compilers. Bottom-up Parsing. (original slides by Sam

Compilers. Bottom-up Parsing. (original slides by Sam Compilers Bottom-up Parsing Yannis Smaragdakis U Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Bottom-Up Parsing More general than top-down parsing And just as efficient Builds

More information

Front End. Hwansoo Han

Front End. Hwansoo Han Front nd Hwansoo Han Traditional Two-pass Compiler Source code Front nd IR Back nd Machine code rrors High level functions Recognize legal program, generate correct code (OS & linker can accept) Manage

More information

CSE 401 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter UW CSE 401 Winter 2015 B-1

CSE 401 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter UW CSE 401 Winter 2015 B-1 CSE 401 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2015 UW CSE 401 Winter 2015 B-1 Administrivia Read: textbook ch. 1 and sec. 2.1-2.4 First homework: out by end of

More information

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Lecture 3. January 10, 2018 Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool Lexical Analysis Implementing Scanners & LEX: A Lexical Analyzer Tool Copyright 2016, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08 CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions

More information

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers. Part III : Parsing From Regular to Context-Free Grammars Deriving a Parser from a Context-Free Grammar Scanners and Parsers A Parser for EBNF Left-Parsable Grammars Martin Odersky, LAMP/DI 1 From Regular

More information