Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Size: px
Start display at page:

Download "Lexical Analysis. COMP 524, Spring 2014 Bryan Ward"

Transcription

1 Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others

2 The Big Picture Character Stream Scanner (lexical analysis) Token Stream Parser (syntax analysis) Parse Tree Semantic analysis & intermediate code gen. Abstract syntax tree Machine-independent optimization (optional) Modified intermediate form Target code generation. Machine language Machine-specific optimization (optional) Modified target language!2

3 The Big Picture Character Stream Scanner (lexical analysis) Token Stream Parser (syntax analysis) Parse Tree Semantic analysis & intermediate code gen. Abstract syntax tree Modified intermediate form Lexical analysis: Machine-independent optimization (optional) grouping consecutive characters that belong together.! Target code generation. Machine language Turn the stream of individual characters into a Modified target language Machine-specific optimization stream of tokens that have individual meaning. (optional)!3

4 Source Program The compiler reads the program from a file.! Input as a character stream.!4

5 Source Program The compiler reads the program from a file.! Input as a character stream. Source File = * f o o ;!4

6 Source Program The compiler reads the program from a file.! Input as a character stream. Source File = * f o o ; Compilation requires analysis of program structure.! Identify subroutines, classes, methods, etc. Thus, first step is to find units of meaning.!4

7 Tokens Source File = * f o o ;!5

8 Tokens Source File = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!5

9 Tokens Source File = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Human Analogy: To understand the meaning of an English sentence, we do not look at individual characters. Rather, we look at individual words. Human word = Program token Compiler must identify all input tokens.!!6

10 Tokens Source File Operator: Assignment = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!7

11 Tokens Source File Integer Literal = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!8

12 Tokens Source File Operator: Minus = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!9

13 Tokens Source File Integer Literal = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!10

14 Tokens Source File Operator: Multiplication = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!11

15 Tokens Source File Identifier: foo = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!12

16 Tokens Source File Statement separator/terminator = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!13

17 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase?!14

18 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing).!14

19 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing). However, this is impractical.!14

20 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing). However, this is impractical. It is much easier (and much more efficient) to express the syntax rules in terms of tokens.!14

21 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing). However, this is impractical. It is much easier (and much more efficient) to express the syntax rules in terms of tokens. Thus, lexical analysis is made a separate step because it greatly simplifies the subsequently performed syntactical analysis.!14

22 Example: Java Language Specification Lexical Structure The following 37 tokens are the operators, formed from ASCII characters: Operator: one of = > <! ~? : == <= >=!= && * / & ^ % << >> >>> += -= *= /= &= = ^= %= <<= >>= >>>= Syntactical Structure UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression! UnaryExpression CastExpression

23 Example: Java Language Specification Token Specification: Lexical Structure These strings mean something, but knowledge of the exact meaning is not required to identify them. The following 37 tokens are the operators, formed from ASCII characters: Operator: one of = > <! ~? : == <= >=!= && * / & ^ % << >> >>> += -= *= /= &= = ^= %= <<= >>= >>>= Syntactical Structure UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression! UnaryExpression CastExpression

24 Example: Java Language Specification Token Specification: Lexical Structure These strings mean something, but knowledge of the exact meaning is not required to identify them. The following 37 tokens are the operators, formed from ASCII characters: Operator: one of = > <! ~? : == <= >=!= && * / & ^ % << >> >>> += -= *= /= &= = ^= %= <<= >>= >>>= Syntactical Structure Meaning is given by where they can UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression! UnaryExpression CastExpression occur in the program (grammar) and and language semantics.

25 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream?!18

26 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream? Token Specification Regular Expressions Language Design and Specification!18

27 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream? Token Specification Token Recognition Regular Expressions Deterministic Finite Automata (DFA) Language Design and Specification Language Implementation!18

28 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream? Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA) Language Design and Specification (several steps) Language Implementation!18

29 Regular Expression Rules!19

30 Regular Expression Rules Base case: a regular expression (RE) is either!19

31 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or!19

32 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!19

33 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!19

34 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by!19

35 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by alternation: two REs separated by next to each other (e.g. 1 0 ),!19

36 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by alternation: two REs separated by next to each other (e.g. 1 0 ), parentheses (in order to avoid ambiguity).!19

37 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by alternation: two REs separated by next to each other (e.g. 1 0 ), parentheses (in order to avoid ambiguity). concatenation: two REs next to each other (e.g., (1)(0 1) ),!19

38 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)*!20

39 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1?!20

40 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? Yes!20

41 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? Yes 0x0?!20

42 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? Yes No!20

43 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? Yes No 0xdeadbeef?!20

44 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? Yes No Yes!20

45 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? Yes No Yes 0x?!20

46 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? Yes No Yes No!20

47 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? Yes No Yes No 0x01?!20

48 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? 0x01? Yes No Yes No No!20

49 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? 0x01? Yes No Yes No No Recognizes Positive hexadecimal constants! without leading zeros!20

50 Example!21

51 Example Can we create a regular expression corresponding to the City, State ZIP-code line in mailing addresses? E.g.: Chapel Hill, NC !21

52 Example Can we create a regular expression corresponding to the City, State ZIP-code line in mailing addresses? E.g.: Chapel Hill, NC !! Beverly Hills, CA 90210!21

53 Example Can we create a regular expression corresponding to the City, State ZIP-code line in mailing addresses? E.g.: Chapel Hill, NC !! Beverly Hills, CA 90210!21

54 Grammars and Languages A regular grammar is a kind of grammar.! A grammar describes the structure of strings. A string that matches a grammar G s structure is said to be in the language L(G) (which is a set).!22

55 Grammars and Languages A regular grammar is a kind of grammar.! A grammar describes the structure of strings. A string that matches a grammar G s structure is said to be in the language L(G) (which is a set). A grammar is a set of productions:! Rules to obtain (produce) a string that is in L(G) via repeated substitutions. There are many grammar classes (see COMP 455). Two are commonly used to describe programming languages: regular grammars for tokens and context-free grammars for syntax.!22

56 Grammar 101 digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!23

57 Grammar 101: Productions digit non_zero_digit natural_number non_zero_digit digit* A B is called a production. non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!24

58 Grammar 101: Non-Terminals digit non_zero_digit natural_number non_zero_digit digit* The name on the left is called non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε ) a non-terminal symbol.!25

59 Grammar 101: Terminals digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε ) The symbols on the right are either terminal or nonterminal symbols. A terminal symbol is just a character.!26

60 Grammar 101: Definition digit non_zero_digit natural_number non_zero_digit digit* means is a or replace with non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!27

61 Grammar 101: Choice digit non_zero_digit natural_number non_zero_digit digit* denotes or non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!28

62 Grammar 101: Example digit non_zero_digit natural_number non_zero_digit digit* Thus, the first production means: A digit is a 0 or 1 or 2 or or 9. non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!29

63 Grammar 101: Optional Repetition * denotes zero or more of a symbol. digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!30

64 Grammar 101: Sequence Two symbols next to each other means followed by. digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!31

65 Grammar 101: Example Thus, this means: A natural number is a non-zero digit digit followed by zero or more digits. non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!32

66 Grammar 101: Epsilon digit ε is special terminal that means empty. It corresponds to the empty string. non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!33

67 Grammar 101: Example So, what does this mean? digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!34

68 Grammar 101: Example A non-negative digit 0 1 number is 6 a or a natural number, followed by either nothing or a., followed by zero or more non_zero_digit digits, followed by (exactly one) digit. natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!35

69 Regular Grammar Rules Very similar to regular expression rules!!36

70 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either!36

71 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or!36

72 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!36

73 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!36

74 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using!36

75 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using alternation: two REs or nonterminals separated by next to each other (e.g. letter digit ),!36

76 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using alternation: two REs or nonterminals separated by next to each other (e.g. letter digit ), parentheses (in order to avoid ambiguity).!36

77 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using alternation: two REs or nonterminals separated by next to each other (e.g. letter digit ), parentheses (in order to avoid ambiguity). concatenation: two REs or nonterminals next to each other (e.g., letter letter ),!36

78 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either! a character (e.g., 0, 1,...), or the empty string (i.e., ε ).! A non-terminal is NEVER defined in terms of itself, Non-Terminals: can be constructed using! not even indirectly! alternation: two REs or nonterminals separated by next to Thus, regular grammars cannot define recursive each other (e.g. letter digit ), statements. parentheses (in order to avoid ambiguity). concatenation: two REs or nonterminals next to each other (e.g., letter letter ), optional repetition: a RE or nonterminal followed by * (the Kleene star) to denote zero or more occurrences (e.g., digit* )!37

79 Example Let s create a regular grammar corresponding to the City, State ZIP-code line in mailing addresses.! E.g.: Chapel Hill, NC !!! Beverly Hills, CA 90210!38

80 Example Let s create a regular grammar corresponding to the City, State ZIP-code line in mailing addresses.! E.g.: Chapel Hill, NC !!! Beverly Hills, CA city_line city city, state_abbrev zip_code letter (letter letter)* state_abbrev AL AK AS AZ WY zip_code digit digit digit digit digit (extra ε ) extra - digit digit digit digit digit letter A B C ö!39

81 Example Let s create a regular grammar corresponding to the City, State ZIP-code line in mailing addresses.! E.g.: Chapel Hill, NC ! city_line city Creating a regular expression from a regular grammar is mechanical and easy.!! Beverly Hills, CA city, state_abbrev zip_code letter (letter letter)* state_abbrev AL AK AS AZ WY! Just take the most general non-terminal and keep substituting until you get down to terminals.! zip_code digit digit digit digit digit (extra ε ) The lack of recursion means that you won t extra get into an infinite loop. - digit digit digit digit digit letter A B C ö!40

82 Regular Sets and Finite Automata If a grammar G is a regular grammar, then the language L(G) is called a regular set.!41

83 Regular Sets and Finite Automata If a grammar G is a regular grammar, then the language L(G) is called a regular set. Equivalently, the language accepted by a regular expression is a regular set.!41

84 Regular Sets and Finite Automata If a grammar G is a regular grammar, then the language L(G) is called a regular set. Equivalently, the language accepted by a regular expression is a regular set. Fundamental equivalence: For every regular set L(G), there exists a deterministic finite automaton (DFA) that accepts a string S if and only if S L(G). (See COMP 455 for proof.)!41

85 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!42

86 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. Start State One or more final states. Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!43

87 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Intermediate State (neither start nor final) Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!44

88 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Final State (indicated by double border) Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!45

89 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. Transition One or more final states. Transitions: define how automaton switches between states (given an input symbol). Given an input of 1, if DFA is in state A, then transition to state B (and consume the input) , 1 A (Start) 1 B 1 C!46

90 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. Self Transition Given an input of 0, if DFA is in state A, then stay in state A (and consume the input). One or more final states. Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!47

91 DFA 101 Transitions must be unambiguous: Deterministic finite automaton:! For each state and each input, there exist only one Has a finite number of states. transition. This is what makes the DFA deterministic. Exactly one start state. One or more final states. Not a legal DFA! Transitions: define how automaton switches between states (given an input symbol). X 1 Z 1 Y 0 0 0, 1 A (Start) 1 B 1 C!48

92 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Multiple Transitions Given an input of either 0 or 1, if DFA is in state C, then stay in state C (and consume the input). Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!49

93 DFA String Processing 0 0 0, 1 A (Start) 1 B 1 C!50

94 DFA String Processing 0 0 0, 1 A (Start) 1 B 1 C String processing.! Initially in start state. Sequentially make transitions each character in input string.!50

95 DFA String Processing 0 0 0, 1 A (Start) 1 B 1 C String processing.! Initially in start state. Sequentially make transitions each character in input string. A DFA either accepts or rejects a string.! Reject if a character is encountered for which no transition is defined in the current state. Reject if end of input is reached and DFA is not in a final state. Accept if end of input is reached and DFA is in final state.!50

96 DFA Example 0 0 0, 1 A (Start) 1 B 1 C current state Input: 1 0 Initially, DFA is in the start State A.! current input character The first input character is 1.! This causes a transition to State B.!51

97 DFA Example 0 0 0, 1 A (Start) 1 B 1 C current state Input: 1 0 The next input character is 0.! current input character This causes a self transition in State B.!52

98 DFA Example 0 0 0, 1 A (Start) 1 B 1 C current state Input: 1 0 The end of the input is reached, current input character but the DFA is not in a final state:! the string 10 is rejected!!53

99 DFA-Equivalent Regular Expression 0 0 0, 1 A (Start) 1 B 1 C What s the RE such that the RE s language is exactly the set of strings that is accepted by this DFA?!54

100 DFA-Equivalent Regular Expression 0 0 0, 1 A (Start) 1 B 1 C What s the RE such that the RE s language is exactly the set of strings that is accepted by this DFA? 0*10*1(1 0)*!54

101 DFA-Equivalent Regular Expression 0 0 0, 1 A (Start) 1 B 1 C What s the RE such that the RE s language is exactly the set of strings that is accepted by this DFA? 0*10*1(1 0)*!55

102 Recognizing Tokens with a DFA 0 0 0, 1 A (Start) 1 B 1 C!56

103 Recognizing Tokens with a DFA 0 0 0, 1 A (Start) 1 B 1 C Table-driven implementation.! DFA s can be represented as a 2-dimensional table.!56

104 Recognizing Tokens with a DFA 0 0 0, 1 A (Start) 1 B 1 C Table-driven implementation.! DFA s can be represented as a 2-dimensional table. Current State On 0 On 1 Note A transition to A transition to B start B transition to B transition to C C transition to C transition to C final!56

105 Recognizing Tokens with a DFA currentstate = start state;! 0 while end of input not yet reached: {! c = get next input character;! if transitiontable[currentstate][c] null:! currentstate = transitiontable[currentstate][c]! else:! A (Start) reject input! }! if currentstate is final:! accept input! else:! Table-driven implementation.! reject input! 1 DFA s can be represented as a 2-dimensional 0 B 1 0, 1 C Current State On 0 On 1 Note A transition to A transition to B start B transition to B transition to C C transition to C transition to C final!57

106 Recognizing Tokens with a DFA currentstate = start state;! 0 while end of input not yet reached: {! c = get next input character;! if transitiontable[currentstate][c] null:! currentstate = transitiontable[currentstate][c]! else:! A (Start) reject input! }! if currentstate is final:! accept input! else:! Table-driven implementation.! reject input 1 DFA s can be represented as a 2-dimensional 0 B 1 0, 1 This accepts exactly one token in the input.! A real lexer must detect multiple successive tokens.! Current State On 0 On 1 Note A transition to A transition to B start This can be achieved by resetting to the start state.! But what happens if the suffix of one token is the prefix of another?! B transition to B transition to C (See Chapter 2 for a solution.) C transition to C transition to C final C!58

107 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA)!59

108 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA)!59

109 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA)!59

110 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA) No single-step algorithm: We first need to construct a Non-Deterministic Finite Automaton!59

111 Non-Deterministic Finite Automaton (NFA) X 1 1 ε Z Y V A legal NFA fragment.!60

112 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: X 1 1 ε Z Y V A legal NFA fragment.!60

113 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) X 1 1 ε Z Y V A legal NFA fragment.!60

114 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) X 1 1 ε Z Y V A legal NFA fragment.!60

115 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. X 1 1 ε Z Y V A legal NFA fragment.!60

116 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. Acceptance rule: X 1 1 ε Z Y V A legal NFA fragment.!60

117 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. Acceptance rule: Accepts an input string if there exists X a series of transitions such that the NFA is in a final state when the end of 1 1 ε input is reached. Z Y V A legal NFA fragment.!60

118 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. Acceptance rule: Accepts an input string if there exists X a series of transitions such that the NFA is in a final state when the end of 1 1 ε input is reached. Z Y V Inherent parallelism: all possible paths are explored simultaneously. A legal NFA fragment.!60

Lexical Analysis. Chapter 2

Lexical Analysis. Chapter 2 Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

Lecture 3: Lexical Analysis

Lecture 3: Lexical Analysis Lecture 3: Lexical Analysis COMP 524 Programming Language Concepts tephen Olivier January 2, 29 Based on notes by A. Block, N. Fisher, F. Hernandez-Campos, J. Prins and D. totts Goal of Lecture Character

More information

Lecture 4: Syntax Specification

Lecture 4: Syntax Specification The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token

More information

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages! CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

Non-deterministic Finite Automata (NFA)

Non-deterministic Finite Automata (NFA) Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis dministrivia Lexical nalysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Moving to 6 Evans on Wednesday HW available Pyth manual available on line. Please log into your account and electronically

More information

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept.. 22nd Jan 2018 CONTENTS :- 1. Role of lexical analysis 2.

More information

Introduction to Parsing. Lecture 8

Introduction to Parsing. Lecture 8 Introduction to Parsing Lecture 8 Adapted from slides by G. Necula Outline Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Languages and Automata Formal languages

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

Lexical Scanning COMP360

Lexical Scanning COMP360 Lexical Scanning COMP360 Captain, we re being scanned. Spock Reading Read sections 2.1 3.2 in the textbook Regular Expression and FSA Assignment A new assignment has been posted on Blackboard It is due

More information

Theory and Compiling COMP360

Theory and Compiling COMP360 Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

A simple syntax-directed

A simple syntax-directed Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

More information

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;} Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)

More information

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:

More information

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the

More information

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grammars Lecture 7 http://webwitch.dreamhost.com/grammar.girl/ Outline Scanner vs. parser Why regular expressions are not enough Grammars (context-free grammars) grammar rules derivations

More information

Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata #1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical

More information

Compiling Regular Expressions COMP360

Compiling Regular Expressions COMP360 Compiling Regular Expressions COMP360 Logic is the beginning of wisdom, not the end. Leonard Nimoy Compiler s Purpose The compiler converts the program source code into a form that can be executed by the

More information

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

More information

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Outline Limitations of regular languages Introduction to Parsing Parser overview Lecture 8 Adapted from slides by G. Necula Context-free grammars (CFG s) Derivations Languages and Automata Formal languages

More information

Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

More information

CS 403: Scanning and Parsing

CS 403: Scanning and Parsing CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

More information

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation Outline Introduction to Parsing Lecture 8 Adapted from slides by G. Necula and R. Bodik Limitations of regular languages Parser overview Context-free grammars (CG s) Derivations Syntax-Directed ranslation

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1 Introduction to Automata Theory BİL405 - Automata Theory and Formal Languages 1 Automata, Computability and Complexity Automata, Computability and Complexity are linked by the question: What are the fundamental

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Parsing II Top-down parsing. Comp 412

Parsing II Top-down parsing. Comp 412 COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

Introduction to Lexing and Parsing

Introduction to Lexing and Parsing Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?

More information

CS164: Midterm I. Fall 2003

CS164: Midterm I. Fall 2003 CS164: Midterm I Fall 2003 Please read all instructions (including these) carefully. Write your name, login, and circle the time of your section. Read each question carefully and think about what s being

More information

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis CSE450 Translation of Programming Languages Lecture 4: Syntax Analysis http://xkcd.com/859 Structure of a Today! Compiler Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator

More information

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens Syntactic Analysis CS45H: Programming Languages Lecture : Lexical Analysis Thomas Dillig Main Question: How to give structure to strings Analogy: Understanding an English sentence First, we separate a

More information

Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata #1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical

More information

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language. UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 01, 2007 Outline Recap

More information

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Computer Science Department Carlos III University of Madrid Leganés (Spain) OUTLINE Introduction: Definitions The role of the Lexical Analyzer Scanner Implementation

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

More information

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program

More information

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below. UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Lecture 15 Ana Bove May 23rd 2016 More on Turing machines; Summary of the course. Overview of today s lecture: Recap: PDA, TM Push-down

More information

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010 Compiler Design. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 1, 010 Contents In these slides we will see 1.Introduction, Concepts and Notations.Regular Expressions, Regular

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate

More information

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical Analysis Lookahead Ambiguity Specifying Lexers Regular Expressions

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 20, 2007 Outline Recap

More information

Compiler Construction

Compiler Construction Compiler Construction Lecture 2: Lexical Analysis I (Introduction) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/

More information

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2 Banner number: Name: Midterm Exam CSCI 336: Principles of Programming Languages February 2, 23 Group Group 2 Group 3 Question. Question 2. Question 3. Question.2 Question 2.2 Question 3.2 Question.3 Question

More information

CT32 COMPUTER NETWORKS DEC 2015

CT32 COMPUTER NETWORKS DEC 2015 Q.2 a. Using the principle of mathematical induction, prove that (10 (2n-1) +1) is divisible by 11 for all n N (8) Let P(n): (10 (2n-1) +1) is divisible by 11 For n = 1, the given expression becomes (10

More information

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages. Lecture 3 CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

More information

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Introduction: Definitions Lexical analysis or scanning: To read from left-to-right a source

More information

UVa ID: NAME (print): CS 4501 LDI Midterm 1

UVa ID: NAME (print): CS 4501 LDI Midterm 1 CS 4501 LDI Midterm 1 Write your name and UVa ID on the exam. Pledge the exam before turning it in. There are nine (9) pages in this exam (including this one) and six (6) questions, each with multiple

More information

Compiler Construction

Compiler Construction Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release

More information

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going

More information

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS 2010: Compilers Lexical Analysis: Finite State Automata Dr. Licia Capra UCL/CS REVIEW: REGULAR EXPRESSIONS a Character in A Empty string R S Alternation (either R or S) RS Concatenation (R followed by

More information

This book is licensed under a Creative Commons Attribution 3.0 License

This book is licensed under a Creative Commons Attribution 3.0 License 6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars

More information

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2007

More information

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler.

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler. Name: Midterm Exam PID: This is a closed-book exam; you may not use any tools besides a pen. You have 75 minutes to answer all questions. There are a total of 75 points available. Please write legibly;

More information

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

ECS 120 Lesson 7 Regular Expressions, Pt. 1

ECS 120 Lesson 7 Regular Expressions, Pt. 1 ECS 120 Lesson 7 Regular Expressions, Pt. 1 Oliver Kreylos Friday, April 13th, 2001 1 Outline Thus far, we have been discussing one way to specify a (regular) language: Giving a machine that reads a word

More information

Figure 2.1: Role of Lexical Analyzer

Figure 2.1: Role of Lexical Analyzer Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20 COL728 Minor1 Exam Compiler Design Sem II, 2016-17 Answer all 5 questions Max. Marks: 20 1. Short questions a. Show that every regular language is also a context-free language [2] We know that every regular

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

Front End: Lexical Analysis. The Structure of a Compiler

Front End: Lexical Analysis. The Structure of a Compiler Front End: Lexical Analysis The Structure of a Compiler Constructing a Lexical Analyser By hand: Identify lexemes in input and return tokens Automatically: Lexical-Analyser generator We will learn about

More information

Week 2: Syntax Specification, Grammars

Week 2: Syntax Specification, Grammars CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information