Week 2: Syntax Specification, Grammars

Size: px
Start display at page:

Download "Week 2: Syntax Specification, Grammars"

Transcription

1 CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62

2 Words and Sentences Recall Language = { Sentences } Sentence = Sequence of words Grammar = Rules for constructing sentences PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 2/ 62

3 Words and Sentences Recall Language = { Sentences } Sentence = Sequence of words Grammar = Rules for constructing sentences Questions: 1. What information about words does a grammar need? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 2/ 62

4 Words and Sentences Recall Language = { Sentences } Sentence = Sequence of words Grammar = Rules for constructing sentences Questions: 1. What information about words does a grammar need? 2. Do we need rules for specifying words? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 2/ 62

5 Words and Sentences Let s look at an example. Grammar: Sentence = NounPhrase VerbPhrase NounPhrase = Noun Article Noun VerbPhrase = Verb Verb PrepPrase PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 3/ 62

6 Words and Sentences Let s look at an example. Grammar: Sentences: Sentence = NounPhrase VerbPhrase NounPhrase = Noun Article Noun VerbPhrase = Verb Verb PrepPrase (1) Dogs run. Birds fly. Fish swim. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 3/ 62

7 Words and Sentences Let s look at an example. Grammar: Sentence = NounPhrase VerbPhrase NounPhrase = Noun Article Noun VerbPhrase = Verb Verb PrepPrase Sentences: (1) Dogs run. Birds fly. Fish swim. (2) Fish run. Dogs fly. Birds swim. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 3/ 62

8 Words and Sentences Let s look at an example. Grammar: Sentence = NounPhrase VerbPhrase NounPhrase = Noun Article Noun VerbPhrase = Verb Verb PrepPrase Sentences: (1) Dogs run. Birds fly. Fish swim. (2) Fish run. Dogs fly. Birds swim. Observation: Both group of sentences are syntactically valid. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 3/ 62

9 Words and Sentences Let s look at an example. Grammar: Sentence = NounPhrase VerbPhrase NounPhrase = Noun Article Noun VerbPhrase = Verb Verb PrepPrase Sentences: (1) Dogs run. Birds fly. Fish swim. (2) Fish run. Dogs fly. Birds swim. Observation: Both group of sentences are syntactically valid. Grammars do not deal with individual words, they deal with categories of words. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 3/ 62

10 Words and Sentences For English, Categories of words = Parts of Speech, i.e. Nouns, Pronouns, Verbs, Adjectives, Adverbs, Conjunctions, Prepositions, and Interjections PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 4/ 62

11 Words and Sentences For English, Categories of words = Parts of Speech, i.e. Nouns, Pronouns, Verbs, Adjectives, Adverbs, Conjunctions, Prepositions, and Interjections For PLs, Categories of words = Token Types, e.g. Keywords, Identifiers, Literals, Operators, etc. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 4/ 62

12 Tokens A program is built from tokens. /* A toy C program */ int main(void) { int a, b, s; printf("enter two integers: "); scanf("%d %d", &a, &b); s = a + 2 * b; printf("%d + 2*%d = %d\n", a, b, s); } PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 5/ 62

13 Tokens A program is built from tokens. /* A toy C program */ int main(void) { int a, b, s; printf("enter two integers: "); scanf("%d %d", &a, &b); s = a + 2 * b; printf("%d + 2*%d = %d\n", a, b, s); } Token = Smallest program entity that is meaningful to the language PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 6/ 62

14 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

15 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

16 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar Identifiers main, a, b, s,... A potentially infinite set of user-defined names PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

17 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar Identifiers main, a, b, s,... A potentially infinite set of user-defined names All IDs form a single token type PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

18 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar Identifiers main, a, b, s,... A potentially infinite set of user-defined names All IDs form a single token type Operators and Delimiters (, ), {, },,, ;, =, +, *,... PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

19 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar Identifiers main, a, b, s,... A potentially infinite set of user-defined names All IDs form a single token type Operators and Delimiters (, ), {, },,, ;, =, +, *,... Each operator/delimiter is typically a separate token type PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

20 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar Identifiers main, a, b, s,... A potentially infinite set of user-defined names All IDs form a single token type Operators and Delimiters (, ), {, },,, ;, =, +, *,... Each operator/delimiter is typically a separate token type Literals Strings Enter two integers:, %d %d,... Integers 2, 365,... Doubles 0.5, 3.14, PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

21 Common Token Types Keywords int, void, printf, scanf,... A finite set of language-defined names Each keyword is typically given a token type of its own, since they are not interchangeable in a grammar Identifiers main, a, b, s,... A potentially infinite set of user-defined names All IDs form a single token type Operators and Delimiters (, ), {, },,, ;, =, +, *,... Each operator/delimiter is typically a separate token type Literals Strings Enter two integers:, %d %d,... Integers 2, 365,... Doubles 0.5, 3.14, Each literal group form a single token type PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 7/ 62

22 Lexemes Lexeme = A particular, meaningful sequence of characters from a program PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 8/ 62

23 Lexemes Lexeme = A particular, meaningful sequence of characters from a program Each token is associated with a lexeme. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 8/ 62

24 Lexemes Lexeme = A particular, meaningful sequence of characters from a program Each token is associated with a lexeme. A token type may correspond to a single lexeme Such as the cases of individual keywords, operators, and delimiters In these cases, the token type is often named the same as the lexeme PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 8/ 62

25 Lexemes Lexeme = A particular, meaningful sequence of characters from a program Each token is associated with a lexeme. A token type may correspond to a single lexeme Such as the cases of individual keywords, operators, and delimiters In these cases, the token type is often named the same as the lexeme Or it may correspond to a collection of lexemes Such as the cases of IDs and literals In these cases, the collection of lexemes typically share a common characteristics, e.g. integer literals are all sequence of digits PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 8/ 62

26 Non-Token Lexemes Whitespace typically have no significance in the language other than to separate tokens: the space, tab, newline character,... PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 9/ 62

27 Non-Token Lexemes Whitespace typically have no significance in the language other than to separate tokens: the space, tab, newline character,... Comments have no effect on program s semantics. various forms depending on language PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 9/ 62

28 Non-Token Lexemes Whitespace typically have no significance in the language other than to separate tokens: the space, tab, newline character,... Comments have no effect on program s semantics. various forms depending on language Illegal Characters characters that do no belong to the language s alphabet. # ^ $ PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 9/ 62

29 Non-Token Lexemes Whitespace typically have no significance in the language other than to separate tokens: the space, tab, newline character,... Comments have no effect on program s semantics. various forms depending on language Illegal Characters characters that do no belong to the language s alphabet. # ^ $ Ill-Formed Tokens Not all legal characters form valid tokens. 0789, 0xAG, "abc<eof> PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 9/ 62

30 Non-Token Lexemes Whitespace typically have no significance in the language other than to separate tokens: the space, tab, newline character,... Comments have no effect on program s semantics. various forms depending on language Illegal Characters characters that do no belong to the language s alphabet. # ^ $ Ill-Formed Tokens Not all legal characters form valid tokens. 0789, 0xAG, "abc<eof> Whitespace and comments are filtered out during lexical analysis; while illegal chars and ill-formed tokens are lexical errors, which should caught and reported. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 9/ 62

31 Common Comment Types Single line: // C, C++, Java -- Haskell ; Lisp, Scheme C Fortran # csh, bash, make PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 10 / 62

32 Common Comment Types Single line: // C, C++, Java -- Haskell ; Lisp, Scheme C Fortran # csh, bash, make Non-nesting block: /* C, C++, Java */ PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 10 / 62

33 Common Comment Types Single line: // C, C++, Java -- Haskell ; Lisp, Scheme C Fortran # csh, bash, make Non-nesting block: /* C, C++, Java */ Nesting block: (* Pascal, ML (* allow nesting *) *) {- Haskell {- as well -} -} PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 10 / 62

34 Token Representation A token is typically represented as an object with attributes: PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 11 / 62

35 Token Representation A token is typically represented as an object with attributes: Type the key info used by a grammar PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 11 / 62

36 Token Representation A token is typically represented as an object with attributes: Type the key info used by a grammar Lexeme to distinguish individual member of a group PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 11 / 62

37 Token Representation A token is typically represented as an object with attributes: Type the key info used by a grammar Lexeme to distinguish individual member of a group Value for numerical tokens It is common for a compiler to convert a numerical token s lexeme to a value, and store it in the token object (in addition to or instead ofthe lexeme). PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 11 / 62

38 Token Representation A token is typically represented as an object with attributes: Type the key info used by a grammar Lexeme to distinguish individual member of a group Value for numerical tokens It is common for a compiler to convert a numerical token s lexeme to a value, and store it in the token object (in addition to or instead ofthe lexeme). Position line and column numbers (for diagnostic use) PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 11 / 62

39 Rules for Specifying Words? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

40 Rules for Specifying Words? For English, the vocabulary ( 172,000 words) is fixed There is no need to create new words Given a word, it is not hard to find the part of speech(s) it belongs to Just look up a dictionary PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

41 Rules for Specifying Words? For English, the vocabulary ( 172,000 words) is fixed There is no need to create new words Given a word, it is not hard to find the part of speech(s) it belongs to Just look up a dictionary For PLs, the vocabulary is not fixed PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

42 Rules for Specifying Words? For English, the vocabulary ( 172,000 words) is fixed There is no need to create new words Given a word, it is not hard to find the part of speech(s) it belongs to Just look up a dictionary For PLs, the vocabulary is not fixed Programmer needs to know how to create new tokens PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

43 Rules for Specifying Words? For English, the vocabulary ( 172,000 words) is fixed There is no need to create new words Given a word, it is not hard to find the part of speech(s) it belongs to Just look up a dictionary For PLs, the vocabulary is not fixed Programmer needs to know how to create new tokens Compiler needs to know how to recognize tokens (and non-tokens) without a dictionary PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

44 Rules for Specifying Words? For English, the vocabulary ( 172,000 words) is fixed There is no need to create new words Given a word, it is not hard to find the part of speech(s) it belongs to Just look up a dictionary For PLs, the vocabulary is not fixed Programmer needs to know how to create new tokens Compiler needs to know how to recognize tokens (and non-tokens) without a dictionary the information must come from the lexemes themselves PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

45 Rules for Specifying Words? For English, the vocabulary ( 172,000 words) is fixed There is no need to create new words Given a word, it is not hard to find the part of speech(s) it belongs to Just look up a dictionary For PLs, the vocabulary is not fixed Programmer needs to know how to create new tokens Compiler needs to know how to recognize tokens (and non-tokens) without a dictionary the information must come from the lexemes themselves We need a mechanism to precisely define tokens. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 12 / 62

46 Patterns Pattern = A description of lexical features for a set of character sequences PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 13 / 62

47 Patterns Pattern = A description of lexical features for a set of character sequences Examples: Sequence of digits 1234, 503, Sequence within quotes "1234", "abc", Sequence starts with letter a a1234, abc, PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 13 / 62

48 Patterns Pattern = A description of lexical features for a set of character sequences Examples: Sequence of digits 1234, 503, Sequence within quotes "1234", "abc", Sequence starts with letter a a1234, abc, For a PL, lexemes belonging to the same token type typically share a common pattern. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 13 / 62

49 Patterns Pattern = A description of lexical features for a set of character sequences Examples: Sequence of digits 1234, 503, Sequence within quotes "1234", "abc", Sequence starts with letter a a1234, abc, For a PL, lexemes belonging to the same token type typically share a common pattern. The preferred tool for precisely specifying patterns is regular expression. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 13 / 62

50 Review: Regular Expressions RE defines strings over a finite alphabet with three operations: alternate concatenate (omitable) repeat0ormoretimes PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 14 / 62

51 Review: Regular Expressions RE defines strings over a finite alphabet with three operations: alternate concatenate (omitable) repeat0ormoretimes Additional operations are available: + repeat1ormoretimes complement w.r.t. an alphabet (or an expression ) difference... (and more)... PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 14 / 62

52 Review: Regular Expressions RE defines strings over a finite alphabet with three operations: alternate concatenate (omitable) repeat0ormoretimes Additional operations are available: + repeat1ormoretimes complement w.r.t. an alphabet (or an expression ) difference... (and more)... Examples: abc, ab c*, (a b)(c d), ab[~ab]*, a*b*c*-aaa Complement of expression is typically not available in RE tools due to its complexity. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 14 / 62

53 Specifying Token Patterns with RE Tokens with single lexeme, such as keywords, operators, and delimiters, are easy: Int = "int" Void = "void" PLUS = "+" MINUS = "-"... PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 15 / 62

54 Specifying Token Patterns with RE Tokens with single lexeme, such as keywords, operators, and delimiters, are easy: Int = "int" Void = "void" PLUS = "+" MINUS = "-"... The ID pattern has one small issue: an ID cannot have the same spelling as a keyword. We can use the difference operation to solve this problem: Digit = [0-9] Letter = [A-Z a-z] KWD = Int Void... ID = Letter (Letter Digit "_")* - KWD Compiler typically uses a different approach. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 15 / 62

55 Specifying Token Patterns with RE String literal is straightforward: StrLit = "\"" (~["\"","\n"])+ "\"" PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 16 / 62

56 Specifying Token Patterns with RE String literal is straightforward: StrLit = "\"" (~["\"","\n"])+ "\"" Numerical literals can be a little tricky: For integers, there can be multiple forms: decimal, octal, and hex. For floating-point numbers, both the integral part and the fractional part can be empty, e.g and.123, yettheycan tbeemptyatthe same time. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 16 / 62

57 Specifying Token Patterns with RE String literal is straightforward: StrLit = "\"" (~["\"","\n"])+ "\"" Numerical literals can be a little tricky: For integers, there can be multiple forms: decimal, octal, and hex. For floating-point numbers, both the integral part and the fractional part can be empty, e.g and.123, yettheycan tbeemptyatthe same time. OctLit = 0([0-7])+ HexLit = "0x" ([0-9a-f])+ DecLit = Digit+ - OctLit IntLit = DecLit OctLit HexLit RealLit = Digit+ "." Digit* Digit* "." Digit+ PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 16 / 62

58 Is RE Perfect for Token Specification? Consider C s non-nesting block comments: A block comment begins with the characters /* and ends with the first subsequent occurrence of the characters */. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 17 / 62

59 Is RE Perfect for Token Specification? Consider C s non-nesting block comments: A block comment begins with the characters /* and ends with the first subsequent occurrence of the characters */. The following is a direct encoding of the definition in RE: BlockComment = "/*" (~("*/" <EOF>))* "*/"> PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 17 / 62

60 Is RE Perfect for Token Specification? Consider C s non-nesting block comments: A block comment begins with the characters /* and ends with the first subsequent occurrence of the characters */. The following is a direct encoding of the definition in RE: BlockComment = "/*" (~("*/" <EOF>))* "*/"> It uses the extended complement operator. While theoretically correct, most RE processing systems don t allow it. (Why?) PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 17 / 62

61 Is RE Perfect for Token Specification? Consider C s non-nesting block comments: A block comment begins with the characters /* and ends with the first subsequent occurrence of the characters */. The following is a direct encoding of the definition in RE: BlockComment = "/*" (~("*/" <EOF>))* "*/"> It uses the extended complement operator. While theoretically correct, most RE processing systems don t allow it. (Why?) It is harder to define BlockComment without using this operator. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 17 / 62

62 Is RE Perfect for Token Specification? Consider C s non-nesting block comments: A block comment begins with the characters /* and ends with the first subsequent occurrence of the characters */. The following is a direct encoding of the definition in RE: BlockComment = "/*" (~("*/" <EOF>))* "*/"> It uses the extended complement operator. While theoretically correct, most RE processing systems don t allow it. (Why?) It is harder to define BlockComment without using this operator. Now consider the pattern of ML s nesting block comments: Comments in Standard ML begin with (* and end with *). Comments can be nested which means that all (* tags must end with a *) tag. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 17 / 62

63 Is RE Perfect for Token Specification? Consider C s non-nesting block comments: A block comment begins with the characters /* and ends with the first subsequent occurrence of the characters */. The following is a direct encoding of the definition in RE: BlockComment = "/*" (~("*/" <EOF>))* "*/"> It uses the extended complement operator. While theoretically correct, most RE processing systems don t allow it. (Why?) It is harder to define BlockComment without using this operator. Now consider the pattern of ML s nesting block comments: Comments in Standard ML begin with (* and end with *). Comments can be nested which means that all (* tags must end with a *) tag. Question: Can you write a regular expression for it? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 17 / 62

64 Review: Finite Automata For every regular set L, there exists a finite automaton that accepts L. Example: aa bb a ϵ 1 2 start 0 ϵ b 3 4 a b Circles represent states; double-circles represent final states. Labeled edges represent transitions. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 18 / 62

65 Review: Finite Automata For every regular set L, there exists a finite automaton that accepts L. Example: aa bb a ϵ 1 2 start 0 ϵ b 3 4 a b Circles represent states; double-circles represent final states. Labeled edges represent transitions. A finite automaton accepts string x if there is a path from the start state to a final state labeled by characters of x. A finite automaton accepts language L if it accepts exactly the strings of L. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 18 / 62

66 NFA vs. DFA RE: aa bb PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 19 / 62

67 NFA vs. DFA RE: aa bb NFA: start 0 ϵ ϵ 1 a 2 3 b 4 a b Transitions may be labeled by ϵ Multiple transitions from the same state may be labeled by the same symbol PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 19 / 62

68 NFA vs. DFA RE: aa bb NFA: start 0 ϵ ϵ 1 a 2 3 b 4 a b Transitions may be labeled by ϵ Multiple transitions from the same state may be labeled by the same symbol DFA: start 0 a b 1 2 a b No ϵ-transitions Each state has at most one transition labeled by the same symbol PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 19 / 62

69 Transition Graph vs. Table Form RE: (a b) abb NFA: a start a b b b State ϵ a b 0 0, DFA: b start 2 b a 0 a 1 b a b 3 b 4 a State a b a PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 20 / 62

70 Using FA to Recognize Tokens Assume tokens are defined by RE. Steps: 1. Construct an NFA based on the token REs. 2. Convert the NFA to a DFA. 3. (Optional) Minimize the DFA s states. 4. Implement the DFA as a lexer program. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 21 / 62

71 Using FA to Recognize Tokens Assume tokens are defined by RE. Steps: 1. Construct an NFA based on the token REs. 2. Convert the NFA to a DFA. 3. (Optional) Minimize the DFA s states. 4. Implement the DFA as a lexer program. We ll use (a b) abb as an example. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 21 / 62

72 Step 1. RE NFA ϵ a 2 3 ϵ start a b: 1 ϵ b 4 5 ϵ 6 ϵ ϵ a 2 3 ϵ (a b) start ϵ : 0 1 ϵ b 4 5 ϵ ϵ 6 7 ϵ start a b b abb: ϵ (a b) start ϵ abb: 0 1 ϵ ϵ 2 a 3 4 b 5 ϵ ϵ ϵ 6 7 ϵ a b b ϵ PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 22 / 62

73 Step 2. NFA DFA NFA: ϵ ϵ a 2 3 ϵ start ϵ 0 1 ϵ b 4 5 ϵ ϵ 6 7 a b b ϵ Find the ϵ-closure for the start state 0: start Each state in the DFA corresponds to a set of states in the NFA. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 23 / 62

74 Step 2. NFA DFA (cont.) NFA: ϵ ϵ a 2 3 ϵ start ϵ 0 1 ϵ b 4 5 ϵ ϵ 6 7 a b b ϵ Find the target states corresponding to symbols a and b: b start a For each symbol, the DFA s transition simulates all possible transitions in the NFA. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 24 / 62

75 Step 2. NFA DFA (cont.) NFA: ϵ ϵ a 2 3 ϵ start ϵ 0 1 ϵ b 4 5 ϵ ϵ 6 7 a b b ϵ Repeat the process for the new states: b b a start a b a a PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 25 / 62

76 Step 2. NFA DFA (cont.) NFA: ϵ ϵ a 2 3 ϵ start ϵ 0 1 ϵ b 4 5 ϵ ϵ 6 7 a b b ϵ Finally: b b a b start a b a b a a PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 26 / 62

77 Step 2. NFA DFA (cont.) The final DFA: b start 2 b a 0 a 1 b a b 3 b 4 a State a b a PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 27 / 62

78 Step 4. DFA Lexer Program b start 2 b a 0 a 1 b a b 3 b 4 a State a b a A DFA implementing token REs can be easily turned into a program that answers yes or no on whether a given input sequence represents a token: Traverse the DFA with the input and see if it reaches a final state. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 28 / 62

79 Step 4. DFA Lexer Program b start 2 b a 0 a 1 b a b 3 b 4 a State a b a A DFA implementing token REs can be easily turned into a program that answers yes or no on whether a given input sequence represents a token: Traverse the DFA with the input and see if it reaches a final state. Examples: a b b yes, since it reaches a final state. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 28 / 62

80 Step 4. DFA Lexer Program b start 2 b a 0 a 1 b a b 3 b 4 a State a b a A DFA implementing token REs can be easily turned into a program that answers yes or no on whether a given input sequence represents a token: Traverse the DFA with the input and see if it reaches a final state. Examples: a b b yes, since it reaches a final state. b a a no, since it does not reach a final state. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 28 / 62

81 Step 4. DFA Lexer Program (cont.) b start 2 b a 0 a 1 b a b 3 b 4 a State a b a However, if the input sequence may represent multiple tokens, there is a problem when should the program report yes? Should it report as soon as the transitions reach a final state? Or should it keep on reading more input before reporting? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 29 / 62

82 Step 4. DFA Lexer Program (cont.) b start 2 b a 0 a 1 b a b 3 b 4 a State a b a However, if the input sequence may represent multiple tokens, there is a problem when should the program report yes? Should it report as soon as the transitions reach a final state? Or should it keep on reading more input before reporting? Examples: a b b a b b one or two tokens? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 29 / 62

83 Step 4. DFA Lexer Program (cont.) b start 2 b a 0 a 1 b a b 3 b 4 a State a b a However, if the input sequence may represent multiple tokens, there is a problem when should the program report yes? Should it report as soon as the transitions reach a final state? Or should it keep on reading more input before reporting? Examples: a b b a b b one or two tokens? a b b a b one token plus some garbage or a single piece of garbage? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 29 / 62

84 Longest Match/Maximal Munch Rule A widely used convention for token recognition. If multiple lexemes can be formed from the same input point, the longest one will be chosen. i n t oneid token PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 30 / 62

85 Longest Match/Maximal Munch Rule A widely used convention for token recognition. If multiple lexemes can be formed from the same input point, the longest one will be chosen. i n t oneid token A white space (including tab, newline, and comment, etc.) automatically terminates a lexeme. i n t oneid token followed by one INTLIT token PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 30 / 62

86 Longest Match/Maximal Munch Rule A widely used convention for token recognition. If multiple lexemes can be formed from the same input point, the longest one will be chosen. i n t oneid token A white space (including tab, newline, and comment, etc.) automatically terminates a lexeme. i n t oneid token followed by one INTLIT token (Priority/First-Match Rule) If the longest lexeme can be matched with multiple tokens, the token whose definition appears first in the token specification file has the priority. b e g i n keywordorid? PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 30 / 62

87 Longest Match/Maximal Munch Rule A widely used convention for token recognition. If multiple lexemes can be formed from the same input point, the longest one will be chosen. i n t oneid token A white space (including tab, newline, and comment, etc.) automatically terminates a lexeme. i n t oneid token followed by one INTLIT token (Priority/First-Match Rule) If the longest lexeme can be matched with multiple tokens, the token whose definition appears first in the token specification file has the priority. b e g i n keywordorid? (Corollary) In a token definition file, always define keywords before ID. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 30 / 62

88 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y x y x y x y x y x y PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

89 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y ID(x) "++" "+" ID(y) x y x y x y x y x y PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

90 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y x y ID(x) "++" "+" ID(y) ID(x) "++" "++" ID(y) x y x y x y x y PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

91 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y x y x y ID(x) "++" "+" ID(y) ID(x) "++" "++" ID(y) ID(x) "++" "++" "+" ID(y) x y x y x y PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

92 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y x y x y ID(x) "++" "+" ID(y) ID(x) "++" "++" ID(y) ID(x) "++" "++" "+" ID(y) x y ID(x) "++" "++" "+" ID(y) x y x y PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

93 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y x y x y ID(x) "++" "+" ID(y) ID(x) "++" "++" ID(y) ID(x) "++" "++" "+" ID(y) x y ID(x) "++" "++" "+" ID(y) x y ID(x) "++" "+" "++" ID(y) x y PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

94 Longest Match/Maximal Munch Rule Exercises Show token sequence for each of the following Java input: x y x y x y ID(x) "++" "+" ID(y) ID(x) "++" "++" ID(y) ID(x) "++" "++" "+" ID(y) x y ID(x) "++" "++" "+" ID(y) x y ID(x) "++" "+" "++" ID(y) x y ID(x) "+" "+" "+" "++" ID(y) PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 31 / 62

95 Dealing with Difficult Patterns An alternative way of finding a regular expression for a pattern is to find a finite automata first, then convert the FA to an RE. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 32 / 62

96 Dealing with Difficult Patterns An alternative way of finding a regular expression for a pattern is to find a finite automata first, then convert the FA to an RE. Example: Find a regular expression for the set of sequences of digits whose values are multiples of 3. E.g. 3, 060, are all in the set, while 11, 22, 2345 are not. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 32 / 62

97 Dealing with Difficult Patterns An alternative way of finding a regular expression for a pattern is to find a finite automata first, then convert the FA to an RE. Example: Find a regular expression for the set of sequences of digits whose values are multiples of 3. E.g. 3, 060, are all in the set, while 11, 22, 2345 are not. Analysis: On the surface, the set seems to be defined through a semantic property, i.e., value of sequence. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 32 / 62

98 Dealing with Difficult Patterns An alternative way of finding a regular expression for a pattern is to find a finite automata first, then convert the FA to an RE. Example: Find a regular expression for the set of sequences of digits whose values are multiples of 3. E.g. 3, 060, are all in the set, while 11, 22, 2345 are not. Analysis: On the surface, the set seems to be defined through a semantic property, i.e., value of sequence. However, mathematical knowledge tells us that this particular semantic property can be mapped to an equivalent syntactical property: A sequence of digits whose value is a multiple of 3 The sum of the sequence s digits is a multiple of 3 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 32 / 62

99 Dealing with Difficult Patterns (cont.) Even with the new insight, it is still not easy to come up with a regular expression. (It s a good exercise to try.) PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 33 / 62

100 Dealing with Difficult Patterns (cont.) Even with the new insight, it is still not easy to come up with a regular expression. (It s a good exercise to try.) Now, let s think from a finite automaton s point of view. With respect to the property of sequence s value being a multiples of 3, an arbitrary sequence of digits can only be in one of three states: S1. It s value is a multiple of 3 (so is the sum of its digits) S2. It s value is a multiple of 3 plus 1 S3. It s value is a multiple of 3 plus 2 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 33 / 62

101 Dealing with Difficult Patterns (cont.) With this info, we can define the following DFA for recognizing the set: 0,3,6,9 0,3,6,9 start 0,3,6, ,5,8 2 2,5,8 1,4,7 1,4,7 2,5,8 0,3,6,9 3 1,4,7 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 34 / 62

102 Dealing with Difficult Patterns (cont.) With this info, we can define the following DFA for recognizing the set: 0,3,6,9 0,3,6,9 start 0,3,6, ,5,8 2 2,5,8 1,4,7 1,4,7 2,5,8 0,3,6,9 3 1,4,7 If the goal is to implement a lexer, we can go directly from here. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 34 / 62

103 Dealing with Difficult Patterns (cont.) With this info, we can define the following DFA for recognizing the set: 0,3,6,9 0,3,6,9 start 0,3,6, ,5,8 2 2,5,8 1,4,7 1,4,7 2,5,8 0,3,6,9 3 1,4,7 If the goal is to implement a lexer, we can go directly from here. If the goal is to find a RE, we can follow an conversion algorithm. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 34 / 62

104 Converting a DFA to a RE Method: The state removal method. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 35 / 62

105 Converting a DFA to a RE Method: The state removal method. Step 1. (Preparation) If the start state has an incoming edge from another state, create a new start state with an ϵ transition connecting to the original start state. If there are multiple final states, or a final state with an out-going edge, create a new final state with ϵ transitions connecting from all the original final states. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 35 / 62

106 Converting a DFA to a RE Method: The state removal method. Step 1. (Preparation) If the start state has an incoming edge from another state, create a new start state with an ϵ transition connecting to the original start state. If there are multiple final states, or a final state with an out-going edge, create a new final state with ϵ transitions connecting from all the original final states. Step 2. (Elimination) Pick a state. For every pair of incoming and outgoing edges (excluding self-loop), consolidate the labels on the path to a single label, and create a direct edge with that label. Eliminate the state. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 35 / 62

107 Converting a DFA to a RE Method: The state removal method. Step 1. (Preparation) If the start state has an incoming edge from another state, create a new start state with an ϵ transition connecting to the original start state. If there are multiple final states, or a final state with an out-going edge, create a new final state with ϵ transitions connecting from all the original final states. Step 2. (Elimination) Pick a state. For every pair of incoming and outgoing edges (excluding self-loop), consolidate the labels on the path to a single label, and create a direct edge with that label. Eliminate the state. Repeat this step until done. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 35 / 62

108 Converting a DFA to a RE 1. Preparation. L 0 =0,3,6,9 L 1 =1,4,7 L 2 =2,5,8 L 0 L 2 2 L 2 L 1 L 1 start L 0 0 L 1 2 L 0 L 0 3 L 1 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 36 / 62

109 Converting a DFA to a RE 1. Preparation. L 0 =0,3,6,9 L 1 =1,4,7 L 2 =2,5,8 L 0 L 2 2 L 2 L 1 L 1 start L 0 0 L 1 2 L 0 L 0 3 L 1 L 0 1 L 0 L 2 2 L 2 ϵ L 1 L 1 start L 0 0 L 1 2 L 0 3 L 1 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 36 / 62

110 Converting a DFA to a RE 2. Eliminating State 2. L 0 1 L 0 L 2 2 L 2 ϵ L 1 L 1 start L 0 0 L 1 2 L 0 3 L 1 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 37 / 62

111 Converting a DFA to a RE 2. Eliminating State 2. L 0 1 L 0 L 2 2 L 2 ϵ L 1 L 1 start L 0 0 L 1 2 L 0 3 L 1 1 L 0 +L 1 L 0 *L 2 L 0 +L 2 L 0 *L 1 ϵ start L 0 0 L 1 2 +L 1 L 0 *L 1 3 L 1 +L 2 L 0 *L 2 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 37 / 62

112 Converting a DFA to a RE 3. Eliminating State 3. 1 L 0 +L 1 L 0 *L 2 L 0 +L 2 L 0 *L 1 ϵ start L 0 0 L 1 2 +L 1 L 0 *L 1 3 L 1 +L 2 L 0 *L 2 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 38 / 62

113 Converting a DFA to a RE 3. Eliminating State 3. 1 L 0 +L 1 L 0 *L 2 L 0 +L 2 L 0 *L 1 ϵ start L 0 0 L 1 2 +L 1 L 0 *L 1 3 L 1 +L 2 L 0 *L 2 1 L 0 +L 1 L 0 *L 2 +(L 2 +L 1 L 0 *L 1 )(L 0 +L 2 L 0 *L 1 )*(L 1 +L 2 L 0 *L 2 ) ϵ start L PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 38 / 62

114 Converting a DFA to a RE 4. Eliminating State 1. 1 L 0 +L 1 L 0 *L 2 +(L 2 +L 1 L 0 *L 1 )(L 0 +L 2 L 0 *L 1 )*(L 1 +L 2 L 0 *L 2 ) ϵ start L PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 39 / 62

115 Converting a DFA to a RE 4. Eliminating State 1. 1 L 0 +L 1 L 0 *L 2 +(L 2 +L 1 L 0 *L 1 )(L 0 +L 2 L 0 *L 1 )*(L 1 +L 2 L 0 *L 2 ) ϵ start L start 0 L 0 (L 0 +L 1 L 0 *L 2 +(L 2 +L 1 L 0 *L 1 )(L 0 +L 2 L 0 *L 1 )*(L 1 +L 2 L 0 *L 2 ))* L 0 =0,3,6,9 L 1 =1,4,7 L 2 =2,5,8 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 39 / 62

116 Context-Free Grammars (BNF Notation) BNF = Backus-Naur Form Common convention: Nonterminals begin with upper-case letters. Literal terminals are quoted. Other terminals begin with lower-case letters. Program begin StmtList end StmtList Statement StmtList StmtList Statement Statement Assignment Statement ReadStmt Statement WriteStmt Assignment id := Expression ; ReadStmt read ( IdList ) ; WriteStmt write ( ExprList ) ; IdList id, IdList IdList id ExprList Expression, ExprList ExprList Expression Expression Primary + Expression Expression Primary Expression Expression Primary Primary ( Expression ) Primary id Primary integer PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 40 / 62

117 Context-Free Grammars (EBNF Notation) EBNF = Extended BNF Common convention: for selection (,) for grouping [ ] for an optional construct {} for zero or more repetitions Program begin StmtList end StmtList Statement {Statement} Statement Assignment ReadStmt WriteStmt Assignment id := Expression ; ReadStmt read ( IdList ) ; WriteStmt write ( ExprList ) ; IdList id {, id } ExprList Expression {, Expression} Expression Primary {( + ) Primary} Primary ( Expression ) id integer PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 41 / 62

118 Equivalence Between EBNF and BNF Extended BNF does not have extra power over BNF. In fact, any grammar in EBNF can be transformed into one in proper BNF: A α [ β ] γ A αbγ B β B ϵ A α {β} γ A αbγ B βb B ϵ PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 42 / 62

119 RE vs. CFG Every set that can be described by a regular expression can also be described by a context-free grammar. Example: The RE (a b) abb can be described by the CFG a A 0 aa 0 ba 0 aa 1 A 1 ba 2 A 2 ba 3 A 3 ϵ start a b b b Regular grammars are just CFGs with their productions limited to the following two forms, A ab and C c PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 43 / 62

120 Derivations A derivation step is to replace the lhs of a production by its rhs. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

121 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

122 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

123 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

124 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 3 T P+T 2 T +T PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

125 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

126 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

127 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

128 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

129 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

130 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id The set of sentences derived from the start symbol S of a grammar G defines the language of the grammar, denoted L(G): L(G) ={x V t S + x}. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

131 Derivations A derivation step is to replace the lhs of a production by its rhs. Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id The set of sentences derived from the start symbol S of a grammar G defines the language of the grammar, denoted L(G): L(G) ={x V t S + x}. Note: Derivations are defined with respect to proper productions. They don t work with grammars in EBNF form. While the rules in an EBNF grammar are of the form lhs rhs, theyarenotproperproductions. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 44 / 62

132 Parse Tree A.k.a. concrete syntax tree A tree representation of complete derivations, i.e., from the start symbol to a sentence. PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 45 / 62

133 Parse Tree A.k.a. concrete syntax tree A tree representation of complete derivations, i.e., from the start symbol to a sentence. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id E root node start symbol E + T leaf nodes terminals T P non-leaf nodes nonterminals non-leaf nodes with children productions T * P id3 P id2 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 45 / 62 id1

134 Multiple Derivations The same sentence often can be derived by different sequences of productions: Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 46 / 62

135 Multiple Derivations The same sentence often can be derived by different sequences of productions: Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id Three different derivations: 1. E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 46 / 62

136 Multiple Derivations The same sentence often can be derived by different sequences of productions: Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id Three different derivations: 1. E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id 2. E 1 E +T 2 T +T 3 T P+T 4 P P+T 4 P P+P 5 P P +id 5 P id + id 5 id id + id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 46 / 62

137 Multiple Derivations The same sentence often can be derived by different sequences of productions: Example: 1. E E+T 2. E T 3. T T*P 4. T P 5. P id Three different derivations: 1. E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id 2. E 1 E +T 2 T +T 3 T P+T 4 P P+T 4 P P+P 5 P P +id 5 P id + id 5 id id + id 3. E 1 E+T 4 E+P 5 E +id 2 T +id 3 T P+id 4 P P+id 5 id P +id 5 id id + id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 46 / 62

138 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E E + T E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

139 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E E + T T E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

140 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id E E + T T T * P PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

141 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id E E + T T T * P P PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

142 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id E E + T T T * P P id1 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

143 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id E E + T T T * P P id2 id1 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

144 Multiple Derivations For unambiguous grammars, different derivations correspond to different ways of building up the same parse tree. 1. E E+T 2. E T 3. T T*P 4. T P 5. P id E 1 E +T 2 T +T 3 T P+T 4 P P+T 5 id P +T 5 id id + T 4 id id + P 5 id id + id E E + T T P T * P P id2 id1 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 47 / 62

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

Non-deterministic Finite Automata (NFA)

Non-deterministic Finite Automata (NFA) Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages. Lecture 3 CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

Syntax. A. Bellaachia Page: 1

Syntax. A. Bellaachia Page: 1 Syntax 1. Objectives & Definitions... 2 2. Definitions... 3 3. Lexical Rules... 4 4. BNF: Formal Syntactic rules... 6 5. Syntax Diagrams... 9 6. EBNF: Extended BNF... 10 7. Example:... 11 8. BNF Statement

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens Syntactic Analysis CS45H: Programming Languages Lecture : Lexical Analysis Thomas Dillig Main Question: How to give structure to strings Analogy: Understanding an English sentence First, we separate a

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 2: Lexical analysis Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 Basics of Lexical Analysis: 2 Some definitions:

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis CSE450 Translation of Programming Languages Lecture 4: Syntax Analysis http://xkcd.com/859 Structure of a Today! Compiler Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream

More information

Lexical Analysis. Chapter 2

Lexical Analysis. Chapter 2 Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Lecture 3. January 10, 2018 Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End Architecture of Compilers, Interpreters : Organization of Programming Languages ource Analyzer Optimizer Code Generator Context Free Grammars Intermediate Representation Front End Back End Compiler / Interpreter

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

Syntax. In Text: Chapter 3

Syntax. In Text: Chapter 3 Syntax In Text: Chapter 3 1 Outline Syntax: Recognizer vs. generator BNF EBNF Chapter 3: Syntax and Semantics 2 Basic Definitions Syntax the form or structure of the expressions, statements, and program

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages! CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how

More information

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics ISBN Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters : Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

More information

Introduction to Lexing and Parsing

Introduction to Lexing and Parsing Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

CS415 Compilers. Lexical Analysis

CS415 Compilers. Lexical Analysis CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

Lecture 4: Syntax Specification

Lecture 4: Syntax Specification The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic

More information

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;} Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)

More information

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

More information

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language? The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

More information

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012 Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

More information

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS308 Compiler Principles Lexical Analyzer Li Jiang CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular

More information

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

More information

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2007

More information

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3 Programming Language Specification and Translation ICOM 4036 Fall 2009 Lecture 3 Some parts are Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics

More information

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

More information

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the

More information

Compiler Construction

Compiler Construction Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =

More information

Week 3: Compilers and Interpreters

Week 3: Compilers and Interpreters CS320 Principles of Programming Languages Week 3: Compilers and Interpreters Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 3: Compilers and Interpreters 1/ 52 Programming Language

More information

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

2. Lexical Analysis! Prof. O. Nierstrasz!

2. Lexical Analysis! Prof. O. Nierstrasz! 2. Lexical Analysis! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/! http://www.cs.purdue.edu/homes/hosking/!

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

Syntax Intro and Overview. Syntax

Syntax Intro and Overview. Syntax Syntax Intro and Overview CS331 Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal

More information

Monday, August 26, 13. Scanners

Monday, August 26, 13. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Context-Free Grammars

Context-Free Grammars Context-Free Grammars Lecture 7 http://webwitch.dreamhost.com/grammar.girl/ Outline Scanner vs. parser Why regular expressions are not enough Grammars (context-free grammars) grammar rules derivations

More information

Programming Language Syntax and Analysis

Programming Language Syntax and Analysis Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of

More information

Syntax and Grammars 1 / 21

Syntax and Grammars 1 / 21 Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types What is a language? 2 / 21 What is a language?

More information

Wednesday, September 3, 14. Scanners

Wednesday, September 3, 14. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built) Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Syntax Analysis Parsing Syntax Or Structure Given By Determines Grammar Rules Context Free Grammar 1 Context Free Grammars (CFG) Provides the syntactic structure: A grammar is quadruple (V T, V N, S, R)

More information

ICOM 4036 Spring 2004

ICOM 4036 Spring 2004 Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3 Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics Structure of a Compiler

More information

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis dministrivia Lexical nalysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Moving to 6 Evans on Wednesday HW available Pyth manual available on line. Please log into your account and electronically

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Describing Syntax and Semantics

Describing Syntax and Semantics Describing Syntax and Semantics Introduction Syntax: the form or structure of the expressions, statements, and program units Semantics: the meaning of the expressions, statements, and program units Syntax

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata #1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

ECE251 Midterm practice questions, Fall 2010

ECE251 Midterm practice questions, Fall 2010 ECE251 Midterm practice questions, Fall 2010 Patrick Lam October 20, 2010 Bootstrapping In particular, say you have a compiler from C to Pascal which runs on x86, and you want to write a self-hosting Java

More information

Lexical Analysis. Finite Automata

Lexical Analysis. Finite Automata #1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

Formal Languages. Formal Languages

Formal Languages. Formal Languages Regular expressions Formal Languages Finite state automata Deterministic Non-deterministic Review of BNF Introduction to Grammars Regular grammars Formal Languages, CS34 Fall2 BGRyder Formal Languages

More information