Lexical Analysis. COMP 524, Spring 2014 Bryan Ward
|
|
- Martin Spencer
- 6 years ago
- Views:
Transcription
1 Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others
2 The Big Picture Character Stream Scanner (lexical analysis) Token Stream Parser (syntax analysis) Parse Tree Semantic analysis & intermediate code gen. Abstract syntax tree Machine-independent optimization (optional) Modified intermediate form Target code generation. Machine language Machine-specific optimization (optional) Modified target language!2
3 The Big Picture Character Stream Scanner (lexical analysis) Token Stream Parser (syntax analysis) Parse Tree Semantic analysis & intermediate code gen. Abstract syntax tree Modified intermediate form Lexical analysis: Machine-independent optimization (optional) grouping consecutive characters that belong together.! Target code generation. Machine language Turn the stream of individual characters into a Modified target language Machine-specific optimization stream of tokens that have individual meaning. (optional)!3
4 Source Program The compiler reads the program from a file.! Input as a character stream.!4
5 Source Program The compiler reads the program from a file.! Input as a character stream. Source File = * f o o ;!4
6 Source Program The compiler reads the program from a file.! Input as a character stream. Source File = * f o o ; Compilation requires analysis of program structure.! Identify subroutines, classes, methods, etc. Thus, first step is to find units of meaning.!4
7 Tokens Source File = * f o o ;!5
8 Tokens Source File = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!5
9 Tokens Source File = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Human Analogy: To understand the meaning of an English sentence, we do not look at individual characters. Rather, we look at individual words. Human word = Program token Compiler must identify all input tokens.!!6
10 Tokens Source File Operator: Assignment = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!7
11 Tokens Source File Integer Literal = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!8
12 Tokens Source File Operator: Minus = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!9
13 Tokens Source File Integer Literal = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!10
14 Tokens Source File Operator: Multiplication = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!11
15 Tokens Source File Identifier: foo = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!12
16 Tokens Source File Statement separator/terminator = * f o o ; Not every character has an individual meaning.! In Java, a + can have two interpretations: A single + means addition. A + + sequence means increment. A sequence of characters that has an atomic meaning is called a token. Compiler must identify all input tokens.!13
17 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase?!14
18 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing).!14
19 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing). However, this is impractical.!14
20 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing). However, this is impractical. It is much easier (and much more efficient) to express the syntax rules in terms of tokens.!14
21 Lexical vs. Syntactical Analysis Why have a separate lexical analysis phase? In theory, token discovery (lexical analysis) could be done as part of the structure discovery (syntactical analysis, parsing). However, this is impractical. It is much easier (and much more efficient) to express the syntax rules in terms of tokens. Thus, lexical analysis is made a separate step because it greatly simplifies the subsequently performed syntactical analysis.!14
22 Example: Java Language Specification Lexical Structure The following 37 tokens are the operators, formed from ASCII characters: Operator: one of = > <! ~? : == <= >=!= && * / & ^ % << >> >>> += -= *= /= &= = ^= %= <<= >>= >>>= Syntactical Structure UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression! UnaryExpression CastExpression
23 Example: Java Language Specification Token Specification: Lexical Structure These strings mean something, but knowledge of the exact meaning is not required to identify them. The following 37 tokens are the operators, formed from ASCII characters: Operator: one of = > <! ~? : == <= >=!= && * / & ^ % << >> >>> += -= *= /= &= = ^= %= <<= >>= >>>= Syntactical Structure UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression! UnaryExpression CastExpression
24 Example: Java Language Specification Token Specification: Lexical Structure These strings mean something, but knowledge of the exact meaning is not required to identify them. The following 37 tokens are the operators, formed from ASCII characters: Operator: one of = > <! ~? : == <= >=!= && * / & ^ % << >> >>> += -= *= /= &= = ^= %= <<= >>= >>>= Syntactical Structure Meaning is given by where they can UnaryExpression: PreIncrementExpression PreDecrementExpression + UnaryExpression - UnaryExpression UnaryExpressionNotPlusMinus PreIncrementExpression: ++ UnaryExpression UnaryExpressionNotPlusMinus: PostfixExpression ~ UnaryExpression! UnaryExpression CastExpression occur in the program (grammar) and and language semantics.
25 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream?!18
26 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream? Token Specification Regular Expressions Language Design and Specification!18
27 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream? Token Specification Token Recognition Regular Expressions Deterministic Finite Automata (DFA) Language Design and Specification Language Implementation!18
28 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? How can we recognize tokens in a character stream? Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA) Language Design and Specification (several steps) Language Implementation!18
29 Regular Expression Rules!19
30 Regular Expression Rules Base case: a regular expression (RE) is either!19
31 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or!19
32 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!19
33 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!19
34 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by!19
35 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by alternation: two REs separated by next to each other (e.g. 1 0 ),!19
36 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by alternation: two REs separated by next to each other (e.g. 1 0 ), parentheses (in order to avoid ambiguity).!19
37 Regular Expression Rules Base case: a regular expression (RE) is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). A compound RE is constructed by alternation: two REs separated by next to each other (e.g. 1 0 ), parentheses (in order to avoid ambiguity). concatenation: two REs next to each other (e.g., (1)(0 1) ),!19
38 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)*!20
39 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1?!20
40 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? Yes!20
41 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? Yes 0x0?!20
42 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? Yes No!20
43 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? Yes No 0xdeadbeef?!20
44 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? Yes No Yes!20
45 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? Yes No Yes 0x?!20
46 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? Yes No Yes No!20
47 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? Yes No Yes No 0x01?!20
48 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? 0x01? Yes No Yes No No!20
49 What Does This Regular Expression Match? 0x( a b c d e f)( a b c d e f)* 0x1? 0x0? 0xdeadbeef? 0x? 0x01? Yes No Yes No No Recognizes Positive hexadecimal constants! without leading zeros!20
50 Example!21
51 Example Can we create a regular expression corresponding to the City, State ZIP-code line in mailing addresses? E.g.: Chapel Hill, NC !21
52 Example Can we create a regular expression corresponding to the City, State ZIP-code line in mailing addresses? E.g.: Chapel Hill, NC !! Beverly Hills, CA 90210!21
53 Example Can we create a regular expression corresponding to the City, State ZIP-code line in mailing addresses? E.g.: Chapel Hill, NC !! Beverly Hills, CA 90210!21
54 Grammars and Languages A regular grammar is a kind of grammar.! A grammar describes the structure of strings. A string that matches a grammar G s structure is said to be in the language L(G) (which is a set).!22
55 Grammars and Languages A regular grammar is a kind of grammar.! A grammar describes the structure of strings. A string that matches a grammar G s structure is said to be in the language L(G) (which is a set). A grammar is a set of productions:! Rules to obtain (produce) a string that is in L(G) via repeated substitutions. There are many grammar classes (see COMP 455). Two are commonly used to describe programming languages: regular grammars for tokens and context-free grammars for syntax.!22
56 Grammar 101 digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!23
57 Grammar 101: Productions digit non_zero_digit natural_number non_zero_digit digit* A B is called a production. non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!24
58 Grammar 101: Non-Terminals digit non_zero_digit natural_number non_zero_digit digit* The name on the left is called non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε ) a non-terminal symbol.!25
59 Grammar 101: Terminals digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε ) The symbols on the right are either terminal or nonterminal symbols. A terminal symbol is just a character.!26
60 Grammar 101: Definition digit non_zero_digit natural_number non_zero_digit digit* means is a or replace with non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!27
61 Grammar 101: Choice digit non_zero_digit natural_number non_zero_digit digit* denotes or non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!28
62 Grammar 101: Example digit non_zero_digit natural_number non_zero_digit digit* Thus, the first production means: A digit is a 0 or 1 or 2 or or 9. non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!29
63 Grammar 101: Optional Repetition * denotes zero or more of a symbol. digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!30
64 Grammar 101: Sequence Two symbols next to each other means followed by. digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!31
65 Grammar 101: Example Thus, this means: A natural number is a non-zero digit digit followed by zero or more digits. non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!32
66 Grammar 101: Epsilon digit ε is special terminal that means empty. It corresponds to the empty string. non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!33
67 Grammar 101: Example So, what does this mean? digit non_zero_digit natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!34
68 Grammar 101: Example A non-negative digit 0 1 number is 6 a or a natural number, followed by either nothing or a., followed by zero or more non_zero_digit digits, followed by (exactly one) digit. natural_number non_zero_digit digit* non_neg_number (0 natural_number) ( (. digit* non_zero_digit) ε )!35
69 Regular Grammar Rules Very similar to regular expression rules!!36
70 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either!36
71 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or!36
72 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!36
73 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ).!36
74 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using!36
75 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using alternation: two REs or nonterminals separated by next to each other (e.g. letter digit ),!36
76 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using alternation: two REs or nonterminals separated by next to each other (e.g. letter digit ), parentheses (in order to avoid ambiguity).!36
77 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either a character (e.g., 0, 1,...), or the empty string (i.e., ε ). Non-Terminals: can be constructed using alternation: two REs or nonterminals separated by next to each other (e.g. letter digit ), parentheses (in order to avoid ambiguity). concatenation: two REs or nonterminals next to each other (e.g., letter letter ),!36
78 Regular Grammar Rules Very similar to regular expression rules! Terminals: a terminal is either! a character (e.g., 0, 1,...), or the empty string (i.e., ε ).! A non-terminal is NEVER defined in terms of itself, Non-Terminals: can be constructed using! not even indirectly! alternation: two REs or nonterminals separated by next to Thus, regular grammars cannot define recursive each other (e.g. letter digit ), statements. parentheses (in order to avoid ambiguity). concatenation: two REs or nonterminals next to each other (e.g., letter letter ), optional repetition: a RE or nonterminal followed by * (the Kleene star) to denote zero or more occurrences (e.g., digit* )!37
79 Example Let s create a regular grammar corresponding to the City, State ZIP-code line in mailing addresses.! E.g.: Chapel Hill, NC !!! Beverly Hills, CA 90210!38
80 Example Let s create a regular grammar corresponding to the City, State ZIP-code line in mailing addresses.! E.g.: Chapel Hill, NC !!! Beverly Hills, CA city_line city city, state_abbrev zip_code letter (letter letter)* state_abbrev AL AK AS AZ WY zip_code digit digit digit digit digit (extra ε ) extra - digit digit digit digit digit letter A B C ö!39
81 Example Let s create a regular grammar corresponding to the City, State ZIP-code line in mailing addresses.! E.g.: Chapel Hill, NC ! city_line city Creating a regular expression from a regular grammar is mechanical and easy.!! Beverly Hills, CA city, state_abbrev zip_code letter (letter letter)* state_abbrev AL AK AS AZ WY! Just take the most general non-terminal and keep substituting until you get down to terminals.! zip_code digit digit digit digit digit (extra ε ) The lack of recursion means that you won t extra get into an infinite loop. - digit digit digit digit digit letter A B C ö!40
82 Regular Sets and Finite Automata If a grammar G is a regular grammar, then the language L(G) is called a regular set.!41
83 Regular Sets and Finite Automata If a grammar G is a regular grammar, then the language L(G) is called a regular set. Equivalently, the language accepted by a regular expression is a regular set.!41
84 Regular Sets and Finite Automata If a grammar G is a regular grammar, then the language L(G) is called a regular set. Equivalently, the language accepted by a regular expression is a regular set. Fundamental equivalence: For every regular set L(G), there exists a deterministic finite automaton (DFA) that accepts a string S if and only if S L(G). (See COMP 455 for proof.)!41
85 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!42
86 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. Start State One or more final states. Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!43
87 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Intermediate State (neither start nor final) Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!44
88 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Final State (indicated by double border) Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!45
89 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. Transition One or more final states. Transitions: define how automaton switches between states (given an input symbol). Given an input of 1, if DFA is in state A, then transition to state B (and consume the input) , 1 A (Start) 1 B 1 C!46
90 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. Self Transition Given an input of 0, if DFA is in state A, then stay in state A (and consume the input). One or more final states. Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!47
91 DFA 101 Transitions must be unambiguous: Deterministic finite automaton:! For each state and each input, there exist only one Has a finite number of states. transition. This is what makes the DFA deterministic. Exactly one start state. One or more final states. Not a legal DFA! Transitions: define how automaton switches between states (given an input symbol). X 1 Z 1 Y 0 0 0, 1 A (Start) 1 B 1 C!48
92 DFA 101 Deterministic finite automaton:! Has a finite number of states. Exactly one start state. One or more final states. Multiple Transitions Given an input of either 0 or 1, if DFA is in state C, then stay in state C (and consume the input). Transitions: define how automaton switches between states (given an input symbol) , 1 A (Start) 1 B 1 C!49
93 DFA String Processing 0 0 0, 1 A (Start) 1 B 1 C!50
94 DFA String Processing 0 0 0, 1 A (Start) 1 B 1 C String processing.! Initially in start state. Sequentially make transitions each character in input string.!50
95 DFA String Processing 0 0 0, 1 A (Start) 1 B 1 C String processing.! Initially in start state. Sequentially make transitions each character in input string. A DFA either accepts or rejects a string.! Reject if a character is encountered for which no transition is defined in the current state. Reject if end of input is reached and DFA is not in a final state. Accept if end of input is reached and DFA is in final state.!50
96 DFA Example 0 0 0, 1 A (Start) 1 B 1 C current state Input: 1 0 Initially, DFA is in the start State A.! current input character The first input character is 1.! This causes a transition to State B.!51
97 DFA Example 0 0 0, 1 A (Start) 1 B 1 C current state Input: 1 0 The next input character is 0.! current input character This causes a self transition in State B.!52
98 DFA Example 0 0 0, 1 A (Start) 1 B 1 C current state Input: 1 0 The end of the input is reached, current input character but the DFA is not in a final state:! the string 10 is rejected!!53
99 DFA-Equivalent Regular Expression 0 0 0, 1 A (Start) 1 B 1 C What s the RE such that the RE s language is exactly the set of strings that is accepted by this DFA?!54
100 DFA-Equivalent Regular Expression 0 0 0, 1 A (Start) 1 B 1 C What s the RE such that the RE s language is exactly the set of strings that is accepted by this DFA? 0*10*1(1 0)*!54
101 DFA-Equivalent Regular Expression 0 0 0, 1 A (Start) 1 B 1 C What s the RE such that the RE s language is exactly the set of strings that is accepted by this DFA? 0*10*1(1 0)*!55
102 Recognizing Tokens with a DFA 0 0 0, 1 A (Start) 1 B 1 C!56
103 Recognizing Tokens with a DFA 0 0 0, 1 A (Start) 1 B 1 C Table-driven implementation.! DFA s can be represented as a 2-dimensional table.!56
104 Recognizing Tokens with a DFA 0 0 0, 1 A (Start) 1 B 1 C Table-driven implementation.! DFA s can be represented as a 2-dimensional table. Current State On 0 On 1 Note A transition to A transition to B start B transition to B transition to C C transition to C transition to C final!56
105 Recognizing Tokens with a DFA currentstate = start state;! 0 while end of input not yet reached: {! c = get next input character;! if transitiontable[currentstate][c] null:! currentstate = transitiontable[currentstate][c]! else:! A (Start) reject input! }! if currentstate is final:! accept input! else:! Table-driven implementation.! reject input! 1 DFA s can be represented as a 2-dimensional 0 B 1 0, 1 C Current State On 0 On 1 Note A transition to A transition to B start B transition to B transition to C C transition to C transition to C final!57
106 Recognizing Tokens with a DFA currentstate = start state;! 0 while end of input not yet reached: {! c = get next input character;! if transitiontable[currentstate][c] null:! currentstate = transitiontable[currentstate][c]! else:! A (Start) reject input! }! if currentstate is final:! accept input! else:! Table-driven implementation.! reject input 1 DFA s can be represented as a 2-dimensional 0 B 1 0, 1 This accepts exactly one token in the input.! A real lexer must detect multiple successive tokens.! Current State On 0 On 1 Note A transition to A transition to B start This can be achieved by resetting to the start state.! But what happens if the suffix of one token is the prefix of another?! B transition to B transition to C (See Chapter 2 for a solution.) C transition to C transition to C final C!58
107 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA)!59
108 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA)!59
109 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA)!59
110 Lexical Analysis The need to identify tokens raises two questions.! How can we specify the tokens of a language? With regular expressions. How can we recognize tokens in a character stream? With DFAs. Token Specification Token Recognition Regular Expressions DFA Construction Deterministic Finite Automata (DFA) No single-step algorithm: We first need to construct a Non-Deterministic Finite Automaton!59
111 Non-Deterministic Finite Automaton (NFA) X 1 1 ε Z Y V A legal NFA fragment.!60
112 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: X 1 1 ε Z Y V A legal NFA fragment.!60
113 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) X 1 1 ε Z Y V A legal NFA fragment.!60
114 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) X 1 1 ε Z Y V A legal NFA fragment.!60
115 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. X 1 1 ε Z Y V A legal NFA fragment.!60
116 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. Acceptance rule: X 1 1 ε Z Y V A legal NFA fragment.!60
117 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. Acceptance rule: Accepts an input string if there exists X a series of transitions such that the NFA is in a final state when the end of 1 1 ε input is reached. Z Y V A legal NFA fragment.!60
118 Non-Deterministic Finite Automaton (NFA) Like a DFA, but less restrictive: Transitions do not have to be unique: each state may have multiple ambiguous transitions for the same input symbol. (Hence, it can be non-deterministic.) Epsilon transitions do not consume any input. (They correspond to the empty string.) Note that every DFA is also a NFA. Acceptance rule: Accepts an input string if there exists X a series of transitions such that the NFA is in a final state when the end of 1 1 ε input is reached. Z Y V Inherent parallelism: all possible paths are explored simultaneously. A legal NFA fragment.!60
Lexical Analysis. Chapter 2
Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples
More informationLexical Analysis. Lecture 2-4
Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.
More informationLecture 3: Lexical Analysis
Lecture 3: Lexical Analysis COMP 524 Programming Language Concepts tephen Olivier January 2, 29 Based on notes by A. Block, N. Fisher, F. Hernandez-Campos, J. Prins and D. totts Goal of Lecture Character
More informationLecture 4: Syntax Specification
The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic
More informationLexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast! Compiler Passes Analysis of input program (front-end) character stream
More informationLexical Analysis. Lecture 3-4
Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please
More informationRegular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications
Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata
More informationSyntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill
Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character
More informationIntroduction to Lexical Analysis
Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular
More informationTheoretical Part. Chapter one:- - What are the Phases of compiler? Answer:
Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal
More informationSyntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011
Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token
More informationCSCI312 Principles of Programming Languages!
CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More informationCMSC 350: COMPILER DESIGN
Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language
More informationCOMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]
More informationCSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1
CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely
More informationLexical Analysis. Introduction
Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies
More informationNon-deterministic Finite Automata (NFA)
Non-deterministic Finite Automata (NFA) CAN have transitions on the same input to different states Can include a ε or λ transition (i.e. move to new state without reading input) Often easier to design
More informationMIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology
MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure
More informationImplementation of Lexical Analysis
Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation
More informationAdministrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis
dministrivia Lexical nalysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Moving to 6 Evans on Wednesday HW available Pyth manual available on line. Please log into your account and electronically
More informationCOLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR
Pune Vidyarthi Griha s COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR By Prof. Anand N. Gharu (Assistant Professor) PVGCOE Computer Dept.. 22nd Jan 2018 CONTENTS :- 1. Role of lexical analysis 2.
More informationIntroduction to Parsing. Lecture 8
Introduction to Parsing Lecture 8 Adapted from slides by G. Necula Outline Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Languages and Automata Formal languages
More informationUNIT -2 LEXICAL ANALYSIS
OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce
More informationAbout the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design
i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target
More informationLexical Scanning COMP360
Lexical Scanning COMP360 Captain, we re being scanned. Spock Reading Read sections 2.1 3.2 in the textbook Regular Expression and FSA Assignment A new assignment has been posted on Blackboard It is due
More informationTheory and Compiling COMP360
Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the
More informationOptimizing Finite Automata
Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states
More informationA simple syntax-directed
Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character
More informationCSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions
CSE 413 Programming Languages & Implementation Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory
More informationCSCE 314 Programming Languages
CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how
More informationImplementation of Lexical Analysis
Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation
More informationFormal Languages and Compilers Lecture VI: Lexical Analysis
Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal
More informationCOP 3402 Systems Software Syntax Analysis (Parser)
COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis
More informationParsing. source code. while (k<=n) {sum = sum+k; k=k+1;}
Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)
More informationCSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions
CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:
More informationInterpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console
Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the
More informationOutline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata
Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2
More informationContext-Free Grammars
Context-Free Grammars Lecture 7 http://webwitch.dreamhost.com/grammar.girl/ Outline Scanner vs. parser Why regular expressions are not enough Grammars (context-free grammars) grammar rules derivations
More informationLexical Analysis. Finite Automata
#1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical
More informationCompiling Regular Expressions COMP360
Compiling Regular Expressions COMP360 Logic is the beginning of wisdom, not the end. Leonard Nimoy Compiler s Purpose The compiler converts the program source code into a form that can be executed by the
More informationTHE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES
THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language
More informationOutline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)
Outline Limitations of regular languages Introduction to Parsing Parser overview Lecture 8 Adapted from slides by G. Necula Context-free grammars (CFG s) Derivations Languages and Automata Formal languages
More informationLexical Analysis 1 / 52
Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA
More informationCS 403: Scanning and Parsing
CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax
More informationOutline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation
Outline Introduction to Parsing Lecture 8 Adapted from slides by G. Necula and R. Bodik Limitations of regular languages Parser overview Context-free grammars (CG s) Derivations Syntax-Directed ranslation
More informationIntroduction to Lexical Analysis
Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples
More informationprogramming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs
Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning
More informationLexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata
Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string
More informationIntroduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1
Introduction to Automata Theory BİL405 - Automata Theory and Formal Languages 1 Automata, Computability and Complexity Automata, Computability and Complexity are linked by the question: What are the fundamental
More informationCSE 3302 Programming Languages Lecture 2: Syntax
CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:
More informationParsing II Top-down parsing. Comp 412
COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled
More informationFormal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2
Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence
More informationIntroduction to Lexing and Parsing
Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?
More informationCS164: Midterm I. Fall 2003
CS164: Midterm I Fall 2003 Please read all instructions (including these) carefully. Write your name, login, and circle the time of your section. Read each question carefully and think about what s being
More informationCSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis
CSE450 Translation of Programming Languages Lecture 4: Syntax Analysis http://xkcd.com/859 Structure of a Today! Compiler Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator
More informationSyntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens
Syntactic Analysis CS45H: Programming Languages Lecture : Lexical Analysis Thomas Dillig Main Question: How to give structure to strings Analogy: Understanding an English sentence First, we separate a
More informationLexical Analysis. Finite Automata
#1 Lexical Analysis Finite Automata Cool Demo? (Part 1 of 2) #2 Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical
More informationCOMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.
UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System
More informationCSE302: Compiler Design
CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 01, 2007 Outline Recap
More informationDavid Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)
David Griol Barres dgriol@inf.uc3m.es Computer Science Department Carlos III University of Madrid Leganés (Spain) OUTLINE Introduction: Definitions The role of the Lexical Analyzer Scanner Implementation
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and
More informationCS415 Compilers. Lexical Analysis
CS415 Compilers Lexical Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 7 1 Announcements First project and second homework
More informationA Simple Syntax-Directed Translator
Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called
More informationChapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1
Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules
More informationAnnouncements! P1 part 1 due next Tuesday P1 part 2 due next Friday
Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday 1 Finite-state machines CS 536 Last time! A compiler is a recognizer of language S (Source) a translator from S to T (Target) a program
More information1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.
UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016 Lecture 15 Ana Bove May 23rd 2016 More on Turing machines; Summary of the course. Overview of today s lecture: Recap: PDA, TM Push-down
More informationCompiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010
Compiler Design. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 1, 010 Contents In these slides we will see 1.Introduction, Concepts and Notations.Regular Expressions, Regular
More informationCOP4020 Programming Languages. Syntax Prof. Robert van Engelen
COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up
More informationBuilding Compilers with Phoenix
Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate
More informationCunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers
Cunning Plan Informal Sketch of Lexical Analysis LA identifies tokens from input string lexer : (char list) (token list) Issues in Lexical Analysis Lookahead Ambiguity Specifying Lexers Regular Expressions
More informationCSE302: Compiler Design
CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 20, 2007 Outline Recap
More informationCompiler Construction
Compiler Construction Lecture 2: Lexical Analysis I (Introduction) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/
More informationMidterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2
Banner number: Name: Midterm Exam CSCI 336: Principles of Programming Languages February 2, 23 Group Group 2 Group 3 Question. Question 2. Question 3. Question.2 Question 2.2 Question 3.2 Question.3 Question
More informationCT32 COMPUTER NETWORKS DEC 2015
Q.2 a. Using the principle of mathematical induction, prove that (10 (2n-1) +1) is divisible by 11 for all n N (8) Let P(n): (10 (2n-1) +1) is divisible by 11 For n = 1, the given expression becomes (10
More informationCS 314 Principles of Programming Languages. Lecture 3
CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information
More informationComputer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres
Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es Introduction: Definitions Lexical analysis or scanning: To read from left-to-right a source
More informationUVa ID: NAME (print): CS 4501 LDI Midterm 1
CS 4501 LDI Midterm 1 Write your name and UVa ID on the exam. Pledge the exam before turning it in. There are nine (9) pages in this exam (including this one) and six (6) questions, each with multiple
More informationCompiler Construction
Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =
More informationCS 314 Principles of Programming Languages
CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018 Announcement First recitation starts this Wednesday Homework 1 will be release
More informationCS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)
CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square) Introduction This semester, through a project split into 3 phases, we are going
More information2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS
2010: Compilers Lexical Analysis: Finite State Automata Dr. Licia Capra UCL/CS REVIEW: REGULAR EXPRESSIONS a Character in A Empty string R S Alternation (either R or S) RS Concatenation (R followed by
More informationThis book is licensed under a Creative Commons Attribution 3.0 License
6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars
More informationPrinciples of Programming Languages COMP251: Syntax and Grammars
Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2007
More informationBriefly describe the purpose of the lexical and syntax analysis phases in a compiler.
Name: Midterm Exam PID: This is a closed-book exam; you may not use any tools besides a pen. You have 75 minutes to answer all questions. There are a total of 75 points available. Please write legibly;
More informationCS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]
CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups
More informationIntroduction to Parsing. Lecture 5
Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important
More informationECS 120 Lesson 7 Regular Expressions, Pt. 1
ECS 120 Lesson 7 Regular Expressions, Pt. 1 Oliver Kreylos Friday, April 13th, 2001 1 Outline Thus far, we have been discussing one way to specify a (regular) language: Giving a machine that reads a word
More informationFigure 2.1: Role of Lexical Analyzer
Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer
More informationfor (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }
Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More informationCOL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20
COL728 Minor1 Exam Compiler Design Sem II, 2016-17 Answer all 5 questions Max. Marks: 20 1. Short questions a. Show that every regular language is also a context-free language [2] We know that every regular
More informationSyntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.
Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will
More informationFront End: Lexical Analysis. The Structure of a Compiler
Front End: Lexical Analysis The Structure of a Compiler Constructing a Lexical Analyser By hand: Identify lexemes in input and return tokens Automatically: Lexical-Analyser generator We will learn about
More informationWeek 2: Syntax Specification, Grammars
CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences
More informationChapter 3. Describing Syntax and Semantics
Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:
More information