CS 314 Principles of Programming Languages

Similar documents
CS 314 Principles of Programming Languages. Lecture 3

Formal Languages and Compilers Lecture VI: Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSc 453 Lexical Analysis (Scanning)

CPSC 434 Lecture 3, Page 1

Part 5 Program Analysis Principles and Techniques

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CS415 Compilers. Lexical Analysis

Lexical Analysis. Introduction

Week 2: Syntax Specification, Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

CS 314 Principles of Programming Languages

Lexical Scanning COMP360

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Compiling Regular Expressions COMP360

Non-deterministic Finite Automata (NFA)

CSCE 314 Programming Languages

CSE 3302 Programming Languages Lecture 2: Syntax

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Theory and Compiling COMP360

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Formal Languages. Formal Languages

CSCI312 Principles of Programming Languages!

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Compiler course. Chapter 3 Lexical Analysis

CSE302: Compiler Design

Zhizheng Zhang. Southeast University

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lecture 2-4

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Chapter 3 Lexical Analysis

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis 1 / 52

Optimizing Finite Automata

Lecture 4: Syntax Specification

2. Lexical Analysis! Prof. O. Nierstrasz!

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

KEY. A 1. The action of a grammar when a derivation can be found for a sentence. Y 2. program written in a High Level Language

Lecture 3: CUDA Programming & Compiler Front End

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Introduction to Lexing and Parsing

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

Compiler Construction LECTURE # 3

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Compiler Design Overview. Compiler Design 1

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CST-402(T): Language Processors

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Implementation of Lexical Analysis

Introduction to Lexical Analysis

CS308 Compiler Principles Lexical Analyzer Li Jiang

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Figure 2.1: Role of Lexical Analyzer

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analyzer Scanner

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Lexical Analyzer Scanner

Chapter 2 :: Programming Language Syntax

Lexical Analysis. Chapter 2

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical and Syntax Analysis

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Implementation of Lexical Analysis

Front End. Hwansoo Han

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Chapter 3. Parsing #1

CSE 401/M501 Compilers

Regular Expressions & Automata

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

CS606- compiler instruction Solved MCQS From Midterm Papers

Writing a Lexical Analyzer in Haskell (part II)

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Compilers and Interpreters

CPS 506 Comparative Programming Languages. Syntax Specification

Lexical Analysis. Lecture 3. January 10, 2018

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CS 441G Fall 2018 Exam 1 Matching: LETTER

Transcription:

CS 314 Principles of Programming Languages Lecture 2: Syntax Analysis Zheng (Eddy) Zhang Rutgers University January 22, 2018

Announcement First recitation starts this Wednesday Homework 1 will be release tomorrow My office hour: Wednesday 5pm to 6pm, CoRE 310. 2

Syntax and Semantics of Prog. Languages Syntax: Describes what a legal program looks like Semantics: Describes what a correct (legal) program means A formal language is a (possibly infinite) set of sentences (finite sequences of symbols) over a finite alphabet Σ of (terminal) symbols: L Σ Examples: L = { identifiers of length 2} with Σ = {a, b, c} L = {strings of only 1s and only 0s} L = {strings starting with $ and ending with #, and any combination of 0s and 1s in between } L = {all syntactically correct Java programs} 3

Syntax and Semantics: How does it work? Syntactic representation of values What do the following syntactic expressions have in common? X 1010 A $ # 3+ 19 (2 6) 4

Syntax and Semantics: How does it work? Syntactic representation of values What do the following syntactic expressions have in common? X 1010 A $ # 3+ 19 (2 6) Answer: They are possible representations of the integer value 10 (written as a decimal number) What is computation? Possible answer: A (finite) sequence of syntactic manipulations of value representations ending in a normal form which is called the result. Normal forms cannot be manipulated any further. 5

Syntax and Semantics: How does it work? Here is a game (rewrite system): Input: Sequence of characters starting with $ and ending with #, and any combination of 0s and 1s in between. Rules: You may replace a character pattern X at any position within the character sequence on the left-hand-side by the pattern Y on the righthand-side: X Y: Rule 1 $1 1& Rule 2 $0 0$ Rule 3 &1 1$ Rule 4 &0 0& Rule 5 Rule 6 $# A &# B Replace patterns using the rules as often as you can. When you cannot replace a pattern any more, stop. 6

Syntax and Semantics: How does it work? Example input: Rule 1 $1 1& $ 0 0 # Rule 2 $0 0$ $ 0 0 # is rewritten as 0 $ 0 # by rule 2 Rule 3 &1 1$ 0 $ 0 # is rewritten as 0 0 $ # by rule 2 Rule 4 &0 0& 0 0 $ # is rewritten as 0 0 A by rule 5 Rule 5 $# A no more rules can be applied (STOP) Rule 6 &# B More examples: $ 0 1 1 0 1 # $ 1 0 1 0 0 # $ 1 1 0 0 1 # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 7

Syntax and Semantics: How does it work? Example input: Rule 1 $1 1& $ 0 0 # Rule 2 $0 0$ $ 0 0 # is rewritten as 0 $ 0 # by rule 2 Rule 3 &1 1$ 0 $ 0 # is rewritten as 0 0 $ # by rule 2 Rule 4 &0 0& 0 0 $ # is rewritten as 0 0 A by rule 5 Rule 5 $# A no more rules can be applied (STOP) Rule 6 &# B More examples: $ 0 1 1 0 1 # $ 1 0 1 0 0 # $ 1 1 0 0 1 # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 8

Syntax and Semantics: How does it work? Example input: Rule 1 $1 1& $ 0 0 # Rule 2 $0 0$ $ 0 0 # is rewritten as 0 $ 0 # by rule 2 Rule 3 &1 1$ 0 $ 0 # is rewritten as 0 0 $ # by rule 2 Rule 4 &0 0& 0 0 $ # is rewritten as 0 0 A by rule 5 Rule 5 $# A no more rules can be applied (STOP) Rule 6 &# B More examples: $ 0 1 1 0 1 # $ 1 0 1 0 0 # $ 1 1 0 0 1 # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 9

Syntax and Semantics: How does it work? Example input: Rule 1 $1 1& $ 0 0 # Rule 2 $0 0$ $ 0 0 # is rewritten as 0 $ 0 # by rule 2 Rule 3 &1 1$ 0 $ 0 # is rewritten as 0 0 $ # by rule 2 Rule 4 &0 0& 0 0 $ # is rewritten as 0 0 A by rule 5 Rule 5 $# A no more rules can be applied (STOP) Rule 6 &# B More examples: $ 0 1 1 0 1 # $ 1 0 1 0 0 # $ 1 1 0 0 1 # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 10

Syntax and Semantics: How does it work? Example input: Rule 1 $1 1& $ 0 0 # Rule 2 $0 0$ $ 0 0 # is rewritten as 0 $ 0 # by rule 2 Rule 3 &1 1$ 0 $ 0 # is rewritten as 0 0 $ # by rule 2 Rule 4 &0 0& 0 0 $ # is rewritten as 0 0 A by rule 5 Rule 5 $# A no more rules can be applied (STOP) Rule 6 &# B More examples: $ 0 1 1 0 1 # $ 1 0 1 0 0 # $ 1 1 0 0 1 # Questions: Can we get different results for the same input string? Does all this have a meaning (semantics), or are we just pushing symbols? 11

Syntax Without Semantics? The green apple is colorless. 12

Syntax Without Semantics? The green apple is colorless. Syntactically correct, but semantically incorrect! 13

Review: Compilers Source Code Compiler Machine Code Error Implications: Recognize legal (and illegal) programs Generate correct code Manage storage of all variables and code Need format for object (or assembly) code Big step up from assembler higher level notations 14

Traditional Two-Pass Compilers Pass: reading and writing entire program Source Code Front End IR Back End Machine Code Implications: Intermediate representation (IR) Front end maps legal code into IR Back end maps IR onto target machine Simplify retargeting Allows multiple front ends Multiple passes better code Front end is O(n) Back end is NP-Complete Errors 15

Front End Source Code Tokens Scanner Parser IR Front End Responsibilities: Recognize legal programs Report errors Produce IR Preliminary storage map Errors Parser: syntax & semantic analyzer, IR code generator (syntax-directed translator) Shape the code for the back end Much of front end construction can be automated 16

Syntax and Semantics of Prog. Languages The syntax of programming languages is often defined in two layers: tokens and sentences tokens - basic units of the language Question: How to spell a token (word)? Answer: Regular expressions sentences - legal combinations of tokens in the language Question: How to build correct sentences with tokens? Answer: (context - free) grammars (CFG) E.g., Backus-Naur form (BNF) is a formalism used to express the syntax of programming languages. 17

Formalisms for Lexical and Syntactic Analysis Two issues in Formal Languages: Language Specification formalism to describe what a valid program (sentence) looks like. Language Recognition formalism to describe a machine and an algorithm that can verify that a program is valid or not. 18

Formalisms for Lexical and Syntactic Analysis Two issues in Formal Languages: Language Specification formalism to describe what a valid program (sentence) looks like. Language Recognition formalism to describe a machine and an algorithm that can verify that a program is valid or not. 1. Lexical Analysis: Converts source code into sequence of tokens. Regular grammar/expression to specify. Finite automata to recognize. 2. Syntax Analysis: Structures tokens into parse tree. Context-free grammars to specify. Push-down automata to recognize (will be covered in CS 415) 19

Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 20

Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 21

Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 22

Lexical Analysis ( Scott 2.1, 2.2) Character sequence i f a < = b t h e n c : = d scanner if id <a> <= id <b> then id <c> := Token sequence num <d> 23

Lexical Analysis ( Scott 2.1, 2.2) Tokens (Terminal Symbols of CFG, Words of Lang.) Smallest atomic units of syntax Used to build all the other constructs Example, C: keywords: for if goto volatile = * / - < > == <= >= <> ( ) [ ] ; :=., number: (Example: 3.14 28 ) identifier: (Example: b square addentry ) 24

Lexical Analysis (cont. ) Identifiers Names of variables, etc. Sequence of terminals of restricted form; Example, in C language: A31, but not 1A3 Upper/lower case sensitive? Keywords Special identifiers which represent tokens in the language May be reserved (reserved words) or not - E.g., C: if is reserved. Delimiters When does character string for token end? Example: identifiers are longest possible character sequence that does not include a delimiter. Most languages have more delimiters such as, new line, keywords,... 25

Regular Expressions A syntax (notation) to specify regular languages. RE r Language L(r) a {a} ϵ r s {ϵ} L(r) L(s) rs {rs r L(r), s L(s)} r+ L(r) L(rr) L(rrr) (any number of L(r) s concatenated) r* (r* = r + ϵ) {ϵ} L(r) L(rr) L(rrr)... (s) L(s) 26

Regular Expressions A syntax (notation) to specify regular languages. RE r r s Language L(r) L(r) L(s) rs {rs r L(r), s L(s)} r+ L(r) L(rr) L(rrr) r* (r* = r + ϵ) {ϵ} L(r) L(rr) L(rrr)... (any number of r s concatenated) (s) L(s) 27

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 28

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 29

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 30

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 31

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 32

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 33

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 34

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 35

Examples of Expressions Solution RE Language a bc (b c)a a ϵ a* b {a, bc} {ba, ca} {a} {ϵ, a, aa, aaa, aaaa, } {b} ab* {a, ab, abb, abbb, abbbb,...} ab* c + {a, ab, abb, abbb, abbbb,... } {c,cc,ccc, } (a b)* {ϵ, a, b, aa, ab, ba, bb, aaa, aab,...} (0 1)*0 binary numbers ending in 0 36

Regular Expressions for Programming Languages Let letter stand for A B C... Z Let digit stand for 0 1 2... 9 integer constant: identifier: real constant: 37

Recognizers for Regular Expressions Example 1: integer constant RE: digit + digit FSA: S0 digit S2 38

Recognizers for Regular Expressions(Cont.) Example 2: identifier RE: letter(letter digit) * letter FSA: S0 letter S2 digit 39

Recognizers for Regular Expressions(Cont.) Example 3: Real constant RE: digit*.digit + digit digit FSA: S0. S1 digit S2 40

Finite State Automata 0 S1 0 S0 S3 1 S2 1 A Finite-State Automaton is a quadruple: < S, s, F, T > S is a set of states, e.g., {S0, S1, S2, S3} s is the start state, e.g., S0 F is a set of final states, e.g., {S3} T is a set of labeled transitions, of the form (state, input) state [i.e., S Σ S] 41

Finite State Automata Transitions can be represented using a transition table: S0 0 1 S1 S2 0 1 S3 State 0 1 S0 S1 S2 S1 S3 - S2 - S3 Input An FSA accepts or recognizes an input string N iff there is some path from start state to a final state such that the labels on the path spell N. Lack of entry in the table (or no arc for a given character) indicates an error reject. 42

Finite State Automata Transitions can be represented using a transition table: S0 0 1 S1 S2 0 1 S3 State 0 1 S0 S1 S2 S1 S3 - S2 - S3 Input An FSA accepts or recognizes an input string N iff there is some path from start state to a final state such that the labels on the path spell N. Lack of entry in the table (or no arc for a given character) indicates an error reject. Error 43

Practical Recognizers Recognizer should be a deterministic finite automaton (DFA) Read until the end of a token Report errors (error recovery) identifier letter (a b c... z A B C... Z) digit (0 1 2 3 4 5 6 7 8 9) id letter (letter digit) Recognizer for identifier: (transition diagram) letter other S0 S1 S2 digit other letter/digit error S3 error 44

Implementation: Table for the recognizer Two tables control the recognizer. char_class: a - z A - Z 0-9 other value letter letter digit other next_state: class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 To change languages, we can just change tables. 45

Implementation: Code for the recognizer letter other S0 S1 S2 S3 digit other error letter/digit class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 error char next_char(); state S0; done false; while( not done ) { class char_class[char]; state next_state[class,state]; switch(state) { case S1: /* building an id */ token_value token_value + char; char next_char(); if (char == EOF) done = true; break; case S2: /* error state */ case S3: /* error state */ token type = error; done = true; break; } } return token_type; 46

Improved efficiency Table driven implementation is slow relative to direct code. Each state transition involves: 1. Classifying the input character 2. Finding the next state 3. An assignment to the state variable 4. A trip through the case statement logic 5. A branch (while loop) We can do better by encoding the state table in the scanner code 1. Classify the input character 2. Test character class locally 3. Branch directly to next state This takes many fewer instructions. 47

Implementation: Faster scanning S0: char next_char(); token value ; class char_class[char]; if (class!= letter) goto S3; S1: token value token_value + char; char next_char(); class char_class[char]; if (class!= other) goto S1; if (char == DELIMITER ) token type = identifier; return token_type; goto S2; S2: S3: token type error; return token type; class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 48

Implementation: Faster scanning S0: char next_char(); token value ; class char_class[char]; if (class!= letter) goto S3; S1: token value token_value + char; char next_char(); class char_class[char]; if (class!= other) goto S1; if (char == DELIMITER ) token type = identifier; return token_type; goto S2; S2: S3: token type error; return token type; char next_char(); state class S0; S0 S1 S2 S3 done false; while( letter not done S1 ) { S1 class digit char_class[char]; S3 S1 state next_state[class,state]; switch(state) other S3{ S2 case S1: /* building an id */ token_value token_value + char; char next_char(); if (char == EOF) done = true; break; case S2: /* error state */ case S3: /* error state */ token type = error; done = true; break; } } return token_type; Faster code Original code 49

Next Lecture Things to do: Read Scott, Chapters 2.3-2.5 50