Programming Languages 2nd edition Tucker and Noonan"

Similar documents
Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

CSCI312 Principles of Programming Languages!

Lexical and Syntax Analysis

Lexical and Syntactic Analysis

Compiler Construction D7011E

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Syntax Intro and Overview. Syntax

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis 1 / 52

Compiler phases. Non-tokens

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

The Language for Specifying Lexical Analyzer

Part II : Lexical Analysis

CS 314 Principles of Programming Languages

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Lexical and Syntax Analysis

Figure 2.1: Role of Lexical Analyzer

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Structure of Programming Languages Lecture 3

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Compiler Construction

CSc 453 Lexical Analysis (Scanning)

Chapter 3 Lexical Analysis

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Writing a Lexical Analyzer in Haskell (part II)

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Compiler course. Chapter 3 Lexical Analysis

Monday, August 26, 13. Scanners

Wednesday, September 3, 14. Scanners

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CS 314 Principles of Programming Languages. Lecture 3

Programming Assignment II

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Lexical Analysis. Lecture 3-4

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

FSASIM: A Simulator for Finite-State Automata

CS 541 Spring Programming Assignment 2 CSX Scanner

CSc 453 Compilers and Systems Software

Parsing and Pattern Recognition

Introduction to Lexical Analysis

Implementation of Lexical Analysis

Compiler Construction LECTURE # 3

CSC 467 Lecture 3: Regular Expressions

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis. Chapter 2

UNIT -2 LEXICAL ANALYSIS

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

WARNING for Autumn 2004:

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

CSE 401/M501 Compilers

Implementation of Lexical Analysis

CSCE 314 Programming Languages

CS308 Compiler Principles Lexical Analyzer Li Jiang

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiler Construction

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

22c:111 Programming Language Concepts. Fall Syntax III

Implementation of Lexical Analysis

Lexical Analysis. Introduction

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Introduction to Parsing. Lecture 8

CMSC 350: COMPILER DESIGN

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Lexical Analysis. Lecture 2-4

Alternation. Kleene Closure. Definition of Regular Expressions

Lexical Analysis (ASU Ch 3, Fig 3.1)

ML 4 A Lexer for OCaml s Type System

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Compiling Regular Expressions COMP360

Introduction to Compiler Design

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

High Level Languages. Java (Object Oriented) This Course. Jython in Java. Relation. ASP RDF (Horn Clause Deduction, Semantic Web) Dr.

Assignment 1 (Lexical Analyzer)

Compilers CS S-01 Compiler Basics & Lexical Analysis

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Finite Automata and Scanners

Assignment 1 (Lexical Analyzer)

[Lexical Analysis] Bikash Balami

COP 3402 Systems Software Syntax Analysis (Parser)

Compilers CS S-01 Compiler Basics & Lexical Analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

MP 3 A Lexer for MiniJava

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

CSE302: Compiler Design

Transcription:

Programming Languages 2nd edition Tucker and Noonan" Chapter 3 Lexical and Syntactic Analysis Syntactic sugar causes cancer of the semicolon. " " " " " " " "A. Perlis"

Contents" 3.1 Chomsky Hierarchy" 3.2 Lexical Analysis" 3.3 Syntactic Analysis"

Lexical Analysis" Purpose: transform program representation Input: printable Ascii characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol.

Example Tokens" Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char... Operators: + - * /... Punctuation: ;, ( ) { }

Other Sequences" Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file

Why a Separate Phase?" Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs

Regular Expressions" RegExpr Meaning x a character x \x an escaped character, e.g., \n { name } a reference to a name M N M or N M N M followed by N M* zero or more occurrences of M

RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits. Any single character

Clite Lexical Syntax" Category Definition anychar [ -~] Letter [a-za-z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004

Category Definition Keyword bool char else false float if int main true while Identifier {Letter}({Letter} {Digit})* integerlit {Digit}+ floatlit {Digit}+\.{Digit}+ charlit {anychar}

Category Definition Operator = && ==!= < <= > >= + - * /! [ ] Separator :. { } ( )! Comment // ({anychar} {Whitespace})* {eol}!

Generators" Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex

Finite State Automata" Set of states: representation graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final states

Deterministic FSA" Defn: A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.

A Finite State Automaton for Identifiers

Definitions" A configuration on an fsa consists of a state and the remaining input. A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then: If no input and state is final, then accept. Otherwise, error.

An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.

Example" (S, a2i$) (I, 2i$) (I, i$) (I, $) (F, ) Thus: (S, a2i$) * (F, )

Some Conventions" Explicit terminator used only for program as a whole, not each token. An unlabeled arc represents any other valid input symbol. Recognition of a token ends in a final state. Recognition of a non-token transitions back to start state.

Recognition of end symbol (end of file) ends in a final state. Automaton must be deterministic. Drop keywords; handle separately. Must consider all sequences with a common prefix together.

Lexer Code" Parser calls lexer whenever it needs a new token. Lexer must remember where it left off. Greedy consumption goes 1 character too far peek function pushback function no symbol consumed by start state

From Design to Code" private char ch = ;" public Token next ( ) {" } "do {" " "switch (ch) {" " " "..." " "}" "} while (true);"

Remarks" Loop only exited when a token is found Loop exited via a return statement. Variable ch must be global. Initialized to a space character. Exact nature of a Token irrelevant to design.

Translation Rules" Traversing an arc from A to B: If labeled with x: test ch == x If unlabeled: else/default part of if/switch. If only arc, no test need be performed. Get next character if A is not start state

A node with an arc to itself is a do-while. Condition corresponds to whichever arc is labeled.

Otherwise the move is translated to a if/switch: Each arc is a separate case. Unlabeled arc is default case. A sequence of transitions becomes a sequence of translated statements.

A complex diagram is translated by boxing its components so that each box is one node. Translate each box using an outside-in strategy.

private boolean isletter(char c) {" "return ch >= a && ch <= z " " "ch >= A && ch <= Z ;" }

private String concat(string set) {" "StringBuffer r = new StringBuffer( );" "do {" " "r.append(ch);" " "ch = nextchar( );" "} while (set.indexof(ch) >= 0);" "return r.tostring( );" }"

public Token next( ) {" do { if (isletter(ch) { // ident or keyword" String spelling = concat(letters+digits);" return Token.keyword(spelling);" } else if (isdigit(ch)) { // int or float literal" String number = concat(digits);" if (ch!=. ) " return Token.mkIntLiteral(number);" number += concat(digits);" return Token.mkFloatLiteral(number);

} else switch (ch) {" case : case \t : case \r : case eolnch:" ch = nextch( ); break;" case eofch: return Token.eofTok;" case + : ch = nextchar( );" return Token.plusTok;" " case & : check( & ); return Token.andTok;" case = : return chkopt( =, Token.assignTok," Token.eqeqTok);

Source" " " " " " Tokens" // a first program" // with 2 comments" int main ( ) {" char c;" int i;" c = 'h';" i = c + 3;" } // main! int" main" (" )" {" char" Identifier " ;" "c"