CPSC 434 Lecture 3, Page 1

Similar documents
CS 314 Principles of Programming Languages. Lecture 3

Part 5 Program Analysis Principles and Techniques

CS 314 Principles of Programming Languages

2. Lexical Analysis! Prof. O. Nierstrasz!

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Structure of Programming Languages Lecture 3

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSc 453 Lexical Analysis (Scanning)

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Lexical Analysis. Introduction

Introduction to Lexical Analysis

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Dr. D.M. Akbar Hussain

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS415 Compilers. Lexical Analysis

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Implementation of Lexical Analysis

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Implementation of Lexical Analysis

Lexical Analysis (ASU Ch 3, Fig 3.1)

Writing a Lexical Analyzer in Haskell (part II)

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CSE 3302 Programming Languages Lecture 2: Syntax

Alternation. Kleene Closure. Definition of Regular Expressions

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 3-4

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

LECTURE 6 Scanning Part 2

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis. Lecture 2-4

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis 1 / 52

CSC 467 Lecture 3: Regular Expressions

Crafting a Compiler with C (V) Scanner generator

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Chapter 3: Lexical Analysis

Lexical Analysis. Lecture 3. January 10, 2018

UNIT II LEXICAL ANALYSIS

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Chapter 2 :: Programming Language Syntax

Monday, August 26, 13. Scanners

Wednesday, September 3, 14. Scanners

An introduction to Flex

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Lecture 3: CUDA Programming & Compiler Front End

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Figure 2.1: Role of Lexical Analyzer

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Formal Languages and Compilers Lecture VI: Lexical Analysis

Zhizheng Zhang. Southeast University

More Examples. Lex/Flex/JLex

1. Lexical Analysis Phase

Compiler phases. Non-tokens

Lexical Analysis - 2

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Compilers CS S-01 Compiler Basics & Lexical Analysis

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compiler Construction

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

Implementation of Lexical Analysis

Introduction to Lexical Analysis

CSE302: Compiler Design

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Problem: Read in characters and group them into tokens (words). Produce a program listing. Do it efficiently.

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Compilers and Interpreters

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

UNIT -2 LEXICAL ANALYSIS

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

CSCI312 Principles of Programming Languages!

Languages and Compilers

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compiler Construction

Implementation of Lexical Analysis

Compiler Construction

Dixita Kagathara Page 1

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

[Lexical Analysis] Bikash Balami

Lexical Analysis. Implementation: Finite Automata

Transcription:

Front end source code tokens scanner parser il errors Responsibilities: recognize legal procedure report errors produce il preliminary storage map shape the code for the back end Much of front end construction can be automated CPSC 434 Lecture 3, Page 1

Scanner source code tokens scanner parser il errors A scanner must recognize various parts of the language's syntax. Input is separated into tokens based on lexical analysis. x = x + y; becomes <id, x> = <id, x> + <id, y> ; Legal tokens are usually specied by regular expressions (REs). CPSC 434 Lecture 3, Page 2

Regular expressions Regular expressions represent languages. Languages are sets of strings. Operations include Kleene closure, concatenation, and union. Regular expression (a) (a) j (b) (a)(b) (a) (a) + Language f "a" g f "a", "b" g f "ab" g f "", "a", "aa",... g f "a", "aa",... g We assume closure, concatenation, union as the order of precedence. ab j cd = (ab) j (c(d )) = f "ab", "c", "cd", "cdd",... g a(bc) = f "a", "abc", "abcbc",... g CPSC 434 Lecture 3, Page 3

Recognizers From a regular expression, we can construct a deterministic nite automaton (dfa). Recognizer for identier: letter letter digit other S0 S1 S2 digit other accept S3 error identier letter! (a j b j c j ::: j z j A j B j C j ::: j Z) digit! (0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9) id! letter (letter j digit) CPSC 434 Lecture 3, Page 4

Code for the recognizer char next char(); state S0; /* code for S0 */ done false; token value "" /* empty string */ while( not done ) f g class char class[char]; state next state[class,state]; switch(state) f g case S1: /* building an id */ token value char break; next char(); token value + char; case S2: /* accept state */ token type = identifier; done = true; break; case S3: /* error */ token type = error; done = true; break; return token type; CPSC 434 Lecture 3, Page 5

Tables for the recognizer Two tables control the recognizer. char class: a? z A? Z 0? 9 other value letter letter digit other next state: class S0 S1 S2 S3 letter S1 S1 digit S3 S1 other S3 S2 To change languages, we can just change tables. CPSC 434 Lecture 3, Page 6

Improved eciency Table driven implementation is slow relative to direct code. Each state transition involves: 1. classifying the input character 2. nding the next state 3. an assignment to the state variable 4. a branch 5. a trip through the case statement logic We can do better by \encoding" the state table in the scanner code. 1. classify the input character 2. test character class locally 3. branch directly to next state This takes many fewer instructions per cycle. CPSC 434 Lecture 3, Page 7

Faster scanning token type error; char next char(); class char class[char]; if (class!= letter) goto S4; S1: token value char; char next char(); class char class[char]; if (class == other) goto S3; S2: token value token value + char; char next char(); class char class[char]; if (class!= other) goto S2; S3: token type = identifier; return token type; S4: class char class[char]; return token type; CPSC 434 Lecture 3, Page 8

So what is hard? Language features that can cause problems: reserved words PL/I had no reserved words if then then then = else; else else = then; signicant blanks FORTRAN and Algol68 ignore blanks do 10 i = 1,25 do 10 i = 1.25 string constants special characters in strings newline, tab, quote, comment delimiter nite closures some languages limit identier lengths adds states to count length FORTRAN 66! 6 characters These problems can be swept under the rug (avoided) by intelligent language design. CPSC 434 Lecture 3, Page 9

Lexical errors What is a lexical error? 1234G6 illegal character What should the scanner do? report the error try to correct it? Error correction techniques minimum distance corrections hard token recovery skip until match CPSC 434 Lecture 3, Page 10

How bad can it get? 1 INTEGERFUNCTIONA 2 PARAMETER(A=6,B=2) 3 IMPLICIT CHARACTER*(A-B)(A-B) 4 INTEGER FORMAT(10),IF(10),DO9E1 5 100 FORMAT(4H)=(3) 6 200 FORMAT(4 )=(3) 7 DO9E1=1 8 DO9E1=1,2 9 IF(X)=1 10 IF(X)H=1 11 IF(X)300,200 12 300 CONTINUE 13 END C this is a comment $ FILE(1) 14 END Example due to Dr. F.K. Zadeck of IBM Corporation CPSC 434 Lecture 3, Page 11

Automatic construction Scanner generators automatically construct code from regular expression-like descriptions. construct a dfa use state minimization techniques emit code for the scanner (table driven or direct code ) A key issue in automation is an interface to the parser. lex is a scanner generator supplied with UNIX. emits C code for scanner provides macro denitions for each token (used in the parser) flex is a modern version of lex. CPSC 434 Lecture 3, Page 12