CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

Similar documents
G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

UNIT -2 LEXICAL ANALYSIS

Formal Languages and Compilers Lecture VI: Lexical Analysis

Chapter 3: Lexical Analysis

Zhizheng Zhang. Southeast University

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Chapter 3 Lexical Analysis

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Compiler course. Chapter 3 Lexical Analysis

1. Lexical Analysis Phase

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

COMPILER DESIGN LECTURE NOTES

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Figure 2.1: Role of Lexical Analyzer

Languages and Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CS 403: Scanning and Parsing

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

CSC 467 Lecture 3: Regular Expressions

CSc 453 Lexical Analysis (Scanning)

Lexical Analysis. Introduction

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Compiler Construction

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Lexical and Syntax Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CSE302: Compiler Design

CSE302: Compiler Design

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

UNIT II LEXICAL ANALYSIS

Lexical Analysis (ASU Ch 3, Fig 3.1)

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Compiling Regular Expressions COMP360

Introduction to Lexical Analysis

Compiler Construction LECTURE # 3

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CS308 Compiler Principles Lexical Analyzer Li Jiang

Dixita Kagathara Page 1

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.


UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Lexical Analysis. Chapter 2

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Compiler Construction

CS415 Compilers. Lexical Analysis

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Lexical Scanning COMP360

Lexical Analysis. Lecture 2-4

A simple syntax-directed

Front End: Lexical Analysis. The Structure of a Compiler

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

2068 (I) Attempt all questions.

CS 314 Principles of Programming Languages. Lecture 3

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CMSC 350: COMPILER DESIGN

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Structure of Programming Languages Lecture 3

Lexical Analysis. Lecture 3-4

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Building Compilers with Phoenix

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Parsing and Pattern Recognition

Implementation of Lexical Analysis

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

CS 314 Principles of Programming Languages

LANGUAGE TRANSLATORS

UNIT III. The following section deals with the compilation procedure of any program.

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

[Lexical Analysis] Bikash Balami

Theory and Compiling COMP360

LECTURE 3. Compiler Phases

Introduction to Parsing. Lecture 8

Lexical Analyzer Scanner

Programming Language Syntax and Analysis

Introduction to Lexical Analysis

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Transcription:

CSCI-GA.2130-001 Compiler Construction Lecture 4: Lexical Analysis I Hubertus Franke frankeh@cs.nyu.edu

Role of the Lexical Analyzer Remove comments and white spaces (aka scanning) Macros expansion Read input characters from the source program Group them into lexemes Produce as output a sequence of tokens Interact with the symbol table Correlate error messages generated by the compiler with the source program

Scanner-Parser Interaction

Why Separating Lexical and Syntactic? Simplicity of design Improved compiler efficiency allows us to use specialized technique for lexer, not suitable for parser Higher portability Input-device-specific peculiarities restricted to lexer

Some Definitions Token: a pair consisting of Token name: abstract symbol representing lexical unit [affects parsing decision] Optional attribute value [influences translations after parsing] Pattern: a description of the form that different lexemes take Lexeme: sequence of characters in source program matching a pattern

Pattern Token classes One token per keyword Tokens for the operators One token representing all identifiers Tokens representing constants (e.g. numbers) Tokens for punctuation symbols

Example

Dealing With Errors Lexical analyzer unable to proceed: no pattern matches Panic mode recovery: delete successive characters from remaining input until token found Insert missing character Delete a character Replace character by another Transpose two adjacent characters

Example What tokens will be generated from the above C program?

Buffering Issue Lexical analyzer may need to look at least a character ahead to make a token decision. Buffering: to reduce overhead required to process a single character

Tokens Specification We need a formal way to specify patterns: regular expressions Alphabet: any finite set of symbols String over alphabet: finite sequence of symbols drawn from that alphabet Language: countable set of strings over some fixed alphabet

Operations Zero or one instance: r? is equivalent to r ε Character class: a b c z can be replaced by [a-z] a c d h can be replaced by [acdh]

Operations Trailing context: exp1/exp2 Match exp1 only if followed by exp2 exp2 is NOT consumed and remained to be returned in subsequent tokens Only one / is permitted per pattern Example: a/b matches a in string abbut will not match anything in a or ac

Examples Which language is generated by: (a b)(a b) a * (a b) * a a * b

Example Presenting number that can be integer with option floating point and exponential parts?

Example Presenting number that can be integer with option floating point and exponential parts? Let s analyze some possible solutions

Examples Write regular definition of all strings of lowercase letters in which the letters are in ascending order

Tokens Recognition

Implementation: Transition Diagrams Intermediate step in constructing lexical analyzer Convert patterns into flowcharts called transition diagrams nodes or circles: called states Edges: directed from state to another, labeled by symbols

Initial state Accepting or final state Actions associated with final state

Means retract the forward pointer

Examine symbol table for the lexeme found Returns whatever token name is there Places ID in symbol table if not there. Returns a pointer to symbol table entry

Reserved Words and Identifiers Install reserved words in symbol table initially OR Create transition diagram for each keyword, then prioritize the tokens so that keywords have higher preference

Implementation of Transition Diagram

Using All Transition Diagrams: The Big Picture Arrange for the transition diagrams for each token to be tried sequentially Run transition diagrams in parallel Combine all transition diagrams into one

The First Part of the Project

The First Part of the Project declarations %% translation rules %% auxiliary functions

declarations translation rules auxiliary functions

Anything between these 2 marks is copied as it is in lex.yy.c braces means the pattern is defined somewhere pattern Actions

Lecture of Today Sections 3.1 to 3.5 First part of the project assigned