CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Similar documents
Dr. D.M. Akbar Hussain

Formal Languages and Compilers Lecture VI: Lexical Analysis

Zhizheng Zhang. Southeast University

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

UNIT -2 LEXICAL ANALYSIS

Compiler Construction

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

1. Lexical Analysis Phase

Week 2: Syntax Specification, Grammars

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Chapter 3: Lexical Analysis

Structure of Programming Languages Lecture 3

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Dixita Kagathara Page 1

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Introduction

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Lexical Analysis (ASU Ch 3, Fig 3.1)

Assignment 1 (Lexical Analyzer)

Implementation of Lexical Analysis

Compiler Construction LECTURE # 3

Front End: Lexical Analysis. The Structure of a Compiler

UNIT II LEXICAL ANALYSIS

CS 314 Principles of Programming Languages. Lecture 3

2. Lexical Analysis! Prof. O. Nierstrasz!

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 2-4

CS 314 Principles of Programming Languages

CS415 Compilers. Lexical Analysis

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

UNIT III. The following section deals with the compilation procedure of any program.

Implementation of Lexical Analysis

Assignment 1 (Lexical Analyzer)

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analyzer Scanner

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

2068 (I) Attempt all questions.

CS 3100 Models of Computation Fall 2011 This assignment is worth 8% of the total points for assignments 100 points total.

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Lexical Analyzer Scanner

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

Lexical Analysis. Implementation: Finite Automata

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CT32 COMPUTER NETWORKS DEC 2015

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Implementation of Lexical Analysis

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CPS 506 Comparative Programming Languages. Syntax Specification

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analysis 1 / 52

Regular Languages and Regular Expressions

A simple syntax-directed

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Compiling Regular Expressions COMP360

2.2 Syntax Definition

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Chapter Seven: Regular Expressions

Compiler Construction

Theory and Compiling COMP360

Automating Construction of Lexers

Compiler Construction

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Regular Expressions & Automata

Introduction to Lexical Analysis

Non-deterministic Finite Automata (NFA)

Group A Assignment 3(2)

CSc 453 Compilers and Systems Software

COMPILER DESIGN LECTURE NOTES

Question Bank. 10CS63:Compiler Design

Part 5 Program Analysis Principles and Techniques

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

QUESTIONS RELATED TO UNIT I, II And III

Writing a Lexical Analyzer in Haskell (part II)

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CMSC 132: Object-Oriented Programming II

G52LAC Languages and Computation Lecture 6

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Transcription:

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups them into tokens Tokens are of the form <token-name, attribute-value> or <token-name> Lexeme: examples of tokens Example 1: a=b+c becomes <id,1> <=> <id,2> <+> <id,3> 5 lexemes (a, =, b, +, c) 5 tokens Tokens are stored in a symbol table 2 1

What is Lexical Analysis? Example 2: Position = initial + rate * 60 <id,1> <=> <id,2> <+> <id,3> <*> <60> White space deleted in lexical analysis. No white space here 3 How to identify tokens? By finding patterns, because tokens have many different patterns. Patterns: different forms of tokens. Examples of patterns: Pattern 1: keywords: if, else, system, out, in, Pattern 2: operators: +, -, %, *, >=, <>, ==,<=, Pattern 3: variables: a, xyz, a_b, p, q2, _e, _001abc, Pattern 4: numbers: 23, 3.45, -7, 0 Other patterns.. Example: if (speed > 130 ) system.out.println( Ticket 300 SAR ); This java code has many token patterns: if, speed, >, 130, system,., out, println, (,, Ticket 300 SAR,, ), ; 4 2

Regular Expressions How to identify token patterns? Answer: By regular expressions Regular expression [we already learned this in CS 301: Theory of computation]: is a regular expression denoting the empty set { }. Every symbol a is a regular expression denoting {a}. If r 1, r 2 are two regular expressions then: r 1 * denotes zero or more occurrences of r 1 r 1 + denotes one or more occurrences of r 1 r 1 r 2 denotes concatenation r 1 r 2 denotes either r 1 or r 2 Example: regular expression for integers Suppose that, in a programming language, integers are like these: 2, 999, -50, +34, -00023, +0, -0, +000 So, regular expression for them: integer (+ ) (0 1 2 9) + Example: regular expression for decimals Suppose that decimals are like these: 0.0, 003.922, +4.001, -44.000 So, regular expression for them: decimal integer.(0 1 2 9) + Regular Expressions Example: Regular expression for identifiers of C language Identifiers (also called variables) in C languages are like these (similar to Java): a, a_1, a2, _,, _p, y98, Masud_Hasan, Identifiers are used in statements like this: a = a + 1; _p = b*a_1; But -55, 23masud, 0, 2, are not identifiers Identifier must start with a letter or _, then a letter, _, or digit can repeat So, the regular expression for identifier is: id (letter _) (letter _ digit)* But, we also need to say what is letter and what is digit? Complete regular expression is: letter A B C Z a b c z digit 0 1 2 9 id (letter _) (letter _ digit)* Example: Regular expression for White Space ws (blank tab new line) + 6 3

Transition Diagrams Transition diagrams: An intermediate step after regular expression, pictorial, like a graph, easier way to understand patterns. Accepter (says yes if match). Example: Transition diagram for relational operators (relop) Relational operators (relop) in Pascal are: =, <, >, <>, <=, >= - Circle means state - Double circle means accepting state Equivalent regular expression: relop < > = < > < = >= 7 Finite Automata (learned in CS 301) Finite Automata: They are graphs, like transition diagrams But, they are recognizer, that means, they say yes or no. If string finished and final state, then YES. If string finished and not final state, or final state but string not finished, then NO. Two types: NFA and DFA are same. For any regular expression, there are equivalent NFA and DFA. 8 4

Example: NFA for (a b)*abb NFA Try some examples for this NFA: YES: aaaabb, ababababb, abb,. NO: a, bb, b, abab, abbbb,. 9 Example: NFA for aa* bb* NFA means transition without anything. You can add anywhere any number of times. Try some examples for this NFA: YES: b, a, aa, bb, aaa, bbb,.. For example, a = a, so YES. NO: ab, bba, ba, abab,. 10 5

DFA Example: NFA for (a b)*abb (from previous slide) Equivalent DFA for (a b)*abb NFA are easier. DFA are difficult. Try some examples for this DFA: YES: aaaabb, ababababb, abb,. NO: a, bb, b, abab,. 11 Write Program for Lexical Analyzer Lexical analyzer itself is a program. Actually, the whole compiler is a program. However, this program must be written in a language that already exists. For example, if you want to write a new programming language D now in year 2016, then its lexical analyzer (and also other parts of the compiler of D) must be written in a language that is available now, such as C, C++, Java, Python, etc. The algorithm for writing the lexical analyzer program is based on the NFA or DFA. The program sans the input program (D language program) and identifies the tokens and put them in symbol table. 12 6

Overall Picture of Lexical Analyzer of a Language like C, C++, Java, Identify all possible patterns in the language: letter, digit, number, keywords, arithmetic operators, logical operators, For each of them, write regular expression. For each pattern, construct NFA or DFA. Draw transition diagrams for convenience and as an intermediate step. Combine them in one diagram. Write program for Lexical Analyzer accordingly in an existing different language to identify the tokens. Take input program (C or C++ or Java ) as string. Run the program for Lexical Analyzer and identify the tokens and store them in symbol tables. 13 Example: Overall Picture of Lexical Analyzer of a Language - Suppose that, we want to write a new programming language D. It only has two keywords KSA and 123. We want to write its compiler in C. As a first step, we write only Lexical Analyzer. The two regular expressions are: KSA and 123. Combined regular expression is: KSA 123. NFA: Start Start K 1 S 2 A 3 Combine: Start K 1 S 2 A 3 14 7

Example: Overall Picture of Lexical Analyzer of a Language - Suppose that the input D program are stored in the string input. - Following is a sample C code for Lexical Analyzer of D if (input[0] == K ) if (input[1] == S ) if (input[2] == A ) symbol_table[0] = KSA ; else if (input[0] == 1 ) if (input[1] == 2 ) if (input[2] == 3 ) symbol_table[0] = 123 ; Example 1: Example 2: my_program_1.d KSA123123 Symbol Table 0 KSA 1 123 2 123 my_program_2.d KSS Symbol Table Empty 15 8