Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Similar documents
2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Lexical Analysis (ASU Ch 3, Fig 3.1)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Front End: Lexical Analysis. The Structure of a Compiler

Zhizheng Zhang. Southeast University

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Chapter 3 Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

CS308 Compiler Principles Lexical Analyzer Li Jiang

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

1. Lexical Analysis Phase

Figure 2.1: Role of Lexical Analyzer

Compiler course. Chapter 3 Lexical Analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analyzer Scanner

UNIT -2 LEXICAL ANALYSIS

Lexical Analyzer Scanner

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis 1 / 52

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

CSc 453 Lexical Analysis (Scanning)

CSE302: Compiler Design

Lexical and Syntax Analysis

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Prof. James L. Frankel Harvard University

CS6660 COMPILER DESIGN L T P C

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Dr. D.M. Akbar Hussain

Assignment 1 (Lexical Analyzer)

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Structure of Programming Languages Lecture 3

The Language for Specifying Lexical Analyzer

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Chapter 3: Lexical Analysis

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Assignment 1 (Lexical Analyzer)

Compiler Construction LECTURE # 3

Compilers CS S-01 Compiler Basics & Lexical Analysis

CSC 467 Lecture 3: Regular Expressions

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

CS 403: Scanning and Parsing

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CS 314 Principles of Programming Languages. Lecture 3

Slides for Faculty Oxford University Press All rights reserved.

Lexical Analysis. Introduction

Compiler Construction

UNIT II LEXICAL ANALYSIS

Introduction to Lexical Analysis

Lexical Analysis. Chapter 2

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

2. Lexical Analysis! Prof. O. Nierstrasz!

CS 314 Principles of Programming Languages

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Projects for Compilers

Buffering Techniques: Buffer Pairs and Sentinels

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Lexical and Syntax Analysis

Implementation of Lexical Analysis

Group A Assignment 3(2)

Edited by Himanshu Mittal. Lexical Analysis Phase

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Lexical Analysis. Lecture 3. January 10, 2018

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter Seven: Regular Expressions

Lexical Analysis. Lecture 2-4

CMSC 132: Object-Oriented Programming II

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

CSE 401 Midterm Exam Sample Solution 2/11/15

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

2.2 Syntax Definition

Week 2: Syntax Specification, Grammars

Notes for Comp 454 Week 2

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compilers CS S-01 Compiler Basics & Lexical Analysis

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

Lexical Analysis. Finite Automata

Transcription:

Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn University 14 July 2012

Outline Introduction 1 Introduction 2 3 4 Transition Diagrams

Learning Objectives Understand definition of lexeme, token, etc. Know a method which transforms string into token Know syntax of regular expression Know concept of transition diagram and code implemented from the diagram

First step Introduction The main task is to read the input characters of the source program and export a sequence of tokens. It also interacts with the symbol as well.

First step Introduction The lexical analyzer must Strip out comments and whitespace. Correlate error messages generated by the compiler with the source program

Tokens, Patterns, and Lexemes A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of lexical unit. A pattern is a description of the form that the lexemes of a token may take. For the keyword, the pattern is just the sequence of characters that form the keyword. For identifiers and some other tokens, the pattern is a more complex structure. A lexeme is a sequence of characters in the source program that matches the pattern for a token.

Tokens, Patterns, and Lexemes printf("total = %d\n", score); printf and score are lexemes matching the pattern for token id "Total = %d\n" is a lexeme matching literal

Examples of tokens Token Informal Description Sample Lexemes if characters i, f if else characters e, l, s, e else comparison < or > or <= or >= or == or!= <=,!= id letter followed by letters and digits pi,score,d2 number any numeric constant 3.14,6.02e23 literal anything but ", surrounded by " s "core"

General concept of tokens in many programming language One token for each keyword. The pattern for a keyword is the same as the keyword itself. Tokens for the operators One token representing all identifiers One or more tokens representing constants, such as numbers and literal strings. Tokens for each punctuation symbol, such as left and right parentheses, comma, and semi colon.

Attributes for Tokens Token must have an attribute associated with. For example, an id must associate with information about identifier; e.g., its lexeme, its type, and the location at which it is first found, is kept in the symbol table.

An Example of Attributes for Tokens E = M * C ** 2 <id, pointer to symbol-table entry for E> <assign_op> <id, pointer to symbol-table entry for M> <mult_op> <id, pointer to symbol-table entry for C> <exp_op> <number, integer value 2 >

String and Language A string over an alphabet is a finite sequence of symbols drawn from that alphabet. The length of string s is usually written s. The empty string is denoted ɛ. A language is any countable set of strings over some fixed alphabet. Concatenation of string x and y is the string formed by appending y to x. For example, if x = dog and y = house, then xy = doghouse. If we think of concatenation as a product, we can define the "exponentiation" of strings as follows. Define s 0 to be ɛ, and for all i > 0, define s i to be s i 1 s. Since ɛs = s, it follows that s i = s. Then s 2 = ss,s 3 = sss, and so on.

Operations on Languages

Example Introduction Let L be the set of letters A,B,...,Z,a,b,...,z. D be the set of digits 0,1,...,9. L D is the set of letters and digits with 62 strings of length one. LD is the set of 520 strings of length two. L 4 is the set of all 4-letter strings. L is the set of all strings of letter, including ɛ. L(L D) is the set of all strings of letters and digits beginning with a letter. D + is the set of all strings of one or more digits.

Outline Introduction 1 Introduction 2 3 4 Transition Diagrams

If we want to describe the set of valid C identifiers, we can use the language L(L D) with the underscore included among the letters. If letter_ denotes any letter of the underscore, and digit stands for any digit, then we could describe the language of C identifiers by: letter_(letter_ digit) where denotes union, the parentheses are used to group subexpressions.

Language L(r) is defined recursively from the languages denoted by r s subexpressions using alphabet set. BASIS: There are two rules that form the basis: 1 ɛ is a regular expression, and L(ɛ) is {ɛ}, that is, the language whose sole member is the empty string. 2 If a is a symbol in, the a is a regular expression, and L(a) = {a}, that is, the language with one string, of length one, with a in its one position.

INDUCTION: The are four parts to the induction whereby larger expressions are built from the smaller one. Suppose r and s are regular expression denoting languages L(r) and L(s), respectively. 1 (r) (s) denotes L(r) L(s). 2 (r)(s) denotes L(r)L(s). 3 (r) denotes L(r)). 4 (r) denotes L(r). The precedence of operator is, concatenation, and. So (a) ((b) (c)) can be written as a b c

Example Let = {a, b} a b denotes the language {a, b} (a b)(a b) denotes {aa, ab, ba, bb} a denotes {a, aa, aaa,... }. (a b) denotes {ɛ, a, b, aa, ab, ba, bb, aaa,...} a a b denotes {a, b, ab, aab, aaab,...}

Definitions Introduction Regular definition is a sequence of the form d 1 r 1 d 2 r 2... d n r n

Regular Definition Example C identifiers are strings of letters, digits, and underscore. letter_ A B... Z a b... z _ digit 0 1... 9 id letter_(letter_ digit)

Extensions of +: One or more instances?: Zero or one instances [a 1 a 2... a n ]: a 1 a 2... a n or a 1 a n letter_ [A Za z_] digit [0 9] id letter_(letter_ digit)

Example Introduction Transition Diagrams

Example Introduction Transition Diagrams

Transition Diagrams Tokens, Patterns, and Attribute Values

Outline Introduction Transition Diagrams 1 Introduction 2 3 4 Transition Diagrams

Transition Diagram for relop Transition Diagrams

Example Code for relop Transition Diagrams