Compiler Design. Lexical Analysis

Size: px
Start display at page:

Download "Compiler Design. Lexical Analysis"

Transcription

1 Compiler Design Lexical Analysis

2 What it Lexical Analysis It is the phase where the compiler reads the text from device source program lexical analyzer token get next token parser symbol table Reading has to be character by character, but buffering is helps

3 What it Lexical Analysis It detects valid tokens, which are equivalent to words in a text For example, from the following code segment, If xval <= yval then result := yval The lexical analyzer will detect the following unbreakable meaning ful components if, xval, <=, yval, then, result, :=, yval

4 What it Lexical Analysis It will check if these meaningful components, which are like words, are valid from the point of view of the given language. If a components is valid according to the language, then Lexical Analyser will determine its type. Or Token. In the example, tokens are identified as follows: if == keyword <= == logical operator xval == identifier := == assignment operator yval == identifier result == identifier

5 What it Lexical Analysis Question: What is a valid token? Answer: There will be set rules to define valid tokens. For example, rule an identifier is usually like starts with alphabet and followed by a combination of alphabet and digit or nothing So, xval is a valid identifier. But, if we write 9xval, it won t be a valid identifier. In fact, generally, it will be valid nothing. No keyword or operator or numeric value.

6 Valid Tokens Valid operators : fixed strings =, <, <=, >, >= Valid numeric: there is a pattern Starts with digit Followed by digit There may be a decimal Digits after decimal E E-5 Then there may be an E (exponent) followed by signed or unsigned integer

7 File Reading is overhead Most expensive phase of Compiler because it reads the text from device extensive Input operation Though it processes the input character by character for matching set patterns Better to read a block (say 1024 bytes) at a time and place it in a buffer. Afterwards process from buffer. Why? Every read involves a system call, therefore context switch. It is more time saving to have one system call per 1024 characters than one per character.

8 File Reading is overhead So, the following happen in a loop for each character. 1. Reads a character from the disk (System Call) 2. Compares with a pattern (User mode) 3. changes state (User Mode) 4. Go to 1 So, we see that for every input character read there is a system call. It means context changes from user to system. After the read, the context is changed from system to user again.

9 What it Lexical Analysis Context Change being an overhead, for every input character we are incurring 2 such overheads. If we could read, say 1024 characters at once, through a single system call, it will save context switch time by 1024 times.

10 Lexical Analyzer tasks: Loop 1 to 2 until end of file 1. Read a character from disk THIS IS A SYSTEM CALL CPU changes context to OS, thus saving PCB of user process 2. Match pattern Here the CPU goes back to the user mode, changing context, saving PCB of OS and retrieving PCB of the user process So, if there is a file of 10,000 characters, context change will take place 20,000 times.

11 User Process OS Pattern match Pattern match Pattern match Context switch Context switch Context switch Context switch Context switch Context switch Read a character Read a character Read a character Instead, if we could 2048 context switches for 1024 characters!!!!

12 User Process OS Context switch Read 1024 Read 1024 characters characters Only 2 context switches instead of 2048 Pattern match Context switch

13 How it happens using a buffer Base n newval5=oldval*12 Forward

14 How it happens using a buffer e Base n newval=oldval*12 Forward

15 How it happens using a buffer ew Base n newval=oldval*12 Forward

16 How it happens using a buffer ewv Base n newval=oldval*12 Forward

17 How it happens using a buffer ewva Base n newval=oldval*12 Forward

18 How it happens using a buffer ewval Base n newval5=oldval*12 Forward

19 How it happens using a buffer ewval5 Base n newval5=oldval*12 Forward

20 How it happens using a buffer ewval5 Base n newval=oldval*12 = Forward

21 How it happens using a buffer ewval5 Base n newval=oldval*12 Forward = Retract; Return (Gettoken ) The string between BP and FP is the next token (if it has adhered to any rule)

22 How it happens using a buffer Base newval=oldval*12 Forward = Retract; Return (Gettoken ) BP is sprung forward to FP; Now the next token will be found.

23 Okay, input characters are read in blocks and put in a buffer / array in main memory. The characters are scanned one by one. What follows, is that when the last character in the buffer has been read and processed, the buffer needs to be reloaded Consider a scenario where the buffer ends before the variable/ identifier oldval is complete. Base newval = old Forward

24 Obviously, next block has to be read into the buffer As a result, the current buffer is overwritten. Base val * 12. Forward Therefore, previous content of the buffer is lost. BP doesn t point to the earlier content.

25 Way out : Two buffers, or split buffers to be reloaded alternatively Initially, both buffers are empty. Base Forward

26 Way out : Two buffers, or split buffers to be reloaded alternatively Read the first one. Base newval = old Forward

27 Way out : Two buffers, or split buffers to be reloaded alternatively Read the first one. Base newval = old Forward

28 Way out : Two buffers, or split buffers to be reloaded alternatively Read the first one. Keep on scanning and processing till the FP reaches the last character. That means, the lexical analyzer is inside the last potential token. Base newval = old Forward

29 Way out : Two buffers, or split buffers to be reloaded alternatively Read the next block and reload the second buffer. Base newval = old val * 12. Forward

30 Way out : Two buffers, or split buffers to be reloaded alternatively Read the next block and reload the second buffer. Move the forward pointer by one character. It therefore enters the second buffer. Continue processing as if it is a single buffer. Base newval = old val * 12. Forward

31 Way out : Two buffers, or split buffers to be reloaded alternatively Read the next block and reload the second buffer. Move the forward pointer by one character. It therefore enters the second buffer. Continue processing as if it is a single buffer. Base newval = old val * 12. Forward

32 Necessity of buffering (A) To read a block of characters at a time, thus reducing context switching time and I/O time as well. Otherwise, the process would be : Loop: I/O request to read next character /* System Call */ execute processing logic /* User mode */ go back to Loop. There would be 1024 system calls for reading 1024 characters. Also, separately requesting the disk controller to read each character. Using block read and buffering, the one system call and I/O request is issued per block of characters, for example, 1024 characters.

33 Necessity of buffering (B) Some times, the lexical analyzer might have to look ahead in order to identify a token. Example. There are two similar looking FORTRAN statements, along with their meaning. (i) DO 5 I = 1.25 /* DO5I is a variable. Set its value to 1.25 */ /* FORTRAN allows spaces in variable name */ (ii) DO 5 I = 1,25 /* Execute line#5 for I = 1 to 25 */ BP FP Look ahead The difference of the two is in the. and, between 1 and 25. Lexical analyzer will have to read forward (look ahead), halting the FP to detect the presence of a,. After that, it goes back to FP.

34 Limitation of Buffer Pair (1) If look ahead character is beyond the end of buffer Such as in PL/1 DECLARE (ARG1,ARG2,.ARGn) To determine if DECLARE is an array or key word, it has look ahead till the closing ) (2) With every character scan, lexical analyzer has to check if it is end of block. Since there are two buffers, it has to check two times, if it is end of buffer 1 or buffer 2. Algorithm and its alternative are furnished in the following two slides.

35 Algorithm With every character read, the program has to check end of buffer. FWD = End of 1 st Half? NO YES Reload 2 nd Half Reload 1st Half YES FWD = End of 2 nd Half? Move FWD to the beginning of the 1 st Half NO FWD := FWD + 1 Back to loop Check 1st buffer. If not end, then check 2nd buffer

36 Alternative (more efficient) FWD = $? NO Back to loop Use sentinel value for end of buffer. Put a $ at the end of each buffer half. Reload 1st Half Move FWD to the beginning of the 1 st Half $ $ YES It must be the end of 1 st half YES FWD = End of 2 nd Half? NO Reload 2nd Half Back to loop Happens once out of 1024 Though it looks like two checks, but the second check comes only when FWD = $ Which happens once for each buffer. If each buffer is of 1024 characters, then for 1023 times the second check does not happen

37 Specification of Tokens Regular expressions are used for recognizing patterns Let s understand how Regular Pattern is represented. First of all, a single character is Regular Language in itself. (1) Union Operation on Regular Languages Alphabet A is itself a Regular Language. Likewise, B and also C and so on. Therefore, {A,B,C,.Z} will be regular since it is the union of regular languages.

38 Specification of Tokens Say, L = {A, B, C,..Z,a,b,c,.z} /* all the letters */ and D = {0,1,2,3,4,5,6,7,8,9} /* all the digits*/ Then UNION: L U D = {A, B, C,..Z,a,b,c,.z, 0,1,2,3,4,5,6,7,8,9} is also a regular language, representing the set of all letters and digits. (2) Concatenation L. D consists of each element of L concatenated with each element of D, thus resulting in the set {A0, A1, A2.A9,B0,B1,B2 B9..z9} Which is also regular

39 Specification of Tokens (2) Exponent / Kleene Closure L is regular. So, the concatenation, L L, which can be written as (L ) 2 is also regular Similarly, (L ) 3, (L ) 4, (L ) 5, etc. are all regular. And then, ɛ is regular and (L ) 0 = ɛ Therefore, (L ) 0 + (L ) 1 + (L ) 2 + (L ) 3 + (L ) 4 +. Any number of elements Is regular. This is called Kleene Closure (L )*

40 Specification of Tokens (2) Transpose / Reverse Reverse of regular string is also regular. Set of reverses of all strings in a regular language is also regular So, {A0,A1,A2} being regular, {0A,1A,2A} is also regular.

41 Specification of Tokens Regular Expressions A regular language is represented by a regular expression. As mentioned before, a single character is a regular expression. Like regular languages, union, concatenation, Kleene closure and reversal of a regular expression result in regular expressions. So, if R1 and R2 are two regular expressions, then following are also: R1+R2 R1.R2 R1* alternatively, R1 R2 And any combination of the above three

42 Specification of Tokens Example: Identifiers in Pascal Regular Expressions Starts with an alphabet followed by any number of alphabets and digits in any order. Letter A B C. Z a b c z Digit Id Letter. (Letter Digit ) * Note: (1) Letter and Digit have to be defined before Id. (2) Character class notation can also be used to declare Letter and Digit. Letter [A-Za-z] Digit [0-9]

43 Specification of Tokens Regular Expressions Example: Floating point numbers Rule: Digits, optionally separated by a., optionally followed by E and signed/unsigned digit Digit [0-9] Digits Digit.(Digit )* Optional_Fraction. Digits ɛ Optional_Exponent (E (+ - ɛ) digits) ɛ Floating _point_number Digits. Optional_Fraction. Optional_Exponent

CSc 453 Lexical Analysis (Scanning)

CSc 453 Lexical Analysis (Scanning) CSc 453 Lexical Analysis (Scanning) Saumya Debray The University of Arizona Tucson Overview source program lexical analyzer (scanner) tokens syntax analyzer (parser) symbol table manager Main task: to

More information

UNIT II LEXICAL ANALYSIS

UNIT II LEXICAL ANALYSIS UNIT II LEXICAL ANALYSIS 2 Marks 1. What are the issues in lexical analysis? Simpler design Compiler efficiency is improved Compiler portability is enhanced. 2. Define patterns/lexeme/tokens? This set

More information

Figure 2.1: Role of Lexical Analyzer

Figure 2.1: Role of Lexical Analyzer Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

More information

Chapter 3: Lexical Analysis

Chapter 3: Lexical Analysis Chapter 3: Lexical Analysis A simple way to build a lexical analyzer is to construct a diagram that illustrates the structure of tokens of the source language, and then to hand translate the diagram into

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z) G22.2130-001 Compiler Construction Lecture 4: Lexical Analysis Mohamed Zahran (aka Z) mzahran@cs.nyu.edu Role of the Lexical Analyzer Remove comments and white spaces (aka scanning) Macros expansion Read

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CD Assignment I. 1. Explain the various phases of the compiler with a simple example. CD Assignment I 1. Explain the various phases of the compiler with a simple example. The compilation process is a sequence of various phases. Each phase takes input from the previous, and passes the output

More information

Lexical Analyzer Scanner

Lexical Analyzer Scanner Lexical Analyzer Scanner ASU Textbook Chapter 3.1, 3.3, 3.4, 3.6, 3.7, 3.5 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Main tasks Read the input characters and produce

More information

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer CMPSC 470 Lecture 02 Topics: Regular Expression Transition Diagram Lexical Analyzer Implementation A. Overview A.a) Role of Lexical Analyzer Lexical Analysis - 1 Lexical analyzer does: read input character

More information

Zhizheng Zhang. Southeast University

Zhizheng Zhang. Southeast University Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

More information

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke CSCI-GA.2130-001 Compiler Construction Lecture 4: Lexical Analysis I Hubertus Franke frankeh@cs.nyu.edu Role of the Lexical Analyzer Remove comments and white spaces (aka scanning) Macros expansion Read

More information

1. Lexical Analysis Phase

1. Lexical Analysis Phase 1. Lexical Analysis Phase The purpose of the lexical analyzer is to read the source program, one character at time, and to translate it into a sequence of primitive units called tokens. Keywords, identifiers,

More information

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below. UNIT I Translator: It is a program that translates one language to another Language. Examples of translator are compiler, assembler, interpreter, linker, loader and preprocessor. Source Code Translator

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

ECS 120 Lesson 7 Regular Expressions, Pt. 1

ECS 120 Lesson 7 Regular Expressions, Pt. 1 ECS 120 Lesson 7 Regular Expressions, Pt. 1 Oliver Kreylos Friday, April 13th, 2001 1 Outline Thus far, we have been discussing one way to specify a (regular) language: Giving a machine that reads a word

More information

Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis Chapter 3 Lexical Analysis Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata Design of lexical analyzer generator The role of lexical

More information

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis (ASU Ch 3, Fig 3.1) Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program

More information

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language. UNIT I LEXICAL ANALYSIS Translator: It is a program that translates one language to another Language. Source Code Translator Target Code 1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System

More information

DVA337 HT17 - LECTURE 4. Languages and regular expressions

DVA337 HT17 - LECTURE 4. Languages and regular expressions DVA337 HT17 - LECTURE 4 Languages and regular expressions 1 SO FAR 2 TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012 Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved

More information

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore (Refer Slide Time: 00:20) Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 4 Lexical Analysis-Part-3 Welcome

More information

Monday, August 26, 13. Scanners

Monday, August 26, 13. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Wednesday, September 3, 14. Scanners

Wednesday, September 3, 14. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Compiler course. Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis Compiler course Chapter 3 Lexical Analysis 1 A. A. Pourhaji Kazem, Spring 2009 Outline Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Finite automata

More information

COS 140: Foundations of Computer Science

COS 140: Foundations of Computer Science COS 140: Foundations of Computer Science Variables and Primitive Data Types Fall 2017 Introduction 3 What is a variable?......................................................... 3 Variable attributes..........................................................

More information

Buffering Techniques: Buffer Pairs and Sentinels

Buffering Techniques: Buffer Pairs and Sentinels Week 3 Lexical Analysis Tasks of Lexical Analysis Why separating lexical analysis and parsing? Tokens, Patterns and Lexemes Complex tokens like identifier and numeral are described using regularexpression

More information

CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end. A Pascal program program xyz(input, output); var A, B, C: integer; begin A := B + C * 2 end. Input from the file is read to a buffer program buffer program xyz(input, output) --- begin A := B + C * 2 end.

More information

C Language, Token, Keywords, Constant, variable

C Language, Token, Keywords, Constant, variable C Language, Token, Keywords, Constant, variable A language written by Brian Kernighan and Dennis Ritchie. This was to be the language that UNIX was written in to become the first "portable" language. C

More information

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by The term languages to mean any set of string formed from some specific alphaet. The notation of concatenation can also e applied to languages. If L and M are languages, then L.M is the language consisting

More information

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language. INRODUCTION TO COMPILING UNIT I- LEXICAL ANALYSIS Translator: It is a program that translates one language to another. Types of Translator: 1.Interpreter 2.Compiler 3.Assembler source code Translator target

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively Regular expressions: a regular expression is built up out of simpler regular expressions using a set of defining rules. Regular expressions allows us to define tokens of programming languages such as identifiers.

More information

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization Second round: the scanner Lexical analysis Syntactical analysis Semantical analysis Intermediate code generation Optimization Code generation Target specific optimization Lexical analysis (Chapter 3) Why

More information

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering TEST 1 Date : 24 02 2015 Marks : 50 Subject & Code : Compiler Design ( 10CS63) Class : VI CSE A & B Name of faculty : Mrs. Shanthala P.T/ Mrs. Swati Gambhire Time : 8:30 10:00 AM SOLUTION MANUAL 1. a.

More information

Programming in C++ 4. The lexical basis of C++

Programming in C++ 4. The lexical basis of C++ Programming in C++ 4. The lexical basis of C++! Characters and tokens! Permissible characters! Comments & white spaces! Identifiers! Keywords! Constants! Operators! Summary 1 Characters and tokens A C++

More information

COS 140: Foundations of Computer Science

COS 140: Foundations of Computer Science COS 140: Foundations of Variables and Primitive Data Types Fall 2017 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 29 Homework Reading: Chapter 16 Homework: Exercises at end of

More information

Dixita Kagathara Page 1

Dixita Kagathara Page 1 2014 Sem - VII Lexical Analysis 1) Role of lexical analysis and its issues. The lexical analyzer is the first phase of compiler. Its main task is to read the input characters and produce as output a sequence

More information

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language? The Front End Source code Front End IR Back End Machine code Errors The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed

More information

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS308 Compiler Principles Lexical Analyzer Li Jiang CS308 Lexical Analyzer Li Jiang Department of Computer Science and Engineering Shanghai Jiao Tong University Content: Outline Basic concepts: pattern, lexeme, and token. Operations on languages, and regular

More information

CSc 453 Compilers and Systems Software

CSc 453 Compilers and Systems Software CSc 453 Compilers and Systems Software 3 : Lexical Analysis I Christian Collberg Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2009 Christian Collberg August 23, 2009

More information

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer As the first phase of a compiler, the main task of the lexical analyzer is to read the input

More information

Compiler Design Concepts. Syntax Analysis

Compiler Design Concepts. Syntax Analysis Compiler Design Concepts Syntax Analysis Introduction First task is to break up the text into meaningful words called tokens. newval=oldval+12 id = id + num Token Stream Lexical Analysis Source Code (High

More information

[Lexical Analysis] Bikash Balami

[Lexical Analysis] Bikash Balami 1 [Lexical Analysis] Compiler Design and Construction (CSc 352) Compiled By Central Department of Computer Science and Information Technology (CDCSIT) Tribhuvan University, Kirtipur Kathmandu, Nepal 2

More information

CS 403: Scanning and Parsing

CS 403: Scanning and Parsing CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

More information

Compiler Construction LECTURE # 3

Compiler Construction LECTURE # 3 Compiler Construction LECTURE # 3 The Course Course Code: CS-4141 Course Title: Compiler Construction Instructor: JAWAD AHMAD Email Address: jawadahmad@uoslahore.edu.pk Web Address: http://csandituoslahore.weebly.com/cc.html

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1 Agenda Overview of language recognizers Basic concepts of formal grammars Scanner Theory

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

Pioneering Compiler Design

Pioneering Compiler Design Pioneering Compiler Design NikhitaUpreti;Divya Bali&Aabha Sharma CSE,Dronacharya College of Engineering, Gurgaon, Haryana, India nikhita.upreti@gmail.comdivyabali16@gmail.com aabha6@gmail.com Abstract

More information

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

More information

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A 1. What is a compiler? (A.U Nov/Dec 2007) A compiler is a program that reads a program written in one language

More information

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements Programming, Data Structures and Algorithms Prof. Shankar Balachandran Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 05 I/O statements Printf, Scanf Simple

More information

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages. Lecture 3 CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

More information

Crafting a Compiler with C (V) Scanner generator

Crafting a Compiler with C (V) Scanner generator Crafting a Compiler with C (V) 資科系 林偉川 Scanner generator Limit the effort in building a scanner to specify which tokens the scanner is to recognize Some generators do not produce an entire scanner; rather,

More information

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Lexical Analysis Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

A simple syntax-directed

A simple syntax-directed Syntax-directed is a grammaroriented compiling technique Programming languages: Syntax: what its programs look like? Semantic: what its programs mean? 1 A simple syntax-directed Lexical Syntax Character

More information

Lexical Analysis/Scanning

Lexical Analysis/Scanning Compiler Design 1 Lexical Analysis/Scanning Compiler Design 2 Input and Output The input is a stream of characters (ASCII codes) of the source program. The output is a stream of tokens or symbols corresponding

More information

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

More information

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Lexical Analysis Chapter 1, Section 1.2.1 Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens

More information

The Language for Specifying Lexical Analyzer

The Language for Specifying Lexical Analyzer The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions The discussion centers around

More information

Assignment 1 (Lexical Analyzer)

Assignment 1 (Lexical Analyzer) Assignment 1 (Lexical Analyzer) Compiler Construction CS4435 (Spring 2015) University of Lahore Maryam Bashir Assigned: Saturday, March 14, 2015. Due: Monday 23rd March 2015 11:59 PM Lexical analysis Lexical

More information

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console Scanning 1 read Interpreter Scanner request token Parser send token Console I/O send AST Tree Walker 2 Scanner This process is known as: Scanning, lexing (lexical analysis), and tokenizing This is the

More information

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

More information

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1 CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University Sukree Sinthupinyo 1 1 Department of Computer Engineering Chulalongkorn University 14 July 2012 Outline Introduction 1 Introduction 2 3 4 Transition Diagrams Learning Objectives Understand definition of

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

CS122 Lecture 15 Winter Term,

CS122 Lecture 15 Winter Term, CS122 Lecture 15 Winter Term, 2017-2018 2 Transaction Processing Last time, introduced transaction processing ACID properties: Atomicity, consistency, isolation, durability Began talking about implementing

More information

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

More information

Lecture 4: Syntax Specification

Lecture 4: Syntax Specification The University of North Carolina at Chapel Hill Spring 2002 Lecture 4: Syntax Specification Jan 16 1 Phases of Compilation 2 1 Syntax Analysis Syntax: Webster s definition: 1 a : the way in which linguistic

More information

CPS 506 Comparative Programming Languages. Syntax Specification

CPS 506 Comparative Programming Languages. Syntax Specification CPS 506 Comparative Programming Languages Syntax Specification Compiling Process Steps Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens

More information

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built) Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language

More information

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points)

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points) Standard 11 Lesson 9 Introduction to C++( Up to Operators) 2MARKS 1. Why C++ is called hybrid language? C++ supports both procedural and Object Oriented Programming paradigms. Thus, C++ is called as a

More information

Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

More information

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2] 1 What is Lexical Analysis? First step of a compiler. Reads/scans/identify the characters in the program and groups

More information

Datatypes, Variables, and Operations

Datatypes, Variables, and Operations Datatypes, Variables, and Operations 1 Primitive Type Classification 2 Numerical Data Types Name Range Storage Size byte 2 7 to 2 7 1 (-128 to 127) 8-bit signed short 2 15 to 2 15 1 (-32768 to 32767) 16-bit

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08 CS412/413 Introduction to Compilers Tim Teitelbaum Lecture 2: Lexical Analysis 23 Jan 08 Outline Review compiler structure What is lexical analysis? Writing a lexer Specifying tokens: regular expressions

More information

VHDL Lexical Elements

VHDL Lexical Elements 1 Design File = Sequence of Lexical Elements && Separators (a) Separators: Any # of Separators Allowed Between Lexical Elements 1. Space character 2. Tab 3. Line Feed / Carriage Return (EOL) (b) Lexical

More information

Procedures, Parameters, Values and Variables. Steven R. Bagley

Procedures, Parameters, Values and Variables. Steven R. Bagley Procedures, Parameters, Values and Variables Steven R. Bagley Recap A Program is a sequence of statements (instructions) Statements executed one-by-one in order Unless it is changed by the programmer e.g.

More information

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones Lexical Analysis - An Introduction Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Conceptual Structure of a Compiler Source code x1 := y2

More information

(Refer Slide Time: 00:23)

(Refer Slide Time: 00:23) In this session, we will learn about one more fundamental data type in C. So, far we have seen ints and floats. Ints are supposed to represent integers and floats are supposed to represent real numbers.

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4 CS321 Languages and Compiler Design I Winter 2012 Lecture 4 1 LEXICAL ANALYSIS Convert source file characters into token stream. Remove content-free characters (comments, whitespace,...) Detect lexical

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata Formal Languages and Compilers Lecture IV: Regular Languages and Finite Automata Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington CSE 5317 Lecture 4: Scanning and parsing 28 Jan 2010 Nate Nystrom University of Texas at Arlington Administrivia hcp://groups.google.com/group/uta- cse- 3302 I will add you to the group soon TA Derek White

More information

Grammars and Parsing, second week

Grammars and Parsing, second week Grammars and Parsing, second week Hayo Thielecke 17-18 October 2005 This is the material from the slides in a more printer-friendly layout. Contents 1 Overview 1 2 Recursive methods from grammar rules

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 4. Lexical Analysis (Scanning) Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office: TA-121 For

More information

Fundamentals of C Programming

Fundamentals of C Programming Introduction, Constants, variables, keywords, Comments, I/O Functions, Data types, Compilation & Execution Tejalal Choudhary tejalal.choudhary@gmail.com, tejalal.choudhary@sdbct.ac.in Department of Computer

More information

Lecture 3: Lexical Analysis

Lecture 3: Lexical Analysis Lecture 3: Lexical Analysis COMP 524 Programming Language Concepts tephen Olivier January 2, 29 Based on notes by A. Block, N. Fisher, F. Hernandez-Campos, J. Prins and D. totts Goal of Lecture Character

More information