Group A Assignment 3(2)

Similar documents
A Simple Syntax-Directed Translator


Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

A simple syntax-directed

Lexical Analysis (ASU Ch 3, Fig 3.1)

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Formal Languages and Compilers Lecture VI: Lexical Analysis

COMPILER DESIGN LECTURE NOTES

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

COMPILER DESIGN. For COMPUTER SCIENCE

Introduction to Compiler Design

Figure 2.1: Role of Lexical Analyzer

Part 5 Program Analysis Principles and Techniques

Lexical Analysis. Introduction

UNIT II LEXICAL ANALYSIS

VIVA QUESTIONS WITH ANSWERS

CSE302: Compiler Design

Introduction to Compiler Construction

Compiling Regular Expressions COMP360

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CSC 467 Lecture 3: Regular Expressions

Compilers and Interpreters

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Introduction to Lexical Analysis

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CPS 506 Comparative Programming Languages. Syntax Specification

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Front End. Hwansoo Han

The Structure of a Syntax-Directed Compiler

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Building Compilers with Phoenix

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

UNIT -2 LEXICAL ANALYSIS

CST-402(T): Language Processors

CS 4201 Compilers 2014/2015 Handout: Lab 1

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

Compiler Construction LECTURE # 3

Compiler Design (40-414)

Introduction to Compiler Construction

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

LECTURE 11. Semantic Analysis and Yacc

Introduction to Compiler Construction

The Structure of a Syntax-Directed Compiler

Question Bank. 10CS63:Compiler Design

Formal Languages and Compilers Lecture I: Introduction to Compilers

LECTURE 3. Compiler Phases

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

CSCE 314 Programming Languages

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CSCI312 Principles of Programming Languages!

2.2 Syntax Definition

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Time : 1 Hour Max Marks : 30

Compiler phases. Non-tokens

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Life Cycle of Source Program - Compiler Design

CS 321 IV. Overview of Compilation

Automatic Scanning and Parsing using LEX and YACC

Lexical Analysis. Finite Automata

Lexical Analysis. Chapter 2

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Undergraduate Compilers in a Day

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsing and Pattern Recognition

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Zhizheng Zhang. Southeast University

LANGUAGE TRANSLATORS

Introduction to Lexical Analysis

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Transcription:

Group A Assignment 3(2) Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Lexical analyzer using LEX. 3.1.1 Problem Definition: Lexical analyzer for sample language using LEX. 3.1.2 Perquisite: Lex, compiler construction. 3.1.3 Relevant Theory / Literature Survey: 3.1.3.1 Compiler: A compiler is a program that reads a program in one language, the source language and translates into an equivalent program in another language, the target language. Programming languages are just notations for describing computations. So, before execution, they have to be converted to the machine understandable form the machine language. This translation is done by the compiler. The translation process should also report the presence of errors in the source program. This can be diagrammatically represented as If the target program is an executable one, it can be called by the user to process inputs and produce outputs. An interpreter is similar to a compiler, except that it directly executes the program with the supplied inputs to give the output. Usually, compiler is faster than interpreter, but the interpreter has better diagnostics, since the execution is step by step. Java uses a hybrid compiler. Now, let us see a generic compiler. Here, there are two parts of compilation. The analysis part breaks up the source program into constant piece and creates an intermediate representation of the source program. The synthesis part constructs the SNJB s Late Sau. KBJ College Of Engineering, Chandwad 1

desired target program from the intermediate representation and optimizes it. Detailed structure showing the different phases of a compiler is given below. While compiling, each word of the code is separated and then converted to object code. In programming, the words are formed by: Keywords. Identifiers. Operators. Now, let us see the different phases of the compiler. 1. Lexical Analysis: This is a linear analysis. Here, the scanner reads a character stream and converts it into a token stream. White space (space, tab, return, formfeed) and comments are ignored. Identifiers, numbers, operators, keywords, punctuation symbols etc. These tokens are the basic entities of the language. The character string associated with a token is called its Lexeme. The scanner produce error messages. It also stores the information in the symbol table. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 2

2. Syntactic Analysis: This is a hierarchical analysis, also called parsing. Syntax refers to the structure or grammar of the language. The parser groups tokens into grammatical phrases corresponding to the structure of the language and a parse tree is generated. Syntactical errors are determined with the help of this parse tree. 3. Semantic Analysis: Semantics refers to meaning. This phase converts the parse tree into an abstract syntax tree which is less dependent on the particulars of any specific language. Parsing and the construction of an abstract syntax tree are often done at the same time. The parse is done (controlled) according to the language definition and the output of a successful parse is an equivalent abstract syntax tree. Many different languages can be parsed into the same abstract syntax, making the following phases somewhat language independent. In a typed language the abstract syntax is type checked. and an abstract syntax tree is generated corresponding to each phrase. 4. Intermediate Code Generation: This phase produces an intermediate representation (IT trees) a notation that isn't tied to any particular source or target language. From this point on the same compiler units can be used for any language and we can convert this to many different assembly languages. This is a program fro abstract machine and it is easy to produce and translate. Examples for intermediate codes are three address codes, postfix notations, directed acyclic graphs etc. 5. Code Optimization: Code optimization is the process of modifying a intermediate code to improve its efficiency. Removing redundant or unreachable codes, propagating constant values, optimizing loops etc. are some of the methods by which we can achieve this. 6. Code Generation: This phase generates the target code which is relocatable. Allocating memory for each variables, translating intermediate instruction into machine instruction etc. are functions of this phase. SNJB s Late Sau. KBJ College Of Engineering, Chandwad 3

Symbol Table: It is a data structure with a record for each identifier used in the program. This includes variables, user defined type names, functions, formal arguments etc. Attributes of this record are Storage size, Type, Scope (visible within what language blocks), Number and types of arguments etc.possible structures used for its implementation are Arrays, Linked Lists, Binary Search Tree and Hash Table. Error handling: Each analysis phase may produce errors. Error messages should be meaningful. It should indicate the location in the source file. Ideally, the compiler should recover and report as many errors as possible rather than die the first time it encounters a problem. 3.1.3.2 Lexical Analysis This is a linear analysis. Here, the scanner reads a character stream and converts it into a sequence of symbols called lexical tokens or just tokens. White space (space, tab, return, formfeed) and comments are ignored. Identifiers, numbers, operators, keywords, punctuation symbols etc. These tokens are the basic entities of the language. The character string associated with a token is called its Lexeme. The scanner produce error messages. It also stores the information in the symbol table. The purpose of producing these tokens is usually to forward them as input to another program, such as a parser. The block diagram of a lexical analyzer is given below. For each lexeme, the lexical analyzer produces as output a token of the form, <token-name, attribute value> that it passes on to the subsequent phase, syntax analysis. In the token, the first component token-name is an abstract symbol that is used during syntax analysis, and the second component attribute-value points to an entry in the symbol table for this SNJB s Late Sau. KBJ College Of Engineering, Chandwad 4

token. Information from symbol-table entry is needed for semantic analysis and code generation. For example, suppose a source program contains the assignment statement position = initial + rate * 60 The characters in this assignment are grouped into the following lexemes and mapped into the following tokens passed on to the syntax analyzer: 1. position is a lexeme that would be mapped into a token <id,1>, where id is an abstract symbol that is standing for identifier and 1 points to the symboltable entry for position. The symbol table entry for an identifier holds information about the identifier, such as its name and type. 2. The assignment symbol = is a lexeme that is mapped into the token <=>. This token needs no attribute-value. 3. initial is a lexeme that is mapped into the token <id,2>, where 2 points to the symbol-table entry for initial. 4. + is a lexeme that is mapped into the token <+>. 5. rate is a lexeme mapped into the token <id,3>, where 3 points to the symboltable entry for rate. 6. * is a lexeme that is mapped into the token <*>. 7. 60 is a lexeme that is mapped into the token <60>. Finally, we get <id,1> < = > <id,2> <+> <id,3> <*> <60>. 3.1.3.3 Specification of tokens Alphabet : Finite set of symbols eg., L = {A,B,,Z,a,b, z}, D = {0,1, 9} String : Finite sequence of symbols drawn from alphabet. eg., aba, abba, aac SNJB s Late Sau. KBJ College Of Engineering, Chandwad 5

Language : Set of string over a fixed alphabet. eg., Lpalindrome = {awa : w {a, b} * } 3.1.3.4 Regular expressions A regular expression is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are used by many text editors and utilities to search and manipulate bodies of text based on certain patterns. Many programming languages support regular expressions for string manipulation. For example, Perl and Tcl have a powerful regular expression engine built directly into their syntax. If r is a regular expression, then L(r) denotes the language accepting this regular expression. Following are some of the operations that we can perform on regular expressions Alternation: A vertical bar separates alternatives. eg., gray grey, matches {gray, grey} Grouping: Parentheses are used to define the scope and precedence of the operators. eg., gr(a e)y also matches {gray, grey} Quantification: A quantifier after a character or group specifies how often that preceding expression is allowed to occur. The most common quantifiers are?, *, and +.? : The question mark indicates there is 0 or 1 of the previous expression. eg., "colou?r" matches {color, colour} * : The asterisk indicates there are 0, 1 or any number of the previous expression. eg., "go*gle" matches {ggle, gogle, google,... } + : The plus sign indicates that there is at least 1 of the previous expression. eg., "go+gle" matches {gogle, google,..} but not ggle. Examples 1. a b* denotes {ε, a, b, bb, bbb,...} SNJB s Late Sau. KBJ College Of Engineering, Chandwad 6

2. (a b)* denotes the set of all strings consisting of any number of a and b symbols, including the empty string 3. b*(ab*)* the same 4. ab*(c ε) denotes the set of strings starting with a, then zero or more bs and finally optionally a c. 5. (aa ab(bb)*ba)*(b ab(bb)*a)(a(bb)*a (b a(bb)*ba)(aa ab(bb)*ba)*(b ab(bb)*a))* denotes the set of all strings which contain an even number of as and an odd number of bs. 3.1.3.5 Algebraic Properties If r and s are two regular expressions. Then r s = s r r (s t) = (r s) t (rs)t = r(st) r(s t) = rs rt єr = r, rє = r // Commutative // Associative // Concatenation is associative // Concatination distributes over alternation // є is the identity element for concatenation r* = (r є)* r** = r 3.1.3.6 Regular Definitions A regular definition gives names to certain regular expressions and uses those names in other regular expressions. Regular definitions are sequence of definitions of the form d1 r1 d2 r2.. dn rn where is distinct name, and each is a regular expression and symbols in єu{ d1, d2,.., dn} Examples SNJB s Late Sau. KBJ College Of Engineering, Chandwad 7

1. Pascal Identifiers Final Year Computer Engineering letter A B... Z a b... z digit 0 1 2... 9 id letter (letter digit)* 2. Pascal numbers digit 0 1 2... 9 digit digit digit* Optimal-fraction. digits ε Optimal-exponent (E (+ - ) digits) ε num digits optimal-fraction optimal-exponent. 3.1.3.7 Lex : Steps that are followed by the lex EXAMPLES 1. A lexer to print out all numbers in a file %{ #include <stdio.h> %} SNJB s Late Sau. KBJ College Of Engineering, Chandwad 8

[0-9]+ { printf("%s\n", yytext); }. \n ; main() { yylex(); } 2. A lexer to print out all HTML tags in a file. %{ #include <stdio.h> %} "<"[^>]*> { printf("value: %s\n", yytext); }. \n ; main() { yylex(); } 3. A lexer to do the word count function of the wc command in UNIX. It prints the number of lines, words and characters in a file. Note the use of definitions for patterns. %{ int c=0, w=0, l=0; %} word [^ \t\n]+ eol \n {word} {w++; c+=yyleng;}; {eol} {c++; l++;} SNJB s Late Sau. KBJ College Of Engineering, Chandwad 9

. {c++;} main() { yylex(); printf("%d %d %d\n", l, w, c); } 4. Classifying tokens as words, numbers or "other". %{ int tokencount=0; %} [a-za-z]+ { printf("%d WORD \"%s\"\n", ++tokencount, yytext); } [0-9]+ { printf("%d NUMBER \"%s\"\n", ++tokencount, yytext); } [^a-za-z0-9]+ { printf("%d OTHER \"%s\"\n", ++tokencount, yytext); } main() { yylex(); } 3.1.4 Assignment Questions: 1. What are tokens? 2. What is lexical analysis? 3. How can we represent token in a language? 4. How are tokens recognized? 5. What is the significance of yywrap( ), yylex( ) and yytext variable? 6. How is a specification of LEX tool given? 7. What is lex? 8. What are the different phases of a compiler? 9. Explain Lexical Analysis? SNJB s Late Sau. KBJ College Of Engineering, Chandwad 10