DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes

Similar documents
Automatic Scanning and Parsing using LEX and YACC

Yacc: A Syntactic Analysers Generator

CST-402(T): Language Processors

Figure 2.1: Role of Lexical Analyzer

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

CA Compiler Construction

CSC 467 Lecture 3: Regular Expressions

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

LEXIMET: A Lexical Analyzer Generator including McCabe's Metrics.

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Languages and Compilers

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

Programming Assignment II

Parsing and Pattern Recognition

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Ulex: A Lexical Analyzer Generator for Unicon

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Lexical Analyzer Scanner

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lexical Analyzer Scanner

Working of the Compilers

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Component Compilers. Abstract

Lexical Analysis. Introduction

Introduction to Lexical Analysis

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Chapter 3 Lexical Analysis

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Edited by Himanshu Mittal. Lexical Analysis Phase

Compiler course. Chapter 3 Lexical Analysis

Compiler Construction D7011E

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

UNIT -2 LEXICAL ANALYSIS

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

TDDD55 - Compilers and Interpreters Lesson 3

TDDD55- Compilers and Interpreters Lesson 2

Type 3 languages. Regular grammars Finite automata. Regular expressions. Deterministic Nondeterministic. a, a, ε, E 1.E 2, E 1 E 2, E 1*, (E 1 )

Decaf Language Reference Manual

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

An introduction to Flex

CSc 453 Lexical Analysis (Scanning)

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

1. Lexical Analysis Phase

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Using an LALR(1) Parser Generator

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Monday, August 26, 13. Scanners

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Wednesday, September 3, 14. Scanners

Formal Languages and Compilers Lecture I: Introduction to Compilers

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CS131: Programming Languages and Compilers. Spring 2017

Compilers. Prerequisites

Compiler Lab. Introduction to tools Lex and Yacc

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

A simple syntax-directed

CSE302: Compiler Design

2068 (I) Attempt all questions.

Translator Design CRN Test 3 Revision 1 CMSC 4173 Spring 2012

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Implementation of Lexical Analysis

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

MANIPAL INSTITUTE OF TECHNOLOGY (Constituent Institute of MANIPAL University) MANIPAL

Full file at

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Programming Project 1: Lexical Analyzer (Scanner)

LECTURE 6 Scanning Part 2

Lexical and Parser Tools

Using Lex or Flex. Prof. James L. Frankel Harvard University

LANGUAGE TRANSLATORS

Application Specific Signal Processors S

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

Compiler Design Overview. Compiler Design 1

Introduction to Lexical Analysis

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Structure of Programming Languages Lecture 3

e t An Analysis of Compiler Design in Context of Lexical Analyzer

COMPILER DESIGN LECTURE NOTES

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Introduction to Compiler Construction

Transcription:

www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issue 12 Dec. 2016, Page No. 19507-19511 DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes Oyebode Idris 1 and Adedoyin Olayinka Ajayi 2 1 Ekiti State University, Department of Computer Science, Ado Ekiti, Nigeria oyebodeidris@gmail.com 2 Ekiti State University, Department of Computer Science, Ado Ekiti, Nigeria dedoyyin@gmail.com Abstract: This research was undertaken to give students in the local Nigerian environment a hands-on, practical and direct knowledge of Compiler Construction and Automata Theory, with the motivation to make it the Country s first programming language. The project was carried out using the Lex/Flex Lexical Analyzer Generator and the C Language. Compiler construction and automata theory has been one of the tougher subjects for local students in the Nigerian region. The purpose of this research is to make our programmers better understand the processes in program compilation and also catch up with the interest of nonprogrammers who may become fascinated by the exquisiteness of programming when it is only at the highest level, which is more like speaking to a fellow human. Locals and students will also be more interested in the subject matter when they are to learn a Language developed in their country. Keywords: Lexical Analyzer, Compiler Construction, LexFlex, Automata Theory, Programming Language, C Language. 1. Introduction The evolution of Programming Languages started from the mnemonic Assembly Languages of the early 1950s to the first major step towards High Level Languages in the latter half of the 1950s with the development of Fortran, Cobol and Lisp. The philosophy behind these languages was to create higherlevel notations with which programmers could more easily write numerical computations, business applications, and symbolic programs. These languages were so successful that they are still in use today. More on the evolution of programming languages and their classifications can be found in [1]. Lexical Analysis, also called Scanning, is the first phase of a compiler. A compiler is a program that can read a program in one language - the source language - and translate it into an equivalent program in another language - the target language [1]. Character streams that make up the source 2. Related Literature program are read and categorized into Lexemes. This produces The Compiler Project on MicroJava [2], is a small compiler for tokens which are passed to the syntax analysis phase. Other stages of compiler construction includes Semantic Analysis, Intermediate Code Generation and Code Optimization. Lexical Analysis also perform other works as: removing unnecessary blanks and comments, labeling lexemes with tokens that are passed for Syntax Analysis and updating symbol tables with identifiers and numbers. A Lexical Analyzer performs Lexical Analysis, that is, it groups input character streams into lexemes [1]. It is a very important part of a compiler as without it, all other phases are useless, because it interfaces the syntax analysis and other phases. It can be developed by using a Lexical Analyzer/Scanner Generator (e.g. Flex), or writing it by hand in an assembly language (good for efficiency) or writing by hand in a high level language. The proposed analyzer called DoId will be achieved using the Lexical Analyzer generator. All existing programming language compilers have the lexical analyzers embedded in them, but it is not visible to the outside environment except for the Tiny Language Scanner and maybe a couple others. a Java-like language. The Project has three levels as follows: Oyebode Idris 1, IJECS Volume 05 Issue 12 Dec., 2016 Page No.19507-19511 Page 14507

i.level 1 requires you to implement a scanner and a parser for the language MicroJava. ii.level 2, which deals with symbol table handling and type checking. iii.level 3, which deals with code generation for the MicroJava Virtual Machine. The project was implemented in Java using Sun Microsystem's Java Development Kit [3] or some other development environment. It was really very inspiring in the development of this research, not to go into detail of it. Our research makes use of the Flex compiler [4, 6]. Also, a new a approach GLAP model for design and time complexity analysis of lexical analyzer was proposed in this paper. In the model different steps of Tokenizer (generation of tokens) through lexemes, and better input system implementation have been introduced [5]. 3. Methods This Lexical Analyzer was implemented by the Flex compiler, which is a Lexical Analyzer Generator. The Flex specifications are placed in a ".l" extension file which includes three sections [6]. These sections are the definition section which is optional, which can contain definitions for text replacements, global C code to be used by actions, etc. it contains literal block which is C code delimited by %{ and %} and between this are variable declarations and function prototypes. The second section is where the regular expressions and the actions are placed. The actions are C code and the rules section has the form: r 1 action 1 r 2 action 2 r n action n i. r 1 is a regular expression and action 1 is C code fragment ii. When r 1 matches an input string, action 1 is executed iii. action 1 should be in {} if more than one statement exists and the third section is also optional consisting of C codes also, this sections are shown as: definition section %% rules section %% auxiliary functions. The rules section which is the main section comprises of patterns which define the set of lexemes corresponding to a token. They are defined through regular expressions which define regular languages. A regular language is a subset of all languages that can be defined by regular expressions. The DoId Analyzer is run by first processing the Flex file( ".l" file) which generates a scanner saved as lex.yy.c. This scanner is then compiled, it consists of a function yylex() which is going to be used as it is by running it repeatedly on the input, a main function is grabbed out of the Lex library [7] using a ( -ll ) option, this grabs the default main() routine out of the library. The scanner is then run which can take any valid inputs as specified in the DoId specifications. The steps for this are shown below: Commands Flex DoId.l #creates lex.yy.c file Gcc lex.yy.c -ll #compiles lex.yy.c DoId.exe <samplefile.in #executable file, DOID compiles a sample file These commands are executed on a command line. The DoId Analyzer can be invoked by a parser via the function yylex(), which returns a token or zero. Some tokens have further information associated with them. These are the following: 1. Tokens which indicate numbers (INTEGER and FLOAT) are first saved in the yytext variable as a string, they are then converted to the actual number and saved in the yylval global variable before being returned in the variable INT_VAL and FLOAT_VAL. 2. Tokens which indicate character strings (CHAR and STRING) return the length of the string in yylval and leave the string itself accessible in the global variable stringbuf. Escape sequences are processed before the string is returned. The string can contain embedded nulls and is not null terminated. Oyebode Idris 1, IJECS Volume 05 Issue 12 Dec., 2016 Page No.19507-19511 Page 14508

They are returned in the variables CHAR_VAL and STRING_VAL respectively. Dout("Hello World!!!"); End 3. Tokens which indicate identifiers collect the identifier string, null-terminate it, and make a privately allocated copy of it. A pointer to this copy is returned in yylval. The scanner/lexical analyzer examines the symbol table to find if the identifier is a user defined type, and returns IDENTIFIER. 4. Tokens which are reserved words are returned just as reserved words e.g Din, Dout, while, if etc. Figure 1: Creation of DOID Lexical Analyzer. 5. Comments are terminated and not compiled with the program when run. 6. Tokens that are unidentified return error messages of "unknown character" The Scanner reads the inputs line by line and checks for errors on each of the line, if an error is encountered on a line, it calls the error function and prints an error message. The Finite State Machine for some of the tokens can be seen in Figure 1. 3.1. The Program Structure in DOID Doid programs are programs which may consist of a bundle statement and other valid DoId statements only,or a bundle statement and with classes containing valid DoId statements. Either of these may include a 'Use' statement which can be used to call the DoId library for built in classes. All valid DoId programs must also terminate with an End Statement which signifies the end of the program. Sample DoId Programs Figure 2: Lexical Analyzer, DOID Architecture (communication with parser) The above Figure 2 shows how the DoId Lexical Analyzer communicates with a parser. Lexical Details Some of the tokens in DOID are A Hello World Program in DoId can be written as follows: i. Integer tokens: sequence of digits, for example 1234. //Hello World Program bundle Hello; use Doid.swing.console; ii. Float tokens: digits with decimal points or signed numbers, for example -123.45 public class HelloWorld { def main( ) { Dout("Hello World!!!"); } } End It can also be achieved in a much easier way as: iii. Boolean tokens: true and false. iv. String Tokens: ASCII characters in double quotes, for example "Hi there". v. Identifiers: a letter followed by a sequence of letters, digits and underscores. bundle Hello; vi. Reserved words: these are for, while, bundle, class etc. Oyebode Idris 1, IJECS Volume 05 Issue 12 Dec., 2016 Page No.19507-19511 Page 14509

vii. Operators: these includes arithmetic, relational etc Expressions DoId Expressions take the following forms; a) Variables b) Arithmetic operator applications such as +, -, /, *, % with precedence and associativity applies. The above algorithm works with tokens as its major raw material. A token recognizer is therefore used which uses transition diagrams explained in Figures 3, 4 and 5. These transition diagrams for the DoId tokens can be seen below and they are separated to avoid difficulties. c) Relational operators applications such as not equal, <, >, <=, >=. Precedence and associativity also applies. d) Operators 'and' and 'or' works as conjunctions and disjunctions. Algorithm The regular expressions are equal to finite automata. They are converted to an NFA by Flex, the NFA is then converted to an equivalent DFA which is simulated [1]. An algorithm for the DoId analyzer is as follows: Figure 3: Transition diagram for identifier Algorithm: Analyze(A) Where A = Input string Step 1: Is A valid DOID word? i. if Yes, go to step 2 ii. if No, print error, "unknown character" Step 2: check if A is a keyword i. if A is keyword, return keyword ii. if not, goto step 3 Step 3: check if A is a special character or operator i. if A is special character or operator, return special character or operator respectively ii. if not, goto step 4 Step 4: check if A is a valid identifier i. if A is a valid identifier, insert identifier into symbol table and return identifier ii. if not, goto step 4 Step 5: check if A is a character or string value i. if Yes, return character or string respectively ii. if No, goto step 5 Step 6: check if A is integer or floating value i. if A is integer or floating value, convert from string to real numbers, save, then return integer or floating value respectively ii. if not, goto step 6 Step 7: A is a comment, discard comment. Figure 4: Transition diagram for String Figure 5: Transition diagram for some DOID Keywords (a) Oyebode Idris 1, IJECS Volume 05 Issue 12 Dec., 2016 Page No.19507-19511 Page 14510

efficient and effective programming languages for and can be used as basic guide for future researches. We hope this will be a platform to build a local indigenous programming language. REFERENCES [1] AHO, Alfred V., RAVI Sethi, LAM, Monica S. and JEFFREY D. Ullman, Compilers, Principles, Techniques and Tools., Addison-Wesley, Boston, 2006. [2] Hanspeter Mössenböck (2014). MicroJava. http://microjava.com. Last accessed: 21 December, 2014 Figure 5: Transition diagram for some DOID Keywords (b) Considering an example, if the action of the DoId lexical analyzer constructed from the transition diagram above. If it encounters a DOI followed by a blank, the lexical analyzer would go through states 1, 2, 3 and 4, then fail and retract the input to the identifier transition. It would then startup the second transition diagram at state 56, go through state 57 two times, go to state 58 on the blank, retract the input one position, then insert DOI into the symbol table. 4. CONCLUSION AND FUTURE WORK We have briefly discussed the development of lexical analyzers using the Lexical Analyzer Generator for Languages for a proposed DOID Lexical Analyzer which offers fewer lexical specifications but still covering the necessary specifications for any standard Programming Language. We believe that having a small set of lexical specifications improves the speed of computation. So that, due to small specification set, DOID provides faster computation and is easier to learn. Our implementation of this will also enhance the way local students view the study of Programming Languages and the possibility and need to research more in the field, and the beauty of the desugaring model. [3] Sun Microsystem's Java Development Kit JDK available via: http://java.sun.com/j2se/1.5.0/index.jsp) [4] Aaby Anthony (2004). Compiler Construction using Flex and Bison. Available at: http://foja.dcs.fmph.uniba.sk/kompilatory/docs/compiler.pdf. Last accessed: 21 January, 2015 [5] BISWAJIT Bhowmik, ABHISHEK Kumar, ABHISHEK Kumar Jha, RAJESH Kumar Agrawal. (2010) A New Approach of Complier Design in Context of Lexical Analyzer and Parser Generation for NextGen Languages. International Journal of Computer Applications (0975 8887)Volume 6 No.11, September 2010. [6] MUHAMMED, Mudawwar.(n.d.) Scanning Practice: Using the Lex Scanner Generator, a TINY Language and Scanner Compiler Design [7] LESK, Michael E., SCHMIDT, Eric (1975). Lex A Lexical Analyzer Generator. New Jersey: Bell Laboratories We hope that our work encourages researchers and developers to look more into the branch of developing more Oyebode Idris 1, IJECS Volume 05 Issue 12 Dec., 2016 Page No.19507-19511 Page 14511