Parallel Execution of Tasks of Lexical Analyzer using OpenMP on Multi-core Machine

Similar documents
A Parallel Algorithm to Process Harmonic Progression Series Using OpenMP

A Simple Syntax-Directed Translator

1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008

Keywords Lexical Analyzer, Multi Core Machines,Parallel Processing,OpenMP concept,lex and Yacc. Fig-1: Working of Lexical Analyzer

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Application Specific Signal Processors S

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

CA Compiler Construction

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

OpenMP Algoritmi e Calcolo Parallelo. Daniele Loiacono

A simple syntax-directed

CS691/SC791: Parallel & Distributed Computing

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY

1. Lexical Analysis Phase

Compiler Design (40-414)

CSc 453 Lexical Analysis (Scanning)

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Chapter 4: Multithreaded Programming

UNIT -2 LEXICAL ANALYSIS

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

OpenMP Introduction. CS 590: High Performance Computing. OpenMP. A standard for shared-memory parallel programming. MP = multiprocessing

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Chapter 3: Lexical Analysis

Distributed Systems + Middleware Concurrent Programming with OpenMP

Introduction to Lexical Analysis

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler course. Chapter 3 Lexical Analysis

MANIPAL INSTITUTE OF TECHNOLOGY (Constituent Institute of MANIPAL University) MANIPAL

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

CS 4201 Compilers 2014/2015 Handout: Lab 1

Parallel Programming in C with MPI and OpenMP

CST-402(T): Language Processors

Chapter 4: Threads. Chapter 4: Threads

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Formal Languages and Compilers Lecture VI: Lexical Analysis

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

Chapter 3 Lexical Analysis

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Introduction to Compiler Construction

Compiler Construction

Introduction to Compiler Construction

A brief introduction to OpenMP

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

Compiler Design Overview. Compiler Design 1

4.8 Summary. Practice Exercises

Topics. Introduction. Shared Memory Parallelization. Example. Lecture 11. OpenMP Execution Model Fork-Join model 5/15/2012. Introduction OpenMP

Recognition of Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Acknowledgments. Amdahl s Law. Contents. Programming with MPI Parallel programming. 1 speedup = (1 P )+ P N. Type to enter text

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Parallel Programming in C with MPI and OpenMP

ECE 574 Cluster Computing Lecture 10

Languages and Compilers

Chapter 4: Multi-Threaded Programming

1. INTRODUCTION. Keywords: multicore, openmp, parallel programming, speedup.

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Computer Architecture

OPERATING SYSTEM. Chapter 4: Threads

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Parallel Programming in C with MPI and OpenMP

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

Introduction to Compiler Construction

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Projects for Compilers

Little Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

The Language for Specifying Lexical Analyzer

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)


A programming language requires two major definitions A simple one pass compiler

Basic programming knowledge (arrays, looping, functions) Basic concept of parallel programming (in OpenMP)

Barbara Chapman, Gabriele Jost, Ruud van der Pas

Lexical Analysis. Finite Automata

15-418, Spring 2008 OpenMP: A Short Introduction

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

COMPILER DESIGN LECTURE NOTES

Compiler Code Generation COMP360

DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes

Lexical Analysis. Finite Automata

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Amdahl s Law. AMath 483/583 Lecture 13 April 25, Amdahl s Law. Amdahl s Law. Today: Amdahl s law Speed up, strong and weak scaling OpenMP

Question Bank. 10CS63:Compiler Design

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Parallel Programming

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Chapter 4: Threads. Operating System Concepts 9 th Edit9on

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

LANGUAGE TRANSLATORS

2068 (I) Attempt all questions.

ECE Spring 2017 Exam 2

Transcription:

Parallel Execution of Tasks of Lexical Analyzer using OpenMP on Multi-core Machine Rohitkumar Rudrappa Wagdarikar* and Rajeshwari Krishnahari Gundla** *-**Computer Science and Engineering Dept, Walchand Institute of Technology,Solapur, Maharashtra,India Email: rohit.wagdarikar@yahoo.com, gundlaradhika9@gmail.com Abstract: Compiler is a system program that translates a source language into target language. The structure of a Compiler is composed of several phases. The first phase is lexical analysis or scanning. This is the only phase which interacts with original source code written by the programmer and perform the various tasks which includes, read the input characters and produce as output sequence of tokens that the parser uses for syntax analysis, and make entry to symbol table, Stripping out from the source program comments and white space in the form of blank, tab, new line characters. Another task is correlating error messages from the compiler with the source program and keeping the track of error by line number. In this paper we present concurrent execution of tasks of lexical analyzer. In a multi-core system, task parallelism is achieved when each core executes a different thread on the same or different data. The threads may execute the same or different code. In this paper we present an approach that lexical analyzer tasks are independently allocated to each core to perform parallelism which improves the performance of the lexical analyzer. Keywords: Concurrent Lexical Analysis, Multi-core Machine, Lexeme-Handler, Whitespace-Handler, Comment-Handler, Newline-Handler, OpenMP. Introduction Compiler is a system program that translates a high level source language into low level target language. The structure of a Compiler is composed of several phases. These phases are divided into two parts: Front and Back. Front contains Lexical analyzer, Syntax Analyzer, Semantic Analyzer and Intermediate Code Generator. Back contains Code Optimizer and Code Generator. As shown in figure 1, First phase of the compiler is lexical analyzer which generates the token from source program. Second phase of compiler is syntax analyzer, Ref. [8] it analyze the code against the production rules specified by context free grammar to detect the error in the code. Third phase of compiler is semantic analyzer Ref. [8] which helps to interpret symbols, their type and their relation with each other. It also verifies the meaning of statement. Fourth phase of compiler is intermediate code generation which generates the intermediate code in the form of three address code, syntax tree and postfix notation. Fifth phase of compiler is code optimizer which performs the optimization on intermediate code to decrease space and time complexity. Last phase of compiler is code generation which generates the target code in the form of low level language (e.g. assembly language). The first phase is lexical analysis or scanning. This is the only phase which interacts with original source code written by the programmer and perform the various tasks which includes, read the input characters and produce as output sequence of tokens that the parser uses for syntax analysis, and make entry to symbol table, Stripping out from the source program comments and white space in the form of blank, tab, new line characters. Another task is correlating error messages from the compiler with the source program and keeping the track of error by line number Ref. [2]. It should be noted that our proposed architecture address the problem of sequential execution of different tasks of lexical analyzer, so proposed architecture concentrate on defining the architecture whose performs the parallel execution of different tasks of lexical analyzer. The reminder of this paper is outlined as follows. After discussion of existing system, we present the architecture of our parallel approach and parallel algorithm for the lexical analyzer. In section 3 we conclude with outlook of future work. Literature Studies As described by Amit Barve, Brijendra Kumar Joshi in Ref. [10], source code is divided into the total number of blocks by total number of CPU, then identify the pivot location. As describe the these authors there will be three different pivot locations, 1.Based on Construct 2. Based on White Space Characters and last one is 3.Based on lines. In the first one, they identify the start of iteration syntax like for loop, while loop etc. in second one it will identify the total number of white spaces from source program and make the blocks and give it to each CPU, and in third one it will identify the total number of lines from source program and make the blocks and assign it to each CPU.

Parallel Execution of Tasks of Lexical Analyzer using OpenMP on Multi-core Machine 9 As shown in figure 2 lexical analyzer takes the input source program into the input buffer and uses Lexeme Beginning Pointer (LBP) and Lexeme Forward Pointer (LFP) to find the lexeme. Initially LBP and LFP are pointed to start of the input buffer, only LFP moves forward and while moving forward it checks for pattern and it uses longest matching rule Ref. [2]. If the pattern matches with lexeme, it extracts the set of characters between LBP and LFP. Based on lexeme found, Lexical analyzer performs following tasks: 1.If extracted lexeme is a token then it generates <token id, value> and if token is identifier or constant then it makes entry to symbol table. 2. If extracted lexeme is comment it removes comment lines and checks for next lexeme. 3. If it finds whitespace then it removes it and checks for next lexeme. 4. If it finds newline, it keeps track of line number which is used by error handler. Figure1. Structure of Compiler As described by Amit Barve, Brijendra Kumar Joshi in Ref. [9], multiple files can be taken as input and these files are placed in a single directory and each file is processing on different core on multi-core system to perform the tasks of lexical analyzer, by using this methodology they are performing parallelism in lexical analyzer. In this paper we present concurrent execution of tasks of lexical analyzer by using OpenMP. OpenMP uses multi-core system. In a multi-core system Ref. [3], task parallelism is achieved when each core executes a different thread on the same or different data. The threads may execute the same or different code. OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing. OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications. OpenMP is an implementation of multithreading, a method of parallelizing whereby a master thread (a series of instructions executed consecutively) forks a specified number of slave threads and the system divides a task among them. The threads then run concurrently, with the runtime environment allocating threads to different processors. In OpenMP each thread executes the parallelized section of code independently. Work-sharing constructs can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both task parallelism and data parallelism can be achieved using OpenMP in this way. Proposed System As shown in figure 3 lexical analyzer takes the input source program into the input buffer and uses Lexeme Beginning Pointer (LBPtr) and Lexeme Forward Pointer (LFPtr) to find the lexeme from source program, which is assigned to core 1. White Space Pointer (WSPtr) is used to find white spaces from source program, which is assigned to core 2. Comments Beginning Pointer (CBPtr) and Comments Forward Pointer (CFPtr) are used to find comments from source program, which is assigned to core 3, and New Line Pointer (NLPtr) to keep track of line number, which is assigned to core 4. These pointers are independently working on its assigned core. The architecture diagram represents the tasks of cores as follows: Core 1 Lexeme-Handler It finds the category of tokens. (Keyword, Identifier, Constant, Special Characters, Operators) Core 2 Whitespace-Handler It finds whitespaces and removes it. Core 3 Comment-Handler It finds comments and removes comment lines. Core 4 Newline-Handler It keeps track of line number. As shown in figure 4 lexeme, white space, comments and newline tasks are assigned to core 1, 2, 3 and core 4 respectively. Core 1-Whitespace-Handler uses WSPtr to find the white space. This pointer is pointed to start of the input buffer. WSPtr moves forward till it finds whitespace. Once it finds whitespace, it removes the white space from source program. This process is repeated till the end of the program. Core 2-Lexeme-Handler uses LBPtr and LFPtr. These pointers are pointed to start of the input buffer, only LFP moves forward and while moving forward it checks for pattern and it uses longest matching rule Ref. [2]. If the pattern matches with lexeme, it extracts the set of characters between LBPtr and LFPtr. If

10 Eighth International Conference on Advances in Computer Engineering ACE 2018 Figure2. Existing system of Lexical Analyzer Figure3. Architecture of Concurrent execution of tasks of Lexical Analyzer extracted lexeme is a token, it generates <token id, value> and if token is identifier or constant then it makes entry to symbol table. This process is repeated till the end of the program. Core 3 Comment-Handler uses CBPtr and CFPtr to find the comments from the source program. These pointers are pointed to start of the input buffer. CBPtr and CFPtr moves forward till they find / character. Once it finds / character, CFPtr pointer moves forward to find / or *. If it finds / next to / which is initiated by CBPtr, then it removes the whole line from the source program or if it finds the * next to / initiated by CBPtr, then CFPtr keeps moving forward till it finds * followed by /. Once it finds * followed by / then statements between CBPtr to CFPtr are removed from the source program. This process is repeated till the end of the program. Core 4 Newline-Handler uses NLPtr to keep the track of new line. NLPtr counts the \n sentinels from source program, this count is used by error handler to correlate the error message with line number.

Parallel Execution of Tasks of Lexical Analyzer using OpenMP on Multi-core Machine 11 Figure4. Proposed System of Lexical Analyzer Lexeme-Handler It finds the category of tokens (Keyword, Identifier, Constant, Special Characters, Operators). Below pseudo code is used to find an identifier. Pseudocode1 LBPtr=0, LFPtr=0; Defining Grammar U [ \_ ] L [a-za-z] D [0-9] WS [ ] ID [U L][L D]+ While(! of Input String ) If (LBPtr==0 && LFPtr==0) While (buff [LFPtr]!= WS) LFPtr++; Lexeme = Extract string between LBPtr to LFPtr; Apply pattern ID to Lexeme

12 Eighth International Conference on Advances in Computer Engineering ACE 2018 If (ID) Make entry to symbol table Whitespace-Handler It finds whitespaces and removes it. Below pseudo code is used to find white space. Pseudocode2 WSPtr=0; Defining Grammar WS [ ] While(! of Input String ) If (buff[wsptr]==ws) Delete White Space WSPtr++; Comment-Handler It finds comments and removes comment lines. Below pseudo code is used to find comments. Pseudocode3 CBPtr=0, CFPtr=0; While(! of Input String ) If (buff [CBPtr] == / ) CFPtr++; If (buff [CFPtr] == / ) Delete entire line; Else if (buff [CFPtr] == * ) While(1) CFPtr ++; if (buff [CFPtr]== * ) CFPtr++; If (buff [CFPtr]== / ) Delete string from buff[cbptr] to buff[cfptr]; Break;

Parallel Execution of Tasks of Lexical Analyzer using OpenMP on Multi-core Machine 13 While (buff [LFPtr]!= WS) LFPtr++; Lexeme = Extract string between LBPtr to LFPtr; Apply pattern ID to Lexeme If (ID) Make entry to symbol table Newline-Handler It keeps track of line number. Below pseudo code is used to find newline. Pseudocode4 NLPtr=0, Count=0; While(! of file ) If (buff[nlptr]== \ ) NLPtr++; If(buff[NLPtr]== n ) Count++; If (IsError) Return line number = count to Error Handler; Parallelism can be achieved in OpenMP by using #pragma omp parallel Ref. [5]. With this code, system can create multiple threads. By default system can create 4 threads; user can create more than 4 threads with the help of #pragma omp parallel num_threads(n) method, where n is an integer. User can get the thread number by omp_get_thread_num() method. #pragma omp sections directive is the way to distribute different tasks to different threads. Syntax of section directive is as follows: #pragma omp parallel s // code for Task 1 // code for Task 2 // code for Task 3 The above mentioned pseudo codes of tasks of lexical analyzer - Pseudo code 1, 2, 3, 4 can be written in OpenMP Sections directive to perform concurrent execution of tasks of lexical analyzer, which is given below:

14 Eighth International Conference on Advances in Computer Engineering ACE 2018 Concurrent execution of tasks of Lexical Analyzer Pseudo code for Concurrent execution of tasks of Lexical Analyzer #pragma omp parallel shared (buff) s // Pseudo code for Identifier tid = omp_get_thread_num(); printf(" Thread %d starts..\n", tid); //Thread 1 s goto pseudocode1 // Pseudo code for White Space tid = omp_get_thread_num(); printf(" Thread %d starts..\n", tid); //Thread 2 s goto pseudocode2 // Pseudo code for Comments tid = omp_get_thread_num(); printf(" Thread %d starts..\n",tid); //Thread 3 s goto pseudocode3 //Pseudo code for Newline tid = omp_get_thread_num(); printf(" Thread %d starts..\n",tid); //Thread 4 s goto pseudocode4 Speedup According to Amdahl s law Ref. [1], execution time is improved if independent code/ tasks are divided into multiple processors and executed parallel / concurrently. In this paper we are dividing tasks of Lexical Analyzer into multiple cores to execute concurrently in order to improve execution time. As mentioned in Ref. [7] Speed up is performance metric which is given as: Where n is size of problem p is number of processors σ (n) is the program s serial part execution time, ϕ(n) is the program s parallel part execution time, and κ(n, p) is the communication time. Efficiency is the ratio of speed up obtained to the number of processors used Ref. [7]. It measures processors utilization. Parallel system efficiency of solving an n-size problem on P processors is given by

Parallel Execution of Tasks of Lexical Analyzer using OpenMP on Multi-core Machine 15 Our algorithm is designed in such a way that operations are performed in parallel/concurrent manner and there is no communication required among cores and no sequential execution. In this paper, sequential operations refer to sequential lexical analysis of all tasks on single processor whereas operations in parallel/concurrent refer to sequential lexical analysis of all tasks independently distributed among available cores. Based on Ref. [4] Ref. [6] and Ref. [7] Parallel/concurrent execution is more efficient than sequential execution. In this paper, tasks of lexical analyzer are distributed independently among different cores where core 1 performs the task of Lexeme-Handler, core 2 performs the task of Whitespace-Handler, core 3 performs the task of Comment-Handler, and core 4 performs the task of Newline-Handler. These tasks are executed concurrently so it improves the efficiency of lexical analyzer phase of compiler. Conclusion and Future Work An algorithm for performing parallel/concurrent execution of tasks of lexical analysis which can use multi-core machine is presented. It is clear that substantial amount of time can be saved in lexical analysis phase by distributing tasks of lexical analyzer across number of cores. So in this paper tasks of lexical analyzer such as token generation, removing white spaces and keeping track of line for error handler are done concurrently by sections directive from OpenMP which improves the speed of lexical analyzer. This speedup is expected to increase further if source code (which is input to lexical analysis phase) is divided on some criteria and processed parallel. We are currently working on further phases of compiler to improve their efficiency in order to improve the overall efficiency of compiler. References [1] Amit Barve, Brijendra Kumar Joshi. Improved Parallel Lexical Analysis Using OpenMP on Multi-core Machines ELSEVIER- Volume 49, 2015, Pages 211-219 [2] Alfred Aho and Jeffrey Ullman, Principles of Compiler Design Second Edition. [3] P. Gepner, M.F. Kowalik Multi-Core Processors: New Way to Achieve High System Performance IEEE- ISBN: 0-7695-2554-7- 16 October 2006 Multi core [4] Michael J. Quinn Parallel Programming in C with MPI and Open MP Published by McGraw-Hill Education (2003) ISBN 10: 0070582017 ISBN 13: 9780070582019 [5] Paul Edmon Introduction to OpenMP HARVARD Faculty of Arts and Sciences, ITC Research computing Associate. Section [6] Mark D. Hill, Michael R. Marty Amdahl's Law in the Multicore Era Published by the IEEE Computer Society July 2008 [7] Alaa Ismail El-Nashar TO PARALLELIZE OR NOT TO PARALLELIZE, SPEED UP ISSUE Published by IJDPS Vol.2, No.2, March 2011 [8] Syntax analyzer, https://www.tutorialspoint.com. [9] Amit Barve, Brijendra Kumar Joshi, Parallel Lexical Analysis of Multiple Files on Multi-Core Machines Published by IJCA Volume 96 No.16, June 2014 [10] Amit Barve, Brijendra Kumar Joshi,, Parallel Lexical Analysis on Multi-Core Machines using Divide and Conquer Published by IEEE, 04 April 2013