DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Similar documents
Table-Driven Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CA Compiler Construction

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).


Syntactic Analysis. Top-Down Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Top down vs. bottom up parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

CS 321 Programming Languages and Compilers. VI. Parsing

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS502: Compilers & Programming Systems

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Context-free grammars

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Compiler Construction: Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Compilers. Predictive Parsing. Alex Aiken

Lexical and Syntax Analysis. Top-Down Parsing

CSCI312 Principles of Programming Languages

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

VIVA QUESTIONS WITH ANSWERS

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Concepts Introduced in Chapter 4

Lexical and Syntax Analysis

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Abstract Syntax Trees & Top-Down Parsing

Syntax Analysis Part I

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Monday, September 13, Parsers

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

CS Parsing 1

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

UNIT-III BOTTOM-UP PARSING

Wednesday, September 9, 15. Parsers

Introduction to Parsing. Comp 412

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Topdown parsing with backtracking

A programming language requires two major definitions A simple one pass compiler

Programming Language

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.


Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

LANGUAGE PROCESSORS. Introduction to Language processor:

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Chapter 3. Parsing #1

Wednesday, August 31, Parsers

General Overview of Compiler

4. Lexical and Syntax Analysis

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Parsing Part II (Top-down parsing, left-recursion removal)

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CS 314 Principles of Programming Languages

Part 3. Syntax analysis. Syntax analysis 96

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CS 406/534 Compiler Construction Parsing Part I

How do LL(1) Parsers Build Syntax Trees?

2068 (I) Attempt all questions.

CS308 Compiler Principles Syntax Analyzer Li Jiang

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Chapter 4. Lexical and Syntax Analysis

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to parsers

CSCI312 Principles of Programming Languages!

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Compila(on (Semester A, 2013/14)

CS143 Handout 08 Summer 2009 June 30, 2009 Top-Down Parsing

Configuration Sets for CSX- Lite. Parser Action Table

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Transcription:

KATHMANDU UNIVERSITY SCHOOL OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING REPORT ON NON-RECURSIVE PREDICTIVE PARSER Fourth Year First Semester Compiler Design Project Final Report submitted in partial fulfillment of the requirements for course COMP 409 Submitted By Prajwol Shrestha [09] Roshan Shrestha [22] Shweta Shrestha [26] Ajaya Thapa [28] Submitted To Mr. Sushil Nepal Department of Computer Science and Engineering Jan 26, 2014

TABLE OF CONTENTS 1. Introduction 1 1.1. Non-Recursive Predictive Parsing 2 1.1.1. FIRST 3 1.1.2. FOLLOW 3 1.1.3. Algorithm to fill the parsing table 4 1.1.4. Algorithm of non-recursive predictive parsing 4 2. Motivation 6 2.1. Tools Required 6 3. Implementation 7 4. Output 10 5. Conclusion 12

1. Introduction The figure below is a compiler model that shows how a Parser is used to check tokens coming from the lexical analyzer, and verifies that the string can be generated by grammar from the source language. The parser is responsible to report any syntax error it encounters during parsing. It is also expected that the parser recovers from errors, those that occurs commonly so that it can continue processing the remainder of the input. Input String Source Code Token Parse Table Lexical Analyzer Parser Rest of the Front End GetNextToken Output Code Intermediate Representation Symbol Table Fig. Parser Model There are mainly three types of parser that can be discussed. They are: 1. Universal Parser: It can parse any kind of grammar. Cocke-Younger-Kasami Algorithm and Earley s Algorithm can be utilized to create universal parser. 2. Top-down Parser: As the name suggests, it builds parse tree from top (root) to the bottom (leaves). A parse tree is an ordered rooted tree that represents the syntactic structure of string according to some formal grammar. Examples of top-down parsers are: LL Grammar, Recursive Descent, and Non-Recursive Predictive Parser. 3. Bottom-up Parser: In this parser, the parse tree is built from the bottom (leaves) to the top (root). Examples of bottom-up parsers are: LR/SLR, CLR, LALR, Shift-Reduce, Operator Procedure Parser In top-down and bottom-up parser, the parser scans the symbols from left to right, one symbol at a time. Page 1

1.1. Non-recursive predictive parsing Non-recursive predictive parsing can be build by maintaining a stack explicitly. The key problem encountered during predictive parsing is to determine the production to be applied for a non terminal. The non-recursive parser looks up the production to be applied in a parsing table, as in the figure below. Input A + B $ Stack X Y Z $ Predictive Parsing Program Output Parsing Table M Fig. Non Recursive Predictive Parser A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output. The input buffer contains the string to be parsed, followed by $, which indicates the end of input string. The stack contains a sequence of grammar symbols with $ on the top. Initially, the stack contains the start symbol of the grammar on the top of $. The parsing table is a twodimensional array m[a,a], where A is a non terminal and a is terminal or symbol $. The parser program behaves as follows. The program considers X, the symbol in top of stack, and a, the current input symbol. These two symbols determine the action of the parser. There are following three possibilities: 1. If X=a=$, then parser halts and announces successful completion of parsing. 2. If X=a $, the parser pops X off the stack and advances the input pointer to next input symbol. Page 2

3. If X is a non-terminal the program consults entry M[X,a] of the parsing table M. this entry will be either an x-production of grammar or an error entry. 1.1.1 FIRST The first set of a sequence of symbols u, written as FIRST(u) is the set of terminals which start the sequences of symbols derivable from u. Rule 1: If X is a terminal then, FIRST(x)={x} Rule 2: If Xaα, where a is a terminal and α can be either a terminal or non-terminal or both Then, add a to FIRST(a) ie FIRST(X)={a} And if, X->ԑ is a a production then Add ԑ to FIRST(x) ie FIRST(X)={ ԑ } Rule 3: If XY 1 Y 2 Y k is a production where all Y i s are non-terminals Then, FIRST(X)={FIRST(Y 1 )- ԑ } If FIRST(Y 1 )contains ԑ symbol then, FIRST(X)= {FIRST(Y 1 )- ԑ }U{FIRST(Y 2 )- ԑ } Again if FIRST(Y 2 )contains ԑ symbol then, FIRST(x)= {FIRST(Y 1 )- ԑ }U{FIRST(Y 2 )- ԑ }U{FIRST(Y 3 )- ԑ } This process continues to include non ԑ symbol of FIRST(Y i ) to FIRST(X ) till FIRST(Y i ) doesn t contain ԑ 1.1.2 FOLLOW The follow set of a nonterminal A is the set of terminal symbols that can appear immediately to the right of A in a valid sentence 1. Place EOF in Follow(S) where S is the start symbol and EOF is the input's right endmarker. The endmarker might be end of file, newline, or a special symbol, whatever is the expected end of input indication for this grammar. We will typically use $ as the endmarker. 2. For every production A αbβ where α and β are any string of grammar symbols (ie β ԑ) and B is a nonterminal, everything in FIRST(β) except ԑ is placed in Follow(B) ie FOLLOW(B)={FIRST(β) - ԑ } 3. For every production AαB, or a production AαBβ where β=> ԑ (FIRST(β) = ԑ), then everything in FOLLOW(A) is added to FOLLOW(B) ie FOLLOW(B)={FOLLOW (A)}. Page 3

1.1.3 Algorithm to fill the parsing table Input: Set of FIRST and FOLLOW elements along with Grammar G Output: Parsing Table M Method: We check for every production of the form A->alpha, the FIRST (α). If it contains ԑ then add the production to M[A,a]. Else add each element of FOLLOW(A) in M[A, b]. Algorithm: Compute FIRST and FOLLOW for every non-terminals of the grammar For every production A α do { For every non - ԑ member a in FIRST(α) do Table M[A, a] = A α If FIRST(α) contain ԑ then For every b in FOLLOW(A) do Table M[A,b] = A α } 1.1.4 Algorithm of non-recursive predictive parsing Input: A string w and a parsing table M for grammar G. Output: If w is in L(G), a leftmost derivation of w; otherwise and error indication. Method: initially, the parser is in a configuration in which it has $S on the stack with S, the start symbol of G on top; and w$ in the input buffer. The program that utilizes the predictive parsing table M to produce a parse for the input is shown as: Push start symbol into stack Repeat Begin Make X to be the top stack symbol and a The next input symbol If X is a terminal or $ then, If X=a then POP X from stack and remove a from input Else Error() Else /*X is non terminal */ If M[X,a] = X->Y 1,Y 2,Y 3,,Y k then Begin Page 4

End Else Error() End until X=$/*until stack is empty */ POP X from stack PUSH Y k,y k-1 Y 2,Y 1 onto stack so y is on top The construction of a predictive parser is aided by two functions associated with a grammar G. these functions, FIRST and FOLLOW allows us to fill in the entries of a predictive parsing table for G. Page 5

2. Motivation: The language chosen for the implementation of this parser is Java. Java is open sourced object oriented platform language where libraries of different classes are available easily. One of such classes includes StringBuffer, which can be effectively used to manipulate strings of variable length anytime. This is one of the important features that drag us to using Java, where variable length string can be programmed easily as is required for the parser. This motivated us to use JAVA as our programming language. 2.1 Tools required The following methodology and tools have been applied for the Project Completion: For developing the parser we used the following tools: Eclipse In this project Eclipse acts as an Integrated Development Environment (IDE). For the development of the project, whole programming part is done in this IDE. In this project, different libraries of Java are required along with the java compiler and a console window to run the program, so to do this in one go Eclipse IDE has been used. In this way Eclipse IDE is used. JAVA Java is a computer programming language that is concurrent, class-based, objectoriented, and specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere" (WORA), meaning that code that runs on one platform does not need to be recompiled to run on another. The latest library made available in Eclipse that was used was JRE System Library 7. Page 6

3. Implementation: The grammar given to design the parser was: S ABC A aba ab B b BC Cc cc In order to generate a non-recursive parser, we had to remove any left recursion if was available using the following rule. Removing Left Recursion: AαA Ʃ AAα β A βa Hence the production rule that was used in the program was modified to: S ABC A aba ab B bk KKC E Cc cc where E represented epsilon and K represented B. A single class NPParser is created which is used to define the production rule, calculate the FIRST and FOLLOW, determine and remove the ambiguity of the grammar, create the parse table and analyze the given string using the parser. Several modules have been created within the class that was utilized to create the parser. The following image shows lists of variables and methods used during implementation of the NPParser class. Page 7

Fig. The NPParser class Among the several modules of the NPParser class, the most important modules have been described below: findfirstnfollow(): The function is used to find the FIRST and FOLLOW. The function has been built based upon the algorithm to find FIRST and FOLLOW of each of the terminals and non-terminals. However, the first and follow of only the non-terminals have been stored in the variables first[5] and follow[5] of the five nonterminals S, A, B, K and C. Any duplicate entries in the FIRST and FOLLOW of the terminals have been removed using DuplicateEntryRemoval(StringBuffer, int) module. constructparsetable(stringbuffer): This module is used to construct the parse table using the algorithm after successfully finding the FIRST and FOLLOW of all the Page 8

non-terminals. This function also checks on the multiple entries if occurred on the table. When multiple entries occurred, a new unambiguous grammar was introduced. For the given grammar the following rule removed the ambiguity, which was done using the LL(1) grammar. The construction was done manually using GenerateUnambiguousGrammar(). S ABC A ab B bc KKC E CcC ParseInput(): The given input buffer to be parsed was abbcc. Applying the algorithm, using the stack and the given input, it was checked if the grammar expressed in terms of the parsing table could parse the input. The input was valid for the given grammar. Page 9

4. Output: Using the user defined modules; we can obtain the following result for the non-recursive predictive parser. The Grammar The table: Aα, FIRST(A), FOLLOW(A) The parse table (With Ambiguity) The Unambiguous Grammar Production Rule (NEW) The table: Aα, FIRST(A), FOLLOW(A) for NEW grammar Page 10

The parse table (NEW, Unambiguous) The Input and Parsing of the Input The presence of input $ at the end signifies that the given input can be parsed by the given algorithm. Page 11

5. Conclusion: The program implements a simple non Recursive Predictive Parser. The program uses a static production rule (both unambiguous/non-recursive grammar) and hence the FIRST and FOLLOW is limited to the given symbols only. The program can parse the table limited to these symbols but can be done effectively for others too. The input symbol can be changed statically or changed to ask for dynamic inputs programmatically for the parsing process. Page 12