Using an LALR(1) Parser Generator

Similar documents
Yacc: A Syntactic Analysers Generator

Syntax Analysis Part IV

Syntax-Directed Translation

Yacc Yet Another Compiler Compiler

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compiler Lab. Introduction to tools Lex and Yacc

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Context-free grammars

Parsing How parser works?

CSCI Compiler Design

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Compiler Construction: Parsing

CSE302: Compiler Design

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

An Introduction to LEX and YACC. SYSC Programming Languages

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Lexical and Syntax Analysis

Yacc. Generator of LALR(1) parsers. YACC = Yet Another Compiler Compiler symptom of two facts: Compiler. Compiler. Parser

LECTURE 11. Semantic Analysis and Yacc

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

COMPILER CONSTRUCTION Seminar 02 TDDB44

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

TDDD55- Compilers and Interpreters Lesson 3

TDDD55 - Compilers and Interpreters Lesson 3

Principles of Programming Languages

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Syntax Analysis The Parser Generator (BYacc/J)

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Lex and Yacc. More Details

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

PRACTICAL CLASS: Flex & Bison

LR Parsing of CFG's with Restrictions

Compiler construction in4020 lecture 5

CS 403: Scanning and Parsing

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Context-free grammars (CFG s)

Bottom-Up Parsing. Lecture 11-12

Hyacc comes under the GNU General Public License (Except the hyaccpar file, which comes under BSD License)

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSC 467 Lecture 3: Regular Expressions

Compiler Construction

Chapter 2: Syntax Directed Translation and YACC

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

LR Parsing LALR Parser Generators

Introduction to Lex & Yacc. (flex & bison)

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Compiler Construction Assignment 3 Spring 2018

CS415 Compilers. LR Parsing & Error Recovery

UNIVERSITY OF CALIFORNIA

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Programming Project II

Compiler construction 2002 week 5

Yacc: Yet Another Compiler-Compiler

Chapter 3: Lexing and Parsing

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

COP5621 Exam 3 - Spring 2005

Bottom-Up Parsing. Lecture 11-12

Syntax-Directed Translation. Introduction

Compiler Construction

Wednesday, August 31, Parsers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Compiler construction 2005 lecture 5

Compilers. Bottom-up Parsing. (original slides by Sam

UNIT III & IV. Bottom up parsing

Conflicts in LR Parsing and More LR Parsing Types

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

Syn S t yn a t x a Ana x lysi y s si 1

Lecture 8: Deterministic Bottom-Up Parsing

HW8 Use Lex/Yacc to Turn this: Into this: Lex and Yacc. Lex / Yacc History. A Quick Tour. if myvar == 6.02e23**2 then f(..!

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

LR Parsing LALR Parser Generators

Monday, September 13, Parsers

Automatic Scanning and Parsing using LEX and YACC

Gechstudentszone.wordpress.com

Semantic actions for declarations and expressions

Lecture 7: Deterministic Bottom-Up Parsing

Semantic actions for declarations and expressions. Monday, September 28, 15

Chapter 3 -- Scanner (Lexical Analyzer)

Syntax Intro and Overview. Syntax

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Parser Tools: lex and yacc-style Parsing

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Topic 5: Syntax Analysis III

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Lex and Yacc. A Quick Tour

Properties of Regular Expressions and Finite Automata

FROWN An LALR(k) Parser Generator

Transcription:

Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated parser, not an entire compiler One popular compatible version of Yacc is Bison Part of the Gnu software distributed by the Free Software Foundation Input to yacc is called yacc specification The.y extention is a convention for yacc specification (example: parser.y) Yacc produces an entire parser module in C Parser module can be compiled and linked to other modules yacc parser.y (command produces y.tab.c) cc c y.tab.c (command produces y.tab.o) y.tab.o can be linked to other object files Yacc: an LALR(1) Parser Generator 1

Yacc Basics The parser generated by yacc is a C function called yyparse( ) yyparse( ) is an LALR(1) parser yyparse( ) calls yylex( ) repeatedly to obtain the next input token The function yylex( ) can be hand-coded in C or generated by lex yyparse( ) returns an integer value 0 is returned if parsing succeeds and end of file is reached 1 is returned if parsing fails due to a syntax error A yacc specification file has three sections: declarations %% productions %% user subroutines The %% separates between sections Yacc: an LALR(1) Parser Generator 2

Example of a Yacc Specification The following is a yacc specification for an expression sequence %token NUMBER 300 %% ExprSeq : ExprSeq, Expr Expr ; Expr : Expr + Term Expr - Term Term ; Term : Term * Factor Term / Factor Factor ; Factor : ( Expr ) NUMBER ; Token Declaration Production Rules Yacc: an LALR(1) Parser Generator 3

Yacc Declarations The first section can include a variety of declarations A literal block is C code delimited by %{ and %} Ordinary C declarations and #include directives are placed in literal block Declarations made in literal block can be used in second and third sections Tokens should be declared in the first section Tokens can either be named or quoted character literals (example, + ) Named tokens must be declared to distinguish them from non-terminals A token declaration is of the form: %token token1 value1 token2 value2... The integer values define the token codes used by the scanner All declared tokens should have positive code values Assignment of code values to tokens is optional Tokens not assigned an explicit code receive an implicit code value Yacc: an LALR(1) Parser Generator 4

Production Rules The productions section defines the grammar that will be parsed Productions are of the form: A : X 1... X n ; A is a non-terminal on the left-hand side of the production X 1... X n are zero or more grammar symbols on the right-hand side A production may span multiple lines and should terminate with a semicolon A sequence of productions with same LHS may be written as: A : X 1... X l ; A : Y 1... Y m ; A : Z 1... Z n ; Equivalently, it may be written as: A : X 1... X l Y 1... Y m Z 1... Z n ; Yacc: an LALR(1) Parser Generator 5

Start Symbol and Auxiliary Code The LHS of first production is assumed to be the start symbol You may also declare the start symbol in the declaration section %start name This will make name as the start symbol Required when start symbol does not appear on LHS of first production Additional code can be provided in third section as necessary Example: yylex() can be implemented in third section Alternatively, yylex() can be generated by lex Error reporting and recovery routines may be added as well Yacc: an LALR(1) Parser Generator 6

Attribute Values and Semantic Actions Every grammar symbol has an associated attribute value An attribute value can represent anything we choose The value of an expression The data type of an expression The translated code Yacc associates an attribute with every token and non-terminal Token attributes are returned by the scanner in the yylval variable Non-terminal attributes are computed while parsing Attribute values are pushed an popped on a semantic stack The semantic stack operates in parallel with the parser stack A semantic action in Yacc is a code fragment delimited by { } Executed when yacc matches a rule in the grammar Semantic Actions can be used to make calls to semantic routines Yacc: an LALR(1) Parser Generator 7

Yacc Specification for a Simple Calculator %{ #include <stdio.h> extern int yylex(); Literal Block %} %token NUMBER 300 %% ExprSeq : ExprSeq, Expr {printf("%d\n",$3);} Expr ; {printf("%d\n",$1);} Expr : Expr + Term {$$ = $1 + $3;} Expr - Term {$$ = $1 - $3;} Term ; {$$ = $1;} Term : Term * Factor {$$ = $1 * $3;} Term / Factor {$$ = $1 / $3;} Factor ; {$$ = $1;} Factor : ( Expr ) {$$ = $2;} NUMBER ; {$$ = $1;} Production Rules and Semantic Actions Yacc: an LALR(1) Parser Generator 8

The $ Notation The $ notation in Yacc is used to represent the attribute values $$ is the attribute value of the left-hand side nonterminal $1, $2,... are the attribute values of right-hand side symbols $1 is attribute value of first symbol, $2 is attribute of second, etc Yacc uses the $ notation to locate attributes on semantic stack Consider A : X 1... X n ; Just before reducing the above production $1 = stack [top n+1], $2 = stack [top n+2],, $n = stack [top] When reducing the above production $n,, $2, $1 are popped from semantic stack $$ is pushed on top of semantic stack in place of $1 top = top n + 1 ; stack [top] = $$ Yacc: an LALR(1) Parser Generator 9

Attribute Data Types and %union Unless explicitly specified, the default type of attributes is int The types of $$, $1, $2, is integer by default Suppose, we want floating-point values for numbers We can change the types of $$, $1, $2, to double by placing #define YYSTYPE double in the literal block The elements of the semantic stack are of type YYSTYPE In general, we may associate different types to different attributes The %union declaration identifies all possible attribute types %union {... field declarations... } The fields of a %union declaration are copied into a C union YYSTYPE is defined to be the C union type Yacc puts the generated C union in the generated output file Yacc: an LALR(1) Parser Generator 10

%union and %type Declarations Example of a %union declaration %union { Operator op; char* name; Treenode* tree; Symbol* sym; } We associate the fields in %union with tokens and nonterminals A %token declaration may specify the attribute type of a token %token <name> ID %token <op> ADDOP MULOP The attribute of ADDOP and MULOP is an op of type Operator The attribute of ID is a name of type char* Type of a nonterminal is specified with a %type declaration %type <tree> Expr Term Factor The attribute of Expr, Term, and Factor is a tree of type Treenode* Yacc: an LALR(1) Parser Generator 11

Generating Syntax Trees for Expressions %union { Operator op; char* name; Treenode* tree; Symbol* sym; } %token <op> ADDOP MULOP %token <name> ID %token <sym> NUMBER %type <tree> E T F %% E : E ADDOP T {$$ = new Treenode($2, $1, $3);}; E : T {$$ = $1;}; T : T MULOP F {$$ = new Treenode($2, $1, $3);}; T : F {$$ = $1;}; F : ( E ) {$$ = $2;}; F : ID {$$ = (Treenode*) idtable.lookup($1);}; F : NUMBER {$$ = (Treenode*) $1;}; Yacc: an LALR(1) Parser Generator 12

Ambiguity and Conflicts in Yacc Parser generators of all varieties reject ambiguous grammars Ambiguous grammars fail to be LR(k) for any value of k Yacc will report conflicts: shift-reduce and reduce-reduce In some cases, a conflict is due to ambiguity in the grammar In other cases, a conflict is a limitation of the LALR(1) method Only one token of lookahead is used by Yacc A grammar may require more than one token of lookahead A shift-reduce conflict occurs when two parses exist One of the parses completes a production rule - the reduce action A second parse shifts a token - the shift action A reduce-reduce conflict occurs when Same lookahead token could complete two different productions Yacc: an LALR(1) Parser Generator 13

Ambiguity and Conflicts cont'd Example of a shift-reduce conflict: E : E + E id ; For the input id + id + id there are two parses: (id + id) + id that uses the reduce action, and id + (id + id) that uses the shift action Yacc always chooses the shift action in a shift-reduce conflict Example of a reduce-reduce conflict: S : X Y ; X : A ; Y : A ; Reduce-reduce conflicts represent mistakes in the grammar Yacc reduces the first production in a reduce-reduce conflict Yacc: an LALR(1) Parser Generator 14

More on Conflicts Having two productions with the same right-hand side does not imply a reduce-reduce conflict the following example does not cause any conflict Lookahead token uniquely determines the production to be reduced S : X b Y c ; X : A ; Y : A ; Some reduce-reduce conflicts are due to the limitations of Yacc Reduce-reduce conflict caused by the limitation of LALR(1) S : X B c Y B d ; X : A ; Y : A ; Yacc: an LALR(1) Parser Generator 15

Using Yacc with Ambiguous Grammars Ambiguity, if controlled, can be of value An ambiguous grammar provides a shorter specification Can be more natural than any equivalent unambiguous grammar Produces more efficient parsers for real programming languages For language constructs like expressions An ambiguous grammar is more natural and more efficient E : E ADDOP E E MULOP E '(' E ')' ID Operator precedence and associativity eliminate the ambiguity Most binary operators, like +,, *, and /, are left-associative Few, such as the exponentiation operator, are right-associative Few, typically the relational operators, do not associate at all Two relational operators cannot be combined at all Yacc: an LALR(1) Parser Generator 16

Operator Precedence and Associativity Yacc provides operator precedence and associativity rules for Eliminating ambiguity and resolving shift-reduce conflicts Example on precedence and associativity of operators: %nonassoc RELOP %left ADDOP %left MULOP %right EXPOP The order of declarations defines precedence of operators RELOP has least precedence and EXPOP has the highest ADDOP has higher precedence than RELOP %left declarations means left-associative %right declarations means right-associative %nonassoc declarations means non-associative Yacc: an LALR(1) Parser Generator 17

Resolving Conflicts The operator precedence and associativity resolve conflicts Given the two productions: E : E op1 E ; E : E op2 E ; Suppose E op1 E is on top of parser stack and next token is op2 If op2 has a higher precedence than op1, we shift If op2 has a lower precedence than op1, we reduce If op2 has an equal precedence to op1, we use associativity If op1 and op2 are left-associative, we reduce If op1 and op2 are right-associative, we shift If op1 and op2 are non-associative, we have a syntax error Yacc: an LALR(1) Parser Generator 18