CSCI Compiler Design

Similar documents
Syntax Analysis Part IV

CSE302: Compiler Design

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Syntax-Directed Translation

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Using an LALR(1) Parser Generator

Compiler Lab. Introduction to tools Lex and Yacc

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Yacc Yet Another Compiler Compiler

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

An Introduction to LEX and YACC. SYSC Programming Languages

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Parsing How parser works?

Principles of Programming Languages

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Yacc: A Syntactic Analysers Generator

Introduction to Lex & Yacc. (flex & bison)

LECTURE 11. Semantic Analysis and Yacc

Context-free grammars

Compiler Construction: Parsing

Lexical and Syntax Analysis

Yacc. Generator of LALR(1) parsers. YACC = Yet Another Compiler Compiler symptom of two facts: Compiler. Compiler. Parser

Syn S t yn a t x a Ana x lysi y s si 1

Compiler construction in4020 lecture 5

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

PRACTICAL CLASS: Flex & Bison

Hyacc comes under the GNU General Public License (Except the hyaccpar file, which comes under BSD License)

Compiler construction 2002 week 5

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Gechstudentszone.wordpress.com

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Compiler Construction

Etienne Bernard eb/textes/minimanlexyacc-english.html

Compiler construction 2005 lecture 5

I 1 : {E E, E E +E, E E E}

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

Compiler Construction

Yacc: Yet Another Compiler-Compiler

Using Lex or Flex. Prof. James L. Frankel Harvard University

Preparing for the ACW Languages & Compilers

Conflicts in LR Parsing and More LR Parsing Types

UNIT III & IV. Bottom up parsing

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

TDDD55 - Compilers and Interpreters Lesson 3

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Lexical and Parser Tools

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Compiler Construction

CSC 467 Lecture 3: Regular Expressions

Building a Parser Part III

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Compiler Construction Assignment 3 Spring 2018

COMPILER CONSTRUCTION Seminar 02 TDDB44

Compil M1 : Front-End

TDDD55- Compilers and Interpreters Lesson 3

UNIVERSITY OF CALIFORNIA

LR Parsing LALR Parser Generators

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Automatic Scanning and Parsing using LEX and YACC

Chapter 2: Syntax Directed Translation and YACC

A Pragmatic Introduction to Abstract Language Analysis with Flex and Bison under Windows DRAFT

LR Parsing of CFG's with Restrictions

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

IBM. UNIX System Services Programming Tools. z/os. Version 2 Release 3 SA

Compilers. Bottom-up Parsing. (original slides by Sam

Lexical Analysis. Introduction

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

LR Parsing LALR Parser Generators

CS415 Compilers. LR Parsing & Error Recovery

Bottom-Up Parsing. Lecture 11-12

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

IN4305 Engineering project Compiler construction

Syntax Analysis The Parser Generator (BYacc/J)

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

L L G E N. Generator of syntax analyzier (parser)

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

In One Slide. Outline. LR Parsing. Table Construction

Syntax Analysis Part VIII

Syntactic Directed Translation

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Parsing and Pattern Recognition

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Programming Project II

Chapter 3 -- Scanner (Lexical Analyzer)

Bottom-Up Parsing. Lecture 11-12

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

LALR Parsing. What Yacc and most compilers employ.

Parser Tools: lex and yacc-style Parsing

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

Transcription:

Syntactic Analysis Automatic Parser Generators: The UNIX YACC Tool Portions of this lecture were adapted from Prof. Pedro Reis Santos s notes for the 2006 Compilers class lectured at IST/UTL in Lisbon, Portugal Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make copies of these materials for their personal use. 1

Outline YACC File Format Lexical Elements Priority and Associativity Semantic Actions Syntactic Analyzer Environment Error Handling Debugging LALR(1) Grammar Conflicts Lex and YACC Example 2

YACC Input File Format Similar to lex input format with 3 regions separated by %% : Declarations: token, union, type, left, right, nonassoc, start and base code declaration between %{ and }% Rules: each production separated by : (instead of ), alternative productions identified by and semantic actions enclosed in braces { and } Code: implementation of functions some of which declared in the first region Declarations %start: declares the grammar s non-terminal goal symbol (start symbol) When absent, it is assumed the goal symbol to be the first symbol in the rules section 3

Lexical Elements, Values and Types %token: declares the lexical elements (excluding isolated characters) to be used in the grammar as terminal symbols (e.g., number or identifier). Several %token declarations may be present without any relative order. %union: defines the data types to be saved in the parser stack. These data types are associated with both terminal (generated by the lexical analyzer) or internally by the syntactic analyzer and associated with non-terminal symbols. %type< x >: associated the data type to the declared symbols. The x elements indicates a specific filed of the union used to represent the data types as in the %union directive. %token< x >: declares the token and corresponding type in a single declaration. 4

Priority and Associativity Possible to impose priority and associativity of rules by explicitly declaration: %left: list of non-terminal symbols that use operators with left associativity %right: list of non-terminal symbols that use operators with right associativity %nonassoc: list of non-terminal symbols that use operators without any associativity Order of declaration defines relative priority First declaration have the least priority Last declaration the highest priority All symbols in the same declaration have the same relative priority 5

Semantic Actions Values of attributes are referred to by their position in the rule: $$: designates the goal objective of the rule, left-hand side symbol $1: designates the first symbol in the right-hand side of the rule. The remaining symbols are numbers in increasing order. $<type>0: designates the first symbol associated with an inherited attribute. The type is needed as the symbol is not explicitly represented in the rule and may not even be uniquely identified. Other symbol in the stack are referred to by negative indices. Semantic actions may return values that can be used in subsequent semantic actions, in the same rule and occupy a position in the rule. For example, X Y{$$=3;}Z{$$=$1*$2+$3} $2 6

Syntatic Analyzer Environment The syntactic analyzer is implemented by the int yyparse() function. It returns 0 if no syntactical errors have been found and returns 1 otherwise Whenever an error is found, the int yyerror(char*) is invoked and the int yynerrs variable is incremented The variables int yystate and int yychar define the internal states of the analyzer according to its LALR(1) table the the current lexical element code The yacc tool generates the y.tab.c file with the parser code and corresponding tables. The -d and -v switches allow the creation of the files y.tab.h (interoperability with lex) and y.output (LALR(1) table in readable format) The file dependency imposed by y.tab.h (with token identification) requires that the parser (using yacc) should be generated first and then the lexical analyzer (using lex). When linking (link), the yacc library should be indicated first by -ly, followed by the lex library, designated by -ll (or -lfl in case of flex) 7

YACC Error Handling The non-terminal symbol (keyword) error, allows the parser to handle error conditions by skipping input terminal symbols until a suitable terminal from its follow set is found. The YYERROR macro may be used in semantic actions to trigger syntactical errors The function yyerror(char*s); is invoked whenever a syntactical error occurs, even if generated by the YYERROR condition The yyerrok macro can be used in semantic actions to allow the parser to continue after a syntactical error has occurred The yyclearin macro can be used in the semantic actions to force the analyzer to reread the lookahead lexical element for error recovery 8

Debugging Static Analysis: generated the syntactic analyzer using the -v switch (yacc -v gram.y) and inspect the y.output file that contains the analyzer states and transitions with the corresponding symbols Dynamic analysis: compile the syntactical analyzer with the YYDEBUG command line definition (gcc -c -DYYDEBUG y.tab.c, or set #define YYDEBUG 1 in the grammar) and set the yydebug variable to 1 (one) at the beginning of the program execution (in main). Note that YYDEBUG will only exist if it has been defined 9

LALR(1) Grammar Conflicts Production items indicate the internal parser state. For X AB there are 3 possible items: X.AB, X A.B and X AB., where the. indicates the parser position in processing this rule There is a conflict if in a given state the parser may be compelled to either shift or reduce as suggested by two of the items associated with that state YACC reports these conflicts as: Reduce/Reduce: there are two distinct items with the. at the end Shift/Reduce: there is an item with the. in the middle (suggesting a shift) and another item with the. at the end (suggesting a reduction) The lookahead token is used to eliminate conflicts in LARL(1) grammars 10

Reduce/Reduce Conflicts By default YACC performs a reduction using the first of the two offending rules in the input file and ignores the other Troubleshooting and Fixing: Identify, in the y.output file the offending rules and determine the symbols generating the conflict Try to factor the productions using common productions Alternatively group the productions and resolve the ambiguity in the semantic actions (e.g., as in polymorphic operations) S : A B A : X Y B : X Z S : A B X A : Y B : Z 11

Shift/Reduce Conflicts By default yacc shifts over reducing Troubleshooting and Fixing: Identify, in the y.output file the offending rules and determine the symbols generating the conflict Enforce priorities (e.g., as in expressions or if-else) Enumerate alternatives (e.g., as in function invocation or for construct) Merge rules, using distributive rules, thus avoiding unnecessary reductions (e.g, as in declarations and instruction lists) 12

Lex and Yacc input source file compiler.y yacc y.tab.c (yyparse) y.tab.h cc compiler compiler.l lex lex.yy.c (yylex) compiled output 13

Lex and Yacc input source file input source file compiler compiled output yylex nexttoken token yyparse compiler stack parse table actions compiled output 14

Example: calc.y %{ #include <stdio.h> %} %union{ int i; } %% int yyerror(char *e){ fprintf(stderr, "ERRO:%s %d\n;",e, yychar); } int main() { %token INT UMINUS %type<i> expr INT %start calc } yyparse(); return 0; %% calc: calc expr ';' { printf("%d\n", $2); } expr ';' { printf("%d\n", $1); } error ';' { printf("discarded input\n"); yyerrok;} ; expr: expr '+' expr {$$ = $1 + $3;} expr '-' expr {$$ = $1 - $3;} expr '*' expr {$$ = $1 * $3;} expr '/' expr {$$ = $1 / $3;} expr '%' expr {$$ = $1 % $3;} '(' expr ')' {$$ = $2;} '-' expr {$$ = - $2;} '+' expr %prec UMINUS {$$ = $2;} INT {$$ = $1;} ; y.tab.h #define INT 257 #define UMINUS 258 typedef union{ int i; } YYSTYPE; extern YYSTYPE yylval; 15

Example: calc.l %{ #include "y.tab.h" %} %% "-" return '-'; "+" return '+'; "*" return '*'; "/" return '/'; "%" return '%'; ";" return ';'; "(" return '('; ")" return ')'; y.tab.h #define INT 257 #define UMINUS 258 typedef union{ int i; } YYSTYPE; extern YYSTYPE yylval; [-+/*%;] return *yytext; [0-9]+ {yylval.i = atoi(yytext); return INT;} [ \t\n]+ ;. yyerror("unrecognised character"); %% 16

Example: Invoking the Tools and Running prompt> yacc -d calc.y yacc: 35 shift/reduce conflicts. prompt> flex calc.l "calc.l", line 17: warning, rule cannot be matched prompt> cc lex.yy.c y.tab.c -ly -ll -o calc prompt> calc 1+2; 3 3-1; 2 <crtl-d> prompt> 17