cmps104a 2002q4 Assignment 3 LALR(1) Parser page 1

Similar documents
A Simple Syntax-Directed Translator

1 Lexical Considerations

Principles of Programming Languages COMP251: Syntax and Grammars

A programming language requires two major definitions A simple one pass compiler

Lexical Considerations

Lexical Considerations

More Assigned Reading and Exercises on Syntax (for Exam 2)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Decaf Language Reference

Compiler Techniques MN1 The nano-c Language

Properties of Regular Expressions and Finite Automata

2.2 Syntax Definition

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

LECTURE 3. Compiler Phases

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Programming Project II

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

This book is licensed under a Creative Commons Attribution 3.0 License

Syntax-Directed Translation. Lecture 14

Time : 1 Hour Max Marks : 30

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Language Reference Manual simplicity

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Semantic actions for declarations and expressions

cmps104a 2002q4 Assignment 2 Lexical Analyzer page 1

The Decaf language 1

Using an LALR(1) Parser Generator

Semantic actions for declarations and expressions. Monday, September 28, 15

The Decaf Language. 1 Lexical considerations

Abstract Syntax Trees

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Syntax Analysis Check syntax and construct abstract syntax tree

CS 536 Midterm Exam Spring 2013

Implementing Actions

COMPILER CONSTRUCTION Seminar 02 TDDB44

Compilers Project 3: Semantic Analyzer

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Optimizing Finite Automata

Syntax and Grammars 1 / 21

TML Language Reference Manual

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Building a Parser Part III

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Decaf Language Reference Manual

Syntax-Directed Translation

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

Intermediate Code Generation

CSE 12 Abstract Syntax Trees

A simple syntax-directed

Project 1: Scheme Pretty-Printer

CMPT 379 Compilers. Parse trees

CS 321 IV. Overview of Compilation

Parsing II Top-down parsing. Comp 412

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Operators. Java operators are classified into three categories:

Introduction to Compilers and Language Design

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

YOLOP Language Reference Manual

CSE 401 Midterm Exam Sample Solution 2/11/15

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

CSE302: Compiler Design

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

ECE251 Midterm practice questions, Fall 2010

Context-free grammars (CFG s)

Chapter 4: Syntax Analyzer

Semantic Analysis. Compiler Architecture

TDDD55 - Compilers and Interpreters Lesson 3

Chapter 9: Dealing with Errors

Yacc: A Syntactic Analysers Generator

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Compiler Lab. Introduction to tools Lex and Yacc

Semantic actions for declarations and expressions

ECE220: Computer Systems and Programming Spring 2018 Honors Section due: Saturday 14 April at 11:59:59 p.m. Code Generation for an LC-3 Compiler

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Principles of Programming Languages COMP251: Syntax and Grammars

Introduction to Compilers and Language Design Copyright (C) 2017 Douglas Thain. All rights reserved.

How do LL(1) Parsers Build Syntax Trees?

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

LECTURE 7. Lex and Intro to Parsing

CMSC 330: Organization of Programming Languages

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

The SPL Programming Language Reference Manual

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

CPS 506 Comparative Programming Languages. Syntax Specification

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

PLT 4115 LRM: JaTesté

Transcription:

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 1 $Id: asg3-parser.mm,v 327.1 2002-10-07 13:59:46-07 - - $ 1. Summary Write a main program, string table manager, lexical analyzer, and parser for the language c0 that you will be compiling this quarter. The usage and options were described in the first assignment. Include options to generate the.str file, and the.tok file, as before. In addition, dump the abstract syntax tree into the.ast file. The -a option will be used to request that the.ast file be generated. And remember that the -v option causes all lower case options, including -a, to be turned on. Use bison to generate the parser. The main program calls yyparse() with no arguments. The result is 0 for a successful parse and 1 for a failed parse. However, it does not register your own errors. Include your own error handler which counts errors. Return zero back to Unix on success and non-zero if any error messages were generated. IMPORTANT : You must implement all of the options from previous assignments in this assignment. And options from this assignment must be implemented in later assignments. For debugging, you must implement both L and Y which turn on, respectively, the scanner s debugger and the parser s debugger. 2. The Metagrammar When reading the grammar of c0, it is important to distinguish between the grammar and the metagrammar. the metagrammar is the grammar that describes the grammar. You must also use your knowledge of C to fill in what is only implied. The metalanguage redundantly uses fonts and typography to represent concepts for the benefit of those reading this document via simple ASCII text. It looks prettier in the Postscript version. The notation used is : x... [x] [x]... x y while symbol Three dots means that the preceding symbol occurs one or more times. Square brackets indicate the the symbol(s) occurs zero times or once. Square brackets and three dots mean that the symbol(s) occur zero or more times. Astick indicates alternation between its left and right operands. Symbols representing themselves and written in Courier-Bold. Nonterminal symbols in the grammar are written in lower-case Roman. IDENT Token classes with lexical information are written in upper-case ITALIC. 3. The Grammar Following is the context-free syntax of c0. You will need to translate it into an LALR(1) grammar acceptable to bison. You may, of course, take advantage of bison s capacity to handle ambiguous grammars. The dangling else problem in the grammar below istoberesolved in the usual way. Operator precedence is given inaseparate table. program [[decl ] ; function ]... decl declobj type function fntype fnobj type declobj * declobj object char int void fntype params fnblock type fnobj * fnobj IDENT params ( [decl [, decl ]... ] ) fnblock { [decl ; stmt ]... } stmt { [stmt ]... } while ( expr ) stmt if ( expr ) stmt [ else stmt ] return [expr ] ; [expr ] ;

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 2 expr ( expr ) expr BINOP expr UNOP expr IDENT ( [expr [, expr ]... ] ) object CHAR_LIT INT_LIT STRING_LIT object IDENT [ [ [ expr ] ] ] Note that you will not be able to feed to grammar above to bison, because it will not be able to handle BINOP and UNOP as you might expect. You will need to explicitly enumerate all possible rules with operators in them. However, using bison s operator precedence declarations, the number of necessary rules will be reduced. Following is a table of operator precedence and associativity. Itisthe same as that of C, except that C has more operators. Operators Arity Associativity Precedence if else ternary right lowest = binary right. <<=>>= binary left. ==!= binary left. +- binary left. */% binary left. +-&* unary right. () [] variable left highest 4. Semantic Information Void declarations are syntactically valid but semantically invalid, and will be caught later during symbol table insertion. That is, the declaration, «void foo;» should be accepted by the parser, as it is too much trouble to suppress it. Your symbol table handler will figure out that this is wrong. In general, you should be fairly forgiving in the parser and accept things which are not strictly valid and the later put on a semantic check to reject the error. For example, the grammar above gives the idea that an expression is optional on a return statement. That is not true. It is either required or prohibited, dependong on whether the function is void or non-void. However, you can not determine that until you have a symbol table. Declarations of objects above imply an indefinitely large number of pointer indirections. Example : «int ****x;» is syntactically valid and the parser will accept this. However, your symbol table manager might generate a semantic error rejecting this. Attempting to make the parser reject this is actually more work that accepting it. 5. Required output to the.ast file If the -a option is set then a file with a.ast suffix will be created with a symbolic representation of the Abstract Syntax Tree after the parse is complete. (Note : this is not the parse tree.) This file, like the string table file, can be opened, dumped to, and closed in the same function. This function will call a recursive tree-walker function that will dump to the file using a prefix depth-first walk. For example, if part of the input file is int *mul_int( char num[], int int_num ) { num[ 0 ] = num[ 0 ] + int_num; } then part of the output file might be :

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 3 0.000 3e088 {} 14.003 3ea70 int 14.007 3eae0 * 14.008 3eb50 mul_int 14.015 3ebc0 () 14.017 3ec30 char 14.023 3eca0 [] 14.022 3ed10 num 14.027 3ed80 int 14.031 3edf0 int_num 15.003 3ee60 {} 16.015 3eed0 = 16.009 3ef40 [] 16.006 3efb0 num 16.011 3f020 0 16.025 3f090 + 16.019 3f100 [] 16.017 3f170 num 16.021 3f1e0 0 16.027 3f250 int_num The first column is the serial number divided by 1000, the second column is the node s address in hexadecimal (%p) format, and the final column is the lexical information from the token node, indented by three times the depth of the node from the root of the tree. Note that if you examine the.ast files in the samples directory, you will see that they contain more information that you can generate at the present time. The numbers in parentheses following the token are references to the declaration of variables and can not be generated without the symbol table. The information after that comes from the code generator 6. Pictures of AST parts Following are pictures of the abstract syntax trees to be constructed from each reduction. The mathematical structure of the trees are shown, not internal links, so which exact set of pointers you use will depend on your tree implementation. Nodes may have zero, one, two, three, or more than three child nodes. Pic has been used to draw pictures of each of them, but you can only see those pictures in the Postscript version. Pic more or less has a hissyfit when asked to generate ASCII, but makes a valiant, though inadequate, effort. In each of the picture labels, which is all you ll see in the ASCII version, the first word on the line is the root of the tree, and the rest are its child nodes. 1 6.1 Binary operator The subscript operator, «[]»behaves asabinary operator when it has an expression in the subscript position. BINOP [] expr expr IDENT expr 6.2 Unary operator The unary operators are : «+», «-», «&», and «*». The subscript operator, «[]»behaves asaunary operator when it has no expression in the subscript position. 1. After all, this isn t rec.arts.ascii or alt.ascii-art.

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 4 UNOP [] expr IDENT 6.3 Function call The function call operator «()»has any number of arguments, but at least one. Its first argument is to the left of the parentheses, and its others, if any, are between the parentheses and separated by commas. Throw away the commas and the right parenthesis. Use the left parenthesis as the operator. () IDENT expr expr expr 6.4 Object declaration Each type mark serves as the root of a declaration subtree. The leaf of such a tree is the identifier. Any pointer indications are at intermediate levels of the tree. See the final diagram of a function for an example. 6.5 Block of declarations and statements. A block of declarations and statements begins and ends with braces and contains sequences of declarations and statements in between. When building a statement, discard the semi-colon. When building sequences thereof, discard the trailing «}» and use the leading «{}»asthe operator which is the parent of the declarations and statements. Then chain them together in the expected way : {} decl decl stmt stmt 6.6 Control structures The control structures «while»and «if»use the keyword as the root of a tree whose child nodes are arranged in the same way as for a binary operator. The expression at the start of the control structures is the first operand and the statement is the second operand. Note : one and only one statement is allowed, but this may be a block statement with a «{» atthe root. In the case of an «if»-«else», the the «if»operator is a ternary operator, with the operand of «else»asthe third operand. The node «else»itself is discarded. An «if»without an «else»looks just like a«while».

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 5 while if expr stmt expr stmt stmt 6.7 Return statement A«return»statement is either a unary or a nilary operator. return return expr 6.8 Tree from afunction declaration A function declaration has the same tree as the corresponding variable declaration, except that the parameter list and statement block are hung off the type mark. For example : int *foo( int bar, int *baz ) { *baz = bar; return baz; } Would produce the following tree. Note that this tree is somewhat different from similar trees in the sample test data. When there is an inconsistency, follow the specifications here, and not the sample test data.

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 6 int * () {} foo int int = return bar * * bar baz baz baz 7. Beginning the Grammar The first rules should be: start : program { $$ = ast_semantics( $1 ); } ; program : program decl { $$ = adopt( $1, $2, NULL ); } program func { $$ = adopt( $1, $2, NULL ); } {$$ = make_root_token( { ); } ; That is, the parser will be called to parse a complete program. After this is done, the root node of the parse tree will be passed to the semantic routines. These semantic routines will walk the tree and dump it symbolically into the.ast file. Before doing that, it will walk the tree, annotating it with symbol table attributes (but not until project 4), and then walk it again generating intermediate code (in project 5). Note that you will have a problem linking top-level constructs together, so you will need to create a root node under which to hang everything else. make_root_token() will just call make_token() from the previous project as a specialcase kluge 2. But before doing so, it must fiddle with yytext. This means that the final ast will be a list of decls and funcs linked by the follow links. 2. See /afs/cats.ucsc.edu/courses/cmps104a-wm/jargon/, the Jargon file, version 4.2.3. See «kluge».