Computing Inside The Parser Syntax-Directed Translation. Comp 412

Similar documents
Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Building a Parser Part III

Introduction to Parsing. Comp 412

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Context-sensitive Analysis Part IV Ad-hoc syntax-directed translation, Symbol Tables, andtypes

Syntax Analysis, III Comp 412

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing II Top-down parsing. Comp 412

Syntax Analysis, III Comp 412

Intermediate Representations

Handling Assignment Comp 412

Bottom-up Parser. Jungsik Choi

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

The Software Stack: From Assembly Language to Machine Code

Intermediate Representations

CS 406/534 Compiler Construction Putting It All Together

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Instruction Selection: Preliminaries. Comp 412

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Implementing Control Flow Constructs Comp 412

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Naming in OOLs and Storage Layout Comp 412

Runtime Support for Algol-Like Languages Comp 412

CS 406/534 Compiler Construction Parsing Part I

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

CSCI312 Principles of Programming Languages

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Combining Optimizations: Sparse Conditional Constant Propagation

The View from 35,000 Feet

Alternatives for semantic processing

Procedure and Function Calls, Part II. Comp 412 COMP 412 FALL Chapter 6 in EaC2e. target code. source code Front End Optimizer Back End

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS415 Compilers. Intermediate Represeation & Code Generation

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Runtime Support for OOLs Object Records, Code Vectors, Inheritance Comp 412

Syntactic Analysis. Top-Down Parsing

Code Shape II Expressions & Assignment

Code Shape Comp 412 COMP 412 FALL Chapters 4, 5, 6 & 7 in EaC2e. source code. IR IR target. code. Front End Optimizer Back End

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Syntax-Directed Translation. Lecture 14

Arrays and Functions

Lexical Analysis. Introduction

Grammars. CS434 Lecture 15 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Context-sensitive Analysis. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

MIDTERM EXAM (Solutions)

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS415 Compilers. Lexical Analysis

The Processor Memory Hierarchy

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Register Allocation. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Chapter 4. Lexical and Syntax Analysis

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Peace cannot be kept by force; it can only be achieved by understanding. Albert Einstein

Context-free grammars (CFG s)

Software II: Principles of Programming Languages

Chapter 4 :: Semantic Analysis

Parsing Part II (Top-down parsing, left-recursion removal)

What do Compilers Produce?

Context-free grammars

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Lexical and Syntax Analysis. Top-Down Parsing

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

CST-402(T): Language Processors

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compiler Construction: Parsing

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Class Information ANNOUCEMENTS

CSCI 565 Compiler Design and Implementation Spring 2014

Introduction to Parsing

Local Optimization: Value Numbering The Desert Island Optimization. Comp 412 COMP 412 FALL Chapter 8 in EaC2e. target code

COP4020 Programming Assignment 2 - Fall 2016

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Compilers and Interpreters

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

4. An interpreter is a program that

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Optimizer. Defining and preserving the meaning of the program

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Front End. Hwansoo Han

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

COMPILER DESIGN. For COMPUTER SCIENCE

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS606- compiler instruction Solved MCQS From Midterm Papers

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Chapter 4: LR Parsing

Lab 3, Tutorial 1 Comp 412

CS 415 Midterm Exam Spring 2002

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Transcription:

COMP 412 FALL 2018 Computing Inside The Parser Syntax-Directed Translation Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Chapter 4 in EaC2e

Midterm Exam Thursday October 18, @ 7PM Herzstein Amphitheater The Midterm will cover Class content through last class Chapters 1, 2, 3, and 5 in the book Local register allocation material from the Chapter 13 excerpt The format will be Closed book, closed notes, closed devices Designed as a two-hour exam COMP 412, Fall 2018 1

Where are we? Where are we going? In the book: Chapter 3 focused on the membership question Given a grammar G, is some sentence s L(G)? We developed efficient techniques to recognize languages To make useful tools, we need to go well beyond syntax Chapter 5 provides an overview of Intermediate Representations Covered the representation of code in the last two lectures Representation of name spaces coming in the near future Chapter 4 focuses on computing in the parser Strong emphasis on attribute grammars in the written chapter We will ignore that material & take a more ad-hoc, pragmatic approach Lecture will emphasize the applications in compilers Chapters 6, & 7 focus on the translation of language features into compiled code runtime support & code shape We will skip around wildly in these two chapters COMP 412, Fall 2018 2

Now We Know How To Build Parsers What do we want to do with them? Parsers have many applications Obvious answer is to build IR that will be compiled or interpreted Reading markup languages: XML, EPUB, GML & KML, RSS and building data structures to represent the content of various ML documents The COMP 412 answer Reading other data representations to transmit rich and structured data Reading in the results of other language processing tools: e.g., EDIF We use parsers to understand syntax General Schema Read some input Build some data structures to model the input Perform some structured computation on the data structures COMP 412, Fall 2018 3

COMP 412 Is About Compiler Construction What else does a compiler need to know about the input program? Short answer: everything COMP 412, Fall 2018 4

COMP 412 Is About Compiler Construction What else does a compiler need to know about the input program? Compiler must assign storage to every item that needs storage Compiler must generate code for every construct that executes Compiler must plan resource allocation, use, & recycling Compiler must ensure coherent use of values & information (types) The compiler is guided in this effort by the actual code, the language definition, and its knowledge about the runtime environment. Meta Issue: The compiler does its work at compile time The compiler emits code that performs the work at runtime Keeping track of the times (design, compile, run) is critical to your understanding of the material COMP 412, Fall 2018 5

Example: Consistent Use of Values To ensure consistent use of values, PLs introduce type systems Type systems allow the language designer to express constraints that are deeper than syntax Type systems allow the programmer to write down facts & intentions that are deeper than syntax By comparing constraints against facts & expressed intentions, the compiler can often spot subtle and (otherwise) difficult to find errors Type Checking Requires that the language have a well-designed type system Requires (often) that the programmer include additional information Requires that the parser gather some base information on the code Requires that the semantic elaborator apply a set of inference rules Inference rules range from simple (C) to complex (ML, OCAML) Design Compile Compile COMP 412, Fall 2018 6

Example: Overloading of Names To allow reuse of individual names, PLs introduce scoping rules Procedures start with a clean name space, as do blocks in some PLs Programmer can choose names, largely ignoring names used elsewhere Exception: declared names of global values and interfaces The part of speech, identifier, does not encode enough information for the compiler to generate correct code Compilers must recognize, model, & understand the scopes in a program Compile-time and Runtime support Managing the application s name space requires support during translation (at compile time) and during execution (at runtime) Compilers build tables to encode knowledge at compile time Compilers emit code to build, maintain, & use runtime structures that support proper name resolution at runtime COMP 412, Fall 2018 7

Example: Storage Layout To access data, running code needs an address in runtime RAM Running program cannot issue a load or a branch without an address Every item of code or data requires an address Addresses are in the virtual address space of a new process Meta Issue: that process won t exist until a user runs the compiled code Meta Issue: code may run on different hardware and OS versions The compiler must manage all decisions about storage layout, for the entire program (not just one procedure). Storage Layout Semantic elaborator must lay out storage before it generates code For each item of code or data, the compiler must either: Assign a symbolic address to each item, or Implement a scheme whereby its address can be computed COMP 412, Fall 2018 8

Example: Storage Layout The compiler plans storage layout It takes cooperation to run the code Processes Virt. Addr. Space Virt. Addr. Space Virt. Addr. Space Compiler Loader Virt. Addr. Space Virt. Addr. Space Virt. Addr. Space.asm Assembler.o Pulls a.out into the virtual address space and jumps to _main_ Virt. Addr. Space Loader Operating System s Paging Mechanism Virt. Addr. Space Virt. Addr. Space Running code creates programmer s objects Linker a.out Memory Hierarchy Data & Code Application Build Time % a.out Core Functional unit Functional unit Registers Functional unit Functional unit COMP 412, Fall 2018 9

Example: Separate Compilation Programmers write code in modules & combine modules into programs Compiler typically sees one module, or file, at a time Compilation for a module is, largely, context free Compiled code must link with other pre-compiled modules and/or precompiled library routines & system calls Planning The compiler-writer must plan & standardize the ways in which the compiler will translate the source code, to create standard conventions for naming code & data, & for calling other procedures Conventions for naming code and data segments Conventions for calls & returns, for memory use, for system calls, The compiler must emit code to implement those plans Design Compile COMP 412, Fall 2018 10

Example: Storage Layout The earlier picture was overly simple Compiler Compiler Assembler.o Compiler Assembler Individual compile and assemble steps with 1,000 SLOC files, on average, it will take 1,000 compiler-assemble steps to compile a 1,000,000 SLOC application.o Compiler Assembler.o Assembler.o Compiler Assembler Compiler.o Assembler Loader Pulls a.out into the virtual address space and jumps to _main_ Application Build Time.o files & libraries.o Linker a.out COMP 412, Fall 2018 11

Syntax-Directed Translation All of these issues play into SDT Answers depend on computation over values, not parts of speech Questions & answers involve non-local information Questions and answers are deeper than syntax How can we answer these questions? Use formal methods Attribute grammars, rule-based systems Use ad-hoc techniques Symbol tables, ad-hoc code Formalisms work well for scanning & parsing. Real compilers use them. For these context-sensitive issues, ad-hoc techniques dominate practice. COMP 412, Fall 2018 12

Example Computing the value of an unsigned integer Consider the simple grammar 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon One obvious use of the grammar is to convert an ASCII string of the number to its integer value Build computation into parser An easy intro to syntax-directed translation COMP 412, Fall 2018 13

Example Computing the value of an unsigned integer Consider the simple grammar 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon One obvious use of the grammar is to convert an ASCII string of the number to its integer value Build computation into parser An easy intro to syntax-directed translation And its recursive-descent parser boolean Number( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return false; boolean DigitList( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return true; /* epsilon case */ COMP This example 412, Fall is 2018 for illustrative purposes. You don t need a CFG to recognize integers. 14

Example Computing the value of an unsigned integer Consider the simple grammar 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon Any assembly programmer will tell you to accumulate the value, left to right, multiplying by the left-context value by 10 at each step. e.g., 147 = (1 *10 + 4) * 10 + 7 and to get the value of character d by computing d 0 And its recursive-descent parser boolean Number( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return false; boolean DigitList( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return true; /* epsilon case */ COMP The programming 412, Fall 2018 trick, however, is undoubtedly how atoi() works. 15

Example 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon Computing the value of an unsigned integer boolean Number( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return false; The Plan 1. Make Number() and DigitList() take value as an argument and return a (boolean,value) pair boolean DigitList( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return true; /* epsilon case */ COMP 412, Fall 2018 16

Example 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon Computing the value of an unsigned integer boolean Number( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return false; The Plan 1. Make Number() and DigitList() take value as an argument and return a (boolean,value) pair 2. Compute initial digit in Number() boolean DigitList( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return true; /* epsilon case */ COMP 412, Fall 2018 17

Example 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon Computing the value of an unsigned integer boolean Number( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return false; The Plan 1. Make Number() and DigitList() take value as an argument and return a (boolean,value) pair 2. Compute initial digit in Number() 3. Add second & subsequent digits in DigitList() boolean DigitList( ) { if (word = digit) then { word = NextWord( ); return DigitList( ); else return true; /* epsilon case */ COMP 412, Fall 2018 18

Example 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon Computing the value of an unsigned integer pair (boolean, int) Number( value ) { if (word = digit) then { value = ValueOf( digit ); word = NextWord( ); return DigitList( value ); else return (false, invalid value); pair (boolean, int) DigitList( value ) { if (word = digit) then { value = value * 10 + ValueOf( digit ); word = NextWord( ); return DigitList( value ); else return (true, value); The Plan 1. Make Number() and DigitList() take value as an argument and return a (boolean,value) pair 2. Compute initial digit in Number() 3. Add second & subsequent digits in DigitList() Int ValueOf ( char d ) { return (int) d (int) 0 ; COMP Obvious, 412, invented Fall 2018 syntax for a pair 19

Example Computing the value of an unsigned integer Consider the simple grammar 1 Number digit DigitList 2 DigitList digit DigitList 3 epsilon Ok, so it works with recursive descent. Does it work with an LR(1) parser? COMP 412, Fall 2018 20

Example Computing the value of an unsigned integer Consider the left-recursive grammar 1 Number DigitList 2 DigitList DigitList digit 3 digit ACTION GOTO digit EOF DigitList s 0 s 2 1 s 1 s 3 acc s 2 r 3 r 3 s 3 r 2 r 2 We would like to augment the grammar with actions that compute the value of the number Cannot encode this computation into the context-free syntax Relies on the lexeme of each digit, rather than its part of speech COMP 412, Fall 2018 21

Example 1 Number DigitList 2 DigitList DigitList digit 3 digit Parse the number 976 State Lookahead Stack Handle Action digit (9) $ 0 none 0 digit (9) $ 0 none shift 2 2 digit (7) $ 0 9 2 DL digit reduce 3 1 digit (7) $ 0 DL 1 none shift 3 3 digit (6) $ 0 DL 1 7 3 DL DL digit reduce 2 1 digit (6) $ 0 DL 1 none shift 3 3 EOF $ 0 DL 1 6 3 DL DL digit reduce 2 1 EOF $ 0 DL 1 Number DL accept ACTION GOTO digit EOF DigitList s 0 s 2 1 Notice that it reduces the 9 with DL digit, s 1 s 3 acc and the others with DL DL digit. s 2 r 3 r 3 Rightmost derivation in reverse! s COMP 3 r 2 r 2 412, Fall 2018 22

Example Parse the number 976 State Lookahead Stack Handle Action digit (9) $ 0 none 0 digit (9) $ 0 none shift 2 2 digit (7) $ 0 9 2 DL digit reduce 3 1 digit (7) $ 0 DL 1 none shift 3 3 digit (6) $ 0 DL 1 7 3 DL DL digit reduce 2 1 digit (6) $ 0 DL 1 none shift 3 3 EOF $ 0 DL 1 6 3 DL DL digit reduce 2 1 EOF $ 0 DL 1 Number DL accept Two cases that correspond to the actions in our recursive descent parser The leftmost digit should just set value to ValueOf (digit) Subsequent digits should compute value * 10 + ValueOf (digit) Suggests that we perform the calculations when the parser reduces COMP 412, Fall 2018 23

Example Parse the number 976 State Lookahead Stack Handle Action digit (9) $ 0 none 0 digit (9) $ 0 none shift 2 2 digit (7) $ 0 9 2 DL digit reduce 3 1 digit (7) $ 0 DL 1 none shift 3 3 digit (6) $ 0 DL 1 7 3 DL DL digit reduce 2 1 digit (6) $ 0 DL 1 none shift 3 3 EOF $ 0 DL 1 6 3 DL DL digit reduce 2 1 EOF $ 0 DL 1 Number DL accept Computing on reductions Reduction by production 3 should set value to ValueOf(digit) Reduction by 2 should compute value = value * 10 + ValueOf (digit) COMP 412, Fall 2018 24

Example Performing calculations on the reduce actions State Lookahead Stack Calculation Action digit (9) $ 0 0 digit (9) $ 0 shift 2 2 digit (7) $ 0 9 2 value 9 reduce 3 1 digit (7) $ 0 DL 1 shift 3 3 digit (6) $ 0 DL 1 7 3 value value * 10 + 7 reduce 2 1 digit (6) $ 0 DL 1 shift 3 3 EOF $ 0 DL 1 6 3 value value * 10 + 6 reduce 2 1 EOF $ 0 DL 1 accept This style of computation is called Ad Hoc Syntax-Directed Translation Compiler writer provides code snippets for specific productions Parser actions determine specific actions & the overall sequence COMP 412, Fall 2018 25

SDT in Bison or Yacc We specify SDT actions using a simple notation 1 Number DigitList { $$ = $1; 2 DigitList DigitList digit { $$ = $1 * 10 + $2; 3 digit { $$ = $1; Scanner provides the value of digit. In Bison and Yacc, the compiler writer provides production-specific code snippets that execute when the parser reduces by that production Positional notation for the value associated with a symbol $$ is the LHS; $1 is the first symbol in the RHS; $2 the second, Compiler writer can put arbitrary code in these snippets Solve a travelling salesman problem, compute PI to 100 digits, More importantly, they can compute on the lexemes of grammar symbols and on information derived and stored earlier in translation COMP 412, Fall 2018 26

Ad Hoc Syntax-Directed Translation Why do we call this ad hoc syntax-directed translation? We have a formalism to specify syntax-directed translation schemes Attribute grammars (see EaC2e, 4.3) An attribute is a value associated with an instance of a grammar symbol Associated with a node in the parse tree In formalism, compiler writer creates a specification & tools generate the code that performs the actual computation Attribute-grammar evaluator generator (similar to parser generator) Syntax-directed because the specification is written in terms of the grammar and the corresponding parse tree (or syntax tree) Evaluator takes an unattributed tree and produces an attributed tree Attribute grammar systems have strengths & weaknesses They have never quite become popular If you are interested, Section 4.3 in EaC2e is a primer We will focus on ad hoc SDT schemes. They are widely used in practice. COMP 412, Fall 2018 27 Might STOP

Fitting AHSDT into the LR(1) Skeleton Parser stack.push(invalid); stack.push(s 0 ); // initial state word = scanner.next_word(); loop forever { s = stack.top(); if ( ACTION[s,word] == reduce A b ) then { stack.popnum(2* b ); // pop 2* b symbols s = stack.top(); stack.push(a); // push LHS, A stack.push(goto[s,a]); // push next state else if ( ACTION[s,word] == shift s i ) then { stack.push(word); stack.push(s i ); word scanner.next_word(); else if ( ACTION[s,word] == accept & word == EOF ) then break; else throw a syntax error; report success; Actions are taken on reductions Insert a call to Work() before the call to stack.popnum() Work() contains a case statement that switches on the production number Code in Work() can read items from the stack That is why it calls Work() before stack.popnum() COMP 412, Fall 2018 28

Fitting AHSDT into the LR(1) Skeleton Parser stack.push(invalid); stack.push(s 0 ); // initial state word = scanner.next_word(); loop forever { s = stack.top(); if ( ACTION[s,word] == reduce A b ) then { stack.popnum(2* b ); // pop 2* b symbols s = stack.top(); stack.push(a); // push LHS, A stack.push(goto[s,a]); // push next state else if ( ACTION[s,word] == shift s i ) then { stack.push(word); stack.push(s i ); word scanner.next_word(); else if ( ACTION[s,word] == accept & word == EOF ) then break; else throw a syntax error; report success; Passing values between actions Tie values to instances of grammar symbols Equivalent to parse tree nodes We can pass values on the stack Push / pop 3 rather than 2 Work() takes the stack as input (conceptually) and returns the value for the reduction it processes Shift creates initial values COMP 412, Fall 2018 29

stack.push(invalid); stack.push(s0); // initial state word = scanner.next_word(); loop forever { s = stack.top(); if ( ACTION[s,word] == reduce A b ) then { r = Work(stack, A b ) stack.popnum(3* b ); // pop 3* b symbols s = stack.top(); // save exposed state stack.push(a); // push A stack.push (r); // push result of WORK() stack.push(goto[s,a]); // push next state else if ( ACTION[s,word] == shift si ) then { stack.push(word); stack.push(initial Value); stack.push(si); word scanner.next_word(); else if ( ACTION[s,word] == accept & word == EOF ) then break; else throw a syntax error; report success; COMP 412, Fall 2018 Fitting AHSDT into the LR(1) Skeleton Parser Modifications are minor - Insert call to Work() - Change the push() & pop() behavior Same asymptotic behavior as the original algorithm. - 50% more stack space Last obstacle is making it easy to write the code for Work() Note that, in C, the stack has some odd union type. 30

Translating Code Snippets Into Work() For each production, the compiler writer can provide a code snippet { value = value * 10 + digit; We need a scheme to name stack locations. Yacc introduced a simple one that has been widely adopted. $$ refers to the result, which will be pushed on the stack $1 is the first item on the productions right hand side $2 is the second item $3 is the third item, and so on The digits example above becomes { $$ = $1 * 10 + $2; COMP 412, Fall 2018 31

Translating Code Snippets Into Work() How do we implement Work()? Work() takes 2 arguments: the stack and a production number Work() contains a case statement that switches on production number Each case contains the code snippet for a reduction by that production The $1, $2, $3 macros translate into references into the stack The $$ macro translates into the return value if ( ACTION[s,word] == reduce A b ) then { r = Work(stack, A b ) stack.popnum(3* b ); // pop 3* b symbols s = stack.top(); // save exposed state stack.push(a); // push A stack.push (r); // push result of WORK() stack.push(goto[s,a]); // push next state $$ translates to r $i translates to the stack location 3 *( b - i + 1) units down from stacktop Note that b, i, 3, and 1 are all constants so $i can be evaluated to a compiletime constant COMP 412, Fall 2018 32