Yacc Meets C+ + Stephen C. Johnson Ardent Computer Corporation

Similar documents
Single-pass Static Semantic Check for Efficient Translation in YAPL

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Compiler Design (40-414)

A programming language requires two major definitions A simple one pass compiler

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

CST-402(T): Language Processors

Error Recovery during Top-Down Parsing: Acceptable-sets derived from continuation

Stating the obvious, people and computers do not speak the same language.

A Simple Syntax-Directed Translator

A Short Summary of Javali

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Time : 1 Hour Max Marks : 30

Using an LALR(1) Parser Generator

1 Lexical Considerations

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou


COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

COMPILER DESIGN LECTURE NOTES

2.2 Syntax Definition

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Compiler Design Overview. Compiler Design 1

Compilers Project 3: Semantic Analyzer

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

Lexical Considerations


COP5621 Exam 3 - Spring 2005

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Question Bank. 10CS63:Compiler Design

CS 415 Midterm Exam Spring SOLUTION

NAME CHSM-C++ Concurrent, Hierarchical, Finite State Machine specification language for C++

Experiences in Building a Compiler for an Object-Oriented Language

CS 6353 Compiler Construction Project Assignments

Programming Languages Third Edition

UNIT -2 LEXICAL ANALYSIS

Compilers. Compiler Construction Tutorial The Front-end

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

The Structure of a Syntax-Directed Compiler

Syntax and Grammars 1 / 21

1. Describe History of C++? 2. What is Dev. C++? 3. Why Use Dev. C++ instead of C++ DOS IDE?

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Lexical Considerations

Working of the Compilers

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Language Reference Manual simplicity

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

Compiler principles, PS1

COMPILER DESIGN. For COMPUTER SCIENCE

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

CA Compiler Construction

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Compiler Construction

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Formal Languages and Compilers Lecture I: Introduction to Compilers

Yacc: A Syntactic Analysers Generator

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

SYMBOL TABLE CHAPTER CHAPTER HIGHLIGHTS. 9.1 Operation on Symbol Table. 9.2 Symbol Table Implementation. 9.3 Data Structure for Symbol Table

VIVA QUESTIONS WITH ANSWERS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Experiences Report on the Implementation of EPTs for GNAT

Syntax Intro and Overview. Syntax

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Grammars and Parsing. Paul Klint. Grammars and Parsing

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Building Compilers with Phoenix

Compilers Crash Course

Syntax. A. Bellaachia Page: 1

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing

On the correctness of template metaprograms

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

Compilers and Code Optimization EDOARDO FUSELLA

INF5110 Compiler Construction

Introduction to Compiler Design

Typescript on LLVM Language Reference Manual

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Implementing a VSOP Compiler. March 12, 2018

An Object Model for Multiparadigm

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

ER E P M S S I TRANSLATION OF CONDITIONAL COMPIL DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

COP4020 Programming Assignment 2 Spring 2011

CSCI Compiler Design

CPS 506 Comparative Programming Languages. Syntax Specification

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

Industrial Strength Compiler Construction with Equations

Transcription:

Yacc Meets C+ + Stephen C. Johnson Ardent Computer Corporation ABSTRACT: The fundamçntal notion of attribute grammars lknuth 1978] is that values are associated with the components of a grammar rule; these values may be computed by synthesízing the values of the left component from those of the right components, or inheritíng the values of the right components from those of the left component. The Yacc parser generator, in use for over 15 years, allows attributes to be synthesized; in fact, arbitrary segments of code can be executed as parsing takes place. For the last decade, Yacc has supported arbiftary data types as synthesized values and performed type-checking on these synthesized values. It is natural to think of this synthesis as associating a value of a particular type to a grammar symbol when a grammar rule deriving that symbol is recognized. Languages such as C++ support abstract data types that permit functions as well as values to be associated with objects of a given type. In this framework, it appears natural to extend the idea of computing a value at a grammar rule to that of defining a function at a rule. The definition of the Much of this work was done when the author was employed by AT&T Information Systems. A previous version of this paper was presented at the April 1988 EUUG meeting (London). @ Computing Systems, Vol. I. No.2. Spring 1988 159

function for a given object of a given type depends on the rule used to construct that object. In fact, this notion can be used to generalize both inherited and synthesized attributes; unifying them and allowing even more expressive power. This paper explores these notions, and shows how this rule-based definition of functions allows for easier deflnitions and much more flexibility in some cases. Several examples are given that are hard to express using traditional techniques, but are naturally expressed using this framework. I. Introduction ri/hen we write a grammar rule such as expr : expr t+ t term ; we frequently associate values with the components of the rule and use these values to compute the values of other components. For instance, in the above example we might wish to associate integer values with the symbols expr and term, and include a rule for generating the value of the left-hand expr from the value of the right-hand expr and term. Yalues associated with the left side of a rule that are computed from the values on the right are called synthesized attributes. In other cases, we wish to use values associated with the left side of the rule to compute values of components on the right side of the rule. These values are known as inherited attributes. One importairt example of inherited attributes involves passing symbol table information into expressions and statements. Allowing inherited and synthesized attributes to be specified without restriction makes it difficult to generate parsers automatically from the rule descriptions. For example, it is very 160 Stephen C. Johnson

difficult even to decide whether the defrnitions are potentially circular ljazayeri et al. 19751. Nevertheless, attribute grammars are very expressive, making them attractive both as a basis for editors and compilers [Reps 1984] and as a platform for more extensive computational and database systems [Horowitz & Teitelbaum I e8s l. In a parallel development, languages such as AdarM [United 19831, Modula-2 lwirth 1983], and C++ [Stroustrup 1986] have explored ways of extending traditional program language type structures in much the same fashion. In Ada and C++, the type concept is extended to include not just the values and the data but also the actions that may be performed on these values. The key idea is that an abstract model of the dafatype, and the operations provided on it, is presented to the programmer; the details of the implementation are hidden. Yacc is a parser generator that has been available under UNIX since the early 1970s. The original versions of Yacc permitted only one attribute, of integer type, for each grammar symbol. Later versions allowed other types, including structures, to be computed. However, because the parsing method is bottom up (LALR(I)), and the actions are executed as the rules are recognized, only synthesized attributes can be handled directly. More complex translations must be done by building a parse tree, and then walking this tree doing the desired actions. In Yacc, a unique datatype may be associated with each nonterminal symbol. In this case, every rule deriving that nonterminal must return a value of the defined type. Each token may also have a defined type; in this case, the values are computed by the lexical analyzer. An action computes the value associated with the left side of a rule; this action depends on the particular rule and the values of the components on the right of the rule. In trying to extend Yacc to handle C++, these two streams naturally came together in a prototype tool called y++. Since there was already an association of types with nonterminals, it became natural to ask whether functions could be defrned on these types as well, and what meaning this might have. Some examples proved compelling. Yacc Meets C++ 16l

Given a rule 1.1 Examples expr : expr r+r expr ; we might choose to define a function print), on the nonterminal/type expr. ln the context of this particular rule, print) might be defined as print() { $1.print(); putchar('+r) ; $3.print(); } (As with Yacc, we will use $1, $2, etc. to refer to the components of the right side of the rule, and $$ to refer to the left side). We might also choose to defrne another function, type, oî expr, and this might have a totally different definition: type() { return( exprtypel'+', $1.type(), $3.type()));) Since tokens are only created by the lexical analyzer (and never by rules), functions such as print and expr can be called implicitly as part of a particular rule, and need no special definition mechanism. By allowing these rule-defined functions to have arguments and return values, we get many of the effects of inherited attributes: type(type t) t $1.type(t) ; $3.type(t) ; ) In C+ +, a function that is defined as a member of a class can obtain access to the values in an instance ofthat class; the keyword this allows such values to be explicitly manipulated. In y++, when a function/is defrned on a nonterminal/type X, not only the value, but also the function defrnition itself depends on the rule used to define the instance of x. A nice example is given by the rule: expr : expr r+r expr ; on which we define two functions, polísh and revpolish: pol.ish() { putchar(r+') ; $1.poLish() i $3.pot ish(), l 162 Sæphen C. Johnson

revpotish() t $1.revpol.ish() i $3.revpot.ish() ì putchar(r+1) ; ) These two functions can be similarly defrned for other expressions. Two input line types can be defined as well: I ine : ilprer expr I i ne : ttpostil expr and a function, print), that is defrned as print() t $2.pol.ish() ; ) on the first rule, and print() t $2.revpoIish() ; ] on the second. Then, after a line hasbeen recognized, print will print the expression in either prefrx or postfrx Polish form, depending on the initial keyword of the line. The attribute grammar approach to this would require generating and storing both the prefix and postflx translations, and many intermediate translations as well. The above technique saves both space and time. To summarize this section: we associate datatypes (C++ classes) with nonterminal symbols and tokens. In addition to values, these types have functions associated with them whose defrnition depends on the rule used to recognize a particular instance of the type. This mechanism generalizes both inherited and synthesized attributes; later sections discuss implementation and other applications. 2. Implementation We have prototyped a tool, y++, to explore the semantic and syntactic implications of these ideas. Some features of y++ are: 1. Every grammar symbol, token and nonterminal alike, is associated with a C++ class. 2. Every class has 0 or more values, and 0 or more functions, deflned on it. Yacc Meets C++ 163

3. Every grammar rule may have associated with it 0 or more functions that may be invoked to access and change the values accessible to that rule. These functions may access and change values of the result (left side) of the rule, and the components (right side) of the rule, and invoke other functions defined on the components of the rule. In practice, y** specifications are transformed to C++ programs and compiled. yyparse returns a value that is, in effect, a pointer to the parse tree. After calling yyparse (which causes some input to be read and parsed), the returned value is used to access the functions that are defined on the start symbol; presumably, these cause transformations and output to be done. When yyparse is called, it creates a data structure that represents the parse tree. For tokens, it creates space large enough to hold the values, if any, included in the token. For nonterminals, the space created depends on the rule used to create the particular instance of the nonterminal. The rule' A:BCD; would cause space to be allocated as follows: integer ruie number space for the A vatues pointer to the B vatue pointer to the C value pointer to the D vatue In the case of simple actions, not depending on the rule, we simply index past the value to obtain the components. In the case where the actions depend on the rule number, we generate a conditional based on the stored rule number. In the case where B, C, or D is a token, the value returned from the lexical analyzer is saved instead of a pointer. There are a number of scope issues not yet resolved. If there are two calls to yyparse, for example, does the second parse tree overwrite the first, or do both remain active? The issue of default actions and values is also tricky. There is little point in wasting space on characters and literal keywords returned by the lexical analyzt when these are implicitly known from the rule number; the question is how to recognize this and exploit it. 164 Stephen C. Johnson

A similar issue is the treatment of default functions. If a function is called for a rule that contains no defrnition for that function, should a default definition be assumed? We currently consider this a semantic error and produce a message, semantic error, by analogy with the syntax error message of Yacc. Another issue is error handling. There is a premium in being able to return a sensible structure for any input, even those in error, to allow the user to craft special functions that give particularly good error messages. The exact mechanism by which these error rules might be constructed is still open. Finally, given a data structure representing a parse tree, it is very nice to be able to rewrite it; y+ + should probably provide such operations through a user interface. 2.1 A Simple Example This section sketches how y+ + can be used to make a preprocessor that translates extensions into a base language. We begin with a function ident, defined on every nonterminal symbol of the base language grammar; this function, when called, produces a literal text representation of the rule. For example, for the rule: stat : expr r; I we might define ident as ident() t $2.ident() ; putchar(';') ; ) This grammar can quickly be extended to a preprocessor by simply adding rules for the new constructions, and defining ident on the new rules to translate into the base definition. For example, suppose we wish to augment C with the forever statement. After recognizing the keyword in the lexical analyzer, we add the rule: stat : Itforevertt stat ; and the associated defrnition of ident: ident() { printf("whi Ie(1)") ; $2. ident() ; ] Clearly, translators that required symbol tables, etc., would be Yacc Meets C++ 165

harder, but one could envision a standard C grammar and lexical analyzer being far more reusable in y+ + than in Yacc. 2.2 Impressive Example Giegerich and Wilhelm have discussed the difficulty of generating "short-circuit" evaluation of Boolean expressions using the usual forms of syntax-directed translations (see also Aho et al. [1985]). This becomes relatively straightforward in y++. A function, bool-gen( t, f n /, is defined on the rules involving the shortcircuited operators. / is the label to go to if the expression is true, /the label for false, and n has the "preferred" label, either t or f. The rule expr : expr 0R expr for example would defrne bool-gen as boot-gen( t, f, n ) { int x = get_nebr_label.(); $1.boot-gen( t, x, x ); define-[abel.( x ); $3.bool.-gen( t, f, n ); ) and similarly for AND and NOT. The definition for those expressions not involving short-circuit operations would look like: boo[-gen( t, f, n ) {. /* get the vatue of the expression */ if( t == n ) t.. l* branch if fal.se to [abe[ 'Ít */ ] eise t.. /* branch if true to [abe[ 'tt */ ] ) 3. Conclusion This paper describes a simple extension of parser generators to handle abstract data types; in this way, some translations can be specifred easily that would be more difficult to describe with conventional attribute grammars. 166 Srcphen C. Johnson

Moreover, the notions seem to generalize attribute grammars, while at the same time allowing the ideas of Yacc to be brought to bear on the concepts in C++, or perhaps vice versa. References A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, New York, 1985. Bjarne Stroustrup, The C++ Programming Language, Addison-Wesley, Reading, MA, 1986. R. Giegerich and R. Wilhelm, Counter-one-pass features in one-pass compilation: a formalization using attribute grammars, Information Processing Letters 7(6) pages 279-284. S. Horowitz and T. Teitelbaum, Relations and Attributes: A symbiotic basis for editing environments, Proceedings of SrcrteN 85 Symposium on Language Issues in Programming Environments, Seattle, Vy'A, pages 93-106, June 1985. M. Jazayeri, W. F. Ogden and W. C. Rounds, The intrinsic exponential complexity of the circularity problem for attribute grammars, Communications of the ACM 1S(12) pages 697-706, t975. D. E. Knuth, Semantics of context-free languages, Math. Syst. Theory, 5(l) pages 95-96,March 1971. T. Reps, Generating Language-Based EnvironmenÍs, MIT Press, Cambridge, MA, 1984. United States Department of Defense, Reference Manual for the Ada Progr amming Language, ANSI/MIL-STD- I 8 I 5A- I 983, Feb. I 983, Springer-Verlag, New York. N. Wirth, Programming in MODULA-2, Springer-Verlag, New York, 1983. lsubmitted April 16, 1988; accepted May 23, I98Sl Yøcc Meets C++ 167