A Simple Syntax-Directed Translator

Similar documents
2.2 Syntax Definition

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

A programming language requires two major definitions A simple one pass compiler

A simple syntax-directed

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Principles of Programming Languages COMP251: Syntax and Grammars

This book is licensed under a Creative Commons Attribution 3.0 License

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Part 5 Program Analysis Principles and Techniques

More Assigned Reading and Exercises on Syntax (for Exam 2)

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Syntax Analysis Check syntax and construct abstract syntax tree

Building Compilers with Phoenix

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

B The SLLGEN Parsing System

Intermediate Code Generation

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CSE 3302 Programming Languages Lecture 2: Syntax

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages

CPS 506 Comparative Programming Languages. Syntax Specification

1 Lexical Considerations

CSE 401 Midterm Exam 11/5/10

VIVA QUESTIONS WITH ANSWERS

Introduction to Parsing. Lecture 8

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

EECS 6083 Intro to Parsing Context Free Grammars

CS 314 Principles of Programming Languages

Compiler Design Concepts. Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

CSE302: Compiler Design

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Group A Assignment 3(2)

Context-Free Grammars

Compilers. Compiler Construction Tutorial The Front-end

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Midterm Solutions COMS W4115 Programming Languages and Translators Wednesday, March 25, :10-5:25pm, 309 Havemeyer

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

CMSC 330: Organization of Programming Languages

SECOND PUBLIC EXAMINATION. Compilers

1. Lexical Analysis Phase

CMSC 330: Organization of Programming Languages

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Lexical Considerations

Compilers Course Lecture 4: Context Free Grammars

Syntax-Directed Translation

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }


Stating the obvious, people and computers do not speak the same language.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS 403: Scanning and Parsing

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

COMPILERS BASIC COMPILER FUNCTIONS

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

LECTURE 3. Compiler Phases

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

COMPILER DESIGN. For COMPUTER SCIENCE

Chapter 2, Part III Arithmetic Operators and Decision Making

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Principles of Programming Languages COMP251: Syntax and Grammars

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

CMSC 330: Organization of Programming Languages. Context Free Grammars

The SPL Programming Language Reference Manual

UNIT IV INTERMEDIATE CODE GENERATION

Lexical Considerations

The PCAT Programming Language Reference Manual

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Introduction to Parsing. Lecture 5

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Question Bank. 10CS63:Compiler Design

Dr. D.M. Akbar Hussain

Lexical and Syntax Analysis. Top-Down Parsing

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

PRINCIPLES OF COMPILER DESIGN

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Optimizing Finite Automata

Lexical and Syntax Analysis

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Formal Languages and Compilers Lecture I: Introduction to Compilers

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Syntax. A. Bellaachia Page: 1

Abstract Syntax Trees & Top-Down Parsing

Chapter 3. Describing Syntax and Semantics

Syntax-Directed Translation

Introduction to Parsing Ambiguity and Syntax Errors

LANGUAGE PROCESSORS. Introduction to Language processor:

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Topic 3: Syntax Analysis I

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Transcription:

Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called intermediate code. The synthesis phase translates the intermediate code into the target program. Analysis is organized around the "syntax" of the language to be compiled. The syntax of a programming language describes the proper form of its programs The semantics of the language defines what its programs mean; that is, what each program does when it executes. 1-2

Definitions A lexeme is a group of characters formed by the lexical analyzer and having a meaningful sequence. For each lexeme, the lexical analyzer produces as output a token. A token describes a pattern of characters having the same meaning in the source program. (such as identifiers, operators, keywords, numbers, delimiters, and so on) No blank, tab, newline, or comments in grammar 1-3 Constants When a sequence of digits appears in the input stream, the lexical analyzer passes to the parser a token consisting of the terminal num along with an integer-valued attribute computed from the digits. 31+28+60 num, 31 + num, 28 + num, 60 1-4

Keywords and Identifiers Keywords: Fixed character strings used as punctuation marks or to identify constructs Identifiers : A character string forms an identifier only if it is not a keyword. 1-5 Syntax Definition The "context-free grammar (CFG)" or "grammar" for short - is used to specify the syntax of a language. A grammar naturally describes the hierarchical structure of most programming language constructs. Example An if-else statement in Java can have the form: if ( expression ) statement else statement That is, an if-else statement is the concatenation of the keyword if, an opening parenthesis, an expression, a closing parenthesis, a statement, the keyword else, and another statement. 1-6

Syntax Definition ( Cont.) Using the variable expr to denote an expression and the variable stmt to denote a statement, this structuring rule can be expressed as stmt if (expression) stmt else stmt The arrow may be read as "can have the form" Such a rule is called a production. In a production, lexical elements like the keyword if and the parentheses are called terminals. Variables like expr and stmt represent sequences of terminals and are called nonterminals. 1-7 Definition of context-free grammar (CFG) A context-free grammar has four components: 1. {terminal symbols (tokens)} The terminals are the elementary symbols of the language defined by the grammar. 2. {non-terminals (syntactic variables)} Each nonterminal represents a set of strings of terminals 3. {productions}, where each production consists of a nonterminal, called the head or left side of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of the production. 4. start symbol : A designation of one of the non-terminals 1-8

A Grammar Example List of digits separated by plus or minus signs Accepts strings such as 9-5+2, 3-1, or 7. 0, 1,, 9, +, - are the terminal symbols list and digit are non-terminals Every line is a production list is the start symbol Grouping: 1-9 A Grammar Example (Cont.) The terminals of the grammar are the symbols The non-terminals are the italicized names list and digit, with list being the start symbol because its productions are given first 1-10

Derivation A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the body of a production for that nonterminal. the language: The terminal strings that can be derived from the start symbol defined by the grammar. 1-11 Parsing A Syntax/Semantic Analyzer (Parser) creates the syntactic structure (generally a parse tree) of the given program. Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar. 1-12

Parse Trees A parse tree shows how the start symbol of a grammar derives a string in the language. Example: If nonterminal A has a production A XYZ, then a parse tree may have: o an interior node labeled A o three children labeled X, Y, and Z, o from left to right: 1-13 Parse Trees Properties 1. The root is labeled by the start symbol. 2. Each leaf is labeled by a terminal or by ε 3. Each interior node is labeled by a nonterminal. 4. If A is the non-terminal labeling some interior node and X 1, X 2, X 3 X n are the labels of the children of that node from left to right, then there must be a production A X 1 X 2 X 3 X n Where X i stands for a symbol that is either a terminal or a nonterminal. Special case, if A ε is a production, then a node labeled A may have a single child labeled ε. 1-14

Example Consider the following Grammar: According to this Grammar, the derivation of 9-5+2 is illustrated by the following tree. Observation Each node in the tree is labeled by a grammar symbol. An interior node and its children correspond to a production; The interior node corresponds to the head of the production, The children corresponds to the body 1-15 Ambiguity Suppose we used a single nonterminal string and did not distinguish between digits and lists. Merging the notion of digit and list into the nonterminal string makes superficial sense, because a single digit is a special case of a list The first grammar does not permit this interpretation. 1-16

Eliminating Ambiguity Operator Associativity: in most programming languages arithmetic operators have left associativity. Example: 9+5-2 = (9+5)-2; 9-5-2 = (9-5)-2 Exception: Assignment operator = has right associativity: a=b=c is equivalent to a=(b=c) Operator Precedence: if an operator has higher precedence, then it will bind to it s operands first. Example: * has higher precedence than +, therefore 9+5*2 = 9+(5*2); 9*5+2 = (9*5)+2; 1-17 Associativity of Operators Strings like a=b=c with a right-associative operator are generated by the following grammar: 1-18

Syntax of Expressions A grammar for arithmetic expressions can be constructed from a table showing the associativity and precedence of operators. Left associative : + - lower precedence Left associative : * / higher precedence We create two nonterminals <exp> and <term> for 2 levels of precedence and an extra nonterminal <factor> for generating the basic units of expressions (digits & parenthesized exp.) 1-19 Syntax of Expressions (Cont.) Now consider the binary operators, * and /, that have the highest precedence. Since these operators associate to the left, the productions are similar to those for lists that associate to the left. Similarly, expr generates lists of terms separated by the additive operators. The resulting grammar is therefore 1-20

Syntax of Expressions (Cont.) With this grammar: an expression is a list of terms separated by either + or - signs, a term is a list of factors separated by * or / signs. a factor is a digit or parenthesized expression. Notice that any parenthesized expression is a factor, so with parentheses we can develop expressions that have arbitrarily deep nesting (and arbitrarily deep trees). 1-21 Syntax-Directed Translation Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. For example, consider an expression expr generated by the production: Here, expr is the sum of the two sub expressions expr 1 and term. We can translate expr by exploiting its structure, as in the following pseudo-code: 1-22

concepts related to syntax-directed translation Attributes. An attribute is any quantity associated with a programming construct. Examples of attributes: data types of expressions, number of instructions in the generated code, (Syntax- directed) translation schemes. A translation scheme is a notation for attaching program fragments to the productions of a grammar. The program fragments are executed when the production is used during syntax analysis. The combined result of all these fragment executions, in the order induced by the syntax analysis, produces the translation of the program to which this analysis/synthesis process is applied. Syntax-directed translations will be used to translate infix expressions into postfix notation, to evaluate expressions, and 1-23 to build syntax trees for programming constructs Postfix Notation We deal with translation into postfix notation. The postfix notation for an expression E can be defined inductively as follows: 1. If E is a variable or constant, then the postfix notation for E is E itself. 2. If E is an expression of the form E 1 op E 2, where op is any binary operator, then the postfix notation for E is E 1 E 2 op, where E 1, E 2 are the postfix notations for E 1 and E 2 respectively. 3. If E is a parenthesized expression of the form (E 1 ), then the postfix notation for E is the same as the postfix notation for E 1. 1-24

Postfix Notation (Example 1) The postfix notation for (9-5)+2 is 95-2+. o The translations of 9,5,2 are 9,5,2 (rule (1)) o The translation of 9-5 is 95- (rule (2)) o The translation of (9-5) is (9-5) (rule (3)) Having translated the parenthesized subexpression, we may apply rule (2) to the entire expression, with (9-5) in the role of El and 2 in the role of E2, to get the result 95-2+. So the postfix notation for (9-5)+2 is 95-2+ 1-25 Postfix Notation (Example 2) The postfix notation for 9-(5+2)? o The translations of 9,5,2 are 9,5,2 (rule (1)) o The translation of 5+2 is 52+ (rule (2)) So the postfix notation for 9-(5+2) is 952+- Observation No parentheses are needed in postfix notation, because the position and arity (number of arguments) of the operators permits only one decoding of a postfix expression. 1-26

Postfix Notation (Example 3) Calculate the postfix expression 952+-3* Answer Scanning from the left, we first encounter the + sign. Looking to its left we find operands 5 and 2. 5 + 2= 7, replaces 52+, and we have the string 97-3*. Now, the leftmost operator is the -, and its operands are 9 and 7: 9-7=2 Replacing these by the result of the subtraction leaves 23*. The multiplication sign applies to 2 and 3, giving the result 6. 952+-3* = 6 1-27 Synthesized Attributes The idea of associating quantities with programming constructs-for example, values and types with expressions-can be expressed in terms of grammars. We associate attributes with nonterminals and terminals. Then, we attach rules to the productions of the grammar; These rules describe how the attributes are computed at those nodes of the parse tree where the production in question is used to relate a node to its children. 1-28

Attributes A syntax-directed definition o associates attributes with non-terminals and terminals in a grammar o attaches semantic rules to the productions of the grammar. An attribute is said to be synthesized if its value at a parse-tree node is determined from attribute values of its children and itself. Synthesized attributes have the desirable property that they can be evaluated during a single bottom-up traversal of a parse tree 1-29 Annotated Parse Tree Suppose a node N in a parse tree is labeled by the grammar symbol X. We write X.a to denote the value of attribute a of X at that node. A parse tree showing the attribute values at each node is called an annotated parse tree. 1-30

Semantic Rules for Infix to Postfix Fig. 1: annotated parse tree Infix Translation Postfix Translation 1-31