Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Similar documents
Dr. D.M. Akbar Hussain

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

3. Context-free grammars & parsing

Syntax. A. Bellaachia Page: 1

A simple syntax-directed

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

COP 3402 Systems Software Syntax Analysis (Parser)

Theory and Compiling COMP360

A Simple Syntax-Directed Translator

2.2 Syntax Definition

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Syntax. In Text: Chapter 3

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Part 5 Program Analysis Principles and Techniques

CPS 506 Comparative Programming Languages. Syntax Specification

CMSC 330: Organization of Programming Languages

CSE 3302 Programming Languages Lecture 2: Syntax

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

CMPT 755 Compilers. Anoop Sarkar.

CMSC 330: Organization of Programming Languages

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

EECS 6083 Intro to Parsing Context Free Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Formal Languages. Formal Languages

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

CMSC 330: Organization of Programming Languages. Context Free Grammars

Lecture 4: Syntax Specification

Programming Language Definition. Regular Expressions

Habanero Extreme Scale Software Research Project

Lecture 10 Parsing 10.1

Principles of Programming Languages COMP251: Syntax and Grammars

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Languages and Compilers

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Defining syntax using CFGs

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics

Introduction to Parsing

Chapter 3. Describing Syntax and Semantics ISBN

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CS 314 Principles of Programming Languages. Lecture 3

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical Scanning COMP360

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

This book is licensed under a Creative Commons Attribution 3.0 License

Context-Free Grammars

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

CSCE 314 Programming Languages

CS 314 Principles of Programming Languages

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

ISO/IEC INTERNATIONAL STANDARD. Information technology - Syntactic metalanguage - Extended BNF

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

B The SLLGEN Parsing System

CSE302: Compiler Design

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Introduction to Lexing and Parsing

Programming Languages Third Edition

Lexical Analysis (ASU Ch 3, Fig 3.1)

Compiler Design Concepts. Syntax Analysis

Parsing Combinators: Introduction & Tutorial

Chapter 3. Describing Syntax and Semantics ISBN

LECTURE 3. Compiler Phases

Chapter 3. Describing Syntax and Semantics

Parsing II Top-down parsing. Comp 412

1. Lexical Analysis Phase

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Context-Free Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Chapter 2 - Programming Language Syntax. September 20, 2017

Part 3. Syntax analysis. Syntax analysis 96

Syntax and Semantics

Programming Languages

Chapter 3: Syntax and Semantics. Syntax and Semantics. Syntax Definitions. Matt Evett Dept. Computer Science Eastern Michigan University 1999

COMPILERS BASIC COMPILER FUNCTIONS

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Transcription:

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules of a Context-Free Grammar (CFG) The rules of a context-free grammar are recursive The basic data structure of Syntax Analysis is parse tree or syntax tree The syntactic structure of a language must also be recursive 3.1 The Parsing Process Function of a Parser Takes the sequence of tokens produced by the scanner as its input and produces the syntax tree as its output. Parser Sequence of tokens --------> Syntax-Tree 3.2 Context -Free Grammars (CFG) Basic Concept A context-free grammar is a specification for the syntactic structure of a programming language Similar to the specification of the lexical structure of a language using regular expressions Except involving recursive rules For example: simple integer arithmetic expressions, can be given by the following grammar: exp exp op exp (exp) number op + * 3.2.1 Comparison to Regular Expression Notation 1.Comparing an Example: Consider the simple integer arithmetic expressions The context-free grammar: exp exp op exp (exp) number op + * The regular expression: number = digit digit* digit = 0 1 2 3 4 5 6 7 8 9 2.Basic Regular Expression Rules Three operations: Choice, concatenation, repetition Equal sign represents the definition of a name for a regular expression; Name was written in italics to distinguish it from a sequence of actual characters. 3. Grammars Rules Vertical bar appears as meta-symbol for choice. Concatenation is used as a standard operation. No meta-symbol for repetition (like the * of regular expressions) Use the arrow symbol instead of equality to express the definitions of names Names are written in italic( in a different font) Grammar rules use regular expressions as components The notation was developed by John Backus and adapted by Peter Naur for the Algol60 report Grammar rules in this form are usually said to be in Backus-Naur form, or BNF 1

3.2.2 Specification of Context-Free Grammar Rules 1. Symbols of Grammar Rules Grammar rules are defined over an alphabet, or set of symbols. The symbols are tokens representing strings of characters Using the regular expressions to represent the tokens A token is a fixed symbol, as in the reserved word while or the special symbols such as + or :=, write the string itself in the font used in Chapter 2 Tokens such as identifiers and numbers, representing more than one string, use font in italics, just as though the token is a name for a regular expression. 2. Construction of a CFG rule Given an alphabet, a context-free grammar rule in BNF consists of a string of symbols. The first symbol is a name for a structure. The second symbol is the meta-symbol" ". This symbol is followed by a string of symbols, each of which is either a symbol from the alphabet, a name for a structure, or the metasymbol " ". A grammar rule in BNF is interpreted as follows The rule defines the structure whose name is to the left of the arrow The structure is defined to consist of one of the choices on the right-hand side separated by the vertical bars The sequences of symbols and structure names within each choice defines the layout of the structure For example: exp exp op exp (exp) number op + * The first rule defines an expression structure (with name exp) to consist of an expression followed by an operator and another expression, or an expression within parentheses, or a number. The second rule defines an operator (with name op) to consist of one of the symbols + - *. Our simple example can be written as follows: exp exp op exp exp (exp) exp number op + op op * However we usually write grammar rules so that all choices for each structure are listed in a single rule. 3.2.3 Derivations & Language Defined by a Grammar 1. How Grammar Rules Determine a Language Context-free grammar rules determine the set of syntactically legal strings of token symbols for the structures defined by the rules. For example, the arithmetic expression (34-3)*42 Corresponds to the legal string of seven tokens (number - number ) * number While (34-3*42 is not a legal expression, There is a left parenthesis that is not matched by a right parenthesis and the second choice in the grammar rule for an exp requires that parentheses be generated in pairs. 2

2. Derivations Grammar rules determine the legal strings of token symbols by means of derivations A derivation is a sequence of replacements of structure names by choices on the righthand sides of grammar rules A derivation begins with a single structure name and ends with a string of token symbols At each step in a derivation, a single replacement is made using one choice from a grammar rule The example exp exp op exp (exp) number op + * A derivation for the expression (34-3)*42 Derivation Steps At each step the grammar rule choice used for the replacement is given on the right (1) exp => exp op exp [exp exp op exp] (2) => exp op number [exp number] (3) => exp * number [op * ] (4) => ( exp ) * number [exp ( exp ) ] (5) =>{ exp op exp ) * number [exp exp op exp] (6) => (exp op number) * number [exp number] (7) => (exp - number) * number [op - ] (8) => (number - number) * number [exp number] Derivation steps use a different arrow from the arrow meta-symbol in the grammar rules. 3. Language Defined by a Grammar The set of all strings of token symbols obtained by derivations from the exp symbol is the language defined by the grammar of expressions L(G) = { s exp =>* s } G represents the expression grammar s represents an arbitrary string of token symbols (sometimes called a sentence) The symbols =>* stand for a derivation consisting of a sequence of replacements as described earlier. (The asterisk is used to indicate a sequence of steps, much as it indicates repetition in regular expressions.) Grammar rules are sometimes called productions Because they "produce" the strings in L(G) via derivations Grammar for a Programming Language The grammar for a programming language often defines a structure called program The language of this structure is the set of all syntactically legal programs of the programming language. For example: a BNF for Pascal program program-heading; program-block. program-heading. program-block.. The first rule says that a program consists of a program heading, followed by a semicolon, followed by a program block, followed by a period. 3

4. Symbols in rules Start symbol The most general structure is listed first in the grammar rules. Non-terminals Structure names are also called non-terminals, since they always must be replaced further on in a derivation. Terminals Symbols in the alphabet are called terminals, since they terminate a derivation. Terminals are usually tokens in compiler applications. 5. Examples Example 3.1: The grammar G with the single grammar rule E (E) a This grammar generates the language L(G) = { a,(a),((a)),(((a))), } = { ( n a) n n an integer >= 0 } Derivation for ((a)) E => (E) => ((E)) => ((a)) Example 3.2: The grammar G with the single grammar rule E (E) This grammar generates no strings at all, there is no way we can derive a string consisting only of terminals. Example 3.3: Consider the grammar G with the single grammar rule E E + a a This grammar generates all strings consisting of a's separated by +'s: L(G) ={a, a + a,a + a + a, a + a + a + a,...} Derivation: Example E => E+a => E+a+a => E+a+a+a => finally replace the E on the left using the base E a Example 3.4 Examples of strings in Consider the following this lan guage are extremely simplified grammar of statements: Statement if-stmt other if-stmt if ( exp ) statement if ( exp ) statement else statement exp 0 1 other if (0) other if (1) other if (0) other else other if (1) other else other if (0) if (0) other if (0) if (1) other else other if (1) other else if (0) other else other 5. Recursion The grammar rule: A A a a or A a A a Generates the language {an n an integer >=1 } (the set of all strings of one or more a's) The same language as that generated by the regular expression a+ The string aaaa can be generated by the first grammar rule with the derivation A => Aa => Aaa => Aaaa => aaaa 4

left recursive: The non-terminal A appears as the first symbol on the right-hand side of the rule defining A A A a a right recursive: The non-terminal A appears as the last symbol on the right-hand side of the rule defining A A a A a 6. Examples of Recursion Consider a rule of the form: A A where and represent arbitrary strings and does not begin with A. This rule generates all strings of the form,,,,... (all strings beginning with a, followed by 0 or more ' s). This grammar rule is equivalent in its effect to the regular expression *. Similarly, the right recursive grammar rule A A (where does not end in A) generates all strings,,,,.. To generate the same language as the regular expression a* we must have a notation for a grammar rule that generates the empty string use the epsilon meta-symbol for the empty string empty, called an -production (an "epsilon production"). A grammar that generates a language containing the empty string must have at least one -production. A grammar equivalent to the regular expression a* A A a or A a A Both grammars generate the language { an n an integer >= 0} = L(a*). 7. Examples Example 3.5: A (A) A generates the strings of all "balanced parentheses." For example, the string (( ) (( ))) ( ) generated by the following derivation (the -production is used to make A disappear as needed): A => (A) A => (A)(A)A => (A)(A) =>(A)( ) => ((A)A)( ) =>( ( )A)() => (( ) (A)A ) () => (( )( A ))( ) => (( )((A)A))( ) => (( )(( )A))( ) => (( )(( )))( ) Example 3.6: The statement grammar of Example 3.4 can be written in the following alternative way using an -production: statement if-stmt other if-stmt if ( exp ) statement else-part else-part else statement exp 0 1 The -production indicates that the structure else-part is optional. Example 3.7: Consider the following grammar G for a sequence of statements: stmt-sequence stmt ; stmt-sequence stmt stmt s This grammar generates sequences of one or more statements separated by semicolons (statements have been abstracted into the single terminal s): 5

L(G)= { s, s; s, s; s; s,... ) If allow statement sequences to be empty, write the following grammar G': stmt-sequence stmt ; stmt-sequence stmt s semicolon is a terminator rather than a separator: L(G')= {, s;, s;s;, s;s;s;,... } If allow statement sequences to be empty, but retain the semicolon as a separator, write the grammar as follows: stmt-sequence nonempty-stmt-sequence nonempty-stmt-sequence stmt;nonempty-stmt-sequence stmt stmt s L(G)= {, s, s; s, s; s; s,... ) 6