CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Similar documents
Dr. D.M. Akbar Hussain

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

3. Context-free grammars & parsing

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

EECS 6083 Intro to Parsing Context Free Grammars

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CSE 3302 Programming Languages Lecture 2: Syntax

CPS 506 Comparative Programming Languages. Syntax Specification

Defining syntax using CFGs

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Syntax. In Text: Chapter 3

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Compiler Principle and Technology. Prof. Dongming LU April 15th, 2019

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3. Describing Syntax and Semantics ISBN

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lexical and Syntax Analysis. Top-Down Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

A programming language requires two major definitions A simple one pass compiler

CIT 3136 Lecture 7. Top-Down Parsing

Parsing: Derivations, Ambiguity, Precedence, Associativity. Lecture 8. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Habanero Extreme Scale Software Research Project

Building Compilers with Phoenix

Principles of Programming Languages COMP251: Syntax and Grammars

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

A Simple Syntax-Directed Translator

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

A simple syntax-directed

Lexical and Syntax Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Syntax. A. Bellaachia Page: 1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Describing Syntax and Semantics

Recursive Descent Parsers

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Lecture 10 Parsing 10.1

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages

Programming Language Syntax and Analysis

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Part 5 Program Analysis Principles and Techniques

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CMSC 330: Organization of Programming Languages

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Context-free grammars (CFG s)

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS 314 Principles of Programming Languages

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lecture 12: Parser-Generating Tools

COP 3402 Systems Software Syntax Analysis (Parser)

Lecture 4: Syntax Specification

CMSC 330: Organization of Programming Languages

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing II Top-down parsing. Comp 412

Introduction to Parsing. Lecture 5

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Introduction to Parsing. Lecture 5

CMSC 330: Organization of Programming Languages. Context Free Grammars

ECE251 Midterm practice questions, Fall 2010

3. Parsing. Oscar Nierstrasz

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1


CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

CMSC 330: Organization of Programming Languages. Context Free Grammars

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

ICOM 4036 Spring 2004

Lexical and Syntax Analysis (2)

Introduction to Parsing. Lecture 8

F453 Module 7: Programming Techniques. 7.2: Methods for defining syntax

Homework & Announcements

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Chapter 4. Lexical and Syntax Analysis

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Projects for Compilers

Abstract Syntax Trees & Top-Down Parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Syntax-Directed Translation. Lecture 14

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Chapter 2 :: Programming Language Syntax

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Chapter 3. Parsing #1

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Introduction to Parsing Ambiguity and Syntax Errors

Bottom-Up Parsing. Lecture 11-12

Transcription:

CIT3136 - Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Definition of a Context-free Grammar: An alphabet or set of basic symbols (like regular expressions, only now the symbols are whole tokens, not chars), including ε. (Terminals) A set of names for structures (like statement, expression, definition). (Non-terminals) A set of grammar rules expressing the structure of each name. (Productions) A start symbol (the name of the most general structure compilation_unit in C). 4/2/2003 2

Basic Example: Simple integer arithmetic expressions 2 non-terminals exp exp op exp ( exp ) number op + - * 6 terminals In what way does such a CFG differ from a regular expression? digit = 0 1 9 number = digit digit* Recursion! Recursive rules 6 productions (3 on each line) Base rule 4/2/2003 3

CFGs are designed to represent recursive (i.e. nested) structures But consequences are huge: The structure of a matched string is no longer given by just a sequence of symbols (lexeme), but by a tree (parse tree) Recognizers are no longer finite, but may have arbitrary data size, and must have some notion of stack. 4/2/2003 4

Recognition Process is much more complex: Algorithms can use stacks in many different ways. Nondeterminism is much harder to eliminate. Even the number of states can vary with the algorithm (only 2 states necessary if stack is used for state structure. 4/2/2003 5

Major Consequence: Many parsing algorithms, not just one Top down Recursive descent (hand choice) Predictive table-driven, LL (outdated) Bottom up LR and its cousin LALR (machinegenerated choice [Yacc/Bison]) Operator-precedence (outdated) 4/2/2003 6

Structural Issues First! Express matching of a string [ (34-3)*42 ] by a derivation: (1) exp expopexp [exp expopexp] (2) exp op number [exp number] (3) exp * number [op * ] (4) ( exp ) *number [exp ( exp )] (5) ( exp op exp ) *number [exp expopexp] (6) (exp op number) *number [exp number ] (7) (exp - number) *number [op - ] (8) (number - number)*number [exp number ] 4/2/2003 7

Abstract the structure of a derivation to a parse tree: 1 exp 4 exp 3 op 2 exp ( 5 exp ) * number 8 exp 7 op 6 exp number - number 4/2/2003 8

Derivations can vary, even when the parse tree doesn t: A leftmost derivation (Slide 8 was a rightmost): (1) exp exp op exp [exp exp op exp] (2) (exp) op exp [exp ( exp )] (3) (exp op exp) op exp [exp exp op exp] (4) (number op exp) op exp [exp number] (5) (number - exp) op exp [op -] (6) (number - number) op exp [exp number] (7) (number - number) * exp [op *] (8) (number - number) * number [exp number] 4/2/2003 9

A leftmost derivation corresponds to a (top-down) preorder traversal of the parse tree. A rightmost derivation corresponds to a (bottom-up) postorder traversal, but in reverse. Top-down parsers construct leftmost derivations. (LL = Left-to-right traversal of input, constructing a Leftmost derivation) Bottom-up parsers construct rightmost derivations in reverse order. (LR = Left-to-right traversal of input, constructing a Rightmost derivation) 4/2/2003 10

But what if the parse tree does vary?[ exp op exp op exp ] Correct one exp exp exp op exp exp op exp exp op exp * number number - exp op exp number - number number * number The grammar is ambiguous, but why should we care? Semantics! 4/2/2003 11

Principle of Syntax-directed Semantics The parse tree will be used as the basic model; semantic content will be attached to the tree; thus the tree should reflect the structure of the eventual semantics (semanticsbased syntax would be a better term) 4/2/2003 12

Sources of Ambiguity: Associativity and precedence of operators Sequencing Extent of a substructure (dangling else) Obscure recursion (unusual) exp exp exp 4/2/2003 13

Dealing with ambiguity Disambiguating rules Change the grammar (but not the language!) Can all ambiguity be removed? Backtracking can handle it, but the expense is great 4/2/2003 14

Example: integer arithmetic exp exp addop term term addop + - term term mulop factor factor mulop * factor ( exp ) number Precedence cascade 4/2/2003 15

Repetition and Recursion Left recursion: A A x y yxx: A A x A x y Right recursion: A x A y xxy: A x A x A 4/2/2003 16 y

Repetition & Recursion, cont. Sometimes we care which way recursion goes: operator associativity Sometimes we don t: statement and expression sequences Parsing always has to pick a way! The tree may remove this information (see next slide) 4/2/2003 17

Abstract Syntax Trees Express the essential structure of the parse tree only Leave out parens, cascades, and don t-care repetitive associativity Corresponds to actual internal tree structure produced by parser Use sibling lists for don t care repetition: s1 --- s2 --- s3 4/2/2003 18

Previous Example [ (34-3)*42 3)*42 ] * - 42 34 3 4/2/2003 19

Data Structure typedef enum {Plus,Minus,Times} OpKind; typedef enum {OpK,ConstK} ExpKind; typedef struct streenode { ExpKind kind; OpKind op; struct streenode *lchild,*rchild; int val; } STreeNode; typedef STreeNode *SyntaxTree; 4/2/2003 20

Or (using a union): typedef enum {Plus,Minus,Times} OpKind; typedef enum {OpK,ConstK} ExpKind; typedef struct streenode { ExpKind kind; struct streenode *lchild,*rchild; union { OpKind op; int val; } attribute; } STreeNode; typedef STreeNode *SyntaxTree; 4/2/2003 21

Sequence Examples stmt-seq stmt ; stmt-seq stmt one or more stmts separated by a ; stmt-seq stmt ; stmt-seq ε zero or more stmts terminated by a ; stmt-seq stmt-seq ; stmt stmt one or more stmts separated by a ; stmt-seq stmt-seq ; stmt ε zero or more stmts preceded by a ; 4/2/2003 22

Obscure Ambiguity Example Incorrect attempt to add unary minus: exp exp addop term term - exp addop + - term term mulop factor factor mulop * factor ( exp ) number 4/2/2003 23

Ambiguity Example, continued Better: (only one at beg. of an exp) exp exp addop term term - term Or maybe: (many at beg. of term) term - term term1 term1 term1 mulop factor factor Or maybe: (many anywhere) factor ( exp ) number - factor 4/2/2003 24

Dangling else ambiguity statement if-stmt other if-stmt if ( exp ) statement if ( exp )statement else statement exp 0 1 The following string has two parse trees: if(0) if(1) other else other 4/2/2003 25

Parse trees for dangling else: statement statement Correct one if-stmt if-stmt if ( ) else exp statement statement if ( ) exp statement 0 other if-stmt 0 if-stmt if ( ) exp statement if ( ) else exp statement statement 1 other 1 other other 4/2/2003 26

Disambiguating Rule: An else part should always be associated with the nearest if-statement that does not yet have an associated else-part. (Most-closely nested rule: easy to state, but hard to put into the grammar itself.) Note that a bracketing keyword can remove the ambiguity: Bracketing keyword if-stmt if ( exp ) stmt end if ( exp )stmt else stmt end 4/2/2003 27

Extra Notation: So far: Backus-Naur Form (BNF) Metasymbols are ε Extended BNF (EBNF): New metasymbols [ ] and { } ε largely eliminated by these Parens? Maybe yes, maybe no: exp exp (+ -) term term exp exp + term exp - term term 4/2/2003 28

EBNF Metasymbols: Brackets [ ] mean optional (like? in regular expressions): exp term exp term becomes: exp term [ exp ] if-stmt if ( exp ) stmt if ( exp )stmt else stmt becomes: if-stmt if ( exp ) stmt [ else stmt ] Braces { } mean repetition (like * in regexps - see next slide) 4/2/2003 29

Braces in EBNF Replace only left-recursive repetition: exp exp + term term becomes: exp term { + term } Left associativity still implied Watch out for choices: exp exp + term exp - term term is not the same as exp term { + term } term { - term } 4/2/2003 30

Simple Expressions in EBNF exp term { addop term } addop + - term factor { mulop factor } mulop * factor ( exp ) number 4/2/2003 31

Final Notational Option: Syntax Diagrams (from EBNF): exp > term < addop < > ( > exp > ) factor > > number 4/2/2003 32