Context-Free Languages and Parse Trees

Similar documents
Ambiguous Grammars and Compactification

Context-Free Grammars

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Syntax Analysis Part I

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Defining syntax using CFGs

Introduction to Syntax Analysis. The Second Phase of Front-End

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Introduction to Parsing. Lecture 8

Optimizing Finite Automata

Context-Free Grammars and Languages (2015/11)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

COP 3402 Systems Software Syntax Analysis (Parser)

Dr. D.M. Akbar Hussain

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Introduction to Syntax Analysis

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Enumerations and Turing Machines

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CMSC 330: Organization of Programming Languages

3. Parsing. Oscar Nierstrasz

Syntax Analysis Check syntax and construct abstract syntax tree

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

3. Context-free grammars & parsing

Lexical and Syntax Analysis

Languages and Compilers

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Introduction to Parsing. Lecture 5

CMSC 330: Organization of Programming Languages

Lexical and Syntax Analysis. Top-Down Parsing

Compilers Course Lecture 4: Context Free Grammars

4. Lexical and Syntax Analysis

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

CMSC 330: Organization of Programming Languages. Context Free Grammars

Lecture 4: Syntax Specification

Part 5 Program Analysis Principles and Techniques

CS525 Winter 2012 \ Class Assignment #2 Preparation

Introduction to Parsing. Lecture 5

4. Lexical and Syntax Analysis

Lecture 8: Context Free Grammars

Building Compilers with Phoenix

Introduction to Lexing and Parsing

VIVA QUESTIONS WITH ANSWERS

Describing Syntax and Semantics

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Formal Languages. Formal Languages

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Defining Languages GMU

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Compiler Design Concepts. Syntax Analysis

CSE302: Compiler Design

EECS 6083 Intro to Parsing Context Free Grammars

Properties of Regular Expressions and Finite Automata

15 212: Principles of Programming. Some Notes on Grammars and Parsing

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS 314 Principles of Programming Languages



A programming language requires two major definitions A simple one pass compiler

Chapter 4: LR Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Syntax. In Text: Chapter 3

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Compilers and computer architecture From strings to ASTs (2): context free grammars

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

ICOM 4036 Spring 2004

Compiler Construction 2016/2017 Syntax Analysis

2.2 Syntax Definition

Chapter 4. Lexical and Syntax Analysis

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Context-Free Grammars

Chapter 2 :: Programming Language Syntax

Transcription:

Context-Free Languages and Parse Trees Mridul Aanjaneya Stanford University July 12, 2012 Mridul Aanjaneya Automata Theory 1/ 41

Context-Free Grammars A context-free grammar is a notation for describing languages. It is more powerful than finite automata or regular expressions. It still cannot define all possible languages. Useful for nested structures, e.g., parentheses in programming languages. Mridul Aanjaneya Automata Theory 2/ 41

Context-Free Grammars Basic idea is to use variables to stand for sets of strings (i.e., languages). These variable are defined recursively, in terms of one another. Recursive rules (productions) involve only concatenation. Alternative rules for a variable allow union. Mridul Aanjaneya Automata Theory 3/ 41

Example: CFG for {0 n 1 n n 1} Productions: S 01 S 0S1 Basis: 01 is in the language. Induction: If w is in the language, then so is 0w1. Mridul Aanjaneya Automata Theory 4/ 41

CFG Formalism Terminals: symbols of the alphabet of the language being defined. Variables (nonterminals): a finite set of other symbols, each of which represents a language. Start symbol: the variable whose language is the one being defined. Mridul Aanjaneya Automata Theory 5/ 41

Productions A production has the form variable string of variables and terminals Convention A, B, C,... are variables. a, b, c,... are terminals...., X, Y, Z are either terminals or variables...., w, x, y, z are strings of terminals only. α, β, γ,... are strings of terminals and/or variables. Mridul Aanjaneya Automata Theory 6/ 41

Example: Formal CFG Here is a formal CFG for {0 n 1 n n 1}. Terminals = {0,1}. Variables = {S}. Start symbol = S. Productions = S 01 S 0S1 Mridul Aanjaneya Automata Theory 7/ 41

Derivations - Intuition We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of one of its productions. That is, the productions for A are those that have A on the left side of the. Mridul Aanjaneya Automata Theory 8/ 41

Derivations - Formalism We say αaβ αγβ if A γ is a production. Example: S 01; S 0S1. S 0S1 00S11 000111. Mridul Aanjaneya Automata Theory 9/ 41

Iterated Derivation means zero or more derivation steps. Basis: α α for any string α. Induction: if α β and β γ, then α γ. Mridul Aanjaneya Automata Theory 10/ 41

Example: Iterated Derivation S 01; S 0S1. S 0S1 00S11 000111. So S S; S 0S1; S 00S11; S 000111. Mridul Aanjaneya Automata Theory 11/ 41

Sentential Forms Any string of variables and/or terminals derived from the start symbol is called a sentential form. Formally, α is a sentential form iff S α. Mridul Aanjaneya Automata Theory 12/ 41

Language of a Grammar If G is a CFG, then L(G), the language of G is {w S w}. Note: w must be a terminal string, S is the start symbol. Example: G has productions S ε and S 0S1. Note: ε is a valid right hand side. L(G) = {0 n 1 n n 0}. Mridul Aanjaneya Automata Theory 13/ 41

Context-Free Languages A language that is defined by some CFG is called a context-free language. There are CFL s that are not regular languages, such as the example just given. But not all languages are CFL s. Intuition: CFL s can count two things, not three. Mridul Aanjaneya Automata Theory 14/ 41

BNF Notation Grammars for programming languages are often written in Backus-Naur Form (BNF). Variables are words in <... >, e.g., <statement>. Terminals are often multicharacter strings indicated by boldface or underline, e.g., while or WHILE. Mridul Aanjaneya Automata Theory 15/ 41

BNF Notation Symbol ::= is often used for. Symbol is used for or. A shorthand for a list of productions with the same left side. Example: S 0S1 01 is a shorthand for S 0S1 and S 01. Mridul Aanjaneya Automata Theory 16/ 41

BNF Notation: Kleene Closure Symbol... is used for one or more. Example: <digit> ::= 0 1 2 3 4 5 6 7 8 9 <unsigned integer> ::= <digit>... Note: that s not exactly the for RE s. Translation: Replace α... with a new variable A and productions A Aα α. Grammar for unsigned integers can be replaced by: U UD D D 0 1 2 3 4 5 6 7 8 9 Mridul Aanjaneya Automata Theory 17/ 41

BNF Notation: Optional Elements Surround one or more symbols by [...] to make them optional. Example: <statement> ::= if <condition> then <statement> [;else <statement>] Translation: replace [α] by a new variable A with productions A ε. Grammar for if-then-else can be replaced by: S ictsa A ;es ε Mridul Aanjaneya Automata Theory 18/ 41

BNF Notation: Grouping Use {...} to surround a sequence of symbols that need to be treated as a unit. Typically, they are followed by a... for one or more. Example: <statement list> ::= <statement> [{;<statement>}...] Mridul Aanjaneya Automata Theory 19/ 41

BNF Notation: Grouping (Translation) You may, if you wish, create a new variable A for {α}. One production for A: A α. Use A in place of {α}. Mridul Aanjaneya Automata Theory 20/ 41

Example: Grouping L S[{;S}...] Replace L S[A...]; A ;S A stands for {;S}. Then by L SB; B A... ε; A ;S B stands for [A...] (zero or more A s). Finally by L SB; B C ε; C AC A; A ;S. C stands for A... Mridul Aanjaneya Automata Theory 21/ 41

Leftmost and Rightmost Derivations Derivations allow us to replace any of the variables in a string. Leads to many different derivations of the same string. By forcing the leftmost variable (or alternatively, the rightmost variable) to be replaced, we avoid these distinctions without a difference. Mridul Aanjaneya Automata Theory 22/ 41

Leftmost Derivations Say waα lm wβα if w is a string of terminals only and A β is a production. Also, α lm β if α becomes β by a sequence of zero or more lm steps. Mridul Aanjaneya Automata Theory 23/ 41

Example: Leftmost Derivations Balanced parantheses grammar: S SS (S) () S lm SS lm (S)S lm (())S lm (())() Thus, S lm (())() S SS S() (S)() (())() is a derivation, but not a leftmost derivation. Mridul Aanjaneya Automata Theory 24/ 41

Rightmost Derivations Say αaw rm αβw if w is a string of terminals only and A β is a production. Also, α rm β if α becomes β by a sequence of zero or more rm steps. Mridul Aanjaneya Automata Theory 25/ 41

Example: Rightmost Derivations Balanced parantheses grammar: S SS (S) () S rm SS rm S() rm (S)() rm (())() Thus, S rm (())() S SS SSS S()S ()()S ()()() is neither a rightmost derivation nor a leftmost derivation. Mridul Aanjaneya Automata Theory 26/ 41

Parse Trees Parse trees are trees labeled by symbols of a particular CFG. Leaves: labeled by a terminal or ε. Interior nodes: labeled by a variable. Children are labeled by the right side of a production for the parent. Root: must be labeled by the start symbol. Mridul Aanjaneya Automata Theory 27/ 41

Example: Parse Tree S SS (S) () S S S ( S ) ( ) ( ) Mridul Aanjaneya Automata Theory 28/ 41

Yield of a Parse Tree The concatenation of the labels of the leaves in left-to-right order is called the yield of the parse tree. Example: Yield of the given parse tree is (())(). S S S ( S ) ( ) ( ) Mridul Aanjaneya Automata Theory 29/ 41

Parse Trees, Leftmost and Rightmost Derivations For every parse tree, there is a unique leftmost and a unique rightmost derivation. We ll prove: 1 If there is a parse tree with root labeled A and yield w, then A lm w. 2 If A lm w, then there is a parse tree with root A and yield w. Mridul Aanjaneya Automata Theory 30/ 41

Part 1: Basis Induction on the height (length of the longest path from the root) of the tree. Basis: height 1. Tree looks like a A... a 1 n A a 1... a n must be a production. Thus, A lm a 1... a n. Mridul Aanjaneya Automata Theory 31/ 41

Part 1: Induction Assume (1) for trees of height < h, and let this tree have height h: By IH, X i lm w i. Note: If X i is a terminal, then X i = w i. X A... X 1 n w 1 w n Thus, A lm X 1...X n lm w 1X 2...X n lm w 1w 2 X 3...X n lm... lm w 1...w n Mridul Aanjaneya Automata Theory 32/ 41

Part 2 Given a leftmost derivation of a terminal string, we need to prove the existence of a parse tree. The proof is an induction on the length of the derivation. Mridul Aanjaneya Automata Theory 33/ 41

Part 2: Basis If A lm a 1a 2...a n by a one-step derivation, then there must be a parse tree a A... a 1 n Mridul Aanjaneya Automata Theory 34/ 41

Part 2: Induction Assume (2) for derivations of fewer than k > 1 steps, and let A lm w be a k-step derivation. First step is A lm X 1...X n. Key point: w can be divided such that the first portion is derived from X 1, the next is derived from X 2, and so on. If X i is a terminal, then w i = X i. Mridul Aanjaneya Automata Theory 35/ 41

Part 2: Induction That is, X i lm w i for all i such that X i is a variable. And the derivation takes fewer than k steps. By the IH, if X i is a variable, then there is a parse tree with root X i and yield w i. Thus, there is a parse tree X A... X 1 n w 1 w n Mridul Aanjaneya Automata Theory 36/ 41

Parse Trees and Rightmost Derivations The ideas are essentially the mirror image of the proof of the leftmost derivations. Left to the imagination! Mridul Aanjaneya Automata Theory 37/ 41

Parse Trees and Any Derivation The proof that you can obtain a parse tree from a leftmost derivation doesn t really depend on leftmost. First step still has to be A lm X 1...X n. And w can still be divided such that the first portion is derived from X 1, the next is derived from X 2, and so on. Mridul Aanjaneya Automata Theory 38/ 41

Ambiguous Grammars A CFG is ambiguous is there is a string in the language that is the yield of two or more parse trees. Example: S SS (S) () Two parse trees for ()()()! Mridul Aanjaneya Automata Theory 39/ 41

Ambiguity, Leftmost and Rightmost Derivations If there are two different parse trees, they must produce two different leftmost derivations by the construction given in the proof. Conversely, two different leftmost derivations produce different parse trees by the other part of the proof. Likewise for rightmost derivations. Mridul Aanjaneya Automata Theory 40/ 41

Ambiguity, Leftmost and Rightmost Derivations Thus, equivalent definitions for ambiguous grammar are: 1 There is a string in the language that has two different leftmost derivations. 2 There is a string in the language that has two different rightmost derivations. Mridul Aanjaneya Automata Theory 41/ 41