Lecture 8: Context Free Grammars

Similar documents
Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Context-Free Grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Lecture 12: Parser-Generating Tools

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Compilers Course Lecture 4: Context Free Grammars

Optimizing Finite Automata

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Context-Free Languages and Parse Trees

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Introduction to Lexing and Parsing

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Parsing II Top-down parsing. Comp 412

Chapter 3. Describing Syntax and Semantics ISBN

Compiler Construction

EECS 6083 Intro to Parsing Context Free Grammars

A Simple Syntax-Directed Translator

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

A simple syntax-directed

Dr. D.M. Akbar Hussain

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Chapter 4. Lexical and Syntax Analysis

2.2 Syntax Definition

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Syntax Analysis Part I

CS 314 Principles of Programming Languages

CSE302: Compiler Design

Introduction to Parsing. Lecture 5

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Introduction to Parsing. Lecture 8

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Parsing Part II (Top-down parsing, left-recursion removal)

Decision Properties for Context-free Languages

ITEC2620 Introduction to Data Structures

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Week 2: Syntax Specification, Grammars

Languages and Compilers

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

CS 406/534 Compiler Construction Parsing Part I

Types of parsing. CMSC 430 Lecture 4, Page 1

ICOM 4036 Spring 2004

Programming Language Syntax and Analysis

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Lecture 5: Regular Expression and Finite Automata

Introduction to Syntax Analysis

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Describing Syntax and Semantics

CSCI312 Principles of Programming Languages

3. Parsing. Oscar Nierstrasz

Chapter 3. Describing Syntax and Semantics

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Introduction to Parsing. Lecture 5

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Lexical and Syntax Analysis. Top-Down Parsing

4. Lexical and Syntax Analysis

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Habanero Extreme Scale Software Research Project

Bottom-Up Parsing. Lecture 11-12

Part 5 Program Analysis Principles and Techniques

Introduction to Syntax Analysis. The Second Phase of Front-End

4. Lexical and Syntax Analysis

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Chapter 3. Describing Syntax and Semantics

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Properties of Regular Expressions and Finite Automata

Models of Computation II: Grammars and Pushdown Automata

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntax and Grammars 1 / 21

Chapter 18: Decidability

CT32 COMPUTER NETWORKS DEC 2015

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Compiler Design Concepts. Syntax Analysis

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Wednesday, August 31, Parsers

Compiler principles, PS1

CYK parsing. I love automata theory I love love automata automata theory I love automata love automata theory I love automata theory

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Transcription:

Lecture 8: Context Free s Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (12/10/17) Lecture 8: Context Free s 2017-2018 1 / 1

Specifying Non-Regular Languages Recall Language Observations Not every language is regular, e.g. L = {a n b n : n non-negative integer} Consider following recursive rules defining L 1 ɛ L 2 if α L, then so is a α b Every string derived by repeated application of above rules is in L Every string in L can be formed by these rules by applying second rule n times KH (12/10/17) Lecture 8: Context Free s 2017-2018 2 / 1

Context Free s Idea Capture above idea using a context-free grammar (CFG). S ɛ S a S b Intuitive Explanation Symbol < S > a substitutable placeholder productions are substitution rules Language consists of strings derivable by starting with < S > repeatedly applying rules continuing until no place-holders left KH (12/10/17) Lecture 8: Context Free s 2017-2018 3 / 1

CFG cont d S ɛ S a S b Examples of Derivations S ɛ S a S b a b S a S b aa S bb aabb So ɛ, ab, aabb, belong to language KH (12/10/17) Lecture 8: Context Free s 2017-2018 4 / 1

Some Terminology A (context-free) grammar consists of one or more Productions S a S b LHS a single nonterminal (here S ) RHS sequence of one or more symbols (here a S b ) composed of terminals, nonterminals and ɛs ( separates two; sometimes ::= etc. used instead) where Terminals are symbols from underlying alphabet, e.g. {a, b } Nonterminals are placeholder symbols, e.g. S (Here enclosed in angle brackets for clarity) Start Symbol a nonterminal ( S ) ; the first nonterminal by default KH (12/10/17) Lecture 8: Context Free s 2017-2018 5 / 1

Derivations S ɛ S a S b Derivation Transformation of start symbol into sentence (sequence of terminals) by repeated application of grammar productions i.e. substitution of RHS of some production for the nonterminal in its LHS Example: S a S b aa S bb aabb The intermediate stages e.g. a S b are known as sentential forms Definition Sentences derivable from start symbol constitute the language defined by grammar KH (12/10/17) Lecture 8: Context Free s 2017-2018 6 / 1

More Examples and Counterexamples S ɛ S a S b KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1

More Examples and Counterexamples S ɛ S a S b aaabbb? KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1

More Examples and Counterexamples S ɛ S a S b aaabbb? aaab? KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1

More Examples and Counterexamples S ɛ S a S b aaabbb? aaab? abba? Upshot specifies language L = {a n b n n 0} Note If we interpret a as ( and b as ), this captures the set of nested parentheses KH (12/10/17) Lecture 8: Context Free s 2017-2018 7 / 1

Another S N S S N N ɛ N ( S ) Features Start S Nonterminals S, N Terminals Left and right parentheses symbols ( ( and ) shown in boldface) KH (12/10/17) Lecture 8: Context Free s 2017-2018 8 / 1

Left Recursion S N S S N N ɛ N ( S ) Note is left recursive: embodies rules of form X X α 1 This is one of the standard grammar idioms used to express repetition Some techniques disfavour left recursion; can usually recast grammar to avoid 1 More indirect forms of left recursion are also possible KH (12/10/17) Lecture 8: Context Free s 2017-2018 9 / 1

Another cont d S N S S N N ɛ N ( S ) Observation The first two rules imply S N S S N N N S S N S N N N N N i.e. S can roll out sequence of one or more N s depending on the number of applications of Rule 2. This is a standard CFG idiom to specify repetition. KH (12/10/17) Lecture 8: Context Free s 2017-2018 10 / 1

Some More Derivations S N S S N N ɛ N ( S ) Some Derivations S N ɛ KH (12/10/17) Lecture 8: Context Free s 2017-2018 11 / 1

Some More Derivations S N S S N N ɛ N ( S ) Some Derivations S N ɛ S N ( S ) ( N ) () KH (12/10/17) Lecture 8: Context Free s 2017-2018 11 / 1

Some More Derivations S N S S N N ɛ N ( S ) Some Derivations S N ɛ S N ( S ) ( N ) () S N ( S ) ( N ) (( S )) (( N )) (()) KH (12/10/17) Lecture 8: Context Free s 2017-2018 11 / 1

Some More Derivations S N S S N N ɛ N ( S ) More Derivations S S N N N ( S ) N ( N ) N () N () ( S ) ()( N ) ()() KH (12/10/17) Lecture 8: Context Free s 2017-2018 12 / 1

Some More Derivations S N S S N N ɛ N ( S ) More Derivations S S N N N ( S ) N ( N ) N () N () ( S ) ()( N ) ()() Upshot captures set of balanced parentheses as found in validly formated arithmetic expressions. KH (12/10/17) Lecture 8: Context Free s 2017-2018 12 / 1

Parse Trees Parse Trees S N S S N N ɛ N ( S ) Sentence/ Source : ()()() Parse Tree tree representation of derivation start symbol at root terminals at leaves each non-leaf reflects a production inorder traversal (leaves only) yields sentence. KH (12/10/17) Lecture 8: Context Free s 2017-2018 13 / 1

Parse Trees Parse Trees S N S S N N ɛ N ( S ) Sentence/ Source : ()()() KH (12/10/17) Lecture 8: Context Free s 2017-2018 14 / 1

Parse Trees Parse Trees cont d Tree representation encodes connection between source and grammar Compilers often use such trees to model detailed structure of source to drive code generation, for example KH (12/10/17) Lecture 8: Context Free s 2017-2018 15 / 1

Parse Trees Notational Note Productions sharing the same LHS can be combined using the symbol (read or ). So X α X β X γ can be abbreviated to X α β γ KH (12/10/17) Lecture 8: Context Free s 2017-2018 16 / 1

CFGs and Programming Language Syntax for Simple Arithmetic Expressions expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) Terminal NUM stands for a number (i.e. sequence of digits). CFGs can be used to specify syntax for arithmetic expressions and most programming languages CFG-based tools allow us to generate parser capable of recognizing expressions automatically KH (12/10/17) Lecture 8: Context Free s 2017-2018 17 / 1

CFGs and Programming Language Syntax Some Examples of Valid Expressions 1 NUM 2 NUM NUM 3 NUM + NUM 4 NUM + NUM NUM 5 NUM (NUM + NUM) KH (12/10/17) Lecture 8: Context Free s 2017-2018 18 / 1

CFGs and Programming Language Syntax Example 1 Expression Parse Tree NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 19 / 1

CFGs and Programming Language Syntax Example 2 Expression Parse Tree NUM NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 20 / 1

CFGs and Programming Language Syntax Example 3 Expression Parse Tree NUM + NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 21 / 1

CFGs and Programming Language Syntax Example 4 Expression Parse Tree NUM + NUM NUM expr expr + term expr - term term term term * factor term / factor factor factor NUM ( expr ) KH (12/10/17) Lecture 8: Context Free s 2017-2018 22 / 1

CYK Algorithm Parsing Algorithm <expr > <expr > + <term> <expr > <term> <term> <term> <term> <factor > <term>/<factor > <factor > < factor > NUM (<expr >) For CFG G and string s how do we determine if s L(G)? Could try enumerating all possible derivations but TGBABW... KH (12/10/17) Lecture 8: Context Free s 2017-2018 23 / 1

CYK Algorithm CYK Algorithm for i 1 to n do V[i, 1] {A A > a is a production and ith symbol of x is a} for j 2 to n do for i 1 to n j + 1 do V[i, j ] {} for k 1 to j 1 do V[i, j ] V[i, j ] Union {A A >BC is a production, B is in V[i, k] and C is in V[i+k, j k]} 2 Computes (in V [i, j]) set of nonterminals <X> which for which derivation <X> x i x i+1 x i+j 1 exists, where x i x i+1 x i+j 1 denotes substring of source beginning at x i and of length j. 2 See J. E. Hopcroft and J. D. Ullmann, Introduction to Automata, Languages and Computation, Addison-Wesley, 1979 (pp139 141) KH (12/10/17) Lecture 8: Context Free s 2017-2018 24 / 1

CYK Algorithm Chomsky Normal Form Chomsky Normal Form (CNF) Any grammar without ɛ can be recast to use only productions of form A B C A a where. are nonterminals and a is a terminal. Transformation reasonably straightforward, but not discussed here KH (12/10/17) Lecture 8: Context Free s 2017-2018 25 / 1

CYK Algorithm Determines for any CNF G and string s, whether s L(G) (Can be modified to produce derivation/parse tree) (Dynamic Programming!) KH (12/10/17) Lecture 8: Context Free s 2017-2018 26 / 1 CYK Algorithm for i 1 to n do V[i, 1] {A A > a is a production and ith symbol of x is a} for j 2 to n do for i 1 to n j + 1 do V[i, j ] {} for k 1 to j 1 do V[i, j ] V[i, j ] Union {A A >BC is a production, B is in V[i, k] and C is in V[i+k, j k]} CYK Algorithm