Context-Free Grammars

Similar documents
Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Recursive descent parsing

Ambiguous Grammars and Compactification

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Definition: two derivations are similar if one of them precedes the other.

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Principles of Programming Languages COMP251: Syntax and Grammars

CS525 Winter 2012 \ Class Assignment #2 Preparation

Context-Free Languages and Parse Trees

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

CPS 506 Comparative Programming Languages. Syntax Specification

Compilers and computer architecture From strings to ASTs (2): context free grammars

2.2 Syntax Definition

This book is licensed under a Creative Commons Attribution 3.0 License

SWEN 224 Formal Foundations of Programming

Syntax Analysis Check syntax and construct abstract syntax tree

Bottom-Up Parsing. Lecture 11-12

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

More Assigned Reading and Exercises on Syntax (for Exam 2)

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Models of Computation II: Grammars and Pushdown Automata

Programming Languages Third Edition

Bottom-Up Parsing. Lecture 11-12

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Compiler Design Concepts. Syntax Analysis

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax. A. Bellaachia Page: 1

Programming Lecture 3

Introduction to Parsing. Lecture 8

A programming language requires two major definitions A simple one pass compiler

Introduction to Syntax Analysis

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Types of parsing. CMSC 430 Lecture 4, Page 1

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Introduction to Syntax Analysis. The Second Phase of Front-End

Lexical and Syntax Analysis

Introduction to Parsing. Lecture 5

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Tree Parsing. $Revision: 1.4 $

Context-Free Grammars

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Equivalence of NTMs and TMs

Grammar Rules in Prolog

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Chapter 4: Syntax Analyzer

Defining syntax using CFGs

A Simple Syntax-Directed Translator

Lexical and Syntax Analysis. Top-Down Parsing

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

CS 536 Midterm Exam Spring 2013

Programming Language Features. CMSC 330: Organization of Programming Languages. Turing Completeness. Turing Machine.

CMSC 330: Organization of Programming Languages

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Introduction to Parsing. Lecture 5

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Context-Free Grammars and Languages (2015/11)

Related Course Objec6ves

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Grammar Rules in Prolog!!

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

How do LL(1) Parsers Build Syntax Trees?

CMPSCI 250: Introduction to Computation. Lecture #14: Induction and Recursion (Still More Induction) David Mix Barrington 14 March 2013

Programming Languages and Paradigms, J. Fenwick, B. Kurtz, C. Norris

CS 314 Principles of Programming Languages

Table-Driven Top-Down Parsers

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

CS 403: Scanning and Parsing

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Chapter 2 Syntax Analysis

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Context-Free Grammars

Non-deterministic Finite Automata (NFA)

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

CS164: Midterm I. Fall 2003

Transcription:

Context-Free Grammars An alphabet is a set of symbols finite or infinite. Given an alphabetathe set of all finite sequences with items inais writtena. A language (overa) is a subset finite or infinite of A. Some languages: {[0,0],[0,1],[1,0],[1,1]} a finite language over{0,1} {0,1} ={[],[0],[1],[0,0],[0,1],[1,0],[1,1],[0,0,0], }, an infinite language over{0,1} {[],[0],[1],[0,0],[1,1],[0,0,0],[0,1,0],[1,0,1],[1,1,1], } the infinite set of palindromes over{0,1} {[1],[2],[1, +,1],[1, +,2],[2, +,2],[1, +,1, +,1], } arithmetic expressions using 1, 2, and + As languages are often infinite sets, we need finite descriptions of infinite languages. A context-free language (CFL) is a language defined by a context-free grammar (below). Context free languages have numerous applications in data formats, programming languages, communication protocols, etc. Typeset October 21, 2014 1

Languages that are too complex to be context free are nevertheless often described by first describing a context free language and then imposing additional restrictions E.g. the set of all syntactically correct Java classes is context free class C { int i = 13.5;} The set of all type correct Java classes is not context free. A context free grammar (CFG)(A,N,P,n start ) consists of An alphabeta(i.e. a set of symbols) A finite set of nonterminal symbolsn disjoint from A. A finite set of production rules of the formn α, wheren N andα (N A) A starting nonterminaln start N. Example G 0 = ({0,1,2,3,4,5,6,7,8,9,(,),, },{pn,d},p 0,pn) wherep 0 is 1 {pn (ddd) ddd dddd, d 0,d 1,d 2,d 3,d 4, d 5,d 6,d 7,d 8,d 9} 1 In formal language theory, it is is usual to write sequences just by listing the items. E.g. dḋd instead of[d,d,d]. This creates a usually harmless ambiguity between symbols and sequences of length 1. Catenation is written asst rather thansˆt. The empty sequence is written as eitherɛor as nothing at all rather than[]. Typeset October 21, 2014 2

For a given grammarg = (A,N,P,n start ), define a relation= on sequences as follows 2 α= β if and only if α,β (N A) and there are sequencesγ,δ,θ (N A) and an n N such that α=γnθ and β=γδθ and (n δ) P Ifα= β, we say that we can deriveβ fromαin one step. Each derivation step γnθ = γδθ replaces one occurrence of some nonterminal symbol n with δ where (n δ) P. For example the following are derivation steps forg 0 ddd = 8dd ddd = d0d pn = (ddd) ddd dddd Thus each grammar defines a directed graph in which the nodes are elements of(a N) and, for eachαand β, there is an edge fromαtoβ iffα= β. 2 Using the notation of the rest of the course, we would write the last line of this definition as such thatα=γˆ[n]ˆθ andβ=γˆδˆθ and(n δ) P. Typeset October 21, 2014 3

A derivation is a finite path in this graph. We write α= β to mean there is a derivation that starts atsand ends att. I.e. α= βmeans that we can transformα intoβ via 0 or more derivation steps. For example pn = (dd9) ddd dd0d The language defined by a CFG(A,N,P,n start ) is the set of sequences inα A such thatn start = α. For example, the language defined byg 0 includes the sequence (709) 867 5309 To prove this, all we need to do is show 1 derivation (of the many) frompn. pn = (ddd) ddd dddd= (dd9) ddd dddd = (dd9) ddd dd0d= (d09) ddd dd0d = (709) ddd dd0d= (709) ddd dd0d = (709) ddd dd09= (709) ddd d309 = (709) 8dd d309= (709) 86d d309 = (709) 867 d309= (709) 867 5309 Typeset October 21, 2014 4

Another example. LetG 1 =(A 1,N 1,P 1,block) A 1 ={+,,/,,(,),<,:=,if,then,while,do,else,end} I N, wherei is a finite set of identifiers disjoint from {if,then,while,do,else,end} andn is some finite subset ofn. N 1 ={block,command,exp,comparand,term,factor} andp 1 contains all of the following production rules 3 block ɛ block command block command i:=exp for alli I command ifexpthenblockelseblockendif command whileexpdoblockendwhile exp comparand exp comparand < comparand comparand term comparand term + comparand comparand term comparand term factor term factor term term factor/term factor n for alln N factor i for alli I factor (exp) An example of a string in this language is whilei<ndoj:=j+ii:=i+1endwhile 3 Recall thatɛmeans an empty sequence. Typeset October 21, 2014 5

We can show that whilei<ndoj:=j+ii:=i+1endwhile is in the language by showing a derivation. block = command block = whileexpdoblockendwhileblock. = whilei<ndoj:=j+icommandblockendwhilebloc = whilei<ndoj:=j+ii:=expblockendwhileblock = whilei<ndoj:=j+ii:=comparandblock = whilei<ndoj:=j+ii:=term+comparandblock = whilei<ndoj:=j+ii:=factor+comparandblock = whilei<ndoj:=j+ii:=i+comparandblock = whilei<ndoj:=j+ii:=i+termblock = whilei<ndoj:=j+ii:=i+factorblock = whilei<ndoj:=j+ii:=i+1blockendwhilebloc = whilei<ndoj:=j+ii:=i+1endwhileblock = whilei<ndoj:=j+ii:=i+1endwhile Typeset October 21, 2014 6

Another way to show that a sequence is in a context free language is with a parse tree. Typeset October 21, 2014 7

Let s define parse trees. An ordered tree is a directed tree whose nodes are either leaves (with no children) or branches whose (0 or more) children are arranged in a sequencechildren(t) A parse tree for a CFG(A,N,P,n start ) is a finite ordered tree whose nodes are labelled with symbols froma N, such that the root is labelled withn start and each branch node is labelled with a nonterminal symboln N and has a sequence of children whose labels form a finite sequenceαsuch that(n α) P. Given a parse treetthe sequence of leaves is given by proc leaves(t) iftis a leaf then return[label(t)] else //tis a branch varα:=[] for u children(t) do α:= αˆleaves(u) end for returnα end if end leaves Exercise: Show that, for any parse treet, ifleaves(t)=α, thenn start = α. Exercise: Show that ifn start = α, there is a parse treet such thatα=leaves(t). Typeset October 21, 2014 8

Exercise: Design a CFG for the language of palindromes in{0,1}. Exercise: Design a CFG for the language of valid boolean expressions in{p,q,r,,,,,(,)} Exercise: Design a CFG for the language of valid C++ variable declarations in {int,,[,],(,),;,a,b,c,0,1} wherea,b, andcare identifiers. E.g. int a(int, int ); declaresato be a function, while int( b[10])(); declaresbto be an array of 10 pointers to functions returning int results. Exercise: Look up the definition of regular language. Show that every regular language is a context-free language. Show that some context free languages are not regular languages. Typeset October 21, 2014 9