Context-Free Grammars

Similar documents
Context-Free Grammars

Context-Free Grammars

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

Turing Machines Part Two

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

CSC-461 Exam #2 April 16, 2014

Non-deterministic Finite Automata (NFA)

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 105 THEORY OF COMPUTATION

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

Multiple Choice Questions

Ambiguous Grammars and Compactification

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Chapter Seven: Regular Expressions

Introduction to Parsing. Lecture 8

CS103 Handout 13 Fall 2012 May 4, 2012 Problem Set 5

CS154 Midterm Examination. May 4, 2010, 2:15-3:30PM

Context-Free Grammars

Programming Lecture 3

CS103 Handout 35 Spring 2017 May 19, 2017 Problem Set 7

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CS 314 Principles of Programming Languages

Models of Computation II: Grammars and Pushdown Automata

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Part 5 Program Analysis Principles and Techniques

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

Context-free grammars (CFGs)

Compiler Construction

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CS 403: Scanning and Parsing

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed)

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Languages and Parse Trees

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE302: Compiler Design

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CSE 413 Final Exam. June 7, 2011

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CSE 401 Midterm Exam 11/5/10

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

DVA337 HT17 - LECTURE 4. Languages and regular expressions

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CSE 105 THEORY OF COMPUTATION

Formal Languages. Formal Languages

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

CS 314 Principles of Programming Languages. Lecture 3

Formal Languages and Compilers Lecture VI: Lexical Analysis

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Decision Properties for Context-free Languages

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Introduction to Parsing. Lecture 5

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

EECS 6083 Intro to Parsing Context Free Grammars


Defining Languages GMU

CS 44 Exam #2 February 14, 2001

CS103 Handout 39 Winter 2018 February 23, 2018 Problem Set 7

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Languages and Compilers

Dr. D.M. Akbar Hussain

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler.

First Midterm Exam CS164, Fall 2007 Oct 2, 2007

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011

Introduction to Parsing. Lecture 5

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CSE 105 THEORY OF COMPUTATION

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

Transcription:

Context-Free Grammars

Describing Languages We've seen two models for the regular languages: Automata accept precisely the strings in the language. Regular expressions describe precisely the strings in the language. Finite automata recognize strings in the language. Perform a computation to determine whether a specific string is in the language. Regular expressions match strings in the language. Describe the general shape of all strings in the language.

Context-Free Grammars A context-free grammar (or CFG) is an entirely different formalism for defining a class of languages. Goal: Give a procedure for listing off all strings in the language. CFGs are best explained by example...

Arithmetic Expressions Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division. Here is one possible CFG: E int E E Op E E (E) Op + Op - Op * Op / E E Op E E Op (E) E Op (E Op E) E * (E Op E) int * (E Op E) int * (int Op E) int * (int Op int) int * (int + int)

Arithmetic Expressions Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division. Here is one possible CFG: E int E E Op E E (E) Op + Op - Op * Op / E E Op E E Op int int Op int int / int

Context-Free Grammars Formally, a context-free grammar is a collection of four objects: A set of nonterminal symbols (also called variables), A set of terminal symbols (the alphabet of the CFG) A set of production rules saying how each nonterminal can be converted by a string of terminals and nonterminals, and A start symbol (which must be a nonterminal) that begins the derivation. E int E E Op E E (E) Op + Op - Op * Op /

Some CFG Notation Capital letters in Bold Red Uppercase will represent nonterminals. i.e. A, B, C, D Lowercase letters in blue monospace will represent terminals. i.e. t, u, v, w Lowercase Greek letters in gray italics will represent arbitrary strings of terminals and nonterminals. i.e. α, γ, ω

A Notational Shorthand E int E E Op E E (E) Op + Op - Op * Op /

A Notational Shorthand E int E Op E (E) Op + - * /

Derivations E E Op E int (E) Op + * - / E E Op E E Op (E) E Op (E Op E) E * (E Op E) int * (E Op E) int * (int Op E) int * (int Op int) int * (int + int) A sequence of steps where nonterminals are replaced by the right-hand side of a production is called a derivation. If string α derives string ω, we write α * ω. In the example on the left, we see E * int * (int + int).

The Language of a Grammar If G is a CFG with alphabet Σ and start symbol S, then the language of G is the set L(G) = { ω Σ* S * ω } That is, L( G) is the set of strings derivable from the start symbol. Note: ω must be in Σ*, the set of strings made from terminals. Strings involving nonterminals aren't in the language.

More Context-Free Grammars Chemicals! C 19 H 14 O 5 S Cu 3 (CO 3 ) 2 (OH) 2 - MnO 4 S 2- Form Cmp Cmp Ion Cmp Term Term Num Cmp Cmp Term Elem (Cmp) Elem H He Li Be B C Ion + - IonNum + IonNum - IonNum 2 3 4... Num 1 IonNum

CFGs for Chemistry Form Cmp Cmp Ion Cmp Term Term Num Cmp Cmp Term Elem (Cmp) Elem H He Li Be B C Ion + - IonNum + IonNum - IonNum 2 3 4... Num 1 IonNum Form Cmp Ion Cmp Cmp Ion Cmp Term Num Ion Term Term Num Ion Elem Term Num Ion Mn Term Num Ion Mn Elem Num Ion MnO Num Ion MnO IonNum Ion MnO 4 Ion MnO - 4

CFGs for Programming Languages BLOCK STMT { STMTS } STMTS ε STMT STMTS STMT EXPR; if (EXPR) BLOCK while (EXPR) BLOCK do BLOCK while (EXPR); BLOCK EXPR var const EXPR + EXPR EXPR EXPR EXPR = EXPR... { } var = var * var; if (var) var = const; while (var) { var = var + const; }

Context-Free Languages A language L is called a context-free language (or CFL) if there is a CFG G such that L = L( G). Questions: What languages are context-free? How are context-free and regular languages related?

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S a*b

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S Ab

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S Ab A Aa ε

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S a(b c*)

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S ax X (b c*)

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S ax X b c*

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S ax X b C

From Regexes to CFGs CFGs don't have the Kleene star, parenthesized expressions, or internal operators. However, we can convert regular expressions to CFGs as follows: S ax X b C C Cc ε

Regular Languages and CFLs Theorem: Every regular language is context-free. Proof Idea: Use the construction from the previous slides to convert a regular expression for L into a CFG for L.

The Language of a Grammar Consider the following CFG G: S asb ε What strings can this generate? a a a a b b b b L(G) = { a n b n n N }

All Languages Regular Languages CFLs

Time-Out for Announcements!

Problem Set Six Problem Set Six goes out right now, is due next Monday at 2:15PM. Explore context-free languages and the limits of regular languages! Problem Set Five was due at the start of class; is due on Wednesday at 2:15PM sharp with a late day.

Extra Practice Problems Based on your feedback, I've assembled a set of eight extra practice problems on the material so far. Available on the course website; solutions will go out later this week.

Midterm Regrades All midterm regrades were due at the start of today's lecture. Brought a regrade to class? Feel free to hand it in to Keith right after class today. That's okay with us. SCPD students deadline is this Wednesday at the start of class.

Asking Questions We're getting a lot of great questions through the staff list and Piazza please keep asking them! If you have Piazza questions that don't give away key insights, consider making them public. There are a lot of great private questions right now! Please avoid asking questions after 10:00PM the night before a problem set comes due we will tend not to answer questions asked after this point.

Second Midterm Exam The second midterm exam is next Thursday, November 13 from 7PM 10PM. Covers material up through and including topics from PS6; focus is on topics from PS4 PS6. Same format as last time: closed-book, closedcomputer, open one page of notes. Location information TBA. Practice midterm next Monday night, November 10, from 7PM 10PM, location TBA. Need to take the exam at an alternate time? Email Maesen ASAP to set up an alternate exam time.

Your Questions!

It has been over a year since I took a CS class. I want to do CS107, but I am worried that my coding is too rusty. What are some good ways to brush up on coding over winter break?

Why is HTML not a regular language? I have always heard not to parse HTML with regular expressions because it is not a regular language, but why is this?

Do you have a dog?

Back to CS103!

Designing CFGs Like designing DFAs, NFAs, and regular expressions, designing CFGs is a craft. When thinking about CFGs: Think recursively: Build up bigger structures from smaller ones. Have a construction plan: Know in what order you will build up the string.

Designing CFGs Let Σ = {a, b} and let L = {w Σ* w is a palindrome } We can design a CFG for L by thinking inductively: Base case: ε, a, and b are palindromes. If ω is a palindrome, then aωa and bωb are palindromes. S ε a b asa bsb

Designing CFGs Let Σ = {(, )} and let L = {w Σ* w is a string of balanced parentheses } We can think about how we will build strings in this language as follows: The empty string is balanced. Any two strings of balanced parentheses can be concatenated. Any string of balanced parentheses can be parenthesized. S SS (S) ε

Designing CFGs: Watch Out! When designing CFGs, remember that each nonterminal can be expanded out independently of the others. Let Σ = {a, } and let L = {an a n n N }. Is the following a CFG for L? S X X X ax ε S X X ax X aax X aa X aa ax aa a

Finding a Build Order Let Σ = {a, } and let L = {an a n n N }. To build a CFG for L, we need to be more clever with how we construct the string. Idea: Build from the ends inward. Gives this grammar: S asa S asa aasaa aaasaaa aaa aaa

Designing CFGs: A Caveat Let Σ = {l, r} and let L = {w Σ* w has the same number of l's and r's } Is this a grammar for L? S lsr rsl ε Can you derive the string lrrl?

Designing CFGs: A Caveat When designing a CFG for a language, make sure that it generates all the strings in the language and never generates a string outside the language. The first of these can be tricky make sure to test your grammars! You'll design your own CFG for this language on the next problem set.

CFG Caveats II Is the following grammar a CFG for the language { a n b n n N }? S asb What strings can you derive? Answer: None! What is the language of the grammar? Answer: Ø When designing CFGs, make sure your recursion actually terminates!

Function Prototypes Let Σ = {void, int, double, name, (, ),,, ;}. Let's write a CFG for C-style function prototypes! Examples: void name(int name, double name); int name(); int name(double name); int name(int, int name, int); void name(void);

Function Prototypes Here's one possible grammar: S Ret name (Args); Ret Type void Type int double Args ε void ArgList ArgList OneArg ArgList, OneArg OneArg Type Type name

Summary of CFG Design Look for recursive structures where they exist it can help guide you toward a solution. Keep the build order in mind often, you'll build two totally different parts of the string concurrently. Use different nonterminals to represent different structures.

Next Time Turing Machines What does a computer with unbounded memory look like? How do you program them?