Context-Free Grammars

Similar documents
Context-Free Grammars

Context-Free Grammars

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Context Free Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages

Models of Computation II: Grammars and Pushdown Automata

CMSC 330: Organization of Programming Languages. Context Free Grammars

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Context-free grammars (CFGs)

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Languages and Compilers

COP 3402 Systems Software Syntax Analysis (Parser)

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Compiler Construction

Dr. D.M. Akbar Hussain

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Turing Machines Part Two

Multiple Choice Questions

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Ambiguous Grammars and Compactification

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Introduction to Parsing. Lecture 8

Homework & Announcements

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

14.1 Encoding for different models of computation

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Context-Free Grammars

CS103 Handout 14 Winter February 8, 2013 Problem Set 5

Context-Free Languages and Parse Trees

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Question Bank. 10CS63:Compiler Design

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

This book is licensed under a Creative Commons Attribution 3.0 License

Graph Theory Part Two

Decision Properties for Context-free Languages

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntax Analysis Part I

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

EECS 6083 Intro to Parsing Context Free Grammars

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

Defining syntax using CFGs

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Theory and Compiling COMP360

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS 314 Principles of Programming Languages

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

2.2 Syntax Definition

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

HW2 posted. Announcements

CSE302: Compiler Design

Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

CSE 105 THEORY OF COMPUTATION

Programming Lecture 3

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed)

Syntax. In Text: Chapter 3

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

CS103 Handout 13 Fall 2012 May 4, 2012 Problem Set 5

Part 5 Program Analysis Principles and Techniques

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Definition 2.8: A CFG is in Chomsky normal form if every rule. only appear on the left-hand side, we allow the rule S ǫ.

Chapter Seven: Regular Expressions

Chapter 3. Describing Syntax and Semantics ISBN

CS103 Handout 42 Spring 2017 May 31, 2017 Practice Final Exam 1

Context Free Languages and Pushdown Automata

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

15 212: Principles of Programming. Some Notes on Grammars and Parsing

Chapter 3. Describing Syntax and Semantics ISBN

CS103 Handout 29 Winter 2018 February 9, 2018 Inductive Proofwriting Checklist

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 3. Describing Syntax and Semantics

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

CSC-461 Exam #2 April 16, 2014

JNTUWORLD. Code No: R

Defining Languages GMU

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Optimizing Finite Automata

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION

Introduction to Parsing. Lecture 5

CSE 3302 Programming Languages Lecture 2: Syntax

Transcription:

Context-Free Grammars

Describing Languages We've seen two models for the regular languages: Finite automata accept precisely the strings in the language. Regular expressions describe precisely the strings in the language. Finite automata recognize strings in the language. Perform a computation to determine whether a specifc string is in the language. Regular expressions match strings in the language. Describe the general shape of all strings in the language.

Context-Free Grammars A context-free grammar (or CFG) is an entirely diferent formalism for defning a class of languages. Goal: Give a description of a language by recursively describing the structure of the strings in the language. CFGs are best explained by example...

Arithmetic Expressions Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division. Here is one possible CFG: E int E E Op E E (E) Op + Op - Op * Op / E E Op E E Op (E) E Op (E Op E) E * (E Op E) int * (E Op E) int * (int Op E) int * (int Op int) int * (int + int)

Arithmetic Expressions Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division. Here is one possible CFG: E int E E Op E E (E) Op + Op - Op * Op / E E Op E E Op int int Op int int / int

Context-Free Grammars Formally, a context-free grammar is a collection of four items: A set of nonterminal symbols (also called variables), A set of terminal symbols (the alphabet of the CFG) A set of production rules saying how each nonterminal can be replaced by a string of terminals and nonterminals, and A start symbol (which must be a nonterminal) that begins the derivation. E int E E Op E E (E) Op + Op - Op * Op /

Some CFG Notation In today s slides, capital letters in Bold Red Uppercase will represent nonterminals. e.g. A, B, C, D Lowercase letters in blue monospace will represent terminals. e.g. t, u, v, w Lowercase Greek letters in gray italics will represent arbitrary strings of terminals and nonterminals. e.g. α, γ, ω You don't need to use these conventions on your own; just make sure whatever you do is readable.

A Notational Shorthand E int E Op E (E) Op + - * /

Derivations E E Op E int (E) Op + * - / E E Op E E Op (E) E Op (E Op E) E * (E Op E) int * (E Op E) int * (int Op E) int * (int Op int) int * (int + int) A sequence of steps where nonterminals are replaced by the right-hand side of a production is called a derivation. If string α derives string ω, we write α * ω. In the example on the left, we see E * int * (int + int).

The Language of a Grammar If G is a CFG with alphabet Σ and start symbol S, then the language of G is the set L(G) = { ω Σ* S * ω } That is, ( L G) is the set of strings of terminals derivable from the start symbol.

If If G is is a CFG CFG with with alphabet alphabet Σ and and start start symbol symbol S, S, then then the the language of of G is is the the set set L(G) L(G) = { ω Σ* Σ* S * * ω } Consider the the following CFG CFG G over over Σ = {a, {a, b, b, c, c, d}: d}: S Sa Sa dt dt T btb btb c How How many many of of the the following strings are are in in L (( G)? G)? dca dca cad cad bcb bcb dtaa Answer Answer at at PollEv.com/cs103 PollEv.com/cs103 or or text text CS103 CS103 to to 22333 22333 once once to to join, join, then then a a number. number.

Context-Free Languages A language L is called a context-free language (or CFL) if there is a CFG G such that L = L( G). Questions: What languages are context-free? How are context-free and regular languages related?

From Regexes to CFGs CFGs consist purely of production rules of the form A ω. They do not have the regular expression operators * or. However, we can convert regular expressions to CFGs as follows: S a*b

From Regexes to CFGs CFGs consist purely of production rules of the form A ω. They do not have the regular expression operators * or. However, we can convert regular expressions to CFGs as follows: S Ab A Aa ε

From Regexes to CFGs CFGs consist purely of production rules of the form A ω. They do not have the regular expression operators * or. However, we can convert regular expressions to CFGs as follows: S a(b c*)

From Regexes to CFGs CFGs consist purely of production rules of the form A ω. They do not have the regular expression operators * or. However, we can convert regular expressions to CFGs as follows: S ax X b C C Cc ε

Regular Languages and CFLs Theorem: Every regular language is context-free. Proof Idea: Use the construction from the previous slides to convert a regular expression for L into a CFG for L. Problem Set 8 Exercise: Instead, show how to convert a DFA/NFA into a CFG.

The Language of a Grammar Consider the following CFG G: S asb ε What strings can this generate? a a a a b b b b L(G) = { a n b n n N }

Regular Languages CFLs All Languages

Why the Extra Power? Why do CFGs have more power than regular expressions? Intuition: Derivations of strings have unbounded memory. S asb ε a a a a b b b b

Time-Out for Announcements!

Midterm Exam Logistics The next midterm is tonight from 7:00PM 10:00PM. Locations are divvied up by last (family) name: A-I: Go to Cubberley Auditorium. J-Z: Go to Cemex Auditorium. The exam focuses on Lecture 06 13 (binary relations through induction) and PS3 PS5. Finite automata onward is not tested. Topics from earlier in the quarter (proofwriting, frst-order logic, set theory, etc.) are also fair game, but that s primarily because the later material builds on this earlier material. The exam is closed-book, closed-computer, and limitednote. You can bring a double-sided, 8.5 11 sheet of notes with you to the exam, decorated however you d like.

Our Advice Eat dinner tonight. You are not a brain in a jar. You are a rich, complex, beautiful biological system. Please take care of yourself. Read all the questions before diving into them. Tunnel vision can hurt you on an exam. There s evidence that spreading your time out leads to better outcomes. Refect on how far you ve come. How many of these questions would you have been able to understand two months ago? That s the mark that you re learning something!

Three Questions What is something you know now that, at the start of the quarter, you knew you didn t know? What is something you know now that, at the start of the quarter, you didn t know that you didn t know? What is something you don t know that, at the start of the quarter, you didn t know that you didn t know?

Back to CS103!

Designing CFGs Like designing DFAs, NFAs, and regular expressions, designing CFGs is a craft. When thinking about CFGs: Think recursively: Build up bigger structures from smaller ones. Have a construction plan: Know in what order you will build up the string. Store information in nonterminals: Have each nonterminal correspond to some useful piece of information.

Designing CFGs Let Σ = {a, b} and let L = {w Σ* w is a palindrome } We can design a CFG for L by thinking inductively: Base case: ε, a, and b are palindromes. If ω is a palindrome, then aωa and bωb are palindromes. No other strings are palindromes. S ε a b asa bsb

Designing CFGs Let Σ = {(, )} and let L = {w Σ* w is a string of balanced parentheses } Some sample strings in L: ((())) (())() (()())(()()) ((((()))(()))) ε ()()

Designing CFGs Let Σ = {(, )} and let L = {w Σ* w is a string of balanced parentheses } Let's think about this recursively. Base case: the empty string is a string of balanced parentheses. Recursive step: Look at the closing parenthesis that matches the frst open parenthesis. ((()(()))(()))(())((()))

Designing CFGs Let Σ = {(, )} and let L = {w Σ* w is a string of balanced parentheses } Let's think about this recursively. Base case: the empty string is a string of balanced parentheses. Recursive step: Look at the closing parenthesis that matches the frst open parenthesis. ( ( ) ( ( ) ) )( ( ) ) (())(( ( ) ))

Designing CFGs Let Σ = {(, )} and let L = {w Σ* w is a string of balanced parentheses } Let's think about this recursively. Base case: the empty string is a string of balanced parentheses. Recursive step: Look at the closing parenthesis that matches the frst open parenthesis. Removing the frst parenthesis and the matching parenthesis forms two new strings of balanced parentheses. S (S)S ε

Designing CFGs Let Σ = {a, b} and let L = {w Σ* w has the same number of a's and b's } How How many many of of the the following following CFGs CFGs have have language language L? L? S asb bsa ε S abs bas ε S absba basab ε S SbaS SabS ε Answer Answer at at PollEv.com/cs103 PollEv.com/cs103 or or text text CS103 CS103 to to 22333 22333 once once to to join, join, then then a a number. number.

Designing CFGs: A Caveat When designing a CFG for a language, make sure that it generates all the strings in the language and never generates a string outside the language. The frst of these can be tricky make sure to test your grammars! You'll design your own CFG for this language on Problem Set 8.

CFG Caveats II Is the following grammar a CFG for the language { a n b n n N }? S asb What strings in {a, b}* can you derive? Answer: None! What is the language of the grammar? Answer: Ø When designing CFGs, make sure your recursion actually terminates!

Designing CFGs When designing CFGs, remember that each nonterminal can be expanded out independently of the others. Let Σ = {a, } and let L = {an a n n N }. Is the following a CFG for L? S X X X ax ε S X X ax X aax X aa X aa ax aa a

Finding a Build Order Let Σ = {a, } and let L = {a n a n n N }. To build a CFG for L, we need to be more clever with how we construct the string. If we build the strings of a's independently of one another, then we can't enforce that they have the same length. Idea: Build both strings of a's at the same time. Here's one possible grammar based on that idea: S asa S asa aasaa aaasaaa aaa aaa

Function Prototypes Let Σ = {void, int, double, name, (, ),,, ;}. Let's write a CFG for C-style function prototypes! Examples: void name(int name, double name); int name(); int name(double name); int name(int, int name, int); void name(void);

Function Prototypes Here's one possible grammar: S Ret name (Args); Ret Type void Type int double Args ε void ArgList ArgList OneArg ArgList, OneArg OneArg Type Type name Fun question to think about: what changes would you need to make to support pointer types?

Summary of CFG Design Tips Look for recursive structures where they exist: they can help guide you toward a solution. Keep the build order in mind often, you'll build two totally diferent parts of the string concurrently. Usually, those parts are built in opposite directions: one's built left-to-right, the other right-to-left. Use diferent nonterminals to represent diferent structures.

Applications of Context-Free Grammars

CFGs for Programming Languages BLOCK STMT { STMTS } STMTS ε STMT STMTS STMT EXPR; if (EXPR) BLOCK while (EXPR) BLOCK do BLOCK while (EXPR); BLOCK EXPR identifier constant EXPR + EXPR EXPR EXPR EXPR * EXPR...

Grammars in Compilers One of the key steps in a compiler is fguring out what a program means. This is usually done by defning a grammar showing the high-level structure of a programming language. There are certain classes of grammars (LL(1) grammars, LR(1) grammars, LALR(1) grammars, etc.) for which it's easy to fgure out how a particular string was derived. Tools like yacc or bison automatically generate parsers from these grammars. Curious to learn more? Take CS143!

Natural Language Processing By building context-free grammars for actual languages and applying statistical inference, it's possible for a computer to recover the likely meaning of a sentence. In fact, CFGs were frst called phrase-structure grammars and were introduced by Noam Chomsky in his seminal work Syntactic Structures. They were then adapted for use in the context of programming languages, where they were called Backus- Naur forms. Stanford's CoreNLP project is one place to look for an example of this. Want to learn more? Take CS124 or CS224N!

Biography Minute: Noam Chomsky Invented CFGs! Helped found felds of linguistics and cognitive science PC: Hans Peters / Anefo (via Wikimedia) Today, perhaps more well known for political writing than linguistics Made it onto President Nixon s Enemies List Anti-capitalism, anti-imperialism, anti-war Drawing on linguistics expertise, written extensively on state propaganda (Manufacturing Consent)

Next Time Turing Machines What does a computer with unbounded memory look like? How would you program it?