Context-free grammars (CFGs)

Similar documents
HW2 posted. Announcements

Defining syntax using CFGs

CMSC 330: Organization of Programming Languages. Context Free Grammars

Context-Free Grammars

CMSC 330: Organization of Programming Languages

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Defining syntax using CFGs

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

A Simple Syntax-Directed Translator

CS 314 Principles of Programming Languages

CMSC 330: Organization of Programming Languages

Optimizing Finite Automata

Announcements. Working in pairs is only allowed for programming assignments and not for homework problems. H3 has been posted

CMSC 330: Organization of Programming Languages

Theory and Compiling COMP360

COP 3402 Systems Software Syntax Analysis (Parser)

Context-Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Syntax Analysis Part I

Properties of Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Parsing. Lecture 5

CS 406/534 Compiler Construction Parsing Part I

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Ambiguous Grammars and Compactification

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Parsing II Top-down parsing. Comp 412

Context-Free Languages and Parse Trees

Context-Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis Check syntax and construct abstract syntax tree

Lecture 8: Context Free Grammars

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

CSE302: Compiler Design

Grammars and Parsing. Paul Klint. Grammars and Parsing

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

Compilers Course Lecture 4: Context Free Grammars

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Parsing. Lecture 8

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

CSE 3302 Programming Languages Lecture 2: Syntax

Languages and Compilers

Introduction to Parsing. Lecture 5

Table-Driven Top-Down Parsers

Part 3. Syntax analysis. Syntax analysis 96

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Introduction to Parsing. Comp 412

2.2 Syntax Definition

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CS 4120 Introduction to Compilers

Syntax. In Text: Chapter 3

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

3. Parsing. Oscar Nierstrasz

CS 403: Scanning and Parsing

Context-Free Grammars

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 536 Midterm Exam Spring 2013

Part 5 Program Analysis Principles and Techniques

Project 1: Scheme Pretty-Printer

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Chapter 4. Lexical and Syntax Analysis

Introduction to Parsing

CSCI312 Principles of Programming Languages!

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Lexical Scanning COMP360

Lexical and Syntax Analysis. Top-Down Parsing

CSE 401 Midterm Exam Sample Solution 2/11/15

Dr. D.M. Akbar Hussain

EECS 6083 Intro to Parsing Context Free Grammars

Habanero Extreme Scale Software Research Project

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSCI312 Principles of Programming Languages

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Homework & Announcements

Reflection in the Chomsky Hierarchy

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Introduction to Parsing Ambiguity and Syntax Errors

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Syntax Analysis, VI Examples from LR Parsing. Comp 412

1 Parsing (25 pts, 5 each)

Last Time. What do we want? When do we want it? An AST. Now!

B The SLLGEN Parsing System

Programming Language Syntax and Analysis

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

4. Lexical and Syntax Analysis

Transcription:

Context-free grammars (CFGs)

Roadmap Last time RegExp == DFA Jlex: a tool for generating (Java code for) a lexer/scanner Mainly a collection of regexp, action pairs This time CFGs, the underlying abstraction for parsers Next week Java CUP: a tool for generating (Java code for) a parser Mainly a collection of CFG-rule, action pairs regexp : JLex :: CFG : Java CUP

RegExps Are Great! Perfect for tokenizing a language However, they have some limitations Can only define a limited family of languages Cannot use a RegExp to specify all the programming constructs we need No notion of structure Let s explore both of these issues

Limitations of RegExps Cannot handle matching E.g., language of balanced parentheses L () = { ( n ) n where n > 0} No DFA exists for this language Intuition: A given FSM only has a fixed, finite amount of memory For an FSM, memory = the states With a fixed, finite amount of memory, how could an FSM remember how many ( characters it has seen?

Theorem: No RegExp/DFA can describe the language L () Proof by contradiction: Suppose that there exists a DFA A for L () and A has N states A has to accept the string ( N ) N with some path q 0 q 1 q N q 2N+1 By the pigeonhole principle some state has to repeat: q i = q j for some i<j<n Therefore the run q 0 q 1 q i q j+1 q N q 2N+1 is also accepting A accepts the string ( N-(j-i) ) N L (), which is a contradiction!

Limitations of RegExps: No Structure Our Enhanced-RegExp scanner can emit a stream of tokens: X = Y + Z ID ASSIGN ID PLUS ID but this doesn t really enforce any order of operations

The Chomsky Hierarchy LANGUAGE CLASS: power efficiency Recursively enumerable Turing machine Context-Sensitive Context-Free Happy medium? Regular FSM Noam Chomsky

Context Free Grammars (CFGs) A set of (recursive) rewriting rules to generate patterns of strings Can envision a parse tree that keeps structure

CFG: Intuition S ( S ) A rule that says that you can rewrite S to be an S surrounded by a single set of parenthesis Before applying rule S After applying rule S ( S )

Context Free Grammars (CFGs) A CFG is a 4-tuple (N,Σ,P,S) N is a set of non-terminals, e.g., A, B, S, Σ is the set of terminals P is a set of production rules S N is the initial non-terminal symbol ( start symbol )

Context Free Grammars (CFGs) A CFG is a 4-tuple (N,Σ,P,S) N is a set of non-terminals, e.g., A, B, S Σ is the set of terminals P is a set of production rules S (in N) is the initial non-terminal symbol Placeholder / interior nodes in the parse tree Tokens from scanner Rules for deriving strings If not otherwise specified, use the non-terminal that appears on the LHS of the first production as the start

Production Syntax LHS RHS Single nonterminal symbol Expression: Sequence of terminals and nonterminals Examples: S à ( S ) S à ε

Production Shorthand Nonterm expression Nonterm ε equivalently: Nonterm expression ε equivalently: Nonterm expression ε S à ( S ) S à ε S à ( S ) ε S à ( S ) ε

Derivations To derive a string: Start by setting Current Sequence to the start symbol Repeatedly, Find a Nonterminal X in the Current Sequence Find a production of the form X α Apply the production: create a new current sequence in which α replaces X Stop when there are no more non-terminals This process derives a string of terminal symbols

Derivation Syntax We ll use the symbol for derives We ll use the symbol & for derives in one or more steps (also written as & ) We ll use the symbol for derives in zero or more steps (also written as )

An Example Grammar

Terminals begin end semicolon assign id plus An Example Grammar

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar

For readability, bold and lowercase Terminals begin Program end boundary semicolon assign id plus An Example Grammar

An Example Grammar For readability, bold and lowercase Terminals begin Program end boundary semicolon Represents ; assign Separates statements id plus

An Example Grammar For readability, bold and lowercase Terminals begin Program end boundary semicolon Represents ; assign Separates statements id plus Represents = in an assignment statement

An Example Grammar For readability, bold and lowercase Terminals begin Program end boundary semicolon Represents ; assign Separates statements id plus Represents = in an assignment statement Identifier / variable name

An Example Grammar For readability, bold and lowercase Terminals begin Program end boundary semicolon Represents ; assign Separates statements id plus Represents = in an assignment statement Identifier / variable name Represents + operator in an expression

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar Nonterminals Prog Stmts Stmt Expr

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts Stmt Expr

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt Expr

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt A single statement Expr

For readability, bold and lowercase Terminals begin end semicolon assign id plus An Example Grammar For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt A single statement Expr A mathematical expression

An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr Defines the syntax of legal programs Productions Prog begin Stmts end Stmts Stmts semicolon Stmt Stmt Stmt id assign Expr Expr id Expr plus id

An Example Grammar For readability, bold and lowercase Terminals begin Program end boundary semicolon Represents ; assign Separates statements id plus Represents = statement Identifier / variable name Represents + expression For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt A single statement Expr An expression Defines the syntax of legal programs Productions Prog begin Stmts end Stmts Stmts semicolon Stmt Stmt Stmt id assign Expr Expr id Expr plus id

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt 4. Stmt id assign Expr 5. Expr id 6. Expr plus id

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt 4. Stmt id assign Expr 5. Expr id 6. Expr plus id Derivation Sequence

Productions 1. Prog begin Stmts end Parse Tree 2. Stmts Stmts semicolon Stmt 3. Stmt 4. Stmt id assign Expr 5. Expr id 6. Expr plus id Derivation Sequence

Productions 1. Prog begin Stmts end Parse Tree 2. Stmts Stmts semicolon Stmt 3. Stmt 4. Stmt id assign Expr 5. Expr id 6. Expr plus id Derivation Sequence Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt 4. Stmt id assign Expr 5. Expr id 6. Expr plus id Parse Tree Prog Derivation Sequence Prog Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt 4. Stmt id assign Expr 5. Expr id 6. Expr plus id Parse Tree Prog Derivation Sequence Prog begin Stmts end 1 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin 4. Stmt id assign Expr 5. Expr id 6. Expr plus id Parse Tree Prog Stmts end Derivation Sequence Prog begin Stmts end 1 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts semicolon Stmt 6. Expr plus id Parse Tree Derivation Sequence Prog begin Stmts end 1 begin Stmts semicolon Stmt end 2 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts semicolon Stmt 6. Expr plus id Stmt Parse Tree Derivation Sequence Prog begin Stmts end 1 begin Stmts semicolon Stmt end begin Stmt semicolon Stmt end 2 3 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts semicolon Stmt 6. Expr plus id Stmt Parse Tree Derivation Sequence id Prog begin Stmts end 1 2 begin Stmts semicolon Stmt end 3 begin Stmt semicolon Stmt end begin id assign Expr semicolon Stmt end assign Expr 4 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts Parse Tree semicolon Stmt 6. Expr plus id Stmt id assign Expr Derivation Sequence id assign Prog begin Stmts end 1 2 begin Stmts semicolon Stmt end 3 begin Stmt semicolon Stmt end 4 begin id assign Expr semicolon Stmt end begin id assign Expr semicolon id assign Expr end Expr 4 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts Parse Tree semicolon Stmt 6. Expr plus id Stmt id assign Expr Derivation Sequence id assign Prog begin Stmts end 1 2 begin Stmts semicolon Stmt end 3 begin Stmt semicolon Stmt end 4 begin id assign Expr semicolon Stmt end begin id assign Expr semicolon id assign Expr end begin id assign id semicolon id assign Expr end Expr id 4 5 Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts Parse Tree semicolon Stmt 6. Expr plus id Stmt id assign Expr Derivation Sequence id assign Expr Prog begin Stmts end 1 2 begin Stmts semicolon Stmt end id 3 begin Stmt semicolon Stmt end 4 4 begin id assign Expr semicolon Stmt end 5 begin id assign Expr semicolon id assign Expr end begin id assign id semicolon id assign Expr end begin id assign id semicolon id assign Expr plus id end 6 Expr plus id Key terminal Nonterminal Rule used

Productions 1. Prog begin Stmts end 2. Stmts Stmts semicolon Stmt 3. Stmt begin Prog Stmts end 4. Stmt id assign Expr 5. Expr id Stmts Parse Tree semicolon Stmt 6. Expr plus id Stmt id assign Expr Derivation Sequence id assign Expr Prog begin Stmts end 1 2 begin Stmts semicolon Stmt end id 3 begin Stmt semicolon Stmt end 4 4 begin id assign Expr semicolon Stmt end 5 begin id assign Expr semicolon id assign Expr end begin id assign id semicolon id assign Expr end begin id assign id semicolon id assign Expr plus id end begin id assign id semicolon id assign id plus id end 5 6 Expr plus id id Key terminal Nonterminal Rule used

A five minute introduction MAKEFILE

Makefiles: Motivation Typing the series of commands to generate our code can be tedious Multiple steps that depend on each other Somewhat complicated commands May not need to rebuild everything Makefiles solve these issues Record a series of commands in a script-like DSL Specify dependency rules and Make generates the results

Makefiles: Basic Structure <target>: <dependency list> (tab) <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java

Makefiles: Basic Structure <target>: <dependency list> (tab) <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java

Makefiles: Basic Structure <target>: <dependency list> (tab) <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java Example.class depends on example.java and IO.class

Makefiles: Basic Structure <target>: <dependency list> (tab) <command to satisfy target> Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java Example.class depends on example.java and IO.class Example.class is generated by javac Example.java

Makefiles: Dependencies Example.class Example.java IO.class Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java IO.java

Makefiles: Dependencies Internal Dependency graph Example.class Example.java IO.class Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java IO.java

Makefiles: Dependencies Internal Dependency graph Example.class Example.java Example Example.class: Example.java IO.class javac Example.java IO.class: IO.java javac IO.java IO.class IO.java A file is rebuilt if one of it s dependencies changes

Makefiles: Variables You can thread common configuration values through your makefile

Makefiles: Variables You can thread common configuration values through your makefile Example JC = /s/std/bin/javac JFLAGS = -g

Makefiles: Variables You can thread common configuration values through your makefile Example JC = /s/std/bin/javac JFLAGS = -g Build for debug

Makefiles: Variables You can thread common configuration values through your makefile Example JC = /s/std/bin/javac JFLAGS = -g Build for debug Example.class: Example.java IO.class $(JC) $(JFLAGS) Example.java IO.class: IO.java $(JC) $(JFLAGS) IO.java

Makefiles: Phony Targets You can run commands via make Write a target with no dependencies (called phony) Will cause it to execute the command every time Example clean: rm f *.class test: java cp. Test.class

Makefiles: Phony Targets You can run commands via make Write a target with no dependencies (called phony) Will cause it to execute the command every time Example clean: rm f *.class test: java cp. Test.class

Makefiles: Phony Targets You can run commands via make Write a target with no dependencies (called phony) Will cause it to execute the command every time Example clean: rm f *.class test: java cp. Test.class

Recap We ve defined context-free grammars More powerful than regular expressions Learned a bit about makefiles Next time: we ll look at grammars in more detail