CMSC 330: Organization of Programming Languages

Similar documents
CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

CMSC 330: Organization of Programming Languages

Part 5 Program Analysis Principles and Techniques

Dr. D.M. Akbar Hussain

CSE 3302 Programming Languages Lecture 2: Syntax

COP 3402 Systems Software Syntax Analysis (Parser)

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CPS 506 Comparative Programming Languages. Syntax Specification

Syntax. In Text: Chapter 3

Principles of Programming Languages COMP251: Syntax and Grammars

A Simple Syntax-Directed Translator

Optimizing Finite Automata

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

A programming language requires two major definitions A simple one pass compiler

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Defining syntax using CFGs

Part 3. Syntax analysis. Syntax analysis 96

Monday, September 13, Parsers

3. Parsing. Oscar Nierstrasz

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSCE 314 Programming Languages

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 3. Describing Syntax and Semantics ISBN

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS 314 Principles of Programming Languages

CSE302: Compiler Design

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Introduction to Syntax Analysis. The Second Phase of Front-End

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Lecture 8: Context Free Grammars

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-Free Grammar (CFG)

Wednesday, August 31, Parsers

Introduction to Syntax Analysis

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Context-Free Grammars

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Programming Language Syntax and Analysis

Parsing II Top-down parsing. Comp 412

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CS 406/534 Compiler Construction Parsing Part I

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Context-Free Grammars

Introduction to Lexing and Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Syntax Analysis Check syntax and construct abstract syntax tree

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Introduction to Parsing. Lecture 8

Context-Free Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

EECS 6083 Intro to Parsing Context Free Grammars

Syntax-Directed Translation. Lecture 14

Introduction to Parsing. Lecture 5

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Types of parsing. CMSC 430 Lecture 4, Page 1

Describing Syntax and Semantics

Languages and Compilers

Compilers and computer architecture From strings to ASTs (2): context free grammars

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Introduction to Parsing

A simple syntax-directed

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Properties of Regular Expressions and Finite Automata

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Compiler Construction

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Concepts Introduced in Chapter 4

Context-Free Languages and Parse Trees

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Context-free grammars (CFG s)

Chapter 4. Lexical and Syntax Analysis

Transcription:

CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1

Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter 2

Front End Source Parser Static Analyzer Parse Tree Intermediate Representation Front End! Responsible for turning source code (sequence of symbols) into representation of program structure! Parser generates parse trees, which static analyzer processes 3

Parser Architecture Token Source Scanner Parser Stream Parse Tree Parser! Scanner / lexer converts sequences of symbols into tokens (keywords, variable names, operators, numbers, etc.)! Parser (yes, a parser contains a parser!) converts sequences of tokens into parse tree; implements a context free grammar We can t do everything as a regular expression not powerful enough 4

Context Free Grammar (CFG)! A way of describing sets of strings (= languages) The notation [[S]] denotes the language of strings defined by S! Example grammar: S 0S 1S ε String s [[S]] iff s = ε, or s [[S]] such that s = 0s, or s = 1s! Grammar is same as regular expression (0 1)* Generates / accepts the same set of strings 5

Context-Free Grammars (CFGs)! But CFGs can do what regexps cannot S ( S ) ε // represents balanced pairs of ( ) s! In fact, CFGs subsume REs, DFAs, NFAs There is a CFG that generates any regular language But REs are a better notation for regular languages! CFGs can specify programming language syntax CFGs (mostly) describe the parsing process 6

Formal Definition: Context-Free Grammar! A CFG G is a 4-tuple (Σ, N, P, S) Σ alphabet (finite set of symbols, or terminals) Often written in lowercase N a finite, nonempty set of nonterminal symbols Often written in uppercase It must be that N Σ = P a set of productions of the form N (Σ N)* Informally: the nonterminal can be replaced by the string of zero or more terminals / nonterminals to the right of the Can think of productions as rewriting rules (more later) S N the start symbol 7

Backus-Naur Form! Context-free grammar production rules are also called Backus-Naur Form or BNF A production like A B c D is written in BNF as <A> ::= <B> c <D> (Non-terminals written with angle brackets and ::= instead of ) Often used to describe language syntax! BNF was designed by John Backus Chair of the Algol committee in the early 1960s Peter Naur Secretary of the committee, who used this notation to describe Algol in 1962 8

Generating strings! We can think of a grammar as generating strings by rewriting! Example grammar: S 0S 1S ε S 0S // using S 0S 01S // using S 1S 011S // using S 1S 011 // using S ε 9

Accepting strings (informally)! Determining if s [[S]] is called acceptance: goal is to find a rewriting from S that yields s 011 [[S]] according to the previous rewriting A rewriting is some sequence of productions (rewrites) applied starting at the start symbol! Terminology Such a sequence of rewrites is a derivation or parse Discovering the derivation is called parsing 10

Derivations! Notation indicates a derivation of one step + indicates a derivation of one or more steps * indicates a derivation of zero or more steps! Example S 0S 1S ε! For the string 010 S 0S 01S 010S 010 S + 010 S * S 11

Language Generated by Grammar! L(G) (a.k.a. [[G]]), the language defined by G is [[G]] = L(G) = { ω Σ* S + ω } S is the start symbol of the grammar Σ is the alphabet for that grammar! In other words All strings over Σ that can be derived from the start symbol via one or more productions 12

Practice! Try to make a grammar which accepts 0* 1* 0 n 1 n where n 0 0 n 1 m where m n S A B A 0A ε B 1B ε S 0S1 ε S 0S1 0S ε! Give some example strings from this language S 0 1S 0, 10, 110, 1110, 11110, What language is it? 1*0 13

Example: Arithmetic Expressions! E a b c E+E E-E E*E (E) An expression E is either a letter a, b, or c Or an E followed by + followed by an E etc! This describes (or generates) a set of strings {a, b, c, a+b, a+a, a*c, a-(b*a), c*(b + a), }! Example strings not in the language d, c(a), a+, b**c, etc. 14

Formal Description of Example! Formally, the grammar we just showed is Σ = { +, -, *, (, ), a, b, c } // terminals N = { E } // nonterminals P = { E a, E b, E c, // productions E E-E, E E+E, E E*E, E (E) } S = E // start symbol 15

(Non-)Uniqueness of Grammars! Different grammars generate the same set of strings (language)! The following grammar generates the same set of strings as the previous grammar E E+T E-T T T T*P P P (E) a b c 16

Notational Shortcuts! A production is of the form left-hand side (LHS) right hand side (RHS)! If not specified Assume LHS of first listed production is the start symbol! Productions with the same LHS Are usually combined with! If a production has an empty RHS It means the RHS is ε S ABC // S is start symbol A aa b // A b // A ε 17

Practice! Given the grammar S as T T bt U U cu ε Provide derivations for the following strings b ac bbc Does the grammar generate the following? S + ccc S + bab S T bt bu b S as at au acu ac S T bt bbt bbu bbcu bbc Yes No S + bs S + Ta No No 18

Practice! Given the grammar S as T T bt U U cu ε Name language accepted by grammar a*b*c* Give a different grammar accepting language S ABC A aa ε // a* B bb ε // b* C cc ε // c* 19

Parse Trees! Parse tree shows how a string is produced by a grammar Root node is the start symbol Every internal node is a nonterminal Children of an internal node Are symbols on RHS of production applied to nonterminal Every leaf node is a terminal or ε! Reading the leaves left to right Shows the string corresponding to the tree 20

Parse Tree Example S as at au acu ac S as T T bt U U cu ε 21

Parse Trees for Expressions! A parse tree shows the structure of an expression as it corresponds to a grammar E a b c d E+E E-E E*E (E) a a*c c*(b+d) 22

Practice E a b c d E+E E-E E*E (E) Make a parse tree for a*b a+(b-c) d*(d+b)-a (a+b)*(c-d) a+(b-c)*d 23

Ambiguity! A grammar is ambiguous if a string may have multiple derivations Equivalent to multiple parse trees Can be hard to determine 1. S as T T bt U U cu ε 2. S T T T Tx Tx x x 3. S SS () (S) No Yes? 24

Ambiguity (cont.)! Example Grammar: S SS () (S) String: ()()() 2 distinct (leftmost) derivations (and parse trees) S SS SSS ()SS ()()S ()()() S SS ()S ()SS ()()S ()()() 25

CFGs for Programming Languages! Recall that our goal is to describe programming languages with CFGs! We had the following example which describes limited arithmetic expressions E a b c E+E E-E E*E (E)! What s wrong with using this grammar? It s ambiguous! 26

Example: a-b-c E E-E a-e a-e-e a-b-e a-b-c E E-E E-E-E a-e-e a-b-e a-b-c Corresponds to a-(b-c) Corresponds to (a-b)-c 27

Example: a-b*c E E-E a-e a-e*e a-b*e a-b*c E E-E E-E*E a-e*e a-b*e a-b*c * * Corresponds to a-(b*c) Corresponds to (a-b)*c 28

Another Example: If-Then-Else <stmt> <assignment> <if-stmt>... <if-stmt> if (<expr>) <stmt> if (<expr>) <stmt> else <stmt> (Note < > s are used to denote nonterminals)! Consider the following program fragment if (x > y) if (x < z) a = 1; else a = 2; (Note: Ignore newlines) 29

Parse Tree #1! Else belongs to inner if 30

Parse Tree! Else belongs to outer if 31

Dealing With Ambiguous Grammars! Ambiguity is bad Syntax is correct But semantics differ depending on choice Different associativity (a-b)-c vs. a-(b-c) Different precedence Different control flow (a-b)*c vs. a-(b*c) if (if else) vs. if (if) else! Two approaches Rewrite grammar Use special parsing rules Depending on parsing method (learn in CMSC 430) 32

Fixing the Expression Grammar! Require right operand to not be bare expression E E+T E-T E*T T T a b c (E)! Corresponds to left-associativity! Now only one parse tree for a-b-c Find derivation 33

What if We Wanted Right-Associativity?! Left-recursive productions Used for left-associative operators Example E E+T E-T E*T T T a b c (E)! Right-recursive productions Used for right-associative operators Example E T+E T-E T*E T T a b c (E) 34

Parse Tree Shape! The kind of recursion determines the shape of the parse tree left recursion right recursion 35

A Different Problem! How about the string a+b*c? E E+T E-T E*T T T a b c (E)! Doesn t have correct precedence for * When a nonterminal has productions for several operators, they effectively have the same precedence 36

Final Expression Grammar E E+T E-T T lowest precedence operators T T*P P higher precedence P a b c (E) highest precedence (parentheses)! Practice Construct tree and left and and right derivations for a+b*c a*(b+c) a*b+c a-b-c See what happens if you change the last set of productions to P a b c E (E) See what happens if you change the first set of productions to E E +T E-T T P 37

Tips for Designing Grammars 1. Use recursive productions to generate an arbitrary number of symbols A xa ε // Zero or more x s A ya y // One or more y s 2. Use separate non-terminals to generate disjoint parts of a language, and then combine in a production { a*b* } // a s followed by b s S AB A aa ε B bb ε // Zero or more a s // Zero or more b s 38

Tips for Designing Grammars (cont.) 3. To generate languages with matching, balanced, or related numbers of symbols, write productions which generate strings from the middle {a n b n n 0} // N a s followed by N b s S asb ε Example derivation: S asb aasbb aabb {a n b 2n n 0} // N a s followed by 2N b s S asbb ε Example derivation: S asbb aasbbbb aabbbb 39

Tips for Designing Grammars (cont.) 4. For a language that is the union of other languages, use separate nonterminals for each part of the union and then combine { a n (b m c m ) m > n 0} Can be rewritten as { a n b m m > n 0} { a n c m m > n 0} S T V T atb U U Ub b V avc W W Wc c 40

Recall: Steps of Compilation 41

Implementing Parsers! Many efficient techniques for parsing I.e., turning strings into parse trees Examples LL(k), SLR(k), LR(k), LALR(k) Take CMSC 430 for more details! One simple technique: recursive descent parsing This is a top-down parsing algorithm Other algorithms are bottom-up 42

Top-Down Parsing E id = n { L } L E ; L ε (Assume: id is variable name, n is integer) Show parse tree for { x = 3 ; { y = 4 ; } ; } E L E L E L L ε E L ε { x = 3 ; { y = 4 ; } ; } 43

Example: Recursive Descent Parsing! Goal Determine if we can produce the string to be parsed from the grammar's start symbol! Approach Construct a parse tree: starting with start symbol at root, recursively replace nonterminal with RHS of production! Key question: which production to use? There can be many productions for each nonterminal We could try them all, backtracking if we are unsuccessful But this is slow!! Answer: lookahead Keep track of next token on the input string to be processed Use this to guide selection of production 44

Bottom-up Parsing E id = n { L } L E ; L ε Show parse tree for { x = 3 ; { y = 4 ; } ; } Note that final trees constructed are same as for top-down; only order in which nodes are added to tree is different E L E L E L L ε E L ε { x = 3 ; { y = 4 ; } ; } 45

Example: Shift-reduce parsing! Replaces RHS of production with LHS (nonterminal)! Example grammar S aa, A Bc, B b! Example parse abc abc aa S Derivation happens in reverse! Something to look forward to in CMSC 430! Complicated to use; requires tool support Bison, yacc produce shift-reduce parsers from CFGs 46

Tradeoffs! Recursive descent parsers Easy to write The formal definition is a little clunky, but if you follow the code then it s almost what you might have done if you weren't told about grammars formally Fast Can be implemented with a simple table! Shift-reduce parsers handle more grammars But are slower! Most languages use hacked parsers (!) Strange combination of the two 47

What s Wrong With Parse Trees?! Parse trees contain too much information Example Parentheses Extra nonterminals for precedence This extra stuff is needed for parsing! But when we want to reason about languages Extra information gets in the way (too much detail) 48

Abstract Syntax Trees (ASTs)! An abstract syntax tree is a more compact, abstract representation of a parse tree, with only the essential parts parse tree AST 49

Abstract Syntax Trees (cont.)! Intuitively, ASTs correspond to the data structure you d use to represent strings in the language Note that grammars describe trees So do OCaml datatypes (which we ll see later) E a b c E+E E-E E*E (E) 50

The Compilation Process 51