CMPT 755 Compilers. Anoop Sarkar.

Similar documents
Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Context-Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

CSE 3302 Programming Languages Lecture 2: Syntax

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

3. Context-free grammars & parsing

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Defining syntax using CFGs

Dr. D.M. Akbar Hussain

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

EECS 6083 Intro to Parsing Context Free Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSCI312 Principles of Programming Languages!

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Context-free grammars (CFG s)

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Syntax Analysis Part I

Syntax. In Text: Chapter 3

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Syntax Analysis Check syntax and construct abstract syntax tree

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CS 314 Principles of Programming Languages

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

A programming language requires two major definitions A simple one pass compiler

Natural Language Processing

Syntax. A. Bellaachia Page: 1

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Habanero Extreme Scale Software Research Project

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Introduction to Lexical Analysis

Introduction to Syntax Analysis

CMSC 330: Organization of Programming Languages

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

CPS 506 Comparative Programming Languages. Syntax Specification

CSE302: Compiler Design

CSCE 314 Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax Analysis. Chapter 4

Introduction to Syntax Analysis. The Second Phase of Front-End

Part 3. Syntax analysis. Syntax analysis 96

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Lexical and Syntax Analysis. Top-Down Parsing

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Context free grammars and predictive parsing

CSCI312 Principles of Programming Languages!

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Parsing II Top-down parsing. Comp 412

CMSC 330: Organization of Programming Languages

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

A simple syntax-directed

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Wednesday, August 31, Parsers

Introduction to Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

A Simple Syntax-Directed Translator

Chapter 3. Describing Syntax and Semantics ISBN

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Normal Forms for CFG s. Eliminating Useless Variables Removing Epsilon Removing Unit Productions Chomsky Normal Form

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntax and Grammars 1 / 21


Introduction to Parsing Ambiguity and Syntax Errors

Introduction to Lexing and Parsing

Building Compilers with Phoenix

Describing Syntax and Semantics

Introduction to Parsing Ambiguity and Syntax Errors

Non-deterministic Finite Automata (NFA)

3. Parsing. Oscar Nierstrasz

Syntax and Semantics

Compilers Course Lecture 4: Context Free Grammars

CS 406/534 Compiler Construction Parsing Part I

Lexical and Syntax Analysis

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Parsing. Lecture 8

Homework & Announcements

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Transcription:

CMPT 755 Compilers Anoop Sarkar http://www.cs.sfu.ca/~anoop

Parsing source program Lexical Analyzer token next() Parser parse tree Later Stages Lexical Errors Syntax Errors

Context-free Grammars Set of rules by which valid sentences can be constructed. Example: Sentence Noun Verb Object Noun trees compilers Verb are grow Object on Noun Adjective Adjective slowly interesting What strings can Sentence derive? Syntax only no semantic checking

Derivations of a CFG compilers grow on trees compilers grow on Noun compilers grow Object compilers Verb Object Noun Verb Object Sentence

Derivations and parse trees

Why use grammars for PL? Precise, yet easy-to-understand specification of language Construct parser automatically Detect potential problems Structure and simplify remaining compiler phases Allow for evolution

CFG Notation A reference grammar is a concise description of a context-free grammar For example, a reference grammar can use regular expressions on the right hand sides of CFG rules Can even use ideas like comma-separated lists to simplify the reference language definition

Writing a CFG for a PL First write (or read) a reference grammar of what you want to be valid programs For now, we only worry about the structure, so the reference grammar might choose to overgenerate in certain cases (e.g. bool x = 20; ) Convert the reference grammar to a CFG Certain CFGs might be easier to work with than others (this is the essence of the study of CFGs and their parsing algorithms for compilers)

CFG Notation Normal CFG notation E E * E E E + E Backus Naur notation E ::= E * E E + E (an or-list of right hand sides)

Parse Trees for programs

Arithmetic Expressions E E + E E E * E E ( E ) E - E E id

Leftmost derivations for id + id * id E E + E id + E E id + E * E id + id * E E + E id + id * id id E * E id id

Leftmost derivations for id + id * id E E * E E + E * E id + E * E id + id * E id + id * id E E E * E + E id id id

Ambiguity Grammar is ambiguous if more than one parse tree is possible for some sentences Examples in English: Two sisters reunited after 18 years in checkout counter Ambiguity is not acceptable in PL Unfortunately, it s undecidable to check whether a grammar is ambiguous

Ambiguity Alternatives Massage grammar to make it unambiguous Rely on default parser behavior Augment parser Consider the original ambiguous grammar: E E + E E ( E ) E E * E E - E E id How can we change the grammar to get only one tree for the input id + id * id

Dangling else ambiguity Original Grammar (ambiguous) Stmt if Expr then Stmt else Stmt Stmt if Expr then Stmt Stmt Other Unambiguous grammar Stmt MatchedStmt Stmt UnmatchedStmt MatchedStmt if Expr then MatchedStmt else MatchedStmt MatchedStmt Other UnmatchedStmt if Expr then Stmt UnmatchedStmt if Expr then MatchedStmt else UnmatchedStmt

Dangling else ambiguity Original Grammar (ambiguous) Stmt if Expr then Stmt else Stmt Stmt if Expr then Stmt Stmt Other Modified Grammar (unambiguous?) Stmt if Expr then Stmt Stmt MatchedStmt MatchedStmt if Expr then MatchedStmt else Stmt MatchedStmt Other

Other Ambiguous Grammars Consider the grammar R R R R R R * ( R ) a b What does this grammar generate? What s the parse tree for a b*a Is this grammar ambiguous?

Left Factoring Original Grammar (ambiguous) Stmt if Expr then Stmt else Stmt Stmt if Expr then Stmt Stmt Other Left-factored Grammar (still ambiguous): Stmt if Expr then Stmt OptElse Stmt Other OptElse else Stmt ε

Left Factoring In general, for rules Left factoring is achieved by the following grammar transformation:

Grammar Transformations G is converted to G s.t. L(G ) = L(G) Left Factoring Removing cycles: A + A Removing ε-rules of the form A ε Eliminating left recursion Conversion to normal forms: Chomsky Normal Form, A B C and A a Greibach Normal Form, A a β

Eliminating Left Recursion Simple case, for left-recursive pair of rules: Replace with the following rules: Elimination of immediate left recursion

Eliminating Left Recursion Example: E E + T, E T Without left recursion: E T E 1, E 1 + T E 1, E 1 ε Simple algorithm doesn t work for 2-step recursion: S A a, S b A A c, A S d, A ε

Eliminating Left Recursion Problem CFG: S A a, S b A A c, A S d, A ε Expand possibly left-recursive rules: S A a, S b A A c, A A a d, A b d, A ε Eliminate immediate left-recursion S A a, S b A b d A 1, A A 1, A 1 c A 1, A 1 a d A 1, A 1 ε

Eliminating Left Recursion We cannot use the algorithm if the nonterminal also derives epsilon. Let s see why: A AAa b ε Using the standard lrec removal algorithm: A ba 1 A 1 A 1 AaA 1 ε

Eliminating Left Recursion First we eliminate the epsilon rule: A AAa b ε Since A is the start symbol, create a new start symbol to generate the empty string: A 1 A ε A AAa Aa a b Now we can do the usual lrec algorithm: A 1 A ε A aa 2 ba 2 A 2 AaA 2 aa 2 ε

Non-CF Languages The pumping lemma for CFLs [Bar-Hillel] is similar to the pumping lemma for RLs For a string wuxvy in a CFL for u,v ε and the string is long enough then wu n xv n y is also in the CFL for n 0 Not strong enough to work for every non- CF language (cf. Ogden s Lemma)

Non-CF Languages

CF Languages

Summary CFGs can be used describe PL Derivations correspond to parse trees Parse trees represent structure of programs Ambiguous CFGs exist Some forms of ambiguity can be fixed by changing the grammar Grammars can be simplified by left-factoring Left recursion in a CFG can be eliminated