CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Similar documents
Introduction to Parsing

CS 406/534 Compiler Construction Parsing Part I

EECS 6083 Intro to Parsing Context Free Grammars

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Parsing II Top-down parsing. Comp 412

Introduction to Parsing. Comp 412

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Part 5 Program Analysis Principles and Techniques

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Parsing II Top-down parsing. Comp 412

CS 314 Principles of Programming Languages

Goals for course What is a compiler and briefly how does it work? Review everything you should know from 330 and before

Parsing Part II (Top-down parsing, left-recursion removal)

Syntax Analysis, III Comp 412

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Syntax Analysis Check syntax and construct abstract syntax tree

CSCI312 Principles of Programming Languages

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Front End. Hwansoo Han

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis, III Comp 412

Syntax Analysis Part I

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CSE 3302 Programming Languages Lecture 2: Syntax

Syntactic Analysis. Top-Down Parsing

Defining syntax using CFGs

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compiler Design Concepts. Syntax Analysis

CS415 Compilers. Lexical Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

Syntax Analysis, VI Examples from LR Parsing. Comp 412

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax. In Text: Chapter 3

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Compilers Course Lecture 4: Context Free Grammars

Lexical Analysis. Introduction

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Computer Science 160 Translation of Programming Languages

CMSC 330: Organization of Programming Languages

Homework & Announcements

CS 406/534 Compiler Construction Parsing Part II LL(1) and LR(1) Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Introduction to Syntax Analysis

Introduction to Syntax Analysis. The Second Phase of Front-End

Habanero Extreme Scale Software Research Project

Dr. D.M. Akbar Hussain

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Languages and Compilers


The View from 35,000 Feet

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Lexical and Syntax Analysis. Top-Down Parsing

A Simple Syntax-Directed Translator

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Introduction to Parsing. Lecture 5

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter UW CSE P 501 Winter 2016 C-1

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Building a Parser Part III

Chapter 4: LR Parsing

CPS 506 Comparative Programming Languages. Syntax Specification

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

CMSC 330: Organization of Programming Languages. Context Free Grammars

Monday, September 13, Parsers

Lexical and Syntax Analysis

Optimizing Finite Automata

Wednesday, August 31, Parsers

Compiler Construction 2016/2017 Syntax Analysis

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

A programming language requires two major definitions A simple one pass compiler

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Context-Free Languages and Parse Trees

COP 3402 Systems Software Syntax Analysis (Parser)

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Building Compilers with Phoenix

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CMSC 330: Organization of Programming Languages

Introduction to Parsing. Lecture 5

VIVA QUESTIONS WITH ANSWERS

Transcription:

CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Limits of Regular Languages Advantages of Regular Expressions Simple & powerful notation for specifying patterns Automatic construction of fast recognizers Many kinds of syntax can be specified with REs Example an expression grammar Term [a-za-z] ([a-za-z] [0-9]) * Op + - * / Expr ( Term Op ) * Term Of course, this would generate a DFA If REs are so useful Why not use them for everything? 2

Limits of Regular Languages Not all languages are regular RL s CFL s CSL s You cannot construct DFA s to recognize these languages L = { p k q k } (parenthesis languages) L = { wcw r w Σ * } Neither of these is a regular language (nor an RE) But, this is a little subtle. You can construct DFA s for Strings with alternating 0 s and 1 s ( ε 1 ) ( 01 ) * ( ε 0 ) Strings with odd and even number of 0 s and 1 s Strings of bit patterns that represent binary numbers which are divisible by 5 3

Parsing (Syntax Analysis) EAC Chapters 3.1-3.2

Review: The Front End Source code Scanner tokens Parser IR Errors Parser Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness Determines if the input is syntactically well formed Guides checking at deeper levels than syntax Builds an IR representation of the code 5

The Study of Parsing The process of discovering a derivation for some sentence Need a mathematical model of syntax a grammar G Need an algorithm for testing membership in L(G) Need to keep in mind that our goal is building parsers, not studying the mathematics of arbitrary languages Roadmap 1 Context-free grammars and derivations 2 Top-down parsing LL(1) parsers, hand-coded recursive descent parsers 3 Bottom-up parsing Automatically generated LR(1) parsers 6

Specifying Syntax with a Grammar Context-free syntax is specified with a context-free grammar SheepNoise SheepNoise baa baa This CFG defines the set of noises sheep normally make It is written in a variant of Backus Naur form Formally, a grammar is a four tuple, G = (S,N,T,P) S is the start symbol (set of strings in L(G)) N is a set of non-terminal symbols (syntactic variables) T is a set of terminal symbols (words) P is a set of productions or rewrite rules (P : N (N T) + ) 7

Deriving Syntax We can use the SheepNoise grammar to create sentences use the productions as rewriting rules Rule Sentential Form SheepNoise 2 baa Rule Sentential Form SheepNoise 1 SheepNoise baa 2 baa baa Rule Sentential Form SheepNoise 1 SheepNoise baa 1 SheepNoise baa baa 2 baa baa baa And so on... 8

A More Useful Grammar To explore the uses of CFGs,we need a more complex grammar 1 Expr Expr Op Expr 2 number 3 id 4 Op + 5 6 * 7 / Rule Expr Sentential Form 1 Expr Op Expr 3 <id,x> Op Expr 5 <id,x> Expr 1 <id,x> Expr Op Expr 2 <id,x> <num,2> Op Expr 6 <id,x> <num,2> * Expr 3 <id,x> <num,2> * <id,y> Such a sequence of rewrites is called a derivation Process of discovering a derivation is called parsing We denote this derivation: Expr * id num * id 9

Derivations At each step, we choose a non-terminal to replace Different choices can lead to different derivations Two derivations are of interest Leftmost derivation replace leftmost NT at each step; generates left sentential forms ( * lm ) Rightmost derivation replace rightmost NT at each step; generates right sentential forms ( * rm ) These are the two systematic derivations (We don t care about randomly-ordered derivations!) The example on the preceding slide was a leftmost derivation Of course, there is also a rightmost derivation Interestingly, the resulting parse trees may be different 10

Parse Trees Rule in our grammar: E E Op E A single derivation step E E Op E can be represented as a tree structure with the left-hand side non-terminal as the root, and all right-hand side symbols as the children (ordered left to right). E E Op E The entire derivation of a sentence in the language can be represented as a parse tree with the start symbol as its root, and leave nodes that are all terminal symbols. NOTE: The structure of the parse tree has semantic significance! 11

The Two Derivations for x 2 * y Rule Sentential Form Expr 1 Expr Op Expr 3 <id,x> Op Expr 5 <id,x> Expr 1 <id,x> Expr Op Expr 2 <id,x> <num,2> Op Expr 6 <id,x> <num,2> * Expr 3 <id,x> <num,2> * <id,y> Leftmost derivation Rule Sentential Form Expr 1 Expr Op Expr 3 Expr Op <id,y> 6 Expr * <id,y> 1 Expr Op Expr * <id,y> 2 Expr Op <num,2> * <id,y> 5 Expr <num,2> * <id,y> 3 <id,x> <num,2> * <id,y> Rightmost derivation In both cases, Expr * id num * id The two derivations produce different parse trees The parse trees imply different evaluation orders! 12

Derivations and Parse Trees Leftmost derivation Rule Sentential Form G Expr 1 Expr Op Expr 3 <id,x> Op Expr E 5 <id,x> Expr 1 <id,x> Expr Op Expr 2 <id,x> <num,2> Op Expr E Op E 6 <id,x> <num,2> * Expr 3 <id,x> <num,2> * <id,y> x E Op E This evaluates as x ( 2 * y ) 2 * y 13

Derivations and Parse Trees This corresponds to our rightmost derivation. Can we get this with another leftmost derivation as well? G E E Op E E Op E * y This evaluates as ( x 2 ) * y x 2 14

Two Leftmost Derivations for x 2 * y The Difference: Ø Different productions chosen on the second step Rule Sentential Form Expr 1 Expr Op Expr 3 <id,x> Op Expr 5 <id,x> Expr 1 <id,x> Expr Op Expr 2 <id,x> <num,2> Op Expr 6 <id,x> <num,2> * Expr 3 <id,x> <num,2> * <id,y> Rule Sentential Form Expr 1 Expr Op Expr 1 Expr Op Expr Op Expr 3 <id,x> Op Expr Op Expr 5 <id,x> Expr Op Expr 2 <id,x> <num,2> Op Expr 6 <id,x> <num,2> * Expr 3 <id,x> <num,2> * <id,y> Original choice New choice Ø Both derivations succeed in producing x - 2 * y 15

Ambiguous Grammars This grammar allows multiple leftmost derivations for x - 2 * y Hard to automate derivation if > 1 choice The grammar is ambiguous 1 Expr Expr Op Expr 2 " number 3 " id 4 Op + 5 " 6 " * 7 " / Rule Sentential Form Expr 1 Expr Op Expr 1 Expr Op Expr Op Expr 3 <id,x> Op Expr Op Expr 5 <id,x> Expr Op Expr 2 <id,x> <num,2> Op Expr 6 <id,x> <num,2> * Expr 3 <id,x> <num,2> * <id,y> different choice than the first time 16

Derivations and Precedence These two derivations point out a problem with the grammar: It has no notion of precedence, or implied order of evaluation To add precedence Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize high precedence subexpressions first For algebraic expressions Multiplication and division, first Subtraction and addition, next (level one) (level two) 17

Derivations and Precedence Adding the standard algebraic precedence produces: level two level one 1 Goal Expr 2 Expr Expr + Term 3 Expr Term 4 Term 5 Term Term * Factor 6 Term / Factor 7 Factor 8 Factor number 9 id This grammar is slightly larger Takes more rewriting to reach some of the terminal symbols Encodes expected precedence Produces same parse tree under leftmost & rightmost derivations Let s see how it parses x - 2 * y 18

Derivations and Precedence Rule Sentential Form Goal 1 Expr 3 Expr Term 5 Expr Term * Factor 9 Expr Term * <id,y> 7 Expr Factor * <id,y> 8 Expr <num,2> * <id,y> 4 Term <num,2> * <id,y> 7 Factor <num,2> * <id,y> 9 <id,x> <num,2> * <id,y> The rightmost derivation E G E T T T * F F <id,x> F <num,2> <id,y> Its parse tree This produces x ( 2 * y ), along with an appropriate parse tree. Both the leftmost and rightmost derivations give the same expression, because the grammar directly encodes the desired precedence. 19

Ambiguous Grammars Definitions If a grammar has more than one leftmost derivation for a single sentential form, the grammar is ambiguous If a grammar has more than one rightmost derivation for a single sentential form, the grammar is ambiguous The leftmost and rightmost derivations for a sentential form may differ, even in an unambiguous grammar Classic example the if-then-else problem Stmt if Expr then Stmt if Expr then Stmt else Stmt other stmts This ambiguity is entirely grammatical in nature 20

Ambiguity This sentential form has two derivations if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 21

Ambiguity Removing the ambiguity Must rewrite the grammar to avoid generating the problem Match each else to innermost unmatched if (common sense rule) 1 Stmt WithElse 2 NoElse 3 WithElse if Expr then WithElse else WithElse 4 OtherStmt 5 NoElse if Expr then Stmt 6 if Expr then WithElse else NoElse With this grammar, the example has only one derivation 22

Ambiguity if Expr 1 then if Expr 2 then Stmt 1 else Stmt 2 Rule Sentential Form Stmt 2 NoElse 5 if Expr then Stmt? if E 1 then Stmt 1 if E 1 then WithElse 3 if E 1 then if Expr then WithElse else WithElse? if E 1 then if E 2 then WithElse else WithElse 4 if E 1 then if E 2 then S 1 else WithElse 4 if E 1 then if E 2 then S 1 else S 2 This binds the else controlling S 2 to the inner if 23

Deeper Ambiguity Ambiguity usually refers to confusion in the CFG Overloading can create deeper ambiguity a = f(17) In many Algol-like languages, f could be either a function or a subscripted variable Disambiguating this one requires context Need values of declarations Really an issue of type, not context-free syntax Requires an extra-grammatical solution (not in CFG) Must handle these with a different mechanism Step outside grammar rather than use a more complex grammar 24

Ambiguity - the Final Word Ambiguity arises from two distinct sources Confusion in the context-free syntax Confusion that requires context to resolve (if-then-else) (overloading) Resolving ambiguity To remove context-free ambiguity, rewrite the grammar Change language (e.g.: if endif) To handle context-sensitive ambiguity takes cooperation Knowledge of declarations, types, Accept a superset of L(G) & check it by other means This is a language design problem Sometimes, the compiler writer accepts an ambiguous grammar Parsing techniques that do the right thing i.e., always select the same derivation See Chapter 4 25

Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production & try to match the input Bad pick may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root As input is consumed, encode possibilities in an internal state Start in a state valid for legal first tokens Bottom-up parsers handle a large class of grammars 26

Parsing Techniques: Top-down parsers LL(1), recursive descent 1 input symbol lookahead construct leftmost deriviation (forwards) input: read left-to-right S * lm x A β lm x δ β * lm x y S A δ β x y 27

Parsing Techniques: Top-down parsers LL(1), recursive descent 1 input symbol lookahead construct leftmost deriviation (forwards) input: read left-to-right S * lm x A β lm x δ β * lm x y S rule A δ A δ β x y 28

Parsing Techniques: Bottom-up parsers LR(1), operator precedence 1 input symbol lookahead construct rightmost deriviation (backwards) input: read left-to-right S * rm α B y rm α γ y * rm x y S α B γ x y 29

Parsing Techniques: Bottom-up parsers LR(1), operator precedence 1 input symbol lookahead construct rightmost deriviation (backwards) input: read left-to-right S rule B γ S * rm α B y rm α γ y * rm x y α B γ x y 30

Next class Top-down and bottom-up Parsing Read EaC: 3.1 3.3 31