Syntax Analysis Check syntax and construct abstract syntax tree

Similar documents
Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

CS 314 Principles of Programming Languages

Compilers Course Lecture 4: Context Free Grammars

Compiler Design Concepts. Syntax Analysis

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

EECS 6083 Intro to Parsing Context Free Grammars

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Syntax Analysis Part I

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Context-Free Grammars

Part 5 Program Analysis Principles and Techniques

Introduction to Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Introduction to Parsing Ambiguity and Syntax Errors

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A Simple Syntax-Directed Translator

Introduction to Parsing. Lecture 5

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Introduction to Parsing Ambiguity and Syntax Errors

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

Principles of Programming Languages COMP251: Syntax and Grammars

Optimizing Finite Automata

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

Introduction to Parsing. Lecture 5

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Parsing II Top-down parsing. Comp 412

CSE302: Compiler Design

Properties of Regular Expressions and Finite Automata

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

COP 3402 Systems Software Syntax Analysis (Parser)

Introduction to Syntax Analysis

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Defining syntax using CFGs

Introduction to Parsing. Lecture 8

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

3. Parsing. Oscar Nierstrasz

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Introduction to Syntax Analysis. The Second Phase of Front-End

Lecture 8: Context Free Grammars

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Habanero Extreme Scale Software Research Project

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

2.2 Syntax Definition

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

More on Syntax. Agenda for the Day. Administrative Stuff. More on Syntax In-Class Exercise Using parse trees

CMSC 330: Organization of Programming Languages

A simple syntax-directed

CMSC 330: Organization of Programming Languages

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages. Context Free Grammars


Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Table-Driven Top-Down Parsers

Concepts Introduced in Chapter 4

Wednesday, August 31, Parsers

Part 3. Syntax analysis. Syntax analysis 96

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

More Assigned Reading and Exercises on Syntax (for Exam 2)

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Parsing - 1. What is parsing? Shift-reduce parsing. Operator precedence parsing. Shift-reduce conflict Reduce-reduce conflict

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CS 406/534 Compiler Construction Parsing Part I

A programming language requires two major definitions A simple one pass compiler

Chapter 3. Describing Syntax and Semantics ISBN

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax. In Text: Chapter 3

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CMSC 330: Organization of Programming Languages

Introduction to Parsing. Comp 412

Monday, September 13, Parsers

CSCI312 Principles of Programming Languages!

Building Compilers with Phoenix

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

E E+E E E (E) id. id + id E E+E. id E + E id id + E id id + id. Overview. derivations and parse trees. Grammars and ambiguity. ambiguous grammars

Intro To Parsing. Step By Step

Wednesday, September 9, 15. Parsers

Transcription:

Syntax Analysis Check syntax and construct abstract syntax tree if == = ; b 0 a b Error reporting and recovery Model using context free grammars Recognize using Push down automata/table Driven Parsers 1

Limitations of regular languages How to describe language syntax precisely and conveniently. Can regular expressions be used? Many languages are not regular, for example, string of balanced parentheses (((( )))) { ( i ) i i 0 } There is no regular expression for this language A finite automata may repeat states, however, it cannot remember the number of times it has been to a particular state A more powerful language is needed to describe a valid string of tokens 2

Syntax definition Context free grammars <T, N, P, S> T: a set of tokens (terminal symbols) N: a set of non terminal symbols P: a set of productions of the form nonterminal String of terminals & non terminals S: a start symbol A grammar derives strings by beginning with a start symbol and repeatedly replacing a non terminal by the right hand side of a production for that non terminal. The strings that can be derived from the start symbol of a grammar G form the language L(G) defined by the grammar. 3

Examples String of balanced parentheses S ( S ) S Є Grammar list list + digit list digit digit digit 0 1 9 Consists of the language which is a list of digit separated by + or -. 4

Derivation list list + digit list digit + digit digit digit + digit 9 digit + digit 9 5 + digit 9 5 + 2 Therefore, the string 9-5+2 belongs to the language specified by the grammar The name context free comes from the fact that use of a production X does not depend on the context of X 5

Examples Simplified Grammar for C block block { decls statements } statements stmt-list Є stmt list stmt-list stmt ; stmt ; decls decls declaration Є declaration 6

Syntax analyzers Testing for membership whether w belongs to L(G) is just a yes or no answer However the syntax analyzer Must generate the parse tree Handle errors gracefully if string is not in the language Form of the grammar is important Many grammars generate the same language Tools are sensitive to the grammar 7

What syntax analysis cannot do! To check whether variables are of types on which operations are allowed To check whether a variable has been declared before use To check whether a variable has been initialized These issues will be handled in semantic analysis 8

Derivation If there is a production A α then we say that A derives α and is denoted by A α α A β α γ β if A γ is a production If α 1 α 2 α n then α 1 + α n Given a grammar G and a string w of terminals in L(G) we can write S + w * If S α where α is a string of terminals and non terminals of G then we say that α is a sentential form of G 9

Derivation If in a sentential form only the leftmost non terminal is replaced then it becomes leftmost derivation Every leftmost step can be written as waγ lm* wδγ where w is a string of terminals and A δ is a production Similarly, right most derivation can be defined An ambiguous grammar is one that produces more than one leftmost (rightmost) derivation of a sentence 10

Parse tree shows how the start symbol of a grammar derives a string in the language root is labeled by the start symbol leaf nodes are labeled by tokens Each internal node is labeled by a non terminal if A is the label of anode and x 1, x 2, x n are labels of the children of that node then A x 1 x 2 x n is a production in the grammar 11

Example Parse tree for 9-5+2 list list + digit list - digit 2 digit 5 9 12

Ambiguity A Grammar can have more than one parse tree for a string Consider grammar list list+ list list list 0 1 9 String 9-5+2 has two parse trees 13

list list list + list list - list list - list 2 9 list + list 9 5 5 2 14

Ambiguity Ambiguity is problematic because meaning of the programs can be incorrect Ambiguity can be handled in several ways Enforce associativity and precedence Rewrite the grammar (cleanest way) There is no algorithm to convert automatically any ambiguous grammar to an unambiguous grammar accepting the same language Worse, there are inherently ambiguous languages! 15

Ambiguity in Programming Lang. Dangling else problem stmt if expr stmt if expr stmt else stmt For this grammar, the string if e1 if e2 then s1 else s2 has two parse trees 16

if e1 if e2 s1 else s2 stmt if expr stmt else stmt if e1 if e2 s1 else s2 stmt e1 if expr e2 stmt s1 s2 if expr stmt e1 if expr stmt else stmt e2 s1 s2 17

Resolving dangling else problem General rule: match each else with the closest previous unmatched if. The grammar can be rewritten as stmt matched-stmt unmatched-stmt matched-stmt if expr matched-stmt else matched-stmt others unmatched-stmt if expr stmt if expr matched-stmt else unmatched-stmt 18

Associativity If an operand has operator on both the sides, the side on which operator takes this operand is the associativity of that operator In a+b+c b is taken by left + +, -, *, / are left associative ^, = are right associative Grammar to generate strings with right associative operators right letter = right letter letter a b z 19

Precedence String a+5*2 has two possible interpretations because of two different parse trees corresponding to (a+5)*2 and a+(5*2) Precedence determines the correct interpretation. Next, an example of how precedence rules are encoded in a grammar 20

Precedence/Associativity in the Grammar for Arithmetic Expressions Ambiguous E E + E E * E (E) num id 3 + 2 + 5 Unambiguous, with precedence and associativity rules honored E E + T T T T * F F F ( E ) num id 3 + 2 * 5 21

Parsing Process of determination whether a string can be generated by a grammar Parsing falls in two categories: Top-down parsing: Construction of the parse tree starts at the root (from the start symbol) and proceeds towards leaves (token or terminals) Bottom-up parsing: Construction of the parse tree starts from the leaf nodes (tokens or terminals of the grammar) and proceeds towards root (start symbol) 22