COP 3402 Systems Software Syntax Analysis (Parser)

Similar documents
Syntax. In Text: Chapter 3

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Languages and Compilers

Dr. D.M. Akbar Hussain

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Principles of Programming Languages COMP251: Syntax and Grammars

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Syntax. A. Bellaachia Page: 1

Part 5 Program Analysis Principles and Techniques

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CSE 3302 Programming Languages Lecture 2: Syntax

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Lecture 4: Syntax Specification

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Introduction to Lexing and Parsing

CPS 506 Comparative Programming Languages. Syntax Specification

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Describing Syntax and Semantics

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages

Syntax Analysis Part I

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

EECS 6083 Intro to Parsing Context Free Grammars

CMSC 330: Organization of Programming Languages

Building Compilers with Phoenix

Defining syntax using CFGs

CMSC 330: Organization of Programming Languages. Context Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

Theory and Compiling COMP360

CMSC 330: Organization of Programming Languages. Context Free Grammars

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Introduction to Parsing. Lecture 5

Habanero Extreme Scale Software Research Project

Syntax Analysis Check syntax and construct abstract syntax tree

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

A simple syntax-directed

Chapter 3. Describing Syntax and Semantics ISBN

Introduction to Parsing

Introduction to Syntax Analysis. The Second Phase of Front-End

Lexical and Syntax Analysis. Top-Down Parsing

Lecture 8: Context Free Grammars

Chapter 3. Describing Syntax and Semantics

CSCI312 Principles of Programming Languages!

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSCI312 Principles of Programming Languages!

Introduction to Syntax Analysis

Lexical and Syntax Analysis

A Simple Syntax-Directed Translator

Compiler Construction Using

Introduction to Parsing. Lecture 8

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Lexical Scanning COMP360

Introduction to Parsing. Lecture 5

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Homework & Announcements

CSCE 314 Programming Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Chapter 3. Describing Syntax and Semantics

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Programming Language Syntax and Analysis

Question Bank. 10CS63:Compiler Design

Specifying Syntax COMP360

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Chapter 2 :: Programming Language Syntax

Parsing II Top-down parsing. Comp 412

This book is licensed under a Creative Commons Attribution 3.0 License

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Chapter 3. Describing Syntax and Semantics ISBN

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Chapter 3. Describing Syntax and Semantics

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Context-free grammars (CFG s)

CMPT 755 Compilers. Anoop Sarkar.

Optimizing Finite Automata

Context-Free Languages and Parse Trees

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

A programming language requires two major definitions A simple one pass compiler

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Chapter 2 - Programming Language Syntax. September 20, 2017

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Context-Free Grammars

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

3. Context-free grammars & parsing

Syntax. 2.1 Terminology

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CS 314 Principles of Programming Languages

Transcription:

COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1

Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2

Lexical and Syntax Analysis Lexical Analysis regular languages Syntax Analysis context-free languages described by regular expressions context-free grammars recognized by deterministic finite automata push-down automata implement lexer/scanner parser tool flex (lex) bison (yacc) tokens extracted by lexer are the non-terminal symbols of the CFG Lexical Analysis 3

Parsing Problem Given a grammar for a language and a string, a parser determines if the string is syntactically correct, that is, if can be generated using the rules of the grammar, and constructs the parse tree if the string is correct. A syntactically correct string is called sentence of the language. For a compiler, a sentence is correct program. We will focus on a recursive decent parser of a LL(1) grammar for PL/0. Top Down Parsing 4

Nested Structures Languages containing nested structures such as correctly parenthesized programs cannot be described by regular expressions. Even the much simpler language { ab, aabb, aaabbb, aaaabbbb,... } is not regular. We need to add recursion to describe nested structures. Syntax Analysis 5

Recursion The following rules suffice to define regular sets/languages: Concatenation Alternation r s Kleene closure (repetition) r* r s Regular sets are generated by regular expressions and recognized by finite state automata (FSA). We obtain context free languages by adding recursion as a new rule. Syntax Analysis 6

Context Free Grammars Context Free Languages are generated by Context Free Grammars (CFG) and recognized by Pushdown Automata. The strings of a language are called sentences or statements. Parsing is the task of determining whether the syntax (structure) of a given string conforms to the rules a given grammar, or equivalently, whether it is a sentence in the language Syntax Analysis 7

Context Free Grammars Consider the following three rules (grammar): <sentence> -> <subject> <predicate> <subject> -> John Mary <predicate> -> eats talks where -> means is defined as The four possible sentences that can be derived from <sentence> are John eats John talks Mary eats Mary talks Syntax Analysis 8

Context Free Grammars We refer to the rules used in the example grammar as syntax rules, productions, syntactic equations, or rewriting rules. <subject> and <predicate> are syntactic classes or categories. Using a shorthand notation, the syntax rules of the example grammar are S -> A B A -> a b B -> c d The language (the set of sentences) of this grammar is L = { ac, ad, bc, bd } Syntax Analysis 9

Context Free Grammars A context free language is defined by a 4-tuple (T,N,R,S) as: (1) The set of terminal symbols T * They cannot be substituted by any other symbol * This set is also called the vocabulary S -> A B A -> a b B -> c d terminal symbols (tokens) Syntax Analysis 10

Context Free Grammars A context free language is defined by a 4-tuple (T,N,R,S) as: (2) The set of non-terminal symbols (N) * They denote syntactic classes * They can be substituted by other symbols non-terminal symbols S A B -> A B -> a b -> c d Syntax Analysis 11

Context Free Grammars A context free language is defined by a 4-tuple (T,N,R,S) as: (3) The set of syntactic equations or productions (the grammar). * An equation or rewriting rule is specified for each nonterminal symbol (R) S -> A B A -> a b productions B -> c d Syntax Analysis 12

Context Free Grammars A context free language is defined by a 4-tuple (T,N,R,S) as: (4) The start symbol (S) S -> A B A -> a b B -> c d Syntax Analysis 13

Context Free Grammars A very well known meta-language used to specify context free grammars is Backus Naur Form (BNF) It was developed by John Backus and Peter Naur in the late 50s to describe programming languages. Noam Chomsky developed context free grammars in the early 50s. Syntax Analysis 14

Context Free Grammars Example of a grammar for a simple assignment statement: R1 <assgn> ::= <id> := <expr> R2 <id> ::= a b c R3 <expr> ::= <id> + <expr> R4 <id> * <expr> R5 ( <expr> ) R6 <id> Syntax Analysis 15

Context Free Grammars Grammar for a simple assignment statement: R1 <assgn> ::= <id> := <expr> R2 <id> ::= a b c R3 <expr> ::= <id> + <expr> R4 <id> * <expr> R5 ( <expr> ) R6 <id> The statement a := b * ( a + c ) is generated by this left most derivation: <assgn> -> <id> := <expr> R1 -> a := <expr> R2 -> a := <id> * <expr> R4 -> a := b * <expr> R2 -> a := b * ( <expr> ) R5 -> a := b * ( <id> + <expr> ) R3 -> a := b * ( a + <expr> ) R2 -> a := b * ( a + <id> ) R6 -> a := b * ( a + c ) R2 In a left most derivation only the left most nonterminal is replaced Syntax Analysis 16

Parse Trees A parse tree is a graphical representation of a derivation. For instance, the parse tree for the statement a := b * ( a + c ) is: <assign> <id> := <expr> a <id> * <expr> Every internal node of a parse tree is labeled with a non-terminal symbol. Every leaf is labeled with a terminal symbol. b ( <expr> ) <id> + <expr> a <id> Syntax Analysis 17 c

Ambiguity A grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous. For instance, our grammar for a simple assignment statement <assgn> ::= <id> := <expr> <id> ::= a b c <expr> ::= <expr> + <expr> <expr> * <expr> ( <expr> ) <id> is ambiguous because there are distinct parse trees for the expression a := b + c * a Syntax Analysis 18

Distinct Parse Trees <assign> <assign> <id> := <expr> <id> := <expr> a <expr> + <expr> a <expr> * <expr> <id> <expr> * <expr> <expr> + <expr> <id> b <id> <id> <id> <id> a c a b c If a language structure has more than one parse tree, then the meaning of the structure cannot be determined uniquely. Syntax Analysis 19

Operator Precedence If an operator is generated lower in the parse tree, it indicates that the operator has precedence over the operator generated higher up in the tree. An unambiguous grammar for expressions: <assign> ::= <id> := <expr> <id> ::= a b c <expr> ::= <expr> + <term> <term> <term> ::= <term> * <factor> <factor> <factor> ::= ( <expr> ) <id> This grammar indicates the usual precedence order of multiplication and addition operators. This grammar generates unique parse trees independently of doing a rightmost or leftmost derivation Syntax Analysis 20

Example of Derivations Leftmost derivation: <assgn> -> <id> := <expr> -> a := <expr> -> a := <expr> + <term> -> a := <term> + <term> -> a := <factor> + <term> -> a := <id> + <term> -> a := b + <term> -> a := b + <term> *<factor> -> a := b + <factor> * <factor> -> a := b + <id> * <factor> -> a := b + c * <factor> -> a := b + c * <id> -> a := b + c * a Rightmost derivation: <assgn> -> <id> := <expr> -> <id> := <expr> + <term> -> <id> := <expr> + <term> * <factor> -> <id> := <expr> + <term> * <id> -> <id> := <expr> + <term> * a -> <id> := <expr> + <factor> * a -> <id> := <expr> + <id> * a -> <id> := <expr> + c * a -> <id> := <term> + c * a -> <id> := <factor> + c * a -> <id> := <id> + c * a -> <id> := b + c * a -> a := b + c * a Syntax Analysis 21

Precedence & Associativity Dealing with ambiguity: Rule 1: * (times) and / (divide) have higher precedence than + (plus) and (minus). Example: a + c * 3 means a + (c * 3) Rule 2: Operators of equal precedence associate to the left. Example: a + c + 3 means (a + c) + 3 Syntax Analysis 22

Avoiding Ambiguity As seen previously, the following grammar for arithmetic expression is ambiguous. <expr> ::= <expr> <op> <expr> id int ( <expr> ) <op> ::= + - * / It is not to difficult to modify the above grammar and obtain an unambiguous grammar that generates the same sentences: <expr> ::= <term> <expr> + <term> <expr> - <term> <term> ::= <factor> <term> * <factor> <term> / <factor> <factor> ::= id int (<expr>) Syntax Analysis 23

Ambiguity Is this grammar ambiguous? E -> T E + T E - T T -> F T * F T / F F -> id num ( E ) Parse id + id id Is this grammar ambiguous? E -> T E + T T -> F T * F F -> id ( E ) Parse id + id + id Syntax Analysis 24

Ambiguity Is this grammar ambiguous? E -> E + E id Parse id + id + id Is this grammar ambiguous? E -> E + id id Parse id + id + id Syntax Analysis 25

Ambiguity Example: Given the ambiguous grammar E -> E + E E * E ( E ) id We could rewrite it as: E -> E + T T T -> T * F F F -> id ( E ) Determine the parse tree for: id + id * id Syntax Analysis 26