Related Course Objec6ves

Similar documents
Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Defining syntax using CFGs

Principles of Programming Languages COMP251: Syntax and Grammars

A simple syntax-directed

Syntax. A. Bellaachia Page: 1

2.2 Syntax Definition

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CPS 506 Comparative Programming Languages. Syntax Specification

A Simple Syntax-Directed Translator

Syntax Analysis Check syntax and construct abstract syntax tree

This book is licensed under a Creative Commons Attribution 3.0 License

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Compiler Design Concepts. Syntax Analysis

Syntax and Grammars 1 / 21

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Non-deterministic Finite Automata (NFA)

Semester Review CSC 301

Syntax Intro and Overview. Syntax

Introduction to Lexical Analysis

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

ECE251 Midterm practice questions, Fall 2010

CSE 3302 Programming Languages Lecture 2: Syntax

B The SLLGEN Parsing System

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

CSE 401 Midterm Exam Sample Solution 2/11/15

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Wednesday, August 31, Parsers

Part 5 Program Analysis Principles and Techniques

Formal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Week 2: Syntax Specification, Grammars

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSCE 531 Spring 2009 Final Exam

Introduction to Lexical Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

EECS 6083 Intro to Parsing Context Free Grammars

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Chapter 3. Describing Syntax and Semantics

Compilers and computer architecture From strings to ASTs (2): context free grammars

3. Context-free grammars & parsing

CMSC 330: Organization of Programming Languages

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

A programming language requires two major definitions A simple one pass compiler

Homework & Announcements

CMSC 330: Organization of Programming Languages

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

CMSC 330: Organization of Programming Languages. Context Free Grammars

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

D0010E Object- oriented programming and design. Today. Today An introduc<on to the basic syntax and seman<cs of Java

Syntax. In Text: Chapter 3

CMSC 330: Organization of Programming Languages. Context Free Grammars

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CSE302: Compiler Design

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

COP 3402 Systems Software Syntax Analysis (Parser)

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Syntax and Semantics

Compilers Course Lecture 4: Context Free Grammars

Lexical Scanning COMP360

CS101: Fundamentals of Computer Programming. Dr. Tejada www-bcf.usc.edu/~stejada Week 1 Basic Elements of C++

Chapter 3. Describing Syntax and Semantics ISBN

Syntax Analysis. Chapter 4

Optimizing Finite Automata

CS251 Programming Languages Spring 2016, Lyn Turbak Department of Computer Science Wellesley College

Introduction to Parsing Ambiguity and Syntax Errors

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

More Assigned Reading and Exercises on Syntax (for Exam 2)

Stating the obvious, people and computers do not speak the same language.

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Question Bank. 10CS63:Compiler Design

Introduction to Parsing Ambiguity and Syntax Errors

Parsing. Zhenjiang Hu. May 31, June 7, June 14, All Right Reserved. National Institute of Informatics

Lexical Analysis. Chapter 2

CSCE 314 Programming Languages

Lexical and Syntax Analysis. Top-Down Parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

ITEC2620 Introduction to Data Structures

Defining syntax using CFGs

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

CS 403: Scanning and Parsing

1 Lexical Considerations

Transcription:

Syntax 9/18/17 1

Related Course Objec6ves Develop grammars and parsers of programming languages 9/18/17 2

Syntax And Seman6cs Programming language syntax: how programs look, their form and structure Syntax is defined using a kind of formal grammar Programming language seman>cs: what programs do, their behavior and meaning Seman>cs is harder to define -- mechanisms for precisely defining seman>cs are harder to learn. 9/18/17 3

Outline Grammar and parse tree examples BNF and parse tree defini>ons Construc>ng grammars Phrase structure and lexical structure Dealing with Ambiguity 9/18/17 4

An English Grammar A sentence is a noun phrase, a verb, and a noun phrase. A noun phrase is an article and a noun. A verb is An article is A noun is... <S> ::= <NP> <V> <NP> <NP> ::= <A> <N> <V> ::= loves hates eats <A> ::= a the <N> ::= dog cat rat 9/18/17 5

Deriva6on A grammar says how a sentence (program) is generated. A sentence can be generated by the grammar as follows: Start with the string <S> Repeatedly replace a symbol that is the lem-hand side of a rule by the right-hand side of the same rule Un>l no symbols in the string appears as the lem-hand side of any rule. 9/18/17 6

Deriva6on A deriva>on step: replacing a symbol in a string that is the lem-hand side symbol of a rule by the right-hand side of the rule. A deriva>on step is a rela>on èover strings Mul>ple step deriva>on is a rela>on that is the reflexive and transi>ve closure of è LeMmost/rightmost deriva>ons: Always choose the lemmost (rightmost) non-terminal symbol to replace. A parse tree corresponds to a unique lemmost/ rightmost deriva>on. 9/18/17 7

Building a parse tree The grammar is a set of rules that say how to build a parse tree You put <S> at the root of the tree The grammar s rules say how children can be added at any point in the tree For instance, the rule says you can add nodes <NP>, <V>, and <NP>, in that order, as children of <S> <S> ::= <NP> <V> <NP> 9/18/17 8

A Parse Tree <S> <NP> <V> <NP> <A> <N> loves <A> <N> the dog the cat 9/18/17 9

A Programming Language Grammar <exp> ::= <exp> + <exp> <exp> * <exp> ( <exp> ) id An expression can be the sum of two expressions, or the product of two expressions, or a parenthesized subexpression Or it can be one of the variable iden>fier 9/18/17 10

A Parse Tree <exp> ( <exp> ) ((a+b)*c) <exp> * <exp> ( <exp> ) id <exp> + <exp> id id 9/18/17 11

start symbol <S> ::= <NP> <V> <NP> a production <NP> ::= <A> <N> <V> ::= loves hates eats non-terminal symbols <A> ::= a the <N> ::= dog cat rat tokens 9/18/17 12

BNF Grammar Defini6on A BNF grammar consists of four parts: The set of tokens The set of non-terminal symbols The start symbol The set of produc8ons 9/18/17 13

Defini6on, Con6nued The tokens are the smallest units of syntax Strings of one or more characters of program text They are atomic: not treated as being composed from smaller parts The non-terminal symbols stand for larger pieces of syntax They are strings enclosed in angle brackets, as in <NP> They are not strings that occur literally in program text The grammar says how they can be expanded into strings of tokens The start symbol is the par>cular non-terminal that forms the root of any parse tree for the grammar 9/18/17 14

Defini6on, Con6nued The produc8ons are the tree-building rules Each one has a lem-hand side, the separator ::=, and a right-hand side The lem-hand side is a single non-terminal The right-hand side is a sequence of one or more things, each of which can be either a token or a non-terminal A produc>on gives one possible way of building a parse tree. 9/18/17 15

Alterna6ves When there is more than one produc>on with the same lem-hand side, an abbreviated form can be used The BNF grammar can give the lem-hand side, the separator ::=, and then a list of possible right-hand sides separated by the special symbol 9/18/17 16

Example <exp> ::= <exp> + <exp> <exp> * <exp> ( <exp> ) id Note that there are 4 productions in this grammar. It is equivalent to this one: <exp> ::= <exp> + <exp> <exp> ::= <exp> * <exp> <exp> ::= ( <exp> ) <exp> ::= id 9/18/17 17

Empty The special nonterminal <empty> is for places where you want the grammar to generate nothing For example, this grammar defines a typical if-then construct with an op>onal else part: <if-stmt> ::= if <expr> then <stmt> <else-part> <else-part> ::= else <stmt> <empty> 9/18/17 18

Parse Trees To build a parse tree, put the start symbol at the root Add children to every non-terminal, following any one of the produc8ons for that non-terminal in the grammar Done when all the leaves are tokens Read off leaves from lem to right that is the string derived by the tree 9/18/17 19

Prac6ce <exp> ::= <exp> + <exp> <exp> * <exp> ( <exp> ) id Show a parse tree for each of these sentences: a+b a*b+c (a+b) (a+(b)) 9/18/17 20

Deriva6on αaγèαβγ if A::=β is a produc>on rule Wri>ng deriva>ons for the above sentences/programs. 9/18/17 21

Languages The language defined by a grammar is the set of all strings that can be derived by some parse tree for the grammar, I.e. all terminal strings that can be derived from the start symbol. As in the previous example, that set is omen infinite (though grammars are finite) 9/18/17 22

Construc6ng Grammars Most important trick: divide and conquer Example: the language of Java declara>ons: a type name, a list of variables separated by commas, and a semicolon Each variable can be followed by an ini>alizer: float a; boolean a,b,c; int a=1, b, c=1+2; 9/18/17 23

Example, Con6nued Easy if we postpone defining the comma-separated list of variables with ini>alizers: Primi>ve type names are easy enough too: <var-dec> ::= <type-name> <declarator-list> ; <type-name> ::= boolean byte short int long char float double 9/18/17 24

Example, Con6nued That leaves the comma-separated list of variables with ini>alizers Again, postpone defining variables with ini>alizers, and just do the comma-separated list part: <declarator-list> ::= <declarator> <declarator>, <declarator-list> 9/18/17 25

Example, Con6nued That leaves the variables with ini>alizers: <declarator> ::= id id = <expr> For full Java, we would need to allow pairs of square brackets amer the variable name There is also a syntax for array ini>alizers 9/18/17 26

Logic terms A term is either a variable or a constant or a func>on symbol followed by a comma separated sequence of terms enclosed in a pair of parenthesis. 9/18/17 27

λ-calculus terms A term is either a variable, or λ followed by a variable, a dot and a term, or juxtaposi>on of two terms. 9/18/17 28

Where Do Tokens Come From? Tokens are pieces of program text that we do not choose to think of as being built from smaller pieces Iden>fiers (count), keywords (if), operators (==), constants (123.4), etc. Programs stored in files are just sequences of characters How is such a file divided into a sequence of tokens? 9/18/17 29

Lexical Structure And Phrase Structure Grammars so far have defined phrase structure: how a program is built from a sequence of tokens We also need to define lexical structure: how a text file is divided into tokens 9/18/17 30

Separate Grammars Usually there are two separate grammars One says how to construct a sequence of tokens from a file of characters One says how to construct a parse tree from a sequence of tokens 9/18/17 31

Separate Compiler Passes The scanner reads the input file and divides it into tokens according to the first grammar The scanner discards white space and comments The parser constructs a parse tree (or at least goes through the mo>ons more about this later) from the token stream according to the second grammar 9/18/17 32

Formal Context-Free Grammars In the study of formal languages and automata, grammars are expressed in yet another nota>on: S asb X X cx These are called context-free grammars Other kinds of grammars are also studied: regular grammars (weaker), context-sensi8ve grammars (stronger), etc. 9/18/17 33

Working Grammar G: <exp> ::= <exp> + <exp> <exp> * <exp> (<exp>) id This generates a language of arithmetic expressions using parentheses, the operators + and *, and the variables a, b and c 9/18/17 34

Precedence <exp> <exp> * <exp> <exp> + <exp> c a b Our grammar generates this tree for a+b*c. In this tree, the addition is performed before the multiplication, which is not the usual convention for operator precedence. 9/18/17 35

Operator Precedence Applies when the order of evalua>on is not completely decided by parentheses Each operator has a precedence level, and those with higher precedence are performed before those with lower precedence. a+b*c = a+(b*c) Examples C (15 levels of precedence) Pascal (5 levels) Smalltalk (1 level for all binary operators) 9/18/17 36

Precedence In The Grammar G: <exp> ::= <exp> + <exp> <exp> * <exp> (<exp>) id To fix the precedence problem, we modify the grammar so that it is forced to put * below + in the parse tree. G1: <exp> ::= <exp> + <exp> <mulexp> <mulexp> ::= <mulexp> * <mulexp> (<exp>) id 9/18/17 37

Correct Precedence < exp > < exp > + < exp > G1 parse tree: < mulexp > < mulexp > a < mulexp > * < mulexp > b c Our new grammar generates this tree for a+b*c. It generates the same language as before, but no longer generates parse trees with incorrect precedence. 9/18/17 38

Associa6vity <exp> <exp> + <exp> <exp> <exp> + <exp> <mulexp> <exp> + <exp> <exp> + <exp> <mulexp> a <mulexp> <mulexp> <mulexp> <mulexp> c b c a b Our grammar G1 generates both these trees for a+b+c. The first one is not the usual convention for operator associativity. 9/18/17 39

Operator Associa6vity Applies when the order of evalua>on is not decided by parentheses or by precedence LeA-associa8ve operators group lem to right: a+b+c+d = ((a+b) +c)+d Right-associa8ve operators group right to lem: a+b+c+d = a+(b+ (c+d)) Most operators in most languages are lem-associa>ve, but there are excep>ons 9/18/17 40

Associa6vity Examples C ML Fortran a<<b<<c most operators are left-associative a=b=0 right-associative (assignment) 3-2-1 most operators are left-associativ 1::2::nil right-associative (list builder) a/b*c most operators are left-associativ a**b**c right-associative (exponentiation) 9/18/17 41

Associa6vity In The Grammar G1: <exp> ::= <exp> + <exp> <mulexp> <mulexp> ::= <mulexp> * <mulexp> (<exp>) id To fix the associativity problem, we modify the grammar to make trees of +s grow down to the left (and likewise for *s) G2: <exp> ::= <exp> + <mulexp> <mulexp> <mulexp> ::= <mulexp> * <rootexp> <rootexp> <rootexp> ::= (<exp>) id 9/18/17 42

Correct Associa6vity <exp> <exp> + <mulexp> <exp> <mulexp> <rootexp> + <mulexp> <rootexp> b <rootexp> c a Our new grammar generates this tree for a+b+c. It generates the same language as before, but no longer generates trees with incorrect associativity. 9/18/17 43

Prac6ce Starting with this grammar: G2: <exp> ::= <exp> + <mulexp> <mulexp> <mulexp> ::= <mulexp> * <rootexp> <rootexp> <rootexp> ::= (<exp>) id 1.) Add a left-associative & operator, at lower precedence than any of the others 2.) Then add a right-associative ** operator, at higher precedence than any of the others 9/18/17 44

Ambiguity G was ambiguous: it generated more than one parse tree for the same string Fixing the associa>vity and precedence problems eliminated all the ambiguity This is usually a good thing: the parse tree corresponds to the meaning of the program, and we don t want ambiguity about that 9/18/17 45

Dangling Else In Grammars <stmt> ::= <if-stmt> s1 s2 <if-stmt> ::= if <expr> then <stmt> else <stmt> if <expr> then <stmt> <expr> ::= e1 e2 This grammar has a classic dangling-else ambiguity. The statement we want derive is if e1 then if e2 then s1 else s2 and the next slide shows two different parse trees for it... 9/18/17 46

<if-stmt> if <exp> then <stmt> else <stmt> e1 <if-stmt> s2 if <exp> then <stmt> e2 s1 <if-stmt> if <exp> then <stmt> e1 <if-stmt> Most languages that have this problem choose this parse tree: else goes with nearest unmatched then if <exp> then <stmt> else <stmt> e2 s1 9/18/17 47 s2

Elimina6ng The Ambiguity <stmt> ::= <if-stmt> s1 s2 <if-stmt> ::= if <expr> then <stmt> else <stmt> if <expr> then <stmt> <expr> ::= e1 e2 We want to insist that if this expands into an if, that if must already have its own else. First, we make a new non-terminal <matched-stmt> that generates everything <stmt> generates, except that it can not generate if statements without else: <matched-stmt> ::= <matched-if> s1 s2 <matched-if> ::= if <expr> then <matched-stmt> else <matched-stmt> 9/18/17 48

Elimina6ng The Ambiguity <stmt> ::= <if-stmt> s1 s2 <if-stmt> ::= if <expr> then <matched-stmt> else <stmt> if <expr> then <stmt> <expr> ::= e1 e2 Then we use the new non-terminal here. The effect is that the new grammar can match an else part with an if part only if all the nearer if parts are already matched. 9/18/17 49

Correct Parse Tree - Exercise Please draw the parse tree for if e1 then if e2 then s1 else s2 using the revised grammar. 9/18/17 50

Conclusion We use grammars to define programming language syntax, both lexical structure and phrase structure Connec>on between theory and prac>ce Two grammars, two compiler passes Parser-generators can write code for those two passes automa>cally from grammars 9/18/17 51