Homework & Announcements

Similar documents
CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Syntax. In Text: Chapter 3

Chapter 3. Describing Syntax and Semantics

Describing Syntax and Semantics

Parsing II Top-down parsing. Comp 412

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Programming Language Syntax and Analysis

Chapter 3. Describing Syntax and Semantics ISBN

Habanero Extreme Scale Software Research Project

Languages and Compilers

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Context-Free Grammar (CFG)

COP 3402 Systems Software Syntax Analysis (Parser)

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

EECS 6083 Intro to Parsing Context Free Grammars

Introduction to Parsing

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Lecture 10 Parsing 10.1

Principles of Programming Languages COMP251: Syntax and Grammars

CS 314 Principles of Programming Languages

ICOM 4036 Spring 2004

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CPS 506 Comparative Programming Languages. Syntax Specification

CS 406/534 Compiler Construction Parsing Part I

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Chapter 3. Describing Syntax and Semantics

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSE 12 Abstract Syntax Trees

CSE 3302 Programming Languages Lecture 2: Syntax

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

CMSC 330: Organization of Programming Languages

Lexical and Syntax Analysis. Top-Down Parsing

CSCE 314 Programming Languages

Chapter 4. Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Chapter 2 :: Programming Language Syntax

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

15 212: Principles of Programming. Some Notes on Grammars and Parsing

Lecture 4: Syntax Specification

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Chapter 3. Describing Syntax and Semantics ISBN

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Chapter 3. Describing Syntax and Semantics

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Syntax Analysis Part I

3. Parsing. Oscar Nierstrasz

Theory and Compiling COMP360

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax and Semantics

Introduction to Lexing and Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Building Compilers with Phoenix

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Defining syntax using CFGs

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Dr. D.M. Akbar Hussain

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CMSC 330: Organization of Programming Languages

CSCI312 Principles of Programming Languages!

Introduction to Parsing. Comp 412

Syntax. 2.1 Terminology

4. Lexical and Syntax Analysis

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

A programming language requires two major definitions A simple one pass compiler

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Lexical and Syntax Analysis

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

4. Lexical and Syntax Analysis

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Structure of a Compiler: Scanner reads a source, character by character, extracting lexemes that are then represented by tokens.

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compiler Construction Using

Topic 3: Syntax Analysis I

CS 230 Programming Languages

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Grammars and Parsing. Paul Klint. Grammars and Parsing

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

A simple syntax-directed

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Compiler Design Concepts. Syntax Analysis

Transcription:

Homework & nnouncements New schedule on line. Reading: Chapter 18 Homework: Exercises at end Due: 11/1 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 25

COS 140: Foundations of Specifying Programming : Backus Naur Form (BNF) Fall 2017 Copyright c 2002 2017 UMaine School of Computing and Information S 2 / 25

Problem Problem Syntax and Semantics Syntax Formalisms Backus Naur Form Problem: how to specify a programming language? Have to have a way to describe its syntax (and, possibly, its semantics) Copyright c 2002 2017 UMaine School of Computing and Information S 3 / 25

Syntax and Semantics Problem Syntax and Semantics Syntax Formalisms Backus Naur Form Syntax: Form of something (e.g., language) Describes relationships between components Semantics: Meaning of something Describes what the statements will do Need to capture formally so it s clear how language is to be used. Copyright c 2002 2017 UMaine School of Computing and Information S 4 / 25

Syntax Formalisms Problem Syntax and Semantics Syntax Formalisms Backus Naur Form What is needed in a syntax formalism? ble to specify language s for users and compiler designers Easy to write and understand Expressive enough to capture all programming languages Easy to translate into algorithms for machine translation of language Copyright c 2002 2017 UMaine School of Computing and Information S 5 / 25

Syntax Formalisms Problem Syntax and Semantics Syntax Formalisms Backus Naur Form Formalisms often expressed as production rules: LHS RHS Match LHS with what you have, produce RHS E.g.: S NP VP s we ll see: can also think of going backwards If you have NP and VP have a valid sentence S. Copyright c 2002 2017 UMaine School of Computing and Information S 6 / 25

Backus Naur Form Problem Syntax and Semantics Syntax Formalisms Backus Naur Form BNF is the most common way of specifying of a programming language lso important for: Computability theory Natural language processing Many other places in CS where you need to specify a Copyright c 2002 2017 UMaine School of Computing and Information S 7 / 25

Language Terminology Language Terminology Chomsky Hierarchy String: any combination of characters in the language may be the whole program Lexemes: lowest-level unit of language analogous to words in English sometimes called tokens Syntactic category: classes of lexemes that play the same role in the structure of a statement analogous to parts of speech also called tokens just to be confusing, maybe? Constituents: tokens or groups of tokens that are put together in ways specified by the analogous to noun phrases, etc. Grammar: specification of the syntax; a set of rules describing the legal strings of the language Terminal and non-terminal symbols Copyright c 2002 2017 UMaine School of Computing and Information S 8 / 25

Types of : The Chomsky Hierarchy Language Terminology Chomsky Hierarchy s go down the hierarchy, languages increase in complexity, more machinery needed to recognize them Copyright c 2002 2017 UMaine School of Computing and Information S 9 / 25

Types of : The Chomsky Hierarchy Language Terminology Chomsky Hierarchy s go down the hierarchy, languages increase in complexity, more machinery needed to recognize them Regular languages (regular expressions) tokens Single symbol on the LHS; RHS has at most one non-terminal: E.g.: S as b generatesa + b Copyright c 2002 2017 UMaine School of Computing and Information S 9 / 25

Types of : The Chomsky Hierarchy Language Terminology Chomsky Hierarchy s go down the hierarchy, languages increase in complexity, more machinery needed to recognize them Regular languages (regular expressions) tokens Single symbol on the LHS; RHS has at most one non-terminal: E.g.: S as b generatesa + b Context-free languages programming languages LHS: single symbol; terminals, non-terminals in RHS E.g.: S asb nil generatesa n b n Copyright c 2002 2017 UMaine School of Computing and Information S 9 / 25

Types of : The Chomsky Hierarchy Language Terminology Chomsky Hierarchy s go down the hierarchy, languages increase in complexity, more machinery needed to recognize them Regular languages (regular expressions) tokens Single symbol on the LHS; RHS has at most one non-terminal: E.g.: S as b generatesa + b Context-free languages programming languages LHS: single symbol; terminals, non-terminals in RHS E.g.: S asb nil generatesa n b n Context-sensitive languages RHS has at least as many symbols as LHS E.g.: asc asbsc Copyright c 2002 2017 UMaine School of Computing and Information S 9 / 25

Types of : The Chomsky Hierarchy Language Terminology Chomsky Hierarchy s go down the hierarchy, languages increase in complexity, more machinery needed to recognize them Regular languages (regular expressions) tokens Single symbol on the LHS; RHS has at most one non-terminal: E.g.: S as b generatesa + b Context-free languages programming languages LHS: single symbol; terminals, non-terminals in RHS E.g.: S asb nil generatesa n b n Context-sensitive languages RHS has at least as many symbols as LHS E.g.: asc asbsc Recursively-enumerable languages Turing-equivalent representation No restrictions on LHS, RHS E.g.: aaas aaatq Copyright c 2002 2017 UMaine School of Computing and Information S 9 / 25

(BNF) BNF Derivation Recursion Grammar formalism for context-free languages (only one symbol on LHS) Terminology: Rewrite (production) rules rules that rewrite current pattern into a new one LHS, RHS parts of rule: LHS RHS Terminal symbols symbols that appear only on RHS i.e., do not get replaced by anything Non-terminal symbols symbols that appear in some LHS lternatives ( ) different RHS s that can be used with the LHS Copyright c 2002 2017 UMaine School of Computing and Information S 10 / 25

(BNF) BNF Derivation Recursion E.g.: <var> <var> + <var> <var> B C Often written with::= instead of : <var>::= B C Copyright c 2002 2017 UMaine School of Computing and Information S 11 / 25

Example: Producing a String from BNF BNF Derivation Recursion <var> <var> + <var> <var> B C Start with the kind of constituent you want to produce Replace each non-terminal in LHS with a RHS that appears in the rule for which that non-terminal is in the LHS <var> +<var> Keep doing this until there are only non-terminals in the string + <var>... +B Copyright c 2002 2017 UMaine School of Computing and Information S 12 / 25

Derivation BNF Derivation Recursion Start symbol: where derivations begin to derive all possible strings (programs) Sentential form: any string derived from the start symbols using the rewrite rules Derivation: rewrite sentential forms until have all terminal symbols Options: Order of rewriting left-most, right-most derivations lternative used for rewriting Copyright c 2002 2017 UMaine School of Computing and Information S 13 / 25

Recursion BNF Derivation Recursion What is recursion? In BNF: LHS appears in the RHS llows infinite language from finite Cannot specify the number of times rule will be applied E.g.: to allow any number of additions in an expression: <var> <var> + Need an alternative that can stop the recursion! Copyright c 2002 2017 UMaine School of Computing and Information S 14 / 25

Grammar Parse What is parsing? Checking the legality of input against a Producing a parse tree that shows the relationships between the tokens Parse trees record derivations Start with a string of only tokens (terminal symbols) Find a RHS pattern in the string, replace with the LHS. Continue until you have the start symbol If you can do this: the input was a valid sentence in the language Create a parse tree by showing how we replace terminal/non-terminal symbols using appropriate rules Copyright c 2002 2017 UMaine School of Computing and Information S 15 / 25

Example Grammar Parse Copyright c 2002 2017 UMaine School of Computing and Information S 16 / 25

Example Input: + B + C Processed so far: Parse tree: Grammar: <var> <var> <var> + B C Grammar Parse Copyright c 2002 2017 UMaine School of Computing and Information S 16 / 25

Example Input: + B + C Processed so far: Parse tree: Grammar: <var> <var> <var> + B C Grammar Parse Copyright c 2002 2017 UMaine School of Computing and Information S 16 / 25

Example Input: + B + C Processed so far: Parse tree: <var> Grammar: <var> <var> <var> + B C Grammar Parse Copyright c 2002 2017 UMaine School of Computing and Information S 16 / 25

Example Grammar Parse Input: + B + C Grammar: Processed so far: + <var> Parse tree: <var> + <var> <var> + B C Copyright c 2002 2017 UMaine School of Computing and Information S 16 / 25

Example Grammar Parse Input: + B + C Grammar: Processed so far: +B + C <var> Parse tree: <var> + <var> + B <var> C <var> <var> + B C Copyright c 2002 2017 UMaine School of Computing and Information S 16 / 25

mbiguity Grammar Parse More than one different, legal, correct parse trees for the same string Problem, since parse tree indicates relationship between the constituents Property of the Copyright c 2002 2017 UMaine School of Computing and Information S 17 / 25

mbiguity Example Grammar Parse Grammar: <var> + Parse trees: <var> <var> B C + <var> B + <var> C <var> + + <var> <var> B C Sub-expressions that are more deeply nested in the tree are evaluated first Copyright c 2002 2017 UMaine School of Computing and Information S 18 / 25

n Unambiguous Grammar for ddition Expressions Grammar Parse Need to ensure that the expression can be evaluated in only one way Want the expression to be evaluated left to right first expression formed (lowest in tree) needs to be to the left Copyright c 2002 2017 UMaine School of Computing and Information S 19 / 25

n Unambiguous Grammar for ddition Expressions Grammar Parse Need to ensure that the expression can be evaluated in only one way Want the expression to be evaluated left to right first expression formed (lowest in tree) needs to be to the left Replace rule in previous with: + <var> Copyright c 2002 2017 UMaine School of Computing and Information S 19 / 25

n Unambiguous Grammar for ddition Expressions Grammar Parse Need to ensure that the expression can be evaluated in only one way Want the expression to be evaluated left to right first expression formed (lowest in tree) needs to be to the left Replace rule in previous with: + <var> + + <var> <var> C <var> B Copyright c 2002 2017 UMaine School of Computing and Information S 19 / 25

Operator Precedence Grammar Parse Need a constituent for each set of operators with the same precedence Operators with the highest precedence need to be built at the lowest level in the tree Get left associativity if we parse left right, keep recursion to the left + <var> Right associativity: recursion to the left. Copyright c 2002 2017 UMaine School of Computing and Information S 20 / 25

Levels of Precendence Grammar Parse Items in parentheses and identifiers, numbers: factor, B, (C + D) Multiplication and division: term *B, /B ddition and subtraction: expression +B, -B Copyright c 2002 2017 UMaine School of Computing and Information S 21 / 25

Example Grammar Grammar Parse + / () B C D Copyright c 2002 2017 UMaine School of Computing and Information S 22 / 25

Grammar Parse Can think of there being a state of the parse Records where we are in the process E.g.: + would mean that we ve turned the tokens we ve read so far into the non-terminal, the terminal symbol (token)+, and the non-terminal Idea of state can be used to parse using a state machine or automata we ll talk about this later in course Can also think of there being some tokens that have not yet been read Start with an empty state, and with all terminals unread Copyright c 2002 2017 UMaine School of Computing and Information S 23 / 25

Parse of + B * C - D Grammar Parse State: (empty) Input left: + B * C - D No rules apply to empty state. Read next token (). + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: + B * C - D Only rule that applies is: ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: + B * C - D Only rule that applies is: ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: + B * C - D Only rule that applies is: ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: + B * C - D Only rule that applies is: ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: + B * C - D lthough is the current state, there are still input tokens left, so we re not done. Read next token (+). + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + Input left: B * C - D No rule applies. Read next token (B). + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + B Input left: * C - D Can t replace whole thing. Use ::= B + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + Input left: * C - D Can t replace whole thing. Use ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + Input left: * C - D Can t replace whole thing. Use ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + Input left: * C - D Can apply ::= + + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: * C - D Can t stop -- still have input tokens left. Read next token (*) + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: * Input left: C - D No rule produces anything beginning with * -- dead end. Backtrack to previous state + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + Input left: * C - D lthough we could apply ::= +, we ve already tried that with no luck -- so read next input token (*). + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + * Input left: C - D Nothing matches directly with * Read next token (C). + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + * C Input left: - D Nothing matches directly with * C Use rule: ::= C + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + * Input left: - D Nothing matches directly with * Use rule: ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + * Input left: - D Use rule: ::= * + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: + Input left: - D Use rule: ::= + + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Input left: - D lthough is state, there is still input, so we re not done. No matching rules in ; so read next input token. + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: - Input left: D Doesn t match anything in. Read D from input. + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D State: - D Input: none Doesn t match anything in directly. So use ::= D Grammar Parse + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: - Input: none Doesn t match anything in. So use ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: - Input: none Doesn t match anything in. So use ::= + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D Grammar Parse State: Matches rule ::= - - Input: none + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parse of + B * C - D State: Input: none Done. Grammar Parse + B * C - D Copyright c 2002 2017 UMaine School of Computing and Information S 24 / 25

Parsers Grammar Parse Kinds of s/kinds of parsing: LL: left-to-right, leftmost derivation LR: left-to-right, rightmost derivation Different amounts of lookahead: LR(1), e.g. LL parsing: e.g., recursive-descent parsers LR parsing: e.g., shift-reduce parsers typically table-driven Copyright c 2002 2017 UMaine School of Computing and Information S 25 / 25