A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Similar documents
Dr. D.M. Akbar Hussain

CMSC 330: Organization of Programming Languages

CSE 3302 Programming Languages Lecture 2: Syntax

EECS 6083 Intro to Parsing Context Free Grammars

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Chapter 3. Describing Syntax and Semantics

CMSC 330: Organization of Programming Languages

Principles of Programming Languages COMP251: Syntax and Grammars

CSC 326H1F, Fall Programming Languages. What languages do you know? Instructor: Ali Juma. A survey of counted loops: FORTRAN

CMSC 330: Organization of Programming Languages

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

3. Context-free grammars & parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

CPS 506 Comparative Programming Languages. Syntax Specification

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Programming Languages Third Edition

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Syntax. In Text: Chapter 3

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Chapter 3. Describing Syntax and Semantics ISBN

Homework & Announcements

CSE 12 Abstract Syntax Trees

Describing Syntax and Semantics

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

ECE251 Midterm practice questions, Fall 2010

Parsing II Top-down parsing. Comp 412

Syntax and Semantics

F453 Module 7: Programming Techniques. 7.2: Methods for defining syntax

Context-Free Grammar (CFG)

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

A programming language requires two major definitions A simple one pass compiler

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Programming Language Syntax and Analysis

CMPT 755 Compilers. Anoop Sarkar.

Introduction to Lexing and Parsing

Question Marks 1 /12 2 /6 3 /14 4 /8 5 /5 6 /16 7 /34 8 /25 Total /120

Lexical and Syntax Analysis. Top-Down Parsing

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Lexical Scanning COMP360

3. DESCRIBING SYNTAX AND SEMANTICS

Theory and Compiling COMP360

Syntax Intro and Overview. Syntax

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Syntax and Grammars 1 / 21

Week 2: Syntax Specification, Grammars

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Introduction to Syntax Analysis

Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Part 3. Syntax analysis. Syntax analysis 96

CSCE 314 Programming Languages

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Introduction to Syntax Analysis. The Second Phase of Front-End

Syntax. A. Bellaachia Page: 1

A simple syntax-directed

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Programming Languages

This book is licensed under a Creative Commons Attribution 3.0 License

Defining Languages GMU

Chapter 3: Syntax and Semantics. Syntax and Semantics. Syntax Definitions. Matt Evett Dept. Computer Science Eastern Michigan University 1999

There are many other applications like constructing the expression tree from the postorder expression. I leave you with an idea as how to do it.

Specifying Syntax COMP360

Stating the obvious, people and computers do not speak the same language.

FreePascal changes: user documentation

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS323 Lecture - Specifying Syntax and Semantics Last revised 1/16/09

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

Syntax Analysis Check syntax and construct abstract syntax tree

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

2.2 Syntax Definition

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS 314 Principles of Programming Languages. Lecture 3

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Part 5 Program Analysis Principles and Techniques

CSE431 Translation of Computer Languages

Languages and Compilers

Transcription:

The current topic:! Introduction! Object-oriented programming: Python! Functional programming: Scheme! Python GUI programming (Tkinter)! Types and values! Logic programming: Prolog! Introduction! Rules, unification, resolution, backtracking, lists.! More lists, math, structures.! More structures, trees, cut.! Negation. Exceptions Announcements Reminder: The deadline for submitting a re-mark request for Term Test 2 is the end of class on Friday. Make sure you understand the posted solutions before submitting a re-mark request. Reminder: Lab 3 is due on Monday (December 1st) at 10:30 am. 1 2 What's a language? Goals and definitions Parsing Translation Reference: Sebesta, chapters 3 and 4. A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols Example: alphabet: { a, b, c } language 1: all two-character strings from the alphabet { aa, ab, ac, ba, bb, bc, ca, cb, cc } language 2: all three-character strings that start and end with c { cac, cbc, ccc } 3 4

What do you need to know about a language? Two things: What can you say? What does it mean? "What can you say?" is syntax. e.g. In Python, a for loop must be written as... "What does it mean?" is semantics. e.g. In Python, a for loop means that the following will happen... Programming-language semantics It's hard to specify the meaning of a statement in a programming language. Choices: Operational semantics defines effect of program in s of program execution on a lower-level machine. i.e. the meaning of a statement is the sequence of assembler statements it translates to, or the value of the ession it calculates. similar to our usual explanations of meaning. we've been using (something like) this definition. Axiomatic semantics used in program verification (in proofs of correctness). defines effect of program in s of preconditions and postconditions of individual statements. Denotational semantics gives meaning in s of mappings (functions) from statements to changes of system state, where changes of system state are represented mathematically (using recursion). References: Sebesta, sections 3.4 and 3.5. 5 6 Programming-language syntax Backus-Naur form (BNF) Recall that there are standard ways to specify the form of a statement in a programming language. Backus-Naur Form (BNF) Extended BNF (EBNF) adds alternatives and repetition to BNF Basic idea: A program consists of parts, each of which consists of subparts, and so on. The parts are of various kinds (functions, essions, literal values, ). The structure expands into a tree -- the parse tree. At the leaves of the tree are the actual statements and essions of a particular program. That is: every program has its own parse tree. On the left, the form being described. Then, the ::= (or!) symbol. Then, the components allowed in the form, with pipes used to separate multiple allowed definitions of the same form. Components that must be exactly as shown are put in quotation marks: e.g. '='. The names of the form and of components that are forms themselves are put in angle brackets < > or distinguished by a special typeface. In other words, BNF is like this: <BNF description> ::= <form> '::=' <components> 7 8

Extended BNF A standard example, and its parse tree Repetition: use curly brackets, which mean "zero or more" of their contents. <comma > ::= <> { ',' <> } A comma ession consists of one or more essions, separated by commas. 2 * (3 + 4) Parse tree: Optional parts: use square brackets, which mean "zero or one" of their contents. <if stmt> ::= 'if' '(' <> ')' <stmt> [ 'else' <stmt> ] An "if" statement has at most one "else" clause. Alternatives: use brackets and pipes, which mean "choose exactly one" <> ::= <vbl> ('+' '-' '*' '\') <vbl> An ession consists of a variable followed by exactly one symbol followed by another variable. BNF and Extended BNF are equivalent in what they can describe, but Extended BNF improves readability. 2 * factor ( ) 3 + 4 9 10 EBNF for the grammar of the standard example Productions An ession is the addition and/or subtraction of a sequence of s: <> ::= <> { ( '+' '-' ) <> } We now continue describing how to specify syntax. A is the multiplication and/or division of a sequence of factors: <> ::= <factor> { ( '*' '/' ) <factor> } A factor is an explicit constant, or a parenthesized ession: <factor> ::= <literal> '(' <> ')' Note the quotation marks around the parentheses: the parentheses are part of the language being described. A literal is a sequence of digits: <literal> ::= '0'.. '9' { '0'.. '9' } A rule such as this is a production: <> ::= <> { ( '+' '-' ) <> } The left-hand side of a production is a noninal. The right-hand side consists of a mixture of inals and noninals. The production says that, when you're trying to see how a program can be understood in s of the language's syntax, it's legal to replace the left-hand noninal by the string on the right. That is, you can expand the LHS by replacing it with the RHS. 11 12

Terminals and noninals Grammars Terminals are things that can be actually present in the program text: e.g. '(', '3' Noninals are categories that have to be detailed further before you reach inals. internal nodes in the syntax tree (a syntax tree is a more detailed version of a parse tree) e.g <>, <factor>, <if statement> A grammar consists of: a set of inals -- the alphabet a set of noninals a particular noninal called the start symbol In our example, the start symbol is <>. a set of productions A grammar defines what you can say in a language -- that is, it specifies the syntax. 13 14 An example grammar Another example grammar alphabet: { a } noninals: { <S> } start symbol: <S> productions: <S> ::= " a<s> (" is the empty string) The language is { a n n! 0 } that is: ", a, aa, aaa, aaaa, Productions: <sentence> :: = <subject> <verb> <object>'.' <subject> ::= <article> <noun> <verb> ::= 'walks' 'bites' <object> ::= <article> <noun> <article> ::= 'a' 'the' <noun> ::= 'man' 'dog' Some legal statements in this language: the man walks the dog. a dog walks the man. the dog walks a dog. the dog bites a man. 15 16

Parsing Our example again To parse a statement is to show how it can be derived from the language's grammar. The steps in parsing a statement are derivations. At each derivation, one production is applied to advance the parsing process. The complete set of derivations is a derivation sequence. This material is covered in much more detail in CSC467 (Compilers and interpreters). The "statement" to be parsed: 2*(3+4) The grammar we'll use: <> ::= <> { ( '+' '-' ) <> } <> ::= <factor> { ( '*' '/' ) <factor> } <factor> ::= <literal> '(' <> ')' <literal> ::= '0'.. '9' { '0'.. '9' } 17 18 A derivation sequence for the example A derivation sequence for the example 2 * (3 + 4)!! factor * factor factor! literal literal! 2 factor! ( )! +! factor factor! literal literal! 3 factor! literal literal! 4 the state so far: factor * factor literal * factor 2 * factor 2 * ( ) 2 * ( + ) 2 * ( factor + factor ) 2 * ( literal + factor ) 2 * ( 3 + factor ) 2 * ( 3 + literal ) 2 * ( 3 + 4 ) The derivation we carried out was a leftmost derivation. At each step, the leftmost noninal was replaced by using an appropriate production. At each step, we could tell from the input (from the next "unexplained" inal(s)) which production to choose. What about the! and! factor choices? 19 20

The parse tree for the example With all inediate steps included, the parse tree may be called the "syntax tree": Semantics from the syntax The operator precedence (multiplication/division performed before addition/subtraction) is implied by the syntax: <> ::= <> { ( '+' '-' ) <> } <> ::= <factor> { ( '*' '/' ) <factor> } factor * factor literal ( ) This makes it easier to generate code that corresponds to the semantics, without having to provide rules that state the semantics explicitly. 2 factor literal 3 + factor literal 4 21 22 An ambiguous (and an unusable?) grammar Ambiguity: two parse trees for 2 * 3 + 4 E ::= E '+' E E '*' E '(' E ')' number Why is it ambiguous? Because a given ession may have more than one parse tree. See next slide. F ::= F { ( '+' '-' ) F } Why might it be unusable? Because, for example, if we represent productions by functions, then the first step in the function F( ) will be to call F( ). We avoided both problems because of the way we wrote the grammar in our previous example. We used different noninals for operations of different precedence. E E * E 2 E + E 3 4 E E E + E * E 2 3 4 23 24

Operational semantics We're using "operational semantics": the meaning of a statement is what it does when you run it. That is, if we translate the statement to machine or assembly language, then we have given its meaning. Given the parse tree of a statement, we can do a tree-traversal to get the corresponding code. Code for 2*(3+4) in s of stack operations A postorder traversal of the parse tree: "Postorder": Visit a node's children before visiting the node itself. At operand leaves, emit code to push the operand. At operator leaves, do nothing. At internal nodes that have an operator as a child, emit code to call the operator. At other internal nodes and at parentheses, do nothing. Code: push 2 push 3 push 4 + * 2 * factor ( ) 3 + 4 25 26 Code for 2*(3+4) in assembly language Points glossed over A postorder traversal of the parse tree: At operand leaves, emit code to load the operand into an available register. At operator leaves, do nothing. An internal nodes that have an operator as a child, emit code to call the operator on the appropriate registers. At other internal nodes and at parentheses, do nothing. Code: ldi #2, r1 ldi #3, r2 ldi #4, r3 add r3, r2 mul r2, r1 2 * factor ( ) Register management: the set of registers is smaller than the number of values and subessions in a program, and is different in different computers. How do we decide which values to move to main memory while making best use of the registers in the CPU? We assumed we had an explicit parse tree. More likely, if we analyze the ession recursively, with a function for every noninal, we can generate code as the parser runs instead of building the entire tree data structure. We'll ignore register management, but spend some time on recursivedescent parsing. 3 + 4 27 28

The productions of the grammar: Recursive-descent parsing <> ::= <> { ( '+' '-' ) <> } <> ::= <factor> { ( '*' '/' ) <factor> } <factor> ::= <literal> '(' <> ')' <literal> ::= '0'.. '9' { '0'.. '9' } Every noninal is represented by a function: (), (), and so on. The start symbol: <> So the parsing process begins with a call of (). The productions for <> are <> { ( '+' '-' ) <> }, so () begins something like this: (); while (true) { token = getnexttoken(); } if (token == '+') { emit("add..."); (); } else if (token == '-') { emit ("sub..."); (); } else return; What ( ) and factor( ) do A <> starts with a <factor>, so () looks like (). calls factor() to do most of its work. A <factor> can begin in several ways: <factor> ::= <literal> '(' <> ')' What should factor() begin by doing? It depends on what's next in the ession being parsed. That is, again you have to read the next input token. If the next token is '(', call () and then look for ')'. If that fails, try calling literal(), which if it succeeds will take the appropriate action: For example, emit "ldi <literal>, rn" to the object file. Reference: Sebesta section 4.4 29 30 "Recursive-descent" This process is recursive: the program structure works by function calls related in the same way as the parse tree relates the parts of the ession. It is also literally top-down: it proceeds downward from the root of the parse tree to find all the leaves. 31