Chapter 3: Syntax and Semantics. Syntax and Semantics. Syntax Definitions. Matt Evett Dept. Computer Science Eastern Michigan University 1999

Similar documents
Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics

Describing Syntax and Semantics

Syntax and Semantics

Syntax. In Text: Chapter 3

Chapter 3. Describing Syntax and Semantics. Introduction. Why and How. Syntax Overview

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Programming Languages

3. DESCRIBING SYNTAX AND SEMANTICS

Semantics. There is no single widely acceptable notation or formalism for describing semantics Operational Semantics

Chapter 3 (part 3) Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics ISBN

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Topics. Languages. Formal Definition of Languages. BNF and Context-Free Grammars. Grammar 2/4/2019

Programming Language Syntax and Analysis

Defining Languages GMU

10/30/18. Dynamic Semantics DYNAMIC SEMANTICS. Operational Semantics. An Example. Operational Semantics Definition Process

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Syntax & Semantics. COS 301 Programming Languages

Chapter 3. Describing Syntax and Semantics

ICOM 4036 Spring 2004

P L. rogramming anguages. Fall COS 301 Programming Languages. Syntax & Semantics. UMaine School of Computing and Information Science

10/26/17. Attribute Evaluation Order. Attribute Grammar for CE LL(1) CFG. Attribute Grammar for Constant Expressions based on LL(1) CFG

Syntax. A. Bellaachia Page: 1

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

PRINCIPLES OF PROGRAMMING LANGUAGES

Habanero Extreme Scale Software Research Project

Programming Language Definition. Regular Expressions

Programming Languages Third Edition

Program Assignment 2 Due date: 10/20 12:30pm

EECS 6083 Intro to Parsing Context Free Grammars

Chapter 4. Lexical and Syntax Analysis

Unit-1. Evaluation of programming languages:

COP 3402 Systems Software Syntax Analysis (Parser)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Crafting a Compiler with C (II) Compiler V. S. Interpreter

This book is licensed under a Creative Commons Attribution 3.0 License

4. Lexical and Syntax Analysis

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Parsing II Top-down parsing. Comp 412

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

4. Lexical and Syntax Analysis

A programming language requires two major definitions A simple one pass compiler

CS 230 Programming Languages

Introduction to Parsing

Part 5 Program Analysis Principles and Techniques

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSE 3302 Programming Languages Lecture 2: Syntax

CS323 Lecture - Specifying Syntax and Semantics Last revised 1/16/09

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Principles of Programming Languages COMP251: Syntax and Grammars

CMSC 330: Organization of Programming Languages

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

3. Context-free grammars & parsing

Dr. D.M. Akbar Hussain

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Homework & Announcements

CMSC 330: Organization of Programming Languages

CS 314 Principles of Programming Languages

CMSC 330: Organization of Programming Languages

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Programming Languages, Summary CSC419; Odelia Schwartz

A Simple Syntax-Directed Translator

FreePascal changes: user documentation

CMSC 330: Organization of Programming Languages. Context Free Grammars

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSCE 314 Programming Languages

Context-free grammars (CFG s)

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lexical and Syntax Analysis

CS 406/534 Compiler Construction Parsing Part I

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Defining syntax using CFGs

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSE302: Compiler Design

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

A simple syntax-directed

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Contents. Chapter 1 SPECIFYING SYNTAX 1

Informatica 3 Syntax and Semantics

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Transcription:

Chapter 3: Syntax and Semantics Matt Evett Dept. Computer Science Eastern Michigan University 1999 Syntax and Semantics Syntax - the form or structure of the expressions, statements, and program units Semantics - the meaning of the expressions, statements, and program units Who must use language definitions? Other language designers Implementors Programmers (the users of the language) Matt Ev ett 2 Syntax Definitions A sentence is a string of characters over some alphabet A language is a set of sentences A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) A token is a category of lexemes (e.g., identifier) Matt Ev ett 3

Using Formal Syntax Two general uses of formally defined languages: Recognizers - used in compilers. Given a syntax and a string, is the string sentence of the language? Generators - what we'll study. Given a syntax, generate legal sentences. Matt Ev ett 4 Formal Language Description Context-Free Grammars Developed by Noam Chomsky in the mid-50s Language generators, meant to describe the syntax of natural languages Defined a class of languages called contextfree languages Backus Normal Form (1959) Invented by John Backus to describe Algol 58 BNF is equivalent to context-free grammars A metalanguage for computer languages A language used Matt to Ev describe ett other languages. 5 BNF Abstractions are used to represent classes of syntactic structures Act like syntactic variables (also called nonterminal symbols) <while_stmt> -> while <logic_expr> do <stmt> This is a rule describing the structure of a while statement Matt Ev ett 6

BNF Grammar A grammar is a finite, nonempty set of rules (R), plus sets of terminal (T) and nonterminal (N) symbols. A rule has a left-hand side (LHS) and a righthandside (RHS) The LH is a single terminal symbol The RHS consisting of terminal and nonterminal symbols The sets of terminals (T) and nonterminals (N) are mutually exclusive. Nonterminals are indicated Matt Ev ett with <... > 7 BNF Rule An abstraction (or nonterminal symbol) can have more than one RHS <stmt> -> <single_stmt> begin <stmt_list> end BNF rules are often recursive. Ex: a list <ident_list> -> ident ident, <ident_list> Matt Ev ett 8 Example Grammar 1. <program> -> <stmts> 2. <stmts> -> <stmt> <stmt> ; <stmts> 3. <stmt> -> <var> = <expr> 4. <var> -> a b c d 5. <expr> -> <term> + <term> <term> - <term> 6. <term> -> <var> const Matt Ev ett 9

An example derivation: A derivation is a repeated application of rules, starting with the start symbol (εn)and yielding a sentence (all terminal symbols). <program> => <stmts> => <stmt> R1 and R2 => <var> = <expr> => a = <expr> R3, R4 => a = <term> + <term> R5a => a = <var> + <term> R6a => a = b + <term> R4 => a = b + const R6b Matt Ev ett 10 Derivations Every string of symbols in a derivation is a sentential form. A sentence is a sentential form that has only terminal symbols. A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation may be leftmost, rightmost or neither. Matt Ev ett 11 Parse Tree A parse tree is a hierarchical rep. of a derivation. A grammar is ambiguous iff it generates a sentential form that has two or more distinct parse trees <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b Matt Ev ett 12

Grammars & Languages A language is a set (possibly empty) of strings. A grammar, G, generates or defines a language, L, iff exactly those strings comprising L can be derived with G. All elements of L must be derivable with G. There must be no derivations for any strings not in L. Matt Ev ett 13 Ex: an ambiguous expression grammar <expr> -> <expr> <op> <expr> const <op> -> / - <expr> <expr> <op> <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const <expr> <op> <expr> const - const / const Matt Ev ett 14 Controlling Ambiguity Careful tinkering can convert ambiguous languages into equivalent unambiguous ones. An unambiguous expression grammar: <expr> -> <expr> - <term> <term> <term> -> <term> / const const <expr> => <expr> - <term> => <term> - <term> => const - <term> => const - <term> / const => const - const / const <expr> <expr> - <term> <term> <term> / const const const Matt Ev ett 15

Encoding Precedence Suppose evaluation of subexpressions of an arithmatic expression depended on their location within the parse tree; bottom-up <assgn> --> <id> = <expr> <id> --> A B C <expr> --> <expr> + <term> <term> <term> --> <term> * <factor> <factor> <factor> --> ( <expr> ) <id> Try parsing A = B * C + A Matt Ev ett 16 Encoding Associativity Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> const (ambiguous) <expr> -> <expr> + const const (unambiguous) <expr> <expr> + const <expr> + const const Matt Ev ett 17 Extended BNF Extended BNF (just abbreviations): Optional parts are placed in brackets ([]) <proc_call> -> ident [ ( <expr_list>)] Put alternative parts of RHSs in parentheses and separate them with vertical bars <term> -> <term> (+ -) const Put repetitions (0 or more) in braces ({}) <ident> -> letter {letter digit} Matt Ev ett 18

Example of EBNF BNF: <expr> -> <expr> + <term> <expr> - <term> <term> <term> -> <term> * <factor> <term> / <factor> <factor> EBNF: <expr> -> <term> {(+ -) <term>} <term> -> <factor> {(* /) <factor>} Matt Ev ett 19 Syntax Graphs Syntax Graphs - put the terminals in circles or ellipses and put the nonterminals in rectangles; connect with lines with arrowheads EX: Pascal type declarations ( Type_identifier Identifier ), constant.. constant Matt Ev ett 20 Recursive Descent Parsing Parsing is the process of tracing or constructing a parse tree for an input string Parsers usually do not analyze lexemes that is done by a lexical analyzer, which is called by the parser A recursive descent parser traces out a parse tree in top-down order; it is a top-down parser Each nonterminal in the grammar has a subprogram associated with it; the subprogram parses all sentential forms that the nonterminal can generate Matt Ev ett 21

Building Recursive Descent Parser Each grammar rule yields one recursive descent parsing subprogram. Example: For the grammar: <term> -> <factor> {(* /) <factor>} We could use the following recursive descent parsing subprogram. void term() { factor(); /* parse the first factor*/ while (next_token == ast_code next_token == slash_code) { lexical(); /* get next token */ factor(); /* parse the next factor */ } } Matt Ev ett 22 Limitations of RDP Recursive descent parsers, like other top-down parsers, cannot be built from left-recursive grammars Imagine the code that would derive from: <A> --> <A> + <C> void term() { A(); /* parse the lhs argument*/ if (next_token!= plus_code) { error(); return; } lexical(); /* get next token */ C(); /* parse the rhs */ } } Matt Ev ett 23 Static Semantics Static semantics ( have nothing to do with meaning) Categories: 1. Context-free (e.g. type checking), tends to be cumbersome 2. Noncontext-free (e.g. variables must be declared before they are used) Matt Ev ett 24

Attribute Grammars (Knuth, 1968) Cfgs cannot describe all of the syntax of programming languages E.g. type info Additions to cfgs to carry some semantic info along through parse trees Primary value of AGs: Static semantics specification Compiler design(static semantics checking) Matt Ev ett 25 Static Semantics Information that is difficult to encode with CFG. Could be encoded using CSG, but then it is more difficult to generate compilers. Static because the sentence validity can be checked at compile-time Matt Ev ett 26 Define Attribute Grammar Def: An attribute grammar is a cfg G = (S, N, T, P) with the following additions: For each grammar symbol x there is a set A(x) of attribute values Each rule has a set of functions that define certain attributes of the nonterminals in the rule Each rule has a (possibly empty) set of predicates to check for attribute consistency Matt Ev ett 27

AG Components Let X0 -> X1... Xn be a rule. Functions of the form S(X0) = f(a(x1),... A(Xn)) define synthesized attributes Functions of the form I(Xj) = f(a(x0),..., A(Xn)), for i <= j <= n, define inherited attributes Initially, there are intrinsic attributes on the parse tree leaves Matt Ev ett 28 Example AG (1) Example: expressions of the form id + id id's can be either int_type or real_type types of the two id's must be the same type of the expression must match it's expected type BNF: <expr> -> <var> + <var> <var> -> id Attributes: actual_type - synthesized for <var> and <expr> Matt Ev ett 29 expected_type - inherited for <expr> Ex: G and its Attributes The CFG rules may be augmented with [ ] Syntax rule: <expr> -> <var>[1] + <var>[2] Semantic rules: <var>[1].env <expr>.env <var>[2].env <expr>.env <expr>.actual_type <var>[1].actual_type Predicate: <var>[1].actual_type = <var>[2].actual_type <expr>.expected_type = <expr>.actual_type Syntax rule: <var> -> id Semantic rule: <var>.actual_type <- lookup (id, <var>.env) Matt Ev ett 30

Computing Attributes How to compute attributes? If all attributes were inherited, the tree could be decorated in top-down order. If all attributes were synthesized, the tree could be decorated in bottom-up order. In most cases, both kinds of attributes are used, requiring a combination of top-down and bottom-up decoration. Matt Ev ett 31 Computing Attributes (2) 1. <expr>.env inherited from parent <expr>.expected_type inherited from parent 2. <var>[1].env <expr>.env (inherited...) <var>[2].env <expr>.env 3. <var>[1].actual_type lookup (A, <var>[1].env) (synthesized...) <var>[2].actual_type lookup (B, <var>[2].env) <var>[1].actual_type =? <var>[2].actual_type (a predicate) 4. <expr>.actual_type <var>[1].actual_type <expr>.actual_type =? <expr>.expected_type Matt Ev ett 32 Annotate a parse tree See the board. Matt Ev ett 33

Dynamic Semantics No single widely acceptable notation or formalism for describing semantics Operational semantics Axiomatic semantics Denotational semantics Matt Ev ett 34 - Operational Semantics Describe the meaning of a program by executing its statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement To use operational semantics for a high-level language, a VM in needed A hardware pure interpreter would be too expensive A software pure interpreter also has problems: The detailed characteristics of the particular Matt Ev ett 35 computer would make actions difficult to Idealized VM A better alternative: A complete computer simulation The process: 1. Build a translator (translates source code to the machine code of an idealized computer) 2. Build a simulator for the idealized computer Matt Ev ett 36

Value of Operational Semantics Good if used informally Extremely complex if used formally (e.g., VDL) Matt Ev ett 37 Axiomatic Semantics Based on formal logic (first order predicate calculus) Original purpose: formal program verification Approach: Define axioms or inference rules for each statement type in the language (to allow transformations of expressions to other expressions) The expressions are called assertions Matt Ev ett 38 Conditions An assertion before a statement (a precondition) states the relationships and constraints among variables that are true at that point in execution An assertion following a statement is a postcondition A weakest precondition is the least restrictive precondition guaranteeing a postcondition Pre-post form: {P} statement {Q} An example: a := b + 1 {a > 1} One possible precondition: {b > 10} Weakest precondition: {b > 0} Matt Ev ett 39

Axioms - An axiom for assignment statements: {Q x->e } x := E {Q} - The Rule of Consequence: {P} S {Q}, P' => P, Q => Q' {P'} S {Q'} Matt Ev ett 40 Sequences - An inference rule for sequences (the Chaining Rule) - For a sequence S1;S2: {P1} S1 {P2} {P2} S2 {P3} the inference rule is: {P1} S1 {P2}, {P2} S2 {P3} {P1} S1; S2 {P3} Matt Ev ett 41 Axiomatic Proof Process Proving program correctness Program proof process: The postcondition for the whole program is the desired result. Work back through the program to the first statement, inferring preconditions. If the precondition on the first statement is the same as the program spec, the program is correct. Matt Ev ett 42

Inference Rules for Loops Very complicated! (Their use, that is.) Loop invariants Interested? See 3.6.2.6 Matt Ev ett 43 Loops (skip!) An inference rule for logical pretest loops For the loop construct: {P} while B do S end {Q} the inference rule is: (I and B) S {I} {I} while B do S {I and (not B)} where I is the loop invariant. Matt Ev ett 44 Invariants Characteristics of the loop invariant, I: 1. P => I (the invariant must be true initially) 2. {I} B {I} (evaluation of the Boolean must not change the validity of I) 3. {I and B} S {I} (I is not changed by executing the body of the loop) 4. (I and (not B)) => Q (if I is true and B is false, Q is implied) 5. The loop terminates (this can be difficult to prove) Matt Ev ett 45

Invariants (cont.) - The loop invariant I is a weakened version of the loop postcondition, and it is also a precondition. - I must be weak enough to be satisfied prior to the beginning of the loop, but when combined with the loop exit condition, it must be strong enough to force the truth of the postcondition Matt Ev ett 46 Developing Axiomatic Semantics Evaluation of axiomatic semantics: Developing axioms or inference rules for all of the statements in a language is difficult It is a good tool for correctness proofs, and an excellent framework for reasoning about programs, but it is not as useful for language users and compiler writers Matt Ev ett 47 Denotational Semantics Based on recursive function theory The most abstract semantics description method Originally developed by Scott and Strachey (1970) Matt Ev ett 48

Denotational Semantics (2) The process of building a denotational spec for a language: 1. Define a mathematical object for each language entity 2. Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects The meaning of language constructs are defined by only the values of the program's variables Meaning is assigned to grammar rules containing only a terminal as the RHS. Matt Ev ett 49 Denotational vs. Operational The difference between denotational and operational semantics: In operational semantics, the state changes are defined by coded algorithms; in denotational semantics, they are defined by rigorous mathematical functions The state of a program is the values of all its current variables s = {<i1, v1>, <i2, v2>,, <in, vn>} Let VARMAP be a function that, when given a variable name and a state, returns the current value of the variable VARMAP(ij, s) = vj Matt Ev ett 50 D.S. for Numbers 1. Decimal Numbers <dec_num> 0 1 2 3 4 5 6 7 8 9 <dec_num> (0 1 2 3 4 5 6 7 8 9) M dec ('0') = 0, M dec ('1') = 1,, M dec ('9') = 9 M dec (<dec_num> '0') = 10 * M dec (<dec_num>) M dec (<dec_num> '1 ) = 10 * M dec (<dec_num>) + 1 M dec (<dec_num> '9') = 10 * M dec (<dec_num>) + 9 Matt Ev ett 51

D.S. of Numeric Expressions M e (<expr>, s) = case <expr> of <dec_num> => M dec (<dec_num>, s) <var> => if VARMAP(<var>, s) = undef then error else VARMAP(<var>, s) <binary_expr> => if (M e (<binary_expr>.<left_expr>, s) = undef OR M e (<binary_expr>.<right_expr>, s) = undef) then error else if (<bi nary_expr>.<operator> = ë+í then M e (<binary_expr>.<left_expr>, s) + M e (<binary_expr>.<right_expr>, s) else M e (<binary_expr>.<left_expr>, s) * M e (<binary_expr>.<right_expr>, s) Matt Ev ett 52 D.S. Assignments & Loops Ma(x := E, s) = if Me(E, s) = error then error else s = {<i1,v1 >,<i2,v2 >,...,<in,vn >}, where for j = 1, 2,..., n, vj = VARMAP(ij, s) if ij <> x = Me(E, s) if i j = x 4 Logical Pretest Loops Ml(while B do L, s) = if Mb(B, s) = undef then error else if Mb(B, s) = false then s else if Msl(L, s) = error then error else Ml(while B do L, Msl(L, s)) Matt Ev ett 53 Loops - The meaning of the loop is the value of the program variables after the statements in the loop have been executed the prescribed number of times, assuming there have been no errors - In essence, the loop has been converted from iteration to recursion, where the recursive control is mathematically defined by other recursive state mapping functions - Recursion, when compared to iteration, is easier to describe with mathematical rigor Matt Ev ett 54

Use of D.S. Evaluation of denotational semantics: - Can be used to prove the correctness of programs - Provides a rigorous way to think about programs - Can be an aid to language design - Has been used in compiler generation systems Matt Ev ett 55