Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

Similar documents
Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Programming Languages Third Edition

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

CMSC 330: Organization of Programming Languages

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

CS 6110 S14 Lecture 1 Introduction 24 January 2014

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011

Syntax. A. Bellaachia Page: 1

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Operational Semantics 1 / 13

Formal Semantics of Programming Languages

Formal Semantics of Programming Languages

Formal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1

2 Introduction to operational semantics

CSC 7101: Programming Language Structures 1. Operational Semantics. Winskel, Ch. 2 Slonneger and Kurtz Ch 8.4, 8.5, 8.6. Operational vs.

Last class. CS Principles of Programming Languages. Introduction. Outline

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

This book is licensed under a Creative Commons Attribution 3.0 License

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

PROGRAM ANALYSIS & SYNTHESIS

Intro to Haskell Notes: Part 5

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Chapter 3. Describing Syntax and Semantics

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Homework & Announcements

Lecture 2: Big-Step Semantics

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

CS323 Lecture - Specifying Syntax and Semantics Last revised 1/16/09

Principles of Programming Languages

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages. Lambda calculus

Program Analysis: Lecture 02 Page 1 of 32

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

Static semantics. Lecture 3-6: Semantics. Attribute grammars (2) Attribute grammars. Attribute grammars example. Dynamic semantics

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

Chapter 3. Describing Syntax and Semantics ISBN

A Simple Syntax-Directed Translator

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

CSE 307: Principles of Programming Languages

3.7 Denotational Semantics

CSE 12 Abstract Syntax Trees

Fundamental Concepts. Chapter 1

The Substitution Model

CSCE 314 Programming Languages

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Programming Lecture 3

Chapter 3. Describing Syntax and Semantics

Chapter 3 (part 3) Describing Syntax and Semantics

Contents. Chapter 1 SPECIFYING SYNTAX 1

CS101 Introduction to Programming Languages and Compilers

Semantic Analysis. Lecture 9. February 7, 2018

3.1 IMP: A Simple Imperative Language

CSCI312 Principles of Programming Languages!

Stating the obvious, people and computers do not speak the same language.

CS525 Winter 2012 \ Class Assignment #2 Preparation

15 212: Principles of Programming. Some Notes on Grammars and Parsing

Propositional Logic. Andreas Klappenecker

CMSC 330: Organization of Programming Languages. Context Free Grammars

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

Principles of Programming Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

Formal Systems and their Applications

3. DESCRIBING SYNTAX AND SEMANTICS

Chapter 3: Propositional Languages

Lecture 4: Syntax Specification

1 Introduction. 3 Syntax

Semantics of programming languages

Context-Free Grammar (CFG)

CS 415 Midterm Exam Spring 2002

CS558 Programming Languages

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

SECOND PUBLIC EXAMINATION. Compilers

Harvard School of Engineering and Applied Sciences Computer Science 152

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Application: Programming Language Semantics

Types and Static Type Checking (Introducing Micro-Haskell)

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Guidelines for Writing Mathematical Proofs

Lecture 2: Analyzing Algorithms: The 2-d Maxima Problem

The Substitution Model. Nate Foster Spring 2018

Semantics of Programming Languages - Autumn 2004

14.1 Encoding for different models of computation

Syntax and Semantics

Propositional Logic Formal Syntax and Semantics. Computability and Logic

Defining Languages GMU

EECS 6083 Intro to Parsing Context Free Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

Special Topics: Programming Languages

The semantics of a programming language is concerned with the meaning of programs, that is, how programs behave when executed on computers.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Syntax and Grammars 1 / 21

Introduction to Parsing Ambiguity and Syntax Errors

Proofwriting Checklist

Transcription:

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages Lecture 1 Tuesday, January 29, 2013 1 Intro to semantics What is the meaning of a program? When we write a program, we use a sequence of characters to represent the program. But this syntax is just how we represent the program: it is not what the program means. Maybe we could define the meaning of a program to be whatever happens when we execute the program (perhaps using an interpreter, or by compiling it first). But we can have bugs in interpreters and compilers! That is, an interpreter or compiler may not accurately reflect the meaning of a program. So we must look elsewhere for a definition of what a program means. One place to look for the meaning of a program is in the language specification manual. Such manuals typically give an informal description of the language constructs. Another option is to give a formal, mathematical definition of the language semantics. A formal mathematical definition can have the following advantages over an informal description. Less ambiguous. The behavior of the language is clearer, which is useful for anyone who needs to write programs in the language, implement a compiler or interpreter for the language, add a new feature to the language, etc. More concise. Mathematical concepts and notation can clearly and concise describe a language, and state restrictions on legal programs in the language. For example, the Java Language Specification (2nd edition) devotes a chapter (21 pages) to describing the concept of definite assignment, most of which is describing, in English, a dataflow analysis that can be expressed more succinctly using mathematics. Formal arguments. Most importantly, a formal semantics allows us to state, and prove, program properties that we re interested in. For example: we can state and prove that all programs in a language are guaranteed to be free of certain run-time errors, or even free of security violations; we can state the specification of a program and prove that the program meets the specification (i.e., that the program is guaranteed to produce the correct output for all possible inputs). However, the drawback of formal semantics is that they can lead to fairly complex mathematical models, especially if one attempts to describe all details in a full-featured modern language. Few real programming languages have a formal semantics, since modeling all the details of a real-world language is hard: real languages are complex, with many features. In order to describe these features, both the mathematics and notation for modeling can get very dense, making the formal semantics difficult to understand. Indeed, sometimes novel mathematics and notation needs to be developed to accurately model language features. So while there can be many benefits to having a formal semantics for a programming language, they do not yet outweigh the costs of developing and using a formal semantics for real proramming languages. There are three main approaches to formally specify the semantics of programming languages: operational semantics: describes how a program would execute on an abstract machine; denotational semantics: models programs as mathematical functions; axiomatic semantics: defines program behavior in terms of the logical formulae that are satisfied before and after a program; Each of these approaches has different advantages and disadvantages in terms of how mathematically sophisticated they are, how easy it is to use them in proofs, or how easy it is to implement an interpreter or compiler based on them.

2 A simple language of arithmetic expressions To understand some of the key concepts of semantics, we will start with a very simple language of integer arithmetic expressions, with assignment. A program in this language is an expression; executing a program means evaluating the expression to an integer. To describe the structure of this language we will use the following domains: x, y, z Var n, m Int e Exp Var is the set of program variables (e.g., foo, bar, baz, i, etc.). Int is the set of constant integers (e.g., 42, 40, 7). Exp is the domain of expressions, which we specify using a BNF (Backus-Naur Form) grammar: e ::= x n e 1 + e 2 e 1 e 2 x := e 1 ; e 2 Informally, the expression x := e 1 ; e 2 means that x is assigned the value of e 1 before evaluating e 2. The result of the entire expression is that of e 2. This grammar specifies the syntax for the language. An immediate problem here is that the grammar is ambiguous. Consider the expression 1 + 2 3. One can build two abstract syntax trees: + 1 * 2 3 * + 3 1 2 There are several ways to deal with this problem. One is to rewrite the grammar for the same language to make it unambiguous. But that makes the grammar more complex, and harder to understand. Another possibility is to extend the syntax to require parentheses around all expressions: x n (e 1 + e 2 ) (e 1 e 2 ) x := e 1 ; e 2 However, this also leads to unnecessary clutter and complexity. Instead, we separate the concrete syntax of the language (which specifies how to unambiguously parse a string into program phrases) from the abstract syntax of the language (which describes, possibly ambiguously, the structure of program phrases). In this course we will use the abstract syntax and assume that the abstract syntax tree is known. When writing expressions, we will occasionally use parenthesis to indicate the structure of the abstract syntax tree, but the parentheses are not part of the language itself. (For details on parsing, grammars, and ambiguity elimination, see or take the compiler course Computer Science 153.) 3 Small-step operational semantics At this point we have defined the syntax of our simple arithmetic language. We have some informal, intuitive notion of what programs in this language mean. For example, the program 7 +(4 2) should equal 15, and the program i := 6 + 1; 2 3 i should equal 42. We would like now to define formal semantics for this language. Operational semantics describe how a program would execute on an abstract machine. A small-step operational semantics describe how such an execution in terms of successive reductions of an expression, until we reach a number, which represents the result of the computation. The state of the abstract machine is usually referred to as a configuration, and for our language it must include two pieces of information: Page 2 of 5

a store (aka environment or state), which assigns integer values to variables. During program execution, we will refer to the store to determine the values associated with variables, and also update the store to reflect assignment of new values to variables. the expression left to evaluate. Thus, the domain of stores is functions from Var to Int (written Var Int), and the domain of configurations is pairs of expressions and stores. Config = Exp Store Store = Var Int We will denote configurations using angle brackets. For instance, (foo + 2) (bar + 1), σ, where σ is a store and (foo + 2) (bar + 1) is an expression that uses two variables, foo and bar. The small-step operational semantics for our language is a relation Config Config that describes how one configuration transitions to a new configuration. That is, the relation shows us how to evaluate programs, one step at a time. We use infix notation for the relation. That is, given any two configurations e 1, σ 1 and e 2, σ 2, if (e 1, σ 1, e 2, σ 2 ) is in the relation, then we write e 1, σ 1 e 2, σ 2. For example, we have (4 + 2) y, σ 6 y, σ. That is, we can evaluate the configuration (4 + 2) y, σ by one step, to get the configuration 6 y, σ. Now defining the semantics of the language boils down to defining the relation that describes the transitions between machine configurations. One issue here is that the domain of integers is infinite, and so is the domain of expressions. Therefore, there is an infinite number of possible machine configurations, and an infinite number of possible one-step transitions. We need to use a finite description for the infinite set of transitions. We can compactly describe the transition function using inference rules: VAR x, σ n, σ where n = σ(x) e 1 + e 2, σ e 1 + e 2, σ RADD e 2, σ e 2, σ n + e 2, σ n + e 2, σ ADD n + m, σ p, σ where p is the sum of n and m e 1 e 2, σ e 1 e 2, σ RMUL e 2, σ e 2, σ n e 2, σ n e 2, σ MUL n m, σ p, σ where p is the product of n and m ASG1 x := e 1 ; e 2, σ x := e 1; e 2, σ ASG x := n; e2, σ e 2, σ[x n] The meaning of an inference rule is that if the fact above the line holds and the side conditions also hold, then the fact below the line holds. The fact(s) above the line are called premises; the fact below the line is Page 3 of 5

called the conclusion. The rules without premises are axioms; and the rules with premises are inductive rules. Also, we use the notation σ[x n] for a store that maps the variable x to integer n, and maps every other variable to whatever σ maps it to. More explicitly, we if f is the function σ[x n], then we have n if y = x f(y) = σ(y) otherwise 4 Using the Semantic Rules Let s see how we can use these rules. Suppose we want to evaluate expression (foo + 2) (bar + 1) in a store σ where σ(foo) = 4 and σ(bar) = 3. That is, we want to find the transition for configuration (foo + 2) (bar + 1), σ. For this, we look for a rule with this form of a configuration in the conclusion. By inspecting the rules, we find that the only matching rule is, where e 1 = foo + 2, e 2 = bar + 1, but e 1 is not yet known. We can instantiate the rule, replacing the metavariables e 1 and e 2 with appropriate expressions. foo + 2, σ e 1, σ (foo + 2) (bar + 1), σ e 1 (bar + 1), σ Now we need to show that the premise actually holds and find out what e 1 is. We look for a rule whose conclusion matches foo + 2, σ e 1, σ. We find that is the only matching rule: foo, σ e 1, σ foo + 2, σ e 1+2, σ where e 1 = e 1+2. We repeat this reasoning for foo, σ e 1, σ, and we find that the only applicable rule is the axiom VAR: VAR foo, σ 4, σ because we have σ(foo) =4. Since this is an axiom and has no premises, there is nothing left to prove. Hence, e = 4 and e 1 = 4 + 2. We can put together the above pieces and build the following proof: VAR foo, σ 4, σ foo + 2, σ 4 + 2, σ (foo + 2) (bar + 1), σ (4 + 2) (bar + 1), σ This proves that, given our inference rules, the one-step transition (foo + 2) (bar + 1), σ (4 + 2) (bar + 1), σ is possible. The above proof structure is called a proof tree or derivation. It is important to keep in mind that proof trees must be finite for the conclusion to be valid. We can use a similar reasoning to find out the next evaluation step: ADD 4 + 2, σ 6, σ (4 + 2) (bar + 1), σ 6 (bar + 1), σ And we can continue this process. At the end, we can put together all of these transitions, to get a view of the entire computation: Page 4 of 5

(foo + 2) (bar + 1), σ (4 + 2) (bar + 1), σ 6 (bar + 1), σ 6 (3 + 1), σ 6 4, σ 24, σ The result of the computation is a number, 24. The machine configuration that contains the final result is the point where the evaluation stops; they are called final configurations. For our language of expressions, the final configurations are of the form n, σ where n is a number and σ is a store. We write for the reflexive transitive closure of the relation. That is, if e, σ e, σ, then using zero or more steps, we can evaluate the configuration e, σ to the configuration e, σ. Thus, we can write (foo + 2) (bar + 1), σ 24, σ. Page 5 of 5