Supplementary Notes on Abstract Syntax

Similar documents
Lecture Notes on Static and Dynamic Semantics

Lecture Notes on Aggregate Data Structures

Programming Languages Third Edition

Lecture 2: Big-Step Semantics

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

1 Lexical Considerations

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Lexical Considerations

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

A Simple Syntax-Directed Translator

This book is licensed under a Creative Commons Attribution 3.0 License

Basic concepts. Chapter Toplevel loop

Lexical Considerations

Lecture Notes on Program Equivalence

Introduction to Parsing

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011

Parsing. Zhenjiang Hu. May 31, June 7, June 14, All Right Reserved. National Institute of Informatics

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):

7. Introduction to Denotational Semantics. Oscar Nierstrasz

2.2 Syntax Definition

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Program Analysis: Lecture 02 Page 1 of 32

Fundamental Concepts. Chapter 1

Syntax and Grammars 1 / 21

Today. Elements of Programming Languages. Concrete vs. abstract syntax. L Arith. Lecture 1: Abstract syntax

EECS 6083 Intro to Parsing Context Free Grammars

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

The PCAT Programming Language Reference Manual

Context-Free Grammars

Lecture Notes on Induction and Recursion

3.4 Deduction and Evaluation: Tools Conditional-Equational Logic

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

A simple syntax-directed

Formal Systems and their Applications

Programming Languages Third Edition. Chapter 7 Basic Semantics

Programming Languages Fall 2013

1. Abstract syntax: deep structure and binding. 2. Static semantics: typing rules. 3. Dynamic semantics: execution rules.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

COS 320. Compiling Techniques

CS 314 Principles of Programming Languages

Types and Static Type Checking (Introducing Micro-Haskell)

(Refer Slide Time: 4:00)

Syntax Analysis Check syntax and construct abstract syntax tree

The SPL Programming Language Reference Manual

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

In Our Last Exciting Episode

A Functional Evaluation Model

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Types and Static Type Checking (Introducing Micro-Haskell)

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Parsing II Top-down parsing. Comp 412

What do Compilers Produce?

Syntax. A. Bellaachia Page: 1

Lecture Notes on Static Semantics

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Boolean expressions. Elements of Programming Languages. Conditionals. What use is this?

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

COP 3402 Systems Software Syntax Analysis (Parser)

CSC 501 Semantics of Programming Languages

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Propositional Logic. Part I

6.001 Notes: Section 8.1

The Structure of a Syntax-Directed Compiler

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

MMT Objects. Florian Rabe. Computer Science, Jacobs University, Bremen, Germany

CSCE 314 Programming Languages

CS 565: Programming Languages. Spring 2008 Tu, Th: 16:30-17:45 Room LWSN 1106

Operational Semantics

Basic Structure of Denotational Definitions

Propositional Logic. Andreas Klappenecker

CSE 505: Concepts of Programming Languages

CS 6110 S14 Lecture 1 Introduction 24 January 2014

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

Lecture Notes on Bottom-Up LR Parsing

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

The Structure of a Compiler

Semantic Analysis. Lecture 9. February 7, 2018

JAVASCRIPT AND JQUERY: AN INTRODUCTION (WEB PROGRAMMING, X452.1)

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler,

CSE 12 Abstract Syntax Trees

Part 5 Program Analysis Principles and Techniques

SML Style Guide. Last Revised: 31st August 2011

COP4020 Programming Assignment 2 - Fall 2016

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

Syntax Analysis. Chapter 4

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CLF: A logical framework for concurrent systems

Meta-programming with Names and Necessity p.1

Formal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1

Transcription:

Supplementary Notes on Abstract Syntax 15-312: Foundations of Programming Languages Frank Pfenning Lecture 3 September 3, 2002 Grammars, as we have discussed them so far, define a formal language as a set of strings. We refer to this as the concrete syntax of a language. While this is necessary in the complete definition of a programming language, it is only the beginning. We further have to define at least the static semantics (via typing rules) and the dynamic semantics (via evaluation rules). Then we have to reason about their relationship to establish, for example, type soundness. Giving such definitions and proofs on strings is extremely tedious and inappropriate; instead we want to give it a more abstract form of representation. We refer to this layer of representation as the abstract syntax of a language. An appropriate representation vehicle are terms [Ch. 1.2.1]. Given this distinction, we can see that parsing is more than simply recognizing if a given string lies within the language defined by a grammar. Instead, parsing in our context should translate a string, given in concrete syntax, into an abstract syntax term. The converse problem of printing (or unparsing) is to translate and abstract syntax term into a string representation. While the grammar formalism is somewhat unwieldy when it comes to specifying the translation into abstract syntax, we see that the mechanism of judgments is quite robust and can specify both parsing and unparsing quite cleanly. We begin by reviewing the arithmetic expression language in its concrete [Ch. 3] and abstract [Ch. 4.1] forms. First, the grammar in its unambigous form. 1 We implement here the decision that addition and multiplication should be left-associative (so 1+2+3 is parsed as (1+2)+3) and that 1 We capitalize the non-terminals to avoid confusion when considering both concrete and abstract syntax in the same judgment. Also, the syntactic category of Terms (denoted by T ) should not be confused with the terms we use to construct abstract syntax.

L3.2 Abstract Syntax multiplication has precedence over addition. Such choices are somewhat arbitrary and dictated by convention rather than any scientific criteria. 2 Digits D : : = 0 9 Numbers N : : = D N D Expressions E : : = T E+T Terms T : : = F T *F Factors F : : = N (E) Written in the form of five judgments. 0 D 9 D s D s N s T s E s F s T s N s F s 1 N s 2 D s 1 s 2 N s 1 E s 2 T s 1 +s 2 E s 1 T s 2 F s 1 *s 2 T s E (s) F The abstract syntax of the language is much simpler. It can be specified in the form of a grammar, where the universe we are working over are terms and not strings. While natural numbers can also be inductively defined in a variety of ways [Ch 1.1.1], we take them here as primitive mathematical objects. nat : : = 0 1 expr : : = num(nat) plus(expr, expr) times(expr, expr) Presented as two judgments, we have k nat for every natural number k and the following rule for expressions 2 The grammar given in [Ch. 3.2] is slightly different, since there addition and multiplication are assumed to be right associative.

Abstract Syntax L3.3 k nat num(k) expr t 1 expr t 2 expr plus(t 1, t 2 ) expr t 1 expr t 2 expr times(t 1, t 2 ) expr Now we specify the proper relation between concrete and abstract syntax through several simultaneously inductive judgments. Perhaps the easiest way to generate these judgments is to add the corresponding abstract syntax terms to each of the inference rules defining the concrete syntax. 0 D 0 nat 9 D 9 nat s D k nat s N k nat s T t expr s E t expr s F t expr s T t expr s 1 N k 1 nat s 2 D k 2 nat s 1 s 2 N 10k 1 + k 2 nat s 1 E t 1 expr s 2 T t 2 expr s 1 +s 2 E plus(t 1, t 2 ) expr s 1 T t 1 expr s 2 F t 2 expr s 1 *s 2 T times(t 1, t 2 ) expr s N k nat s F num(k) expr s E t expr (s) F t expr When giving a specification of the form above, we should verify that the basic properties we expect, actually hold. In this case we would like to check that related strings and terms belong to the correct (concrete or abstract, respectively) syntactic classes. Theorem 1 (i) If s E t expr then s E and t expr. (ii) If s E then there exists a t such that s E t expr. Proof: For each part, by rule induction on the given derivation. In each case we can immediately appeal to the induction hypothesis on all subderivations and construct a derivation of the desired judgment from the

L3.4 Abstract Syntax results. When implementing such a specification, we generally make a commitment as to what is considered our input and what is our output. As motivated above, parsing and unparsing (printing) are specifed by this judgment. Definition 2 (Parsing) Given a string s, find a term t such that s E t expr or fail, if no such t exists. Obvious analogous definitions exist for the other syntactic categories. Now we can refine our notion of ambiguity to take into account the abstract syntax that is constructed. This is slightly more relaxed that requiring the uniqueness of derivations, because different derivations could still lead to the same abstract syntax term. Definition 3 (Ambiguity of Parsing) A parsing problem is ambiguous if for a given string s there exist two distinct terms t 1 and t 2 such that s E t 1 expr and s E t 2 expr. Unparsing is just the reverse of parsing: we are given a term t and have to find a concrete syntax representation for it. Unparsing is usually total (every term can be unparsed) and inherently ambiguous (the same term can be written as several strings). An example of this ambiguity is the insertion of additional redundant parentheses. Therefore, any unparser must use heuristics to choose among different alternative string representations. Definition 4 (Unparsing) Given a term t such that t expr, find a string s such that s E t expr. The ability to use judgments as the basis for implementation of different tasks is evidence for their flexibility. Often, it is not difficult to translate a judgment into an implementation in a high-level language such as ML, although in some cases it might require significant ingenuity and some advanced techniques. Our little language of arithmetic expressions serves to illustrate various ideas, such as the distinction between concrete syntax and abstract syntax, but it is too simple to exhibit various other phenomena and concepts. One of the most important one is that of a variable, and the notion of variable binding and scope. In order to discuss variables in isolation, we extend

Abstract Syntax L3.5 our language by a new form of expression to name preliminary results. For example, let x be 2*3 in x+x end should evaluate to 12, but only compute the value of 2*3 once. First, the concrete syntax, showing only the changed or new cases. Variables X : : = (any identifier) Factors F : : = N (E) let X be E in E end X We ignore here the question what constitutes a legal identifier. Presumably it should avoid keywords (such as let, b), special symbols, such as +, and be surrounded by whitespace. In an actual language implementation a lexer breaks the input string into keywords, special symbols, numbers, and identifiers that are the processed by the parser. The first approach to the abstract syntax would be to simply introduce a new abstract syntactic category of variable [Ch. 5.1] and a new operator let with three arguments, let(x, e 1, e 2 ), where x is a variable and e 1 and e 2 are terms representing expressions. Furthermore, we allow an occurrence of a variable x as a term. However, this approach does not clarify which occurrences of a variable are binding occurrences, and to which binder a variable occurrence refers. For example, to see that let x be 1 in let x be x+1 in x+x end end evaluates to 4, we need to know which occurrences of x refer to which values. Rules for scope resolution [Ch. 5.1] dictate that it should be interpreted the same as let x 1 be 1 in let x 2 be x 1 +1 in x 2 +x 2 end end where there is no longer any potential ambiguity. That is, the scope of the variable x in let x be s 1 in s 2 end is s 2 but not s 1. A uniform technique to encode the information about the scope of variables is called higher-order abstract syntax [Ch. 5]. We add to our language of terms a construct x.t which binds x in the term t. Every occurrence of x in t that is not shadowed by another binding x.t, refers to the shown top-level abstraction. Such variables are a new primitive concept, and, in particular, a variable can be used as a term (in addition to the usual operator-based

L3.6 Abstract Syntax terms). We would extend our judgment relating concrete and abstract syntax by x X s 1 E t 1 expr s 2 E t 2 expr let x be s 1 in s 2 end let(t 1, x.t 2 ) expr x X x E x expr and allow for expressions x expr t 1 expr t 2 expr let(t 1, x.t 2 ) expr Note that we translate an identifier x to an identically named variable x in higher-order abstract syntax. Moreover, we view variables in higherorder abstract syntax as a new kind of term, so we do not check explicitly the x s are in fact variables it is implied that they are. We emphasize again that the laws for scope resolution of let-expressions are directly encoded in the higher-order abstract representation. We investigate the laws underlying such representations in Lecture 4 [Ch. 5.3]. We can formulate the language of abstract syntax for arithmetic expressions in a more compact notation as a grammar. nat : : = 0 1 expr : : = num(nat) plus(expr, expr) times(expr, expr) x let(expr, x.expr) As a concrete example, consider the string let x 1 be 1 in let x 2 be x 1 +1 in x 2 +x 2 end end which, in abstract syntax, would be represented as let(num(1), x 1.let(plus(x 1, num(1)), x 2.plus(x 2, x 2 )))