The Design of Core C++ (Notes)

Similar documents
CSCI 3155: Principles of Programming Languages Exam preparation #1 2007

Motivation was to facilitate development of systems software, especially OS development.

Motivation was to facilitate development of systems software, especially OS development.

The SPL Programming Language Reference Manual

Polymorphic lambda calculus Princ. of Progr. Languages (and Extended ) The University of Birmingham. c Uday Reddy

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Haskell 98 in short! CPSC 449 Principles of Programming Languages

Java Primer 1: Types, Classes and Operators

G Programming Languages Spring 2010 Lecture 6. Robert Grimm, New York University

Properties of an identifier (and the object it represents) may be set at

The PCAT Programming Language Reference Manual

Handout 10: Imperative programs and the Lambda Calculus

by Pearson Education, Inc. All Rights Reserved.

Type Checking and Type Inference

1 Introduction. 3 Syntax

Chapter 13 Object Oriented Programming. Copyright 2006 The McGraw-Hill Companies, Inc.

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Informatica 3 Syntax and Semantics

CPSC 427a: Object-Oriented Programming

CS4120/4121/5120/5121 Spring 2018 Xi Type System Specification Cornell University Version of February 18, 2018

Some instance messages and methods

Lecture Notes on Aggregate Data Structures

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

A declaration may appear wherever a statement or expression is allowed. Limited scopes enhance readability.

Computer Programming

Questions? Static Semantics. Static Semantics. Static Semantics. Next week on Wednesday (5 th of October) no

Modular implicits for OCaml how to assert success. Gallium Seminar,

Program construction in C++ for Scientific Computing

CS 6110 S11 Lecture 25 Typed λ-calculus 6 April 2011

Data Types The ML Type System

Imperative Functional Programming

Harvard School of Engineering and Applied Sciences Computer Science 152

Short Notes of CS201

(Refer Slide Time: 4:00)

Concepts of Programming Languages

Object Oriented Software Design II

CS4215 Programming Language Implementation. Martin Henz

Kakadu and Java. David Taubman, UNSW June 3, 2003

Dialects of ML. CMSC 330: Organization of Programming Languages. Dialects of ML (cont.) Features of ML. Functional Languages. Features of ML (cont.

CS201 - Introduction to Programming Glossary By

Lexical Considerations

1. Describe History of C++? 2. What is Dev. C++? 3. Why Use Dev. C++ instead of C++ DOS IDE?

Typed Racket: Racket with Static Types

Types-2. Polymorphism

3.7 Denotational Semantics

Index. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309

Quiz Start Time: 09:34 PM Time Left 82 sec(s)

Variables. Substitution

(Not Quite) Minijava

CS201 Latest Solved MCQs

Instantiation of Template class

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Object-Oriented Principles and Practice / C++

Lexical Considerations

These notes are intended exclusively for the personal usage of the students of CS352 at Cal Poly Pomona. Any other usage is prohibited without

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

Decaf Language Reference Manual

The Decaf Language. 1 Lexical considerations

Basic concepts. Chapter Toplevel loop

Handout 9: Imperative Programs and State

Introduction. Programming in C++ Pointers and arrays. Pointers in C and C++ Session 5 - Pointers and Arrays Iterators

Compiler construction 2009

This book is licensed under a Creative Commons Attribution 3.0 License

1 Lexical Considerations

Data Abstraction. Hwansoo Han

Wrapping a complex C++ library for Eiffel. FINAL REPORT July 1 st, 2005

Application: Programming Language Semantics

Chapter 6 Introduction to Defining Classes

Lecture Notes on Programming Languages

Lecture 18 Tao Wang 1

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

The New C Standard (Excerpted material)

A brief introduction to C programming for Java programmers

Auxiliary class interfaces

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

CS558 Programming Languages

Programming Languages 2nd edition Tucker and Noonan"

RSL Reference Manual

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

CSE 307: Principles of Programming Languages

Objects, Subclassing, Subtyping, and Inheritance

CE221 Programming in C++ Part 2 References and Pointers, Arrays and Strings

Pierce Ch. 3, 8, 11, 15. Type Systems

A Short Summary of Javali

Agenda. CS301 Session 11. Common type constructors. Things we could add to Impcore. Discussion: midterm exam - take-home or inclass?

Lecture 2: SML Basics

The Decaf language 1

Typescript on LLVM Language Reference Manual

QUIZ. How could we disable the automatic creation of copyconstructors

Advanced Systems Programming

OOPS Viva Questions. Object is termed as an instance of a class, and it has its own state, behavior and identity.

SOFTWARE ENGINEERING DESIGN I

+2 Volume II OBJECT TECHNOLOGY OBJECTIVE QUESTIONS R.Sreenivasan SanThome HSS, Chennai-4. Chapter -1

Object-Oriented Programming

XC Specification. 1 Lexical Conventions. 1.1 Tokens. The specification given in this document describes version 1.0 of XC.

Compiler Construction I

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

Lists. Michael P. Fourman. February 2, 2010

1 Terminology. 2 Environments and Static Scoping. P. N. Hilfinger. Fall Static Analysis: Scope and Types

Transcription:

The Design of Core C++ (Notes) Uday Reddy May 13, 1994 This note is to define a small formal language called Core C++ which reflects the essential structure of C++. As the name implies, the design only captures the core language, not the bells and whistles. Moreover, Core C++ deviates from the real C++ in certain ways in order to streamline the design as well as to disambiguate the semantics. Some of these deviations are only formal in that they are useful for describing the semantics, but they may not be reflected concretely in the real language. 1 Type Structure of C The type structure of C seems to be a three-layered system: Data types are types of values that can be stored in variables (denoted by schematic variable δ), Storage types are types of storage objects (denoted by schematic variable S), and Types in general (denoted by schematic variable T). Data types only appear as parts of other types. On the other hand, storage types and general types occur in declarations. Typically, the former occur in storage declarations and the latter in parameter declarations. Data types The data types are defined by δ ::= int T void int is representative of the plethora of data types found in C (short, float etc.). T stands for pointers to T-typed values. Note that the destination of a pointer can be of any type, not only a data type. Finally, void is a paradoxical type of no value. Storage types A good sample of storage types is given by S ::= δ var S[ ] struct {S 1 x 1 ;...S n x n ; The storage type δ var stands for δ-typed variables, but var is usually not written, i.e., δ var as a storage type is simply written δ. (More on this below.) 1

The storage type S[ ] stands for arrays of S-typed storage. (This type has no relation to pointers.) struct {S 1 x 1 ;...S n x n ; stands for structures with components x 1,...,x n of storage types S 1,...,S n respectively. Types General types are given by the syntax T ::= δ val S S const δ(t 1,...,T n ) δ val is the type of values of type δ. The suffix val is for disambiguation. It is never written in concrete C. Every storage type S is a type. S suffixed with const is the type of constant structures of type S. This is a promise by the programmer to never assign to any component of the storage structure. δ(t 1,...,T n ) stands for functions that take arguments of types T 1,...,T n (for n 0) and return results of type δ val. It is significant that the arguments can be of any type, but the result can only be a data value. We are keeping with the C philosophy of being a low-level language [1, pp. 1-2]. In practice, this is not a serious limitation because T is a data type for any type T. One might wonder if functions can return arbitrary typed results (instead of only data values). In general, this would involve heap storage with garbage collection. In my opinion, this would be a radical extension of C. Remarks 1. An important idea in the above treatment is that variables are primitive storage objects and other storage structures are built from them. For example, the storage declaration int a[10]; declares a to be an array consisting of 10 integer variables, not an array variable with an array value of 10 integers. The same remark applies to structs. Storage structures of this kind cannot be passed by value basically because there is no such thing as the value of an array or a struct. Warning: Real C does not agree with this view for structs, though it agrees for arrays. 2. A function taking a parmeter of a storage type takes the entire storage structure. In other words, it corresponds to call by reference. 1 Parameters of type δ val, however, are passed by value. 3. This three-layered type system is in contrast to the two-layered system of data and phrase types advocated by Reynolds for Algol-like languages [2]. The reason for the additional layer of storage types is that values of these types have automatic storage creation mechanisms. This is not the case for functions. 1 This is not strictly C, but C++. In C, function types are of the form δ(δ 1,..., δ n), i.e., arguments can only be data values. 2

4. Note that general types occur in only two places: (i) as parameter types of functions and (ii) as destination types of pointers. 5. The convention of abbreviating both δ var and δ val as simply δ leads, not surprisingly, to an ambiguity. All real-life versions of C use the following convention: in a function-parameter position, δ means δ val and, in a pointer destination position, δ means δ var. (Note that these are the only two places where general types are used.) In Standard C, the effect of δ var parameters is obtained by address-and-pointer mechanism. In C++, however, a notion of references is introduced to circumvent the ambiguity (and then arbitrarily generalized). The effect of δ val is obtained by δ const (which is really δ var const a rather circumlocutary expression!) Type compatibility Type compatibility for struct types is by name and, for others, it is by structure. By name means that struct types are given names by unique type definitions of the form struct t {S 1 x 1 ;...S n x n ; Then, two struct types are considered equal if only if they have the same name. Type compatibility is not strictly equality. It is governed by a subtype relation. Primitive data types have a standard subtype relation, e.g., int <: float. For pointer types, we have T <: T iff T <: T. Storage types have trivial subtype relation, i.e., a storage type is only a subtype of itself. For general types, we have a subtype relation given by the following rules: 2 Enhancements of C++ C++ extends structs as follows: δ <: δ δ val <: δ val S <: S const δ <: δ T 1 <: T 1... T n <: T n δ(t 1,...,T n ) <: δ (T 1,...,T n) S ::=... struct : t {T 1 x 1 ;...T n x n ; This notation extends C structs in two ways. First, the struct-type is declared as a subtype of another struct-type t. Therefore, one has a subtype relation: struct : t {T 1 x 1 ;...;T n x n ; <: t Second, the members of the struct are allowed to be of any type (including function types). The functions in a struct are called member functions or methods. They are deemed to have an implicit parameter t this where t is the name of the struct type. As usual, structs are always introduced via definitions. Such definitions are expected to give bindings for all the non-storage members (values, constants and function members). For example, a counter class is defined as 3

struct counter { int x = 0; int inc() { return (++ this->x); An equivalent definition is struct counter { int x = 0; let int inc() = counter inc int counter inc(counter *this) { return (++ this->x); 3 Declarations There are two kinds of declarations: binding declarations (also called definitions) declare an identifier and bind it to a specific value; type declarations simply declare the type of an identifier. There are no declarations for δ val type. (It would be ambiguous as noted above.) For the other three kinds of types, the two declarations have the syntax S x = P; extern S x; S const x = P; extern S const x; δ f(t 1 x 1,...,T n x n ) {C δ f(t 1,...,T n ); The declaration S x = P means create a storage object of type S with name x and initialize it with the value denoted by P. This kind of a definition is a statement as well as a binding. It is a statement in that it must be executed to create storage. It is a binding in that it binds the name x to the newly created storage. The scope of the name is the remainder of the program text delimited only by braces. For example, in {...;S x = P;...; the scope of x is the text following the declaration. If there are no braces delimiting the scope, the scope would consist of the entire following text. In contrast, a function declaration is not a statement. It can only appear at the top level. Generic let binding In addition to these, we introduce a generic binding mechanism of the form let T x = P. This is illustrated by the following examples: let int val small = 255; /* integer constant */ let int var a = p.x; /* alias to p.x */ let int sqr(int) = {(int x) return x*x; /* function by a block expression */ let type matrix = float[][] /* type definition */ Member declarations In a struct definition, the declarations of member fields have the status of a binding declaration. So, they can have initializers, function definitions and let bindings. In addition, the definition of a derived struct (in C++) can redefine the members of the parent. 4

4 Module system Even though C++ does not have a module system, we define one because it solves many problems which are otherwise treated in an arbitrary fashion in C++. Specifically, we want a module system with the following objectives: integrate header files into the language, provide representation hiding for types as well as for classes, provide parametric polymorphism. Our module system is a direct take off from that of Standard ML. A module is a named collection of binding declarations. This is similar to a struct definition except that it may also have type definitions. Modules in turn have large types that are called signatures. Examples A signature for items in a binary search tree may be defined as follows: signature CMP { type T; bool operator==(t, T); bool operator<(t, T); Then a specific module of this signature might be: CMP module StringCmp { let type T = char[]; bool operator==(t s, T t){ return (strcmp(s,t) == 0); bool operator<(t s, T t){ return (strcmp(s,t) < 0); A simpler module for searching in a linked list can be defined as follows: signature EQ { type T; bool operator==(t, T); let EQ module StringEq = StringCmp; The signature of a searching module can be defined thus: signature TABLE { type item; struct tableops { bool member(item); void insert(item); struct table <: tableops; The structtable has been declared to be some unknown subtype oftableops. Here are two sample searching modules: 5

TABLE module SlowTable(EQ module Elem) { let type item = Elem.T; import List(Elem); /* imports list, member, cons etc. */ struct table: tableops { list elems; bool member(item x) {return member(x, this->elems); void insert(item x) {this->elems = cons(x, this->elems); TABLE module SearchTree(CMP module Elem) { let type item = Elem.T; struct table: tableops { item root; table *left, *right; bool member(item x) {... void insert(item x) {... Notice that the data members of the table struct have been made private by declaring the module to have signature TABLE. Moreover, we achieve polymorphism by module parameterization. Thus, the use of parameterized modules is a powerful mechanism that solves several problems at once. Signatures A signature named Σ is defined in the following fashion: signature Σ {D 1 ;...D n ; where each D i is a declaration. It may be one of the following: extern T x type t struct t <: t type t = T struct t {S 1 x 1 ;... Σ module M include Σ type declaration of a name declaration of an opaque type declaration of an opaque subtype definition of a type definition of a struct type declaration of a module inclusion of another signature Type compatibility for signatures is by structure, not by name. For example, CMP <: EQ above. Modules The definition of a module has the syntax: Σ module M{B 1 ;...;B n ; where each B i is a binding declaration. In addition to the binding declarations already mentioned, modules provide a new one: import M Its effect is to include in the current context all the bindings of the module M as constrained by its signature. 6

5 Phrases Phrases are terms that one writes in the language. These are not to be confused with expressions (arithmetic or other) which yield data values. A phrase always has a phrase type. Phrase types include all types T mentioned in Sec. 1 and, in addition, two special types: S init for initializers of S-typed storage objects, and δ stmt for statements which might return δ-typed values. The phrase type stmt is used for statements which do not return. They may be deemed to be δ stmts for any δ. These types are special in that they are only used for describing terms. They are not included in regular types. So, there are no parameters, identifiers or member fields of these special types. A phrase often has free identifiers each of which is of a specific type. Let x 1 : T 1,...,x n : T n be a list of such distinct identifiers (with order assumed insignificant). To say that P is a phrase of phrase type τ in such a context, we write x 1 : T 1,...,x n : T n P : τ Greek letters Γ,,...are used to stand for lists of free identifiers. We write Γ[x : T] to mean the context Γ with the entry for x (if any) replaced by x : T. 5.1 General phrases An identifier of type T is always a phrase of type T: Γ x : T The type of a phrase is convertible by subtyping: 5.2 Expressions if x : T is in Γ Γ P : T Γ P : T if T <: T A phrase of type δ val is called an expression. Note that expressions can only denote data values. int Examples: Γ 0 : int val Γ E 1 : int val Γ E 2 : int val Γ E 1 + E 2 : int val pointers A pointer is created by new and used by the dereferencing operator: Γ NULL : T val Γ P : T init Γ new T P : T val Γ E : T val Γ E : T Γ E : T val Γ delete E : stmt The initializer for a new function must always be top-level function name (to avoid dangling references). Note that we do not support the address operator &. This would lead to dangling references. 7

5.3 Statements We use braces { as parentheses for grouping statements. They can be dropped wherever unnecessary. The three basic operations are empty statement, sequencing and return: Γ { : stmt Γ return : void stmt Γ C 1 : δ stmt Γ C 2 : δ stmt Γ {C 1 ; C 2 : δ stmt Γ E : δ val Γ return E : δ stmt A non-returning statement can be deemed to return a result of any type: Γ C : stmt Γ C : δ stmt The infamous expression statement is in Core C++: 5.4 Objects Γ E : δ val Γ E : stmt A storage object is created by a binding declaration: Γ P : S init Γ[x : S] C : δ stmt Γ {Sx = P; C : δ stmt Γ P : S init Γ[x : S const] C : δ stmt Γ {S const x = P; C : δ stmt Variables Variables can be dereferenced and assigned: Γ X : δ var Γ X : δ val Γ X : δ var Γ E : δ val Γ X = E : δ val Γ X : δ var const Γ X : δ val Arrays Arrays are subscripted: Γ X : S[ ] Γ E : int val Γ X[E] : S Γ X : S[ ] const Γ E : int val Γ X[E] : S const Structures Structures have field selection Γ X : struct{...;tx;... Γ X.x : T Γ X : struct{...;tx;... const Γ X.x : T const Initializers Variable initializers are just expressions: Γ E : δ val Γ E : δ var init Array and structure initializers are suitable collections. They are omitted in this summary. 8

5.5 Functions A function is applied in the usual fashion: A function is built by a block expression: Γ P : δ(t 1,...,T n ) Γ Q 1 : T 1... Γ Q n : T n Γ P(Q 1,...,Q n ) : δ val Γ[x 1 : T n,...,x n : T n ] C : δ stmt Γ {(T 1 x 1,...,T n x n ) C : δ(t 1,...,T n ) Note that block expressions can be used to create downward closures. 6 Conclusion I hope this brief design notes has convinced the reader that the core of C and C++ are quite solid. The essential concepts can be repackaged in a coherent fashion. Let me mention some open issues which I haven t touched upon: 1. A formal semantics of the language must be defined and a coherence theorem must be proved stating that every type derivation gives the same meaning. For C++ this theorem is alleged not to be true. 2. All member functions are automatically virtual. Is this implementable efficiently? What about multiple inheritance? 3. We have ignored the issue of constructors and destructors. Do we need them? Can they be added cleanly? 4. Should functions return larger values? Other than garbage collection, what problems are there? How do expressions and phrases get reconciled? References [1] B. W. Kernighan and D. M. Ritchie. The C Programming Language, Second Edition. Prentice Hall, 1988. [2] J. C. Reynolds. The essence of Algol. In J. W. de Bakker and J. C. van Vliet, editors, Algorithmic Languages, pages 345 372. North-Holland, 1981. 9