CS : Programming Languages

Size: px

Start display at page:

Download "CS : Programming Languages"

Clarissa Stafford
6 years ago
Views:

1 CS : Programming Languages Joseph (Yossi) Gil * April 5, 2015 Abstract Programming languages are the medium through which programmers precisely describe concepts, formulate algorithms, and reason about solutions. In the course of a career, a computer scientist will work with many different languages, separately or together. Software developers must understand the programming models underlying different languages and make informed design choices in languages supporting multiple complementary approaches. Computer scientists will often need to learn new languages and programming constructs, and must understand the principles underlying how programming language features are defined, composed, and implemented. The effective use of programming languages, and appreciation of their limitations, also requires a basic knowledge of programming language translation and static program analysis, as well as run-time components such as memory management. 1 * yogi@cs.technion.ac.il 1 Drawn from: Computer Science Curricula 2013 Curriculum Guidelines for Undergraduate Degree Programs, in Computer Science, December 20, 2013, by the Joint Task Force on Computing Curricula: Association for Computing Machinery (ACM) and IEEE Computer Society 1

2 Contents (short) [831 frames] Contents (short) i Contents (long) ii List of figures iv List of tables vi List of tables vii List of equations vii Thirteen weeks schedule xiii 1 Preliminaries [35 frames] Administration [11 frames] Motivation [13 frames] Hello, World! [11 frames] Introduction [84 frames] PL design [9 frames] Programming paradigms [10 frames] History of programming languages [8 frames] Syntax specification [25 frames] Tokens: the atoms of syntax [31 frames] Values and types [181 frames] Value systems [34 frames] Introduction to types [11 frames] The type constructors of Mock [40 frames] Type constructors in actual PLs [57 frames] Atomic types [33 frames] Representation of types in memory [6 frames] Advanced typing [152 frames] Classification of type systems [41 frames] Structural vs. nominal typing [13 frames] Theoretical polymorphism [64 frames] Polymorphism in practice [34 frames] Storage [140 frames] Storage models [21 frames] Arrays [24 frames] Variables life time [35 frames] Value vs. reference semantics [27 frames] Automatic memory management [16 frames] Run time type information [16 frames] Commands [113 frames] Expressions vs. commands [10 frames] Recursive definitions [11 frames] Atomic commands [4 frames] Block commands [6 frames] Conditional commands [13 frames] Iterative commands [11 frames] Structured programming [11 frames] Sequencers [11 frames] Exceptions [35 frames] Advanced abstractions [126 frames] Expressions evaluation order [19 frames] Closures [41 frames] Function objects [13 frames] Generators [11 frames] Iterators in [17 frames] Examples of iterators [8 frames] Coroutines [17 frames] i

3 Contents (long) [831 frames] Contents (short) i Contents (long) ii List of figures iv List of tables vi List of tables vii List of equations vii Thirteen weeks schedule xiii 1 Preliminaries [35 frames] Administration [11 frames] Motivation [13 frames] Hello, World! [11 frames] Introduction [84 frames] PL design [9 frames] Programming paradigms [10 frames] History of programming languages [8 frames] Syntax specification [25 frames] Regular expressions [14 frames] EBNF [10 frames] Exercises Tokens: the atoms of syntax [31 frames] Kinds of tokens [19 frames] Library identifiers [7 frames] Starting point [3 frames] Values and types [181 frames] Value systems [34 frames] Symbolic values [6 frames] Semantics of S-expressions? [16 frames] Expressions [7 frames] References Exercises Introduction to types [11 frames] References Exercises The type constructors of Mock [40 frames] Power sets [5 frames] Cartesian product [5 frames] Integral exponentiation [2 frames] Unit type [2 frames] Branding [3 frames] Records [2 frames] Disjoint union [4 frames] Type None and type Any [3 frames] Mapping types [6 frames] Recursive type constructor [7 frames] References Exercises Type constructors in actual PLs [57 frames] Product type constructor [1 frame] Integral exponentiation [2 frames] Branding [4 frames] Union and choice/disjoint union type constructors [4 frames] Tags in concrete PLs [8 frames] Micro-Lisp in C [15 frames] Special types: Unit, Top & Bottom [10 frames] Mapping as functions and arrays [5 frames] Power sets [1 frame] Recursive types [7 frames] Atomic types [33 frames] Taxonomy of atomic types [9 frames] Set of primitive types as PL i.d. [5 frames] Integral primitive types [3 frames] More on language design [2 frames] Real numbers [6 frames] The character primitive type [6 frames] Strings as atomic types [2 frames] 51 References Exercises Representation of types in memory [6 frames] Advanced typing [152 frames] Classification of type systems [41 frames] Existence & sophistication level [7 frames] Orthogonality [8 frames] Strong vs. weak typing [4 frames] Statics vs. dynamic typing [9 frames] Other kinds of typing [6 frames] Type information responsibility [5 frames] References Exercises Structural vs. nominal typing [13 frames] Theoretical polymorphism [64 frames] Motivation [12 frames] Overloading [18 frames] Coercion [7 frames] Universal polymorphism [5 frames] Polytypes [11 frames] Inclusion polymorphism [10 frames] summary [1 frame] References Exercises Polymorphism in practice [34 frames] Overloading in Pascal, C/C++, and [7 frames] Coercion and the C++ overloading tournament [6 frames] Polymorphic functions [7 frames] Type inference [3 frames] Checking parameters with parametric polymorphism [6 frames] Case studies [5 frames] Storage [140 frames] Storage models [21 frames] Utopic perspective of memory [14 frames] Real-world memory models [2 frames] Classical storage model [2 frames] 85 References Arrays [24 frames] Varieties of arrays [12 frames] Arrays with integral index types [7 frames] Type of arrays [3 frames] References ii

4 Exercises Variables life time [35 frames] Simple lifetime [6 frames] Storage class [10 frames] The heap [9 frames] Dangling references [6 frames] Heap errors [4 frames] Value vs. reference semantics [27 frames] Shared representation & lazy copy [10 frames] Value vs. reference semantics in various PLs [7 frames] References Exercises Automatic memory management [16 frames]102 References Run time type information [16 frames] References Commands [113 frames] Expressions vs. commands [10 frames] Recursive definitions [11 frames] Atomic commands [4 frames] Exercises Block commands [6 frames] Conditional commands [13 frames] Iterative commands [11 frames] Exercises Structured programming [11 frames] Sequencers [11 frames] Exercises Exceptions [35 frames] Robustness [7 frames] Policy I: resumption [3 frames] Policy II: error rolling [5 frames] Policy III: setjmp/longjmp of C [6 frames] Policy IV: exceptions [1 frame] Kinds of exceptions [4 frames] Resource acquisition is (or isn t) initialization [8 frames] Advanced abstractions [126 frames] Expressions evaluation order [19 frames] Closures [41 frames] Objects with closures [2 frames] Nested functions [18 frames] Dynamic scoping [4 frames] Static scoping [13 frames] Function objects [13 frames] Generators [11 frames] Iterators in [17 frames] Examples of iterators [8 frames] Coroutines [17 frames] Exercises iii

5 List of figures 1 Preliminaries 2 Introduction 2.0 Hello, World! Introduction to PLs: a visual mindmap PL design Launch of Mariner Programming paradigms PL paradigms History of programming languages Language genealogy (till 1990) Syntax specification A Pascal compound command with two atomic commands in it Decomposing a compound expression Tokens: the atoms of syntax Authoring Hello, World! with the gvim text editor Values and types 3.1 Value systems The CONS record The list (a b c d) in binary tree representation Understanding c[ad]+r syntactic sugar The many ways for presenting the symbolic values of Mathematica Introduction to types Visualizing the type system The type constructors of Mock Using Mathematica to solve the equation imposed by the recursive binary tree type Distinct topologies of binary trees with with n nodes; n = 0,1,2,3; each node stores an integer i Z Taxonomy of the type constructors of Mock Type constructors in actual PLs Pascal variant record providing two perspectives of the same memory cells Layout of a CONS record as a single machine word The Cons pool Layout of a Cons record using bit fields String handles in micro-lisp Atomic types Primitive types in the Mock PL widening casts narrowing casts Floating point format; IEEE 754 / binary Advanced typing 4.3 Theoretical polymorphism The road to polymorphism Monomorphism in context Polymorphism in context Unraveling the Marx riddle Archie Bunker explains to Edith Bunker how context is used to resolve the ambiguity of the three overloaded meanings of the word Shalom in Hebrew Use of overloading for type polymorphism Polymorphism in practice The ancient, obsolete, annoying, yet relatively easy to explain memory model of the Storage Storage: a visual mindmap Storage models Storage models: a visual mindmap Store, cells & values Legned of Figure Life cycle of a cell in the store A cell and names of two references to it in ML Retrieving cell contents in ML Mutating a cell contents in ML Reading & mutating a cell contents in ML Address computation in the 8086 hardware architecture Extended vs. expanded memory in the ancient 8086 architecture Segments in the classical memory map C programs & the classical memory model.. 85 iv

6 5.2 Arrays Row-major layout of 2D arrays (e.g., Pascal) Column-major layout of 2D arrays (e.g., Fortran) Multiple dereferencing layout of 2D arrays Layout of triangular array with the multiple dereferencing layout Variables life time Lists with shared representation Value vs. reference semantics Value semantics Reference semantics References to objects Sharing of representation in Lisp values Semantics dilemma of assignment in reference semantics Assignment strategies side by side Automatic memory management Mindmap of memory management Run time type information Dereferencing a pointer to an unfamiliar memory address An 8-bit byte in memory A 16-bits word in memory Layout in memory of a 32-bits word used as micro-lisp Cons record Different legitimate interprertations of a 16-bytes memory block A step in a BFS/DFS tour Reference representation of the RTTI tag attached to values Advanced abstractions 7.1 Expressions evaluation order Evaluation tree for an expression with many faulty subexpressions Summary: concepts of evaluation order Closures Function nesting structure for an implementation of the quick sort algorithm Reminder: CPU & memory in the classical model Activation record (implemented as stack frame) for function gcd Function f 2 called by f 1 called by f Function f 2 defined within f 1 defined within f Deep function nesting structure Back pointers in the quick sort example Examples of iterators Cache of primes (initial state) Reminder: initial state of the cache of primes Cache of primes (after first insertion) Coroutines Using the Iterator interface Commands Commands: a visual mindmap Structured programming An intriguing finite automaton computing (in a non-sensible manner) a function that does make sense Flowchart for getting things done (GTD) Challenge of understanding spaghetti code Compound commands in Nassi-Shneiderman diagrams Matrix multiplication with Nassi- Shneiderman diagram Factorial with Nassi-Shneiderman diagram More Nassi-Shneiderman notations Even more Nassi-Shneiderman notations Exceptions Hero of Alexandria (10 70 AD); also known as Heron A triangle with edges a, b, and c v

7 List of tables 1 Preliminaries 2 Introduction 2.1 PL design Concepts in the PL World Programming paradigms The main programming paradigms Multi-paradigm PLs Syntax specification Examples of regular expressions Syntax and semantics of regular expressions Atomic types in Tokens: the atoms of syntax Classification of tokens in a BNF for simple expressions What do tokens denote? Replaceable- vs. builtin- library Values and types 3.1 Value systems Examples of using the c[ad]+r syntactic sugar Syntactical differences between functions & operators Atomic types Classification of primitive types More classification: unordered, ordered & ordinal types IEEE standards for representing numbers in floating point format Some other, mostly obsolete, standards for floating points Storage 5.1 Storage models Permissions of memory segments in an idealized storage mode Variables life time C s storage specifiers in blocks C s storage specifiers at the external (file) level Language protection against dangling references Value vs. reference semantics Behind the scenes of four semantics of assignment Run time type information Use of RTTI in the implementation of different PLs Commands 6.2 Recursive definitions C program elements Advanced abstractions 7.1 Expressions evaluation order Eager vs. short-circuit logical operators Iterators in Generators vs. iterators Advanced typing 4.1 Classification of type systems Mixed typing in Theoretical polymorphism Overloaded meanings of terms in the Groucho Marx riddle Ad hoc vs. universal polymorphism Polymorphism in practice Overloading of keyword static in C Polymorphic functions: C++ vs. ML vi

8 List of equations 1 Preliminaries 1.1 Administration (1.1.1) Course grade (F) computed from exam s grade (E) and assignments grade (A) round(e) if E 1.5 A E round(e) if E 50 F = round(e + A E) if 50 < E A round(e E A) otherwise 2 Introduction 2.4 Syntax specification (2.4.1) Σ, the set of all strings over the alphabet Σ = {a,b,c,d,e,f Σ = { length 0 string length 1 strings {{{{ ε, a,b,...,f, length 2 strings length 3 strings {{{{ aa,ab,...,ff, aaa,...,fff,... (2.4.2) Atoms of Σ l Σ l Σ + (2.4.3) Concatenation constructor of Σ α,β Σ + αβ Σ + (2.4.4) All members of Σ + are either atomic strings, or constructed by concatenation from other strings in Σ γ Σ + γ Σ or α,β Σ + γ = αβ (2.4.5) All strings can be thought of as regular expressions Σ RE(Σ) (2.4.6) Regular expressions constructor I: alternation 11 e 1,e 2 RE(Σ) (e 1 e 2 ) RE(Σ) (2.4.7) Regular expressions constructor II: concatenation e 1,e 2 RE(Σ) (e 1 e 2 ) RE(Σ) 2.5 Tokens: the atoms of syntax (2.5.1) Regual expression definining identifiers syntax in a typical modern PL [_a-za-z][_a-za-z0-9]* 3 Values and types 3.1 Value systems (3.1.1) The type system is a set of sets of values T L V L (3.1.2) Policy for representation of values on hardware 21 P L : V L {M 1,M 2, Introduction to types (3.2.1) The set of types of a PL T L V L. (3.2.2) A type of PL is a set of values τ T L τ V L. (3.2.3) The set of types of a value types(v) = {T V L v T (3.2.4) Some values have more than one type: v #types(v) > 1 (3.2.5) Typically, values with more than one type, have inifinitely many types: #types(v) > 1 often #types(v) = (3.2.6) Infinitely many types of 0 in C T T C 0 T * 3.3 The type constructors of Mock (3.3.1) Power set type constructor T = {T T T. (2.4.8) Regular expressions constructor III: Kleene closure e RE(Σ) (e ) RE(Σ) (3.3.2) Alternative notation for power sets T = 2 T vii

9 (3.3.3) Cardinality of type constructed with the power set type constructor #( T ) = 2 #T (3.3.4) The empty set is a value of all power sets.. 28 T /0 T (3.3.5) A unary value constructor for power sets v T {v T (3.3.6) An n-ary value constructor for power sets.. 28 v 1,v 2,...,v n T,n 1 {v 1,v 2,...,v n T (3.3.7) Cartesian product type constructor T 1 T 2 = { v 1,v 2 v 1 T 1 ;v 2 T 2. (3.3.8) Cardinality of type created by Cartesian product type constructor #(T 1 T 2 ) = (#T 1 ) (#T 2 ) (3.3.15) Bits required to represent a value of type Unit 30 lg 2 Unit = lg 2 1 = 0. (3.3.16) Branding type constructor l(t ) = { l,v v T. (3.3.17) The first rule of branding l I T l(t ) (3.3.18) The second rule of branding l 1,l 2 I l 1 l 2 l 1 (T ) l 2 (T ) (3.3.19) Operator to create a value of l(t ) from v T. 30 v T l(v) l(t ) (3.3.20) Operator to extract the T value from a value of l(t ) l(v) T l(v)#l T (3.3.9) Operator for composing a value of a product type (3.3.21) Record types as product of branded types 30 {l 1 : T 1,...,l n : T n = l 1 (T 1 ) l 1 (T n ) v 1 T 1,v 2 T 2 v 1,v 2 T 1 T 2 (3.3.10) Operators for decomposing a value of a product type v T 1 T 2 v#1 T 1 v#2 T 2 (3.3.11) Integral exponentiation type constructor.. 29 n times {{ T n = T T (3.3.12) Cardinality of type created by integral exponentiation type constructor #(T n ) = (#T ) n. (3.3.13) Cardinality of type Unit #Unit = 1 (3.3.14) Type Unit as a singleton set Unit = {(). (3.3.22) Composition operator to create a value of record type v 1 T 1,...,v n T n {l 1 : v 1,...,l n : v n {l 1 : T 1,...,l n : T n (3.3.23) Decomposition operator to elicit a field from a record type i,1 i n {l 1 : v 1,...,l n : v n #l i = v i (3.3.24) Disjoint union type constructor T 1 + T 2 = l 1 (T 1 ) l 2 (T 2 ) (3.3.25) Cardinaly of type created by disjoint union 31 #(T 1 + T 2 ) = #T 1 + #T 2. (3.3.26) Enumerated type as a disjoint union of branded Units {l 1,l 2,...,l n = l 1 (Unit) + l 2 (Unit) + + l n (Unit) viii

10 (3.3.27) Making values of a choice type v 1 T 1 l 1 v 1 T 1 + T 2 v 2 T 2 l 2 v 2 T 1 + T 2 (3.3.28) Testing choice types { v T 1 + T 2 v?l 1 = true v = l 1 v 1,v 1 T 1. false v = l 2 v 2,v 2 T 2 (3.3.29) The empty type None Bottom /0. (3.3.30) The universal type Any = V L. (3.3.31) Cartesian product of arbitrary type and and (3.3.39) Currying (S 1 S 2 ) T = S 1 (S 2 T ) (3.3.40) Curying in algebra: power to product is power of power T S 1 S 2 = ( T S 2 ) S 1. (3.3.41) Integer range type constructor n = 1,...,n. (3.3.42) Integral exponentiation type constructor.. 33 n R = R n T T : T T T : T = (3.3.43) Type Unit as special case of integer range. 33 (3.3.32) Disjoint union of arbitrary type with and 32 Unit = 1. T T : + T = T T : + T = T (3.3.44) Type None as special case of integer range 33 (3.3.33) Full mapping s S {t (s,t) m = 1 (3.3.34) Partial mapping s S {t (s,t) m 1 None = 0. (3.3.45) Euler s identity, tying together the five fundamnetal methametical constants e iπ + 1 = 0. (3.3.35) Power set type constructor as a kind of mapping T = T Boolean (3.3.46) Identity tying together,, and Unit = 1. (3.3.36) Mapping type constructor S T = {m m is a (partial) mapping from S to T. (3.3.37) Cardinality of type created by mapping type constructor #(S T ) = #T #S. (3.3.38) Mapping as exponentiation S T = T S (3.3.47) Recursive definitions type constructor τ 1 = E 1 (T 1,...,T m,τ 1,...,τ n ). τ n = E n (T 1,...,T m,τ 1,...,τ n ) (3.3.48) A recursive equation with only one type variable σ = E(T 1,...,T m,σ) ix

11 (3.3.49) Bottom up solution of recusrive equation with only one type variable σ 0 = /0 = σ 1 = E(T 1,...,T m,σ 0 ) = E(T 1,...,T m, ) σ 2 = E(T 1,...,T m,σ 1 ) = E(T 1,...,T m,e(t 1,...,T m, )). σ n+1 = E(T 1,...,T m,σ n ) = E(T 1,...,T m,e(t 1,...,T m,σ n 1 )). σ = σ i. i=0 (3.3.50) Recursive type equation for t, the type of a pointer to a node in a doubly-linked-list t = (1 + T) = 1 + Z t t (3.3.51) Taylor expansion of the negative solution of the quadratic equation τ = 1 + Zτ 2 defining τ.. 34 τ = 1 + Z + 2Z 2 + 5Z Z Z Type constructors in actual PLs (3.4.1) Right associativity of the mapping type constructor S1 ( S2 (S3 T) ) = S1 S2 S3 T. (3.4.2) Set theoretical equation for the type of binary trees τ = int (int τ τ) 3.5 Atomic types (3.5.1) Breaking a floating point number into sign, normalized mantissa, and exponent π e = (3.5.5) Mantissa in normalized floating point representation (3.5.6) Normalized mantissa in decimal base < m 1 (3.5.7) Normalized mantissa in binary base < m 1 (3.5.8) Floating point formats support sigend zeroes (3.5.9) Floating point formats support multiple infinities (3.5.10) Floating point formats allow representation of non-numbers NaN 3.6 Representation of types in memory (3.6.1) PL policy for representation of types P L : V L {M 1,M 2,... : 4 Advanced typing 4.3 Theoretical polymorphism (4.3.1) In a monomorphic type systems, functions (and other entities) have precisely one type f is a function types( f ) = 1. (3.5.2) Sign bit in normalized floating point representation (3.5.3) Exponent in normalized floating point representation (3.5.4) Mantissa of (4.3.2) Function gcd in Pascal is mononrophis types(gcd) = {Integer Integer Integer = 1 (4.3.3) Function gcd in C is mononrophic types(gcd) = {int int int = Polymorphism in practice (4.4.1) One of the overloaded meanings of operator - follows parametric polymorphism ( ) σ T Pascal ( σ σ) σ types( - ) x

12 (4.4.2) Identity mapping on booleans {false false,true true (6.3.1) Vanilla assignment v e (4.4.3) Identity mapping on integers {..., 2 2, 1 1,0 0,1 1,2 2,... (6.3.2) Multiple assignment v 1,v 2,...,v n e (4.4.4) Identity mapping on strings ε ε, "a" "a","b" "b",..., "aa" "aa","ab" "ab",...,. (4.4.5) Type of operator ok in ML Storage 5.2 Arrays o (β γ) (α β) (α γ) (5.2.1) Array access in C is a matter of pointer arithmetics a[i] *(a+i) *(i+a) i[a] (5.2.2) Offset computation in 2-dimensional arrays stored in column-major layout offset(a i, j ) = (i 1)m + ( j 1) (5.2.3) Offset computation in 2-dimensional arrays stored in column-major layout offset(a i, j ) = ( j 1)n + (i 1) (5.2.4) Address computation in 2-dimensional arrays with the multiple derefrence method address(a i, j ) = derefrence(address(a) + i 1) + j 6 Commands 6.2 Recursive definitions (6.2.1) Assignment atomic command of Pascal v := E (6.3.3) Update assignment v ϕ(,e 1,e 2,...,e n ) (6.3.4) Collateral assignment v 1,v 2 e 1,e 2 (6.3.5) Simultaneous assignment v 1,v 2 e 1,e Block commands (6.4.1) Sequential block command constructor {C 1 ;C 2 ;...;C n (6.4.2) Collateral block command constructor {C 1 ~C 2 ~ ~C n (6.4.3) Concurrent block command constructor {C 1 C 2 C n 6.5 Conditional commands (6.5.1) Conditional commmand constructor {E 1?C 1 : E 2?C 2 : : E n?c n (6.5.2) Conditional command constructor with else clause {E 1?C 1 : E 2?C 2 : : E n?c n : C n+1 (6.5.3) A boolean expression for the else clause. 115 E n = E 1 E 2 E n 1 (6.2.2) Procedure call atomic command of Pascal 111 p(e 1,...,E n ) 6.3 Atomic commands (6.5.4) Computer goto in Fortran GO TO (l 1,l 2,...,l n ) Expression 6.9 Exceptions xi

13 (6.9.1) Heron s formula for the area of the triangle 123 A = s(s a)(s b)(s c) s = a + b + c 2 7 Advanced abstractions 7.6 Examples of iterators (7.6.1) Semantics of function compareto if p > n p.comparteto(n) = 0 if p = n -1 if p < n xii

14 Thirteen weeks schedule Week Topic Lecture Recitation Tutorial Self reading 1 Introduction I/II (i) [11] 1.1 Administration (ii) [13] 1.2 Motivation (iii) [9] 2.1 PL design (iv) [10] 2.2 Programming paradigms (i) Regular expressions (ii) Pascal s EBNF (iii) Specifying an EBNF with itself Pascal introduction (i) Monograph First steps (ii) [11] 1.3 Hello, World! 2 Introduction II/II (i) [25] 2.4 Syntax specification (ii) [31] 2.5 Tokens: the atoms of syntax (i) (E)BNF example (ii) Regex examples ML introduction Monograph Summary of first two lectures 3 Types I/III (i) [34] 3.1 Value systems (ii) [11] 3.2 Introduction to types (i) [9] Taxonomy of atomic types (ii) [5] Set of primitive types as PL i.d. (iii) [2] More on language design (iv) [6] The character primitive type ML curried functions (i) [3] Integral primitive types (ii) [6] Real numbers (iii) [2] Strings as atomic types 4 Types II/III [40] 3.3 The type constructors of Mock (i) [4] Union and choice/disjoint union type constructors (ii) [8] Tags in concrete PLs (iii) [15] Micro-Lisp in C ML declarations (i) [10] Special types: Unit, Top & Bottom (ii) [5] Mapping as functions and arrays (iii) [1] Power sets (iv) [7] Recursive types 5 Types III/III (i) [41] 4.1 Classification of type systems (ii) [64] 4.3 Theoretical polymorphism (i) [7] Overloading in Pascal, C/C++, and (ii) [6] Coercion and the C++ overloading tournament (iii) [7] Polymorphic functions (iv) [3] Type inference ML lists (i) [6] 3.6 Representation of types in memory (ii) [1] Product type constructor (iii) [2] Integral exponentiation (iv) [4] Branding 6 Storage I/III 7 Storage II/III 8 Storage III/III 9 Commands I/II [21] 5.1 Storage models [24] 5.2 Arrays ML lists [35] 5.3 Variables life time [16] 5.5 Automatic memory management ML datatypes [27] 5.4 Value vs. reference [16] 5.6 Run time type ML semantics information exceptions (i) [10] 6.1 Expressions vs. [11] 6.2 Recursive definitions ML commands sequences (ii) [4] 6.3 Atomic commands (iii) [6] 6.4 Block commands (i) [7] Polymorphic functions (ii) [6] Checking parameters with parametric polymorphism (iii) [5] Case studies TBD TBD TBD 10 Commands II/II (i) [11] 6.7 Structured programming (ii) [11] 6.8 Sequencers (i) [13] 6.5 Conditional commands (ii) [11] 6.6 Iterative commands Prolog introduction TBD 11 Advanced constructs I/III 12 Advanced constructs II/III 13 Advanced constructs III/III [19] 7.1 Expressions evaluation order [13] 7.3 Function objects Prolog lists [41] 7.2 Closures [17] 7.5 Iterators in Prolog controlling backtracing (i) [11] 7.4 Generators (ii) [17] 7.7 Coroutines Overflow of previous tutorials; if time is left, then examples in modern languages Prolog database examples TBD [8] 7.6 Examples of iterators TBD xiii

1 Preliminaries Contents [35 frames] 1.1 Administration [11 frames]............. 1.2 Motivation [13 frames].

Resources managed by students Facebook group 1 2 4 Administration https://www.facebook.

languages/ 11 Frames: Instructors Teaching assistants Course material Resources managed by students

the course you ll know Indexed Q&A site 1. Instructors Prof. Yossi Gil: Oﬃce hours: http://yogi.

Text books Main text book Dr. Sara Porat: Programming Language Concepts and Paradigms, David A. Watt.

il But also English Wikipedia2 and Google search.

Mail: mailto:mip@cs.technion.ac.il 6. Bibliography for tutorials Mr.

il Introduction to Pascal, by Jim Welsh and John Elder. Prentice Hall, 1979.

Course material C. Prolog Programming for Artiﬁcial Intelligence, by Ivan Bratko. Addison-Wesley.

15 1 Preliminaries Contents [35 frames] 1.1 Administration [11 frames] Motivation [13 frames] Hello, World! [11 frames] Resources managed by students Facebook group Administration 11 Frames: Instructors Teaching assistants Course material Resources managed by students Text books Bibliography for tutorials Regulations Grade components Final grade Syllabus At the end of the course you ll know Indexed Q&A site 1. Instructors Prof. Yossi Gil: Oﬃce hours: Mail: mailto:yogi@cs.technion.ac.il 5. Text books Main text book Dr. Sara Porat: Programming Language Concepts and Paradigms, David A. Watt. Prentice Hall, Oﬃce hours: Wednesday 14:30 15:30; Taub 717 Mail: mailto:porat@cs.technion.ac.il But also English Wikipedia2 and Google search. For further, in depth, reading: Advanced Programming Language Design, Raphael Finkel. MIT Press, Teaching assistants Mr. Matan Peled: Programming Languages: Concepts and Constructs (2nd Ed), Ravi Sethi. Addison-Wesley, Mail: mailto:mip@cs.technion.ac.il 6. Bibliography for tutorials Mr. Jenya Moroshko: Programming languages taught: Pascal, ML, Prolog Mail: mailto:mjenya@cs.technion.ac.il Introduction to Pascal, by Jim Welsh and John Elder. Prentice Hall, ML for the Working Programmer, Lawrence. Paulson. Cambridge University Press, Course material C. Prolog Programming for Artiﬁcial Intelligence, by Ivan Bratko. Addison-Wesley. Oﬃcial web site Programming in Prolog, by W. F. Clocksin and C. S. Mellish. Springer-Verlag, Regulations Full document where you will ﬁnd: Printouts of slides in a variety of formats A couple of Hebrew monographs Some lecture notes taken by students Past exams Highlights And the usual: grades, current assignment information & FAQs, periodical announcements, 2 Hebrew 1 Wikipedia is essentially rubbish

16 Midterm exam none Prerequisites enforced Co-requisites enforced Homework grades crucial Old homework grades cannot be transferred considered favorably treated seriously must be in writing Appeals must be signed by student no grades negotiation discussing appeals or grades with staff voids all appeal rights 8. Grade components Assignments (grade denoted by A, 0 A 100) Every 2 3 weeks Mandatory Typically includes both programming and miniresearch problems Teams of two students each (strict!) Matching services provided by teaching assistants Exam (grade denoted by E, 0 E 100) Guaranteed to include at least one homework assignment Guaranteed to include at least one past exam question Typically includes 8 12 questions 9. Final grade Denoted by F, 0 F 100: round(e) if E 1.5 A E round(e) if E 50 F = round(e + A E) if 50 < E A round(e E A) otherwise 10. Syllabus Concepts values, types and expressions typing systems storage commands sequencers function abstractions Programming paradigms Imperative (1.1.1) C, C++, Pascal,, AWK, Go, Functional ML, Haskell, Lisp Declarative Prolog 11. At the end of the course you ll know What distinguishes different PLs from one another A variety of mechanisms in familiar and less familiar PLs Programming in the functional language ML Some basic concepts from Pascal, Prolog and other PLs Main skills: Quickly learn a new PL Evaluate PLs Use any PL more cleverly Search in Google 1.2 Motivation 13 Frames: Concrete reasons to study PLs? Discovering you speak prose New modes of thought But also many practical benefits Or, at least, understand the terminology/jargon of the trade: Not for the faint of heart Possible approaches Problem Who needs PLs? What is a PL? Language processors Relations to other fields in computer science Closely related topics Fun 12. Concrete reasons to study PLs? ML is neat Prolog is elegant There is more to it than C, C++, Enhance thinking flexibility Professional skills Over 2,000 different languages out there Common concepts and shared paradigms Framework for comparative study of PLs Useful for other courses: OOP Compilation Software Engineering Semantics of PLs Memory Management 13. Discovering you speak prose «Par ma foi! il y a plus de quarante ans que je dis de la prose sans que j en susse rien, et je vous suis le plus obligé du monde de m avoir appris cela.» The pleasure to discover that you have been speaking prose all your life without knowing it; or, more generally, learning, something new about old things. So, yes, the course will be telling you new stuff about old stuff and we will practice some new modes of thought 2

17 New 14. New modes of thought Programming languages mechanisms much beyond the if and while Programming techniques Paradigms of thought Directions for your minds And also, Get ready to Object Oriented Programming and other advanced courses. 3 Hone web-search skills. 4 No computational interest. The expressive power of all programming mechanisms and computational devices is basically the same The Church-Turing hypothesis The DOS batch language and are equivalent The Commodore 64 and the latest 8-core CPUs are equivalent No algorithmic interest. You don t discover new fascinating algorithms using better programming languages. 18. Possible approaches 15. But also many practical benefits Main objective learn, understand, and evaluate any new programming language 16. Or, at least, understand the terminology/jargon of the trade: What kind of a beast is Script? Imperative, With prototypes (object-based, but not objectoriented), Functions are first-class entities, Has lambda functions, With closures, Is weakly typed, Has dynamic typing, Has static scoping, and a must-know for any modern website developer! By the end of these course, many of these terms will be covered in depth. 17. Not for the faint of heart No mathematical interest. This is not yet another technical course: Many soft definitions Much reliance on common sense No theorems, proofs, lemmas, or integration by parts No easy grades for mathematical genuises 3 the instructor of the Object Oriendted Programming course paid me to mention this 4 Google paid me to include this in the topic of our course Define and compare paradigms of PLs Present formal approaches to syntax and semantics Present ways of implementing and analyzing programs in various PLs Show the concepts that must be dealt with by any PL, and the possible variety in treatment 19. Problem To teach you PL theory, we need to draw examples from different PLs. Right now, most of you know 2.5 languages (C, C++, Unix shell scripts). Examples in these slides come from (alphabetically): Ada, Algol, AWK, C, C++, C #, Eiffel, Fortran, Haskell,, ML, Lazy-ML, Lisp, Pascal, Prolog, Python, SQL, and probably a few more I forgot. Can you please learn all these for next week? Recitations are here to help. 20. Who needs PLs? Computers native tongue is machine language Programmers need higher level languages, because: They can t write machine language correctly They can t read machine language fluently They can t express their ideas in machine language efficiently Life is too short to program in machine language. A formal language is not only a man-machine interface, but also a person-to-person language! Conclusion: PLs are a compromise between the needs of humans and the needs of machines 3

18 21. What is a PL? 25. How many ways to say Hello, World? A linguistic tool with formal syntax and semantics A consciously designed artifact that can be implemented on computers A conceptual universe for thinking about programming (Alan Perlis, 1969) 22. Language processors A system for processing a language: Compiler Interpreter Syntax directed editor Program checker Program verifier Studied in other courses: Compilation Program verification Software engineering To know the semantics of a language (the function a program encodes) one can ignore its implementation 23. Relations to other fields in computer science lan- Databases and Information Retrieval Query guages - languages for manipulating databases. Human-Computer Interaction PLs are designed to be written and read by humans Operating Systems Input-Output support. Storage management. Shells are in fact PLs. Computer Architecture PL design is influenced by architecture and vice versa. Instructions sets are PLs. Hardware design languages. The following slides present examples of some of the most popular computer program Hello, World in various programming languages See how many you can recognize? More exmaples: shtml Assembly 8086 (for MS DOS).model small.stack 100h.data.helloMessage db 'Hello, World',0dh,0ah,'$'.code main proc mov ax,@data 8086 mov ds,ax + MSDOS mov ah,9 mov dx,ofsett hellomessage int 21h mov ax,4c00h int 21h main endp end main 27. Fortran 28. PL/I c c Hello, world. c PROGRAM HELLO Fortran WRITE(*,10) 10 FORMAT('Hello, world') END /* HELLO PL/I WORD PROGRAM TO OUTPUT HELLO WORLD */ HELLO: PROCEDURE OPTIONS (MAIN); PUT SKIP DATA('HELLO, WORLD'); END HELLO; 24. Closely related topics 29. Ada Automata and Formal Languages, Computability Provide the foundation for much of the underlying theory. Compilation The technology of processing PLs. Software engineering The process of building software systems. 1.3 Hello, World! 11 Frames: How many ways to say Hello, World? Assembly 8086 (for MS DOS) Fortran PL/I Ada Prolog Snobol 4 The Chef PL Lisp Smalltalk PostScript 30. Prolog with i_o; use Ada i_o; procedure hello is begin put ("Hello, World"); end hello; hello :- printstring("hello, World"). printstring([]). printstring([h T]) Prolog :- put(h), printstring(t). 4

19 31. Snobol 4 OUTPUT = 'Hello, World' END Snobol The Chef PL Lots of food for one person Hello World Souffle. Ingredients. 72 g haricot beans 101 eggs 108 g lard 111 cups oil 32 zucchinis 119 ml water 114 g red salmon 100 g dijon mustard 33 potatoes Method. Put potatoes into the mixing bowl. Put dijon mustard into the mixing bowl. Put lard into the mixing bowl. Put red salmon into the mixing bowl. Put oil into the mixing bowl. Put water into the mixing bowl. Put zucchinis into the mixing bowl. Put oil into the mixing bowl. Put lard into the mixing bowl. Put lard into the mixing bowl. Put eggs into the mixing bowl. Put haricot beans into the mixing bowl. Liquefy contents of the mixing bowl. Pour contents of the mixing bowl into the baking dish. Serves Lisp ( DEFUN HELLO-WORLD ( ) (PRINT Lisp (LIST 'HELLO 'WORLD) ) ) 34. Smalltalk Transcript Smalltalk show: 'Hello, World'; cr 35. PostScript %!PS scale /Courier findfont 12 scalefont setfont 0 0 translate /row 769 def 85 {/col PostScript 18 def 6 {col row moveto (Hello, World) show /col col 90 add def repeat /row row 9 sub def repeat showpage save restore 5

20 2 Introduction Contents [84 frames] 2.1 PL design [9 frames] Programming paradigms [10 frames] History of programming languages [8 frames] Syntax specification [25 frames] Regular expressions [14 frames] EBNF [10 frames] Exercises Tokens: the atoms of syntax [31 frames] Kinds of tokens [19 frames] Library identifiers [7 frames] Starting point [3 frames] EBNF BNF 36. Introduction to PLs: a visual mindmap Logical Functional Pre- Declared Identifiers Regular Expressions Imperative Motivation Syntax Paradigms Formal Semantics Language Legalese Specification OO Programming Languages: an Introduction Semantics Blocks History Comments Recursively Defined Sets Syntactical Elements Keywords Hello, World! Office Hours Literals Course Material Assignments Identifiers String Literals Library Based Text Books Manuscripts Slides Autarkic Escaping Imported Identifiers Administration Exams Figure 2.0.1: Introduction to PLs: a visual mindmap 2.1 PL design 9 Frames: Requirements from a PL Desiderata for a PL Guiding principles in language design Less is more A legendary Fortran bug The Mariner 1 aborted launch The bug leading to the abortion Poor design of the Fortran PL Concepts in the PL World 37. Requirements from a PL Universal every problem must have a solution 5 Express recursive functions; it is sufficient to require Conditionals Loops Natural application domain specific Try writing a compiler in Cobol or a GUI in Fortran Implementable Neither mathematical notation Nor natural language 5 Exception: domain-specific languages, e.g., pure SQL has no recursion Efficient open to debate More programming crimes were committed in the name of performance than for any other reason. 38. Desiderata for a PL Expressiveness Efficiency Simplicity Turing-completeness But also a practical kind of expressiveness: how easy is it to program simple concepts? Recursion in functional languages is expressive but sometimes inefficient Is there an efficient way to implement the language (in machine code)? as few basic concepts as possible Sometimes a trade-off with convenience (C has for, who needs while and do-while?) Uniformity and consistency of concepts for in Pascal allows a single statement repeatuntil allows any number of statements? Why? Abstraction language should allow to factor out recurring patterns Clarity to humans The = vs. == in C is a bit confusing Information hiding and modularity Safety possibility to detect errors at compile time AWK, Rexx and Snobol type conversions are error prone 39. Guiding principles in language design Example: The C design rules: Division between preprocessor, compiler and linker. No hidden costs Programmer s accountability and responsibility 40. Less is more Two program fragments to find the n th Fibonacci number in Algol-68 Algol x,y := 1; to n do (if x<y then x else y) := x+y; x := max(x,y); Algol x, y := 1; to n do begin x,y := y,x; x := x+y end; 6

21 41. A legendary Fortran bug Computing 314 i=1 sin(i) with Fortran: S = 0 DO 1000 I=1,314 Fortran S = S + SIN(I) 1000 CONTINUE But if you accidently replace, by. the code is very different S = 0 DO1000I = Fortran S = S + SIN(I) 1000 CONTINUE variable S becomes simply sin(1). 42. The Mariner 1 aborted launch Outcome: Official explanation: omission of a hyphen in coded computer instructions in the data-editing program Urban legend: the famous Fortran bug 44. Poor design of the Fortran PL S = 0 DO 1000 I=1.314 Fortran T = T + SIN(I) 1000 CONTINUE Bad lexical definition - spaces are immaterial No declaration of variables Implicit typing Poor control structure specification Lack of diagnostics 45. Concepts in the PL World What characterizes a PL? Question to ask How it handles values? How it checks types? Its entities for storing values? Its means for storing values? How it alters control? How it attaches names to values? How it allows generalization? Concepts values, types and expressions type systems storage commands sequencers binding functions Paradigms Table 2.1.1: Concepts in the PL World Smoke and fire launch of the Mariner 1, less than five minutes prior to its abortion by a security officer due to a combination of a hardware problem and software bug; July 22 nd 1962, 09:26:16 UTC Figure 2.1.1: Launch of Mariner The bug leading to the abortion Background: The cold war between the US and USSR The space race between the US and USSR America s first planetary mission A very expensive project Designed in only 45 days Politicians wanted explanation The public wanted explanation 2.2 Programming paradigms 10 Frames: SQL: what s the difference? What is a paradigm? Main paradigms PL paradigms The imperative paradigm The functional paradigm Aren t all languages pretty much the same? The logic/declarative programming paradigm The objectoriented paradigm The main programming paradigms 46. SQL: what s the difference? A query, and a similar query Pascal SELECT firstname, lastname, age FROM user WHERE firstname = "David" ORDER BY age Pascal SELECT firstname, lastname, age FROM user WHERE age > 18 AND age < 65 ORDER BY age Think about implementing these queries in C++. Would the code look just the same in both cases? 7

22 50. The imperative paradigm 47. What is a paradigm? Fortran, Algol, C, Pascal, Ada, par a digm (Merriam-Webster Collegiate Dictionary) The program has a state reﬂected by storage and location a philosophical and theoretical framework of a scientiﬁc school or discipline within which theories, laws, and generalizations and the experiments performed in support of them are formulated It comprises commands (assignments, sequencers, etc.) that update the state of the program They can be grouped into procedures and functions Model, pattern Thomas Kuhn ( ) There are also expressions and other functional features A set of universally recognized scientiﬁc achievements that for Most familiar, but a large variety of possibilities must be mastered and understood The Sapir-Whorf hypothesis: The language spoken inﬂuences the way reality is perceived. Models real-world processes, hence still dominant Lends itself to eﬃcient processing (optimizing compilers etc.) In PLs: a family of languages with similar basic constructs and m ental modelöf execution Will see Pascal in recitations and home assignment 48. Main paradigms 51. The functional paradigm Imperative programming: Fortran, Cobol, Algol, PL/I, C, Pascal, Ada, C++, Icon, Modula-2, Modula-3, Oberon, Basic. Lisp, Scheme, ML, Haskell, Concurrent programming: Ada, Occam, X10 Everything is a function that takes arguments and returns results Object-oriented programming: Smalltalk, Self, C++, Objective-C, Object-Pascal, Beta, CLOS, Eiffel Moreover, the functions are just another kind of value that can be computed (created), passed as a parameter, etc. Functional programming: Lisp, Scheme, Miranda, ML. Don t really need assignment operation or sequencers - can do everything as returning a result value of computing a function Logic programming: Turbo-Prolog, Icon. Prolog, Prolog-dialects, E.g., use recursive activation of functions instead of iteration 49. PL paradigms Elegant, extensible, few basic concepts Used for list manipulation, artiﬁcial intelligence, Requires a truly diﬀerent perception using an imperative programming style in ML is even worse than a word-for-word translation among natural languages Will see ML, mainly in the recitations 52. Aren t all languages pretty much the same? The move from C to C++ isn t insurmountable. Moving from C++ to is trivial. Figure 2.2.1: PL paradigms And if you know, you pretty much know C#, too. (Not comprehensive) Even if the syntax isn t C-style used (e.g., Eiffel), it can t be that diﬃcult, right? (Not to scale) Why make such a fuss about it? 8

23 53. The logic/declarative programming paradigm Prolog, constraint languages, database query languages Predicates as the basis of execution Facts and rules are listed naturally A computation is implicit - it shows what follows from the given facts and rules Emphasizes what is needed, rather than how to compute it Used for expert systems Will see the basics of Prolog later in the course 54. The object-oriented paradigm C++, Smalltalk, Eiffel,, C # The world has objects that contain both fields with values and operations (called methods) that manipulate the values 56. Language inception & evolution Initial definition by a an Individual Lisp (McCarthy), APL (Iverson), Pascal (Wirth), Rexx (Cowlishaw), C++ (Stroustrup), (Gosling) a small team C (Kernighan and Ritchie), ML (Milner et al.), Prolog (Clocksin and Mellish), Icon (Griswold and Griswold) a committee Fortran, Algol, PL/1, Ada Some survived, many more perished for a variety of reasons usability compilation feasibility dependence on platform politics and sociology Most successful languages were taken over by a standards committees (ANSI, IEEE, ISO, ) 57. Language genealogy (till 1990) Objects communicate by sending messages that ask the object to perform some method on its data Types of objects are declared using classes that can inherit the fields and methods of other classes Has become the primary paradigm because it seems to treat large systems better than other approaches Treated mainly in the follow-up course Object- Oriented Programming (236703) Will do a little bit of in the recitations 55. The main programming paradigms Imperative Functional Aspect Oriented Parallel Object Oriented Logical Constraints Table 2.2.1: The main programming paradigms However, there are many multi-paradigm PLs. Mathematica Oz F# Visual-Basic.Net C # Scala Object Pascal Table 2.2.2: Multi-paradigm PLs 2.3 History of programming languages 8 Frames: Language inception & evolution Language genealogy (till 1990) Historical background Early 1960s: Mid 1960s present Why Pascal? Figure 2.3.1: Language genealogy (till 1990) 58. Historical background Until early 1950s: no real PLs, but rather automatic programming, a mixture of assembly languages and other aids for machine code programming. Mnemonic operation codes and symbolic addresses Subroutine libraries where addresses of operands were changed manually Interpretive systems for floating point and indexing Early 1950s: the Laning and Zierler System (MIT): a simple algebraic language, a library of useful functions. 9

24 1954: Definition of Fortran (FORmula TRANslator). Originally for numerical computing. Symbolic expressions, subprograms with parameters, arrays, for loops, if statements, no blocks, weak control structures 1957: first working compiler 59. Early 1960s: Cobol: Data processing. Means for data description. Algol 60: Blocks, modern control structures One of the most influential imperative languages Gave rise to the Algol-like languages for two decades (Pascal, PL/1, C, Algol 68; Simula, Ada) Lisp (list processing language): symbolic expressions (rather than numerical), computation by list manipulation, garbage collection; the first functional language 60. Mid 1960s PL/1 an attempt to combine concepts from numerical computation languages (Fortran, Algol 60) and data processing languages (Cobol). Simula object oriented, abstract data types Several OO languages: Smalltalk, C++, Eiffel Logic programming: Prolog Functional programming: ML, Miranda, Haskell Ada: Another attempt, more successful than PL/I, for a general purpose language, including concurrency. Domain specific: Snobol, Icon, AWK, Rexx, Perl: String manipulation and scripting SQL: Query language for relational databases Mathematica, Matlab: Mathematical applications Python: large scale script programming present Object oriented + WWW:, C # Scripting + [OO] + WWW: Perl, Python, PHP, Ruby Client-side scripting: Script Components and middleware between operating system and application levels Reuse and design patterns become useful and popular Multiple-language systems with standard interface - XML Flexibility in choice of language and moving among languages 63. Why Pascal? Extremely influential Easy to study: designed for beginners. Autarkic PL Nested functions Set type constructor Subrange types Nested functions and procedures Functions and procedures are not first class values Named labels, although naming is by integers 2.4 Syntax specification Contents [25 frames] Regular expressions [14 frames] EBNF [10 frames] Exercises Linguistics of PLs: syntax & semantics Two components of linguistics (as in natural linguistics): Syntax Which text files are correct programs? How expressions, commands, declarations, etc., are put together to form a program? Semantics What s the meaning of correct programs? Behavior when executed on a computer? Means for specification: Syntax Regular expressions, context-free grammars, BNF form, EBNF form, syntax diagrams (briefly touched here, subject of Automata & formal Languages ) Semantics Tutorials, user guides, handbooks, Wikipedia entries, language legalese (briefly touched here), and formal semantics (outside our scope). Common theme: recursively defined sets Regular expressions 14 Frames: Recursively defined sets The set of strings over an alphabet Recursive definition of the set of non-empty strings Set of strings over an alphabet defined recursively Examples of regular expressions Regular expressions as a recursively defined set Semantics can be recursively defined as well Syntactic sugaring & other variations in regular expression Three components of a recursive definition Recursively defined sets in PLs Example: types in Compound vs. atomic members Decomposing a compound expression Observations 65. Recursively defined sets Also known as inductively defined sets Definition 2.1 (Who is Jewish?). Mother Sarah was Jewish. Father Abraham was Jewish. People who converted are Jewish. 10

25 People born to a Jewish mother are Jewish. Other natural examples: Who is a Muslim? Who can call himself a Dr.? Who can call himself a Rabbi? 66. The set of strings over an alphabet Regular expressions is a language for defining a subset of the strings over a given alphabet But, what are strings over a given alphabet? Let Σ be an alphabet i.e., a set of letters (also called characters), e.g., Σ = {a,b,c,d,e,f Σ can be used to write strings, e.g., abba, baa, daffacc, cafe, decaf, dafcaa Let ε denote the empty string. Let Σ be the set of all strings over Σ, Σ = { length 0 string length 1 strings {{{{ ε, a,b,...,f, length 2 strings length 3 strings {{{{ aa,ab,...,ff, aaa,...,fff, Recursive definition of the set of non-empty strings Given an alphabet Σ, the set Σ + is defined by All letters All letters in Σ are present in Σ + (2.4.1) l Σ l Σ + (2.4.2) Concatentation Set Σ + is closed under concatenation: 68. Set of strings over an alphabet defined recursively Definition 2.2 (Set of strings over an alphabet). Given an alphabet Σ, the set Σ is defined by ε, the empty string is in Σ, ε Σ if l is a letter, l Σ, and α Σ is a string, then lα Σ, where lα is the string obtained by prefixing letter l to string α there are no other members of Σ 69. Examples of regular expressions RE S Σ a {a b {b ε {ε ab {ab a b {a, b a {ε,a,aa,aaaa,... (da ba)a (d b) {bab,bad,dab,dad,baab,baad,... (a b c d e f) Σ Table 2.4.1: Examples of regular expressions 70. Regular expressions as a recursively defined set Given an alphabet Σ, the set RE(Σ) is defined by: All strings Σ RE(Σ) (2.4.5) α,β Σ + αβ Σ + (2.4.3) where αβ is the string obtained by concatnating α and β Minimality there are no other members of Σ + γ Σ + γ Σ or α,β Σ + γ = αβ (2.4.4) There are many equivalent definitios of Σ + All these definitios must be recursive A string α Σ + can be constructed in several ways by the above definition There is an alternative definition of Σ + by which every α Σ + has a single, unique construction. Alternation e 1,e 2 RE(Σ) (e 1 e 2 ) RE(Σ) (2.4.6) Concatenation e 1,e 2 RE(Σ) (e 1 e 2 ) RE(Σ) (2.4.7) Kleene closure e RE(Σ) (e ) RE(Σ) (2.4.8) 11

26 71. Semantics can be recursively defined as well The semantics of e RE(Σ ) is a set S(e), S Σ Strings e Σ S(e) = {e Alternation: e = (e 1 e 2 ) S(e) = S(e 1 ) S(e 2 ) Concatenation: e = (e 1 e 2 ) S(e) = {αβ α S(e 1 ) and β S(e 2 ) Kleen closure: e = (e ) S(e) = i=0 S ( i times {{ e e ) 72. Syntactic sugaring & other variations in regular expression RE were adopted in various systems, including text editors, languages for text processing, shell scripts. adoptions varying syntax for the same underlying concept most adoptions offer syntactic sugaring Sugar Semantics [a-z] letters a through z [0-9] all letters except digits 0 through 9 a b {a, b a? {a,ε + one or more spaces UPPER = [A-Z] name a RE to be used in the definition of other REs Table 2.4.2: Syntax and semantics of regular expressions 74. Recursively defined sets in PLs Arithmetical expressions. Atoms literals, references to named entities. Constructors mathematical operators, user-defined functions, Executable statements (commands) in C. Atoms assignment, return, Constructors if, for, {, Types in C. Atoms int, char, Constructors aka type constructors points to, array of, record with fields, and, function taking type τ and returning type σ. 75. Example: types in s types are recursively defined. Type constructors are e.g., class, array, and enum. Atomic types make the recursion base Atomic type are denoted in by reserved words: Kind Integral types Floating types Other types Types byte, short, int, long float, double boolean, char Table 2.4.3: Atomic types in recursive use of names is never allowed. 76. Compound vs. atomic members 73. Three components of a recursive definition 1. Atoms. e.g., the empty string is in Σ 2. Constructors. how to make compound members out of the atoms and compound members constructed previously. 3. Minimality. usually implicit, but can be phrased as The set has no members other than the atoms or the compound members constructed by the construction rules. The set is the intersection of all sets which are consistent with the atoms and the construction rules specification. The set is the smallest set that is consistent with the atoms and the construction rules specification. Atomic member indivisible, has no components which are members Compound member has smaller components which are members another atomic command atomic command Begin a := b * c ; compound command WriteLn( Sin( (a + 3) * (a + c) ) end Figure 2.4.1: A Pascal compound command with two atomic commands in it 12

27 77. Decomposing a compound expression Sin( ( a + 3 ) * ( a + c ) ) Figure 2.4.2: Decomposing a compound expression Some (but not all) of the compound expressions in the decomposition tree of the largest compound expression are marked as well. 78. Observations in an atomic command may contain a compound expression this does not make the command less atomic an expression never contains commands (at least not in Pascal) constructors are denoted by keywords as well these keywords can be thought of punctuation or as sort of names of the constructors EBNF 10 Frames: Extended Backus-Naur form (EBNF) Example of an EBNF More readable way for writing an EBNF Interpretation of the expession grammar Understanding the expession grammar Many variants for writing an EBNF Terminals can be regular expression as well BNF vs. ENBF Ambiguity in context free grammars Expressive power of context free grammars 79. Extended Backus-Naur form (EBNF) <if-stmt> = if <expression> then <statement> [ else < statement> ] A meta-notation for describing the grammar of a language Terminals = actual legal strings, written as is, or inside + Nonterminals = concepts of the language, written <program> or program or program in different variants Rules = expanding a non-terminal to a series of NTs and Ts One nonterminal is designated as the start of any derivation A sequence of terminals not derivable from start symbol by rules of the grammar is illegal is choice among several possibilities [ ] enclose optional constructs a pair of { and encloses zero or more repetitions 80. Example of an EBNF Terminals v n + - ( ) Nonterminals <a> <m> <F> <E> <T> Start Symbol <E> Rules <a> = + - <m> = * / <F> = v n <F> = ( <E> ) <E> = <T> {<a> <T> <T> = <F> {<m> <F> 81. More readable way for writing an EBNF The common way for presenting an ENBF Employ meaningful names Rules is the only section; no terminals list non-terminals list definition of a start symbol Context free grammar for expressions <expression> = <term> {<add-op> <term> <term> EBNF = <factor> {<mult-op> <factor> <factor> = <variable-name> <number> ( <expression> ) <add-op> = + - <mult-op> = * / 82. Interpretation of the expession grammar Context free grammar for expressions <expression> = <term> {<add-op> <term> <term> EBNF = <factor> {<mult-op> <factor> <factor> = <variable-name> <number> ( <expression> ) <add-op> = + - <mult-op> = * / Terminals never occur at the left hand side of rules: +, -, * and /. Non-terminals should always occur at the left hand side rules: <expression > <term> <factor> Start symbol is <expression> Forget about <number> and <variable-name> for now. 13

28 83. Understanding the expession grammar Context free grammar for expressions <expression> = <term> {<add-op> <term> <term> EBNF = <factor> {<mult-op> <factor> <factor> = <variable-name> <number> ( <expression> ) <add-op> = + - <mult-op> = * / Is a + 2/b c 7 a legal expression? Yes, because there is a sequence of rule applications, starting from <expression> that yields this string (these can be drawn as a syntax tree, also called parse tree ) How about a (b + c)? 84. Many variants for writing an EBNF EBNFs are often written in a form intended to be readable, but only to the educated reader: Context free grammar for expressions (another EBNF syntactical variant) Expression EBNF ::= Term (('+' '-') Term)* Term ::= Factor (('*' '/') Factor )* Factor ::= Variable-Name Number '(' Expression ')' First rule defines the start symbol Terminals never occur in the left Use more RE-like syntax for right hand of rules Terminals show between quotes 85. Terminals can be regular expression as well Don t forget Variable-name and Number Can potentially be specified in EBNF Usually have no recursion in them Are usually written as regular-expression Are though of as tokens or non-terminals Context free grammar for expressions with regular epxressions for tokens Expression ::= Term (('+' '-') Term)* Term EBNF ::= Factor (('*' '/') Factor )* Factor ::= Variable-Name Number '(' Expression ')' Variable-Name := [a-za-z][a-za-z0-9]* Number := [+ -]? [0-9]+ 86. BNF vs. ENBF only strings of (terminals/non-terminals) can be used on the left hand side; no regular expressions in the original Backus Naur Form Context free grammar for expressions (plain BNF) Expression = Terms Terms: Term Terms: Term Addition Terms; Term: Factors; Factors: Factor: Factors: Factor Multiplication Factor; Factor: Variable-Name; Factor: Number; BNF Factor: '(' Expression ')'; Addition: '+'; Addition: '-'; Multiplication: '*'; Multiplication: '/'; 87. Ambiguity in context free grammars If there is a sequence of terminals with more than one derivation tree. Syntactical ambiguity often leads to semantical ambiguity, since there are several possible ways to understand the input. Good PL design avoids ambiguity It is algorithmically impossible to determine whether a BNF gives rise to ambiguity 88. Expressive power of context free grammars Some syntactical cannot be expressed even with EBNF. Examples Every variable used is previously declared The number of arguments in a procedure call equals the number of arguments in the declaration of the procedure Much more on grammars and identifying legal programs you will learn in the courses Automata and Formal Languages and Compilation Exercises 1. Let Σ be the set of strings over the alphabet Σ. (a) give a recursive definition for Σ. (b) explain why there is an exponential number of strings in Σ for every string in Σ (c) explain why the cardinalities of Σ and Σ is the same. 2. Give a recursive definition for Σ +, the set of strings over the alphabet Σ. 3. Show that RE(Σ) = RE(RE(Σ)) 4. Give set theoretical considerations why most subsets of Σ cannot be described as regular expressions. 14

5. Employ set theoretical considerations to (a) explain why Σ, the set of all infinite strings over Σ, has no finite recursive definition, i.e., a recursive definition in which both the number of atoms and the number of constructors is finite.

29 5. Employ set theoretical considerations to (a) explain why Σ, the set of all infinite strings over Σ, has no finite recursive definition, i.e., a recursive definition in which both the number of atoms and the number of constructors is finite. (b) determine whether there is such a recursive definition in which only the number of atoms is infinite? which cardinality should it be? (c) determine whether there is such a recursive definition in which only the number of constructors is infinite? which cardinality should it be? (d) Give an example of a formal language which is not a PL? 2.5 Tokens: the atoms of syntax Contents [31 frames] Kinds of tokens [19 frames] Library identifiers [7 frames] Starting point [3 frames] Writing a Hello, World! program in C Using Bash, Gnu/Linux, etc. Programming involves many technical activities: Actions Authoring Compiling Linking Executing rm f h e l l o. c a. out cat << EOF > h e l l o. c #i n c l u d e <s t d i o. h> Concretely Kinds of tokens 19 Frames: Tokens are the terminals of a CFG What do tokens denote? Names aka identifiers Nameables Nameable values in C/C++/? Legal names Variations What s Unicode? Names and kind distinction Names and naming conventions Names & your new PL Keywords Keywords & atomic types Atomic types in Pascal are predefined Redefinition of predefined identifiers Reserved identifiers Routines whose name is a reserved identifier? Summary: kinds of identifiers The Go PL Tokens 91. Tokens are the terminals of a CFG Context free grammar for expressions (plain BNF) Expression = Terms Terms: Term Terms: Term Addition Terms; Term: Factors; Factors: Factor: Factors: Factor Multiplication Factor; Factor: Variable-Name; Factor: Number; Factor: '(' Expression ')'; Addition: '+'; Addition: '-'; Multiplication: '*'; Multiplication: '/'; Kind (, ) punctuation *, + operators Variable-Name identifier Number literal Table 2.5.1: Classification of tokens in a BNF for simple expressions Note that comments do not show up in the grammar. i n t main ( i n t argc, char * argv [ ], char ** envp ) { 92. What do tokens denote? return printf ( Hello, World! \ n ) <= 0 ; Kind Example Denotes? EOF cc h e l l o. c main, i, printf, Identifier a nameable argv. / a. out 132, Literal itself 90. Authoring Hello, World! with the gvim text editor "Hello, World\n" Operator *, +, *, / a builtin function Punctuation ;,,, (, ) nothing (reading and parsing aide) Reserved word if, class, int Reserved identifier int, class, int primitive type Predefined identifier Integer, &primitive type Comments /* fubar */ Nothing! Table 2.5.2: What do tokens denote? Comments are not officially tokens, but they also belong to the atomic elements of the language. Figure 2.5.1: Authoring Hello, World! with the gvim text editor The gtksourceview library makes these so similar Still, the similarity of PLs makes gtksourceview possible! 93. Names aka identifiers Create an entity once, refer to it many times Essential for modular large-scale programming Largely a nuisance! 15

30 good names are scarce difficult to make up, type, read, and understand 94. Nameables Definition 2.3 (Nameable). A nameable is an entity kind, such as functions, modules, types, constants, variables, for which the programmer can provide a name. Nameable values in Pascal CONST Pi Pascal = Nameable types in C C // Type named struct Date: struct Date { int month, day, year; ; 95. Nameable values in C/C++/? Values are not in C, (the preprocessor is not part of the language) not in C++, neither in, but there are work-arounds: C/C++: : 96. Legal names // Only Cfor integer constants enum { BELL = '\b' TAB = '\t', NL = '\n', CR = '\r', const double E = ; const int Merssene7 C = ; final double E = ; final int Merssene7 = ; All PLs include a definition of legal names : Definition 2.4 (C identifiers). A C identifier is a series of alphanumeric characters, the first being a letter of the alphabet or an underscore, and the remaining being any letter of the alphabet, any numeric digit, or the underscore. Regular expression [_a-za-z][_a-za-z0-9]* (2.5.1) Most PLs follow the same pattern. But, there are always annoying exceptions: TEX: digit and underscores are forbidden Early Basic: a single letter, optionally followed by a digit 97. Variations lower/upper case due to historical or idealogical reasons. Length limit typically 6 8 in ancient languages,~32 or unlimited in modern languages Special characters Can a name contain a dollar (yes, in ), space (in Fortran), a quote (in JTL), or what have you? Unicode Ain t α an excellent variable name in certain contexts? 98. What s Unicode? a system for encoding characters more than 110,00 characters covers ~100 scripts, representing most of the world s writing systems Standard in Windows (NT/XP/Vista/2000/7), Linux, Mac OS X. Extends and replaces ASCII (7 bit standard, used primarily for American English) 99. Names and kind distinction Names can be used for very different entities. Readability 6, concerns as well as parsing issues make PL impose rules such as: Prolog first character determines grammatical role (lowercase: function; uppercase/underscore: variable) Perl first character determines structure, e.g., % for hashtable. Fortran first character determines type, e.g., i must be an integer Names and naming conventions Many PLs employ naming conventions, For distinguishing between categories, types must be capitalized. For making a single name out of multiple words: PascalCase FileOpen, WriteLn (e.g., Pascal) camelcase fileopen, getclass (e.g., ) under_scoring file_open (e.g., C) juxtaposition fileopen, \textbackslash (e.g., TEX) For denoting type, e.g., using the Hungarian Notation, denotes a variable arru8numberlist whose type is an array of unsigned 8-bit integers 6 no one knows what readability really means 16

31 101. Names & your new PL Try to understand the language peculiarities: What special characters are allowed? Why? Is there lower/upper case distinction? Why? Is there a length limit? Why? What is the language naming convention, if any? Remember, modern languages tend to: 1. impose no length limit 2. use Unicode 3. distinguish between upper and lower case 4. rely on conventions rather than syntax for distinguishing kinds Keywords Definition 2.5 (Keywords/reserved words). A keyword (also called a reserved word) is a string of characters which makes a legal name, yet, it is reserved for special purposes and cannot be used by the programmer for any other purpose. Pascal Examples: program begin end record of 103. Keywords & atomic types Each of the atomic types in is denoted by a keyword. Some C atomic types have names made of two, and even three keywords: unsigned short unsigned long int Some C atomic types have more than one name, e.g., long long int signed long signed long int Atomic types in Pascal are denoted by predefined identifiers 104. Atomic types in Pascal are predefined a somewhat confusing fact of life Definition 2.6 (Predefined identifier). A predefined identifier identifier that is bounded to an entity (such as type, function, procedure or value) this binding is made by the PL, with no programmer intervention can be bounded to another entity later on. Redefinition of predefined identifiers is legal, but might be confusing 105. Redefinition of predefined identifiers Confusing program Program misnomer; TYPE Double = Real; Boolean = Integer; VAR Integer: Boolean; Real: Double; Begin Integer := 3; Real := Integer / Integer; WriteLn('i = ',Integer,' r = ',real) end. i = 3 r = E+000 Observe that my pretty printer got confused as well Reserved identifiers Definition 2.7 (Reserved identifier). A reserved identifier is a keyword used as an identifier. The word int is a reserved identifier, it identifies the integer atomic type. The identifiers Integer and WriteLn in Pascal are not reserved. The programmer may redefine these. Not all reserved words are reserved identifiers: return in C (an atomic command). begin, end, program in Pascal (punctuation). struct in C (a constructor for creating compound types from other types) 107. Routines whose name is a reserved identifier? In most PLs, the names of standard routines (procedures and functions) are not reserved: They are either imported nor builtin A notable exception is AWK: print A builtin function, for printing. exit A builtin function, stopping execution. int A builtin function, for conversion into an integral type. Unlike Pascal, builtin names in AWK are reserved. 17

32 108. Summary: kinds of identifiers Identifiers: Reserved identifier Predefined identifier Library identifier Other In addition, we have those reserved words which are not identifiers. Identifiers reserved for future use Denotation of atomic entities Punctuation (often used in constructors of entities) Other, e.g., marking Boolean attributes (register, auto, static in the C PL) there are so many PLs, we cannot hope to classify them all 109. The Go PL Can you classify the identifiers and reserved words used here? // Hello, World! in Go package main import "fmt" func main() { // main function fmt.printf("hello, World!\n") package reserved word, punctuation import reserved word, punctuation func reserved word, punctuation main identifier, other fmt identifier, library Printf identifier, library Library identifiers 7 Frames: Why a library? Library identifiers Replaceable- vs. builtin- library Import by preprocessing Explicit (and implicit) import Implicit import Compilation unit 110. Why a library? The set of executable commands is always a recursively defined set. Derivation rules are language dependent, typically including blocks, iterations, conditionals, and routines Atomic executables include Commands denoted by keywords, e.g., return, break and continue Other atomic commands such as assignment. Invocation of routines Some routines are so low-level that they cannot be implemented within the language essential that there is little point in having each programmer redo them tiresome that most programmers could not be bothered implementing them 111. Library identifiers Definition 2.8 (Library). A collection of pre-made routines (or modules) that are available to the programmer. standard library replaceable (as in C) builtin library cannot be replaced (as in Pascal and AWK) Identification of entities in the library Reserved words rare (e.g., AWK) Pre-defined identifiers as in Pascal. Importing as in and C Replaceable- vs. builtin- library Replaceable Troublesome for programmer Small language specification Flexible Modular language design Library can be very large Most modern PLs Builtin Less work for the programmer Bulky language specification Rigid Tangled language design Library is typically small PLs designed for beginners and for one-liner/scripting Table 2.5.3: Replaceable- vs. builtin- library Dinosaurs: Languages such as Cobol which included huge builtin library tend to collapse under their own weight 113. Import by preprocessing Import at the source, textual level #include <stdio.h> int main(int argc, char *argv[], char **envp) { return printf("hello, World!\n") <= 0; 114. Explicit (and implicit) import Pre-Processor Your program declares which library identifiers it uses: The keyword import seems to be used in so many PLs. Other languages may use other keywords, e.g., uses 18

33 Semantics is greater than textual import. Properties of import usually carried out for a bunch of identifiers (for now, we call such a bunch a module) there is an implicit search path for the library may be used also for user-provided (non-library) modules may cause other modules to compile 119. Summary terminology identifiers, nameable entities (variables, values, functions, procedures, templates, namespaces, labels, modules), keyowrds, reserved identifiers, pre-defined identifiers literals, escaping, comments. separatist, terminist, and variations Implicit import Implicit Import Certain principal modules are automatically imported even if the programmer does not explicitly import these e.g., java.lang.* in Hello.java public class Hello { publis static void main(final String[] args) { System.out.println("Hello, World!\n"); 116. Compilation unit Definition 2.9 (Compilation unit). compilation unit is a portion of a computer program which is sufficiently complete to be compiled correctly. Usually a file Can be string, or a buffer of the editor Starting point 3 Frames: Order of execution Autarkic approach Summary terminology 117. Order of execution Yet, another thing to observe in any Hello, World! program: Normally sequential Can be changed by Conditional commands Iteration commands Parallelization commands Invoking routines But, in the presence of several compilation units, or even several routines in the same compilation unit, where do we start? 118. Autarkic approach au tar ky or au tar chy, pl. au tar kies or au tar chies 1. A policy of national self-sufficiency and nonreliance on imports or economic aid. 2. A self-sufficient region or country. 19

34 3 Values and types Contents [181 frames] 3.1 Value systems [34 frames] Symbolic values [6 frames] Semantics of S-expressions? [16 frames] Expressions [7 frames] References Exercises Introduction to types [11 frames] References Exercises The type constructors of Mock [40 frames] Power sets [5 frames] Cartesian product [5 frames] Integral exponentiation [2 frames] Unit type [2 frames] Branding [3 frames] Records [2 frames] Disjoint union [4 frames] Type None and type Any [3 frames] Mapping types [6 frames] Recursive type constructor [7 frames] References Exercises Type constructors in actual PLs [57 frames] Product type constructor [1 frame] Integral exponentiation [2 frames] Branding [4 frames] Union and choice/disjoint union type constructors [4 frames] Tags in concrete PLs [8 frames] Micro-Lisp in C [15 frames] Special types: Unit, Top & Bottom [10 frames] Mapping as functions and arrays [5 frames] Power sets [1 frame] Recursive types [7 frames] Atomic types [33 frames] Taxonomy of atomic types [9 frames] Set of primitive types as PL i.d. [5 frames] Integral primitive types [3 frames] More on language design [2 frames] Real numbers [6 frames] The character primitive type [6 frames] Strings as atomic types [2 frames] References Exercises Representation of types in memory [6 frames] What are values? Every PL 7 manipulates values. Intuitively, a value is an entity that exists during computation But, what are values? anything that may be manipulated by a program anything that may be passed as an argument to a Function or Procedure (in Pascal) Definition 3.1 (The values universe). Every PL has a set of values, e.g., integers, tuples, records, functions, Running a program of the PL, amounts to manipulation of members of this set. The values set is also called the universe of values Value manipulation Passing them to procedures as arguments Returning them through an argument of a procedure Returning them as the result of a function Assigning them into a variable Using them to create a composite value Creating/computing them by evaluating an expression 122. Values vs. variables A value is not a variable: A variable may store a value. A variable may also be undefined Some PLs (e.g., ML) do not offer variables. Why the confusion? In traditional imperative PLs, it is difficult to construct complex, interesting values, without using variables. People familiar with these PLs, tend to think of integers, reals, etc., as values, but they may have hard time understanding that there are also array values, function values, etc Machine representation vs. types vs. values A type system is a set of types. A type is set of values 8 Subsets are not necessarily disjoint. 3.1 Value systems Contents [34 frames] Symbolic values [6 frames] Semantics of S-expressions? [16 frames] Expressions [7 frames] References Exercises Subsets are not necessarily different Machine representation: mapping of an element in V to the machine. V L the set of all values of a language L, AKA the values universe. 7 here, and henceforth, PL = Programming Language 8 Later, we will see that it is not just any set of values 20

35 T L the type system of L (with respect to V L 9 ) T L V L (3.1.1) 3. An integer (non-essential) 4. A real number (non-essential) Examples: P L policy for representation of values of L on different machines M 1,M 2,..., P L : V L {M 1,M 2,... (3.1.2) hello (a.nil) (NIL.a) ( Hello.(war.lord) ) 124. Value structure 127. S-expression as binary trees Atomic value: is not composed of other values truth values, characters, integers, reals, pointers Composite value: is composed of other values records, arrays, sets, files The ways to create composite values in a PLs 10 are usually independent of its implementation The set of legal values in a PL s implementation: a closure of the atomic values in this implementation under the mechanisms the PL specification allows for creating composite values Symbolic values 6 Frames: Universes including very simple values V Lisp = S expressions S-expression as binary trees Examples of S- expression as binary trees List shorthand Quiz 125. Universes including very simple values To emphasize the difference, we start with simple, yet very useful value systems: Many PLs revolve around symbolic manipulation rather than numbers: Lisp 11 Mathematica Prolog In essence, in these PLs, all values are symbolic expressions. There are no types in any of these universes V Lisp = S expressions An S Expression 12 is 1. An Atom 2. (S 1.S 2 ), where S 1 and S 2 are S-expressions An Atom is 1. A string of any length (Many Lisp implementations ignore letter case) 2. The special value NIL. 9 For a set S, the notation S stands for the power set of set S, i.e., S = {S S S 10 Here, and henceforth, PLs = Programming Languages 11 Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot. Eric Raymond, How to Become a Hacker 12 S Expressions are symbolic expressions Lisp style. S-expressions are binary trees whose leaves are either string or NIL. Lisp assumes 13 that they are indeed represented as trees: a.nil NIL.a cons Internal nodes are called CONS nodes. car Left pointer cdr Right pointer a.a cons car cdr?? Figure 3.1.1: The CONS record 128. Examples of S-expression as binary trees cons car cdr a nil cons car cdr nil Hello.(war. lord) The list a cons car cdr a a a.a cons car cdr cons Hello car cdr war 129. List shorthand (a b c d) lord is shorthand (syntactic sugar) for ( a. (b. ( c.(d.nil) ))) In binary tree representation: 13 actual representation could be different 21

36 cons car cdr cons a car cdr cons b car cdr cons c car cdr Figure 3.1.2: The list (a b c d) in binary tree representation Take note that not all S-expressions can be written using the list notation Quiz d nil 1. What does () mean in the tree notation? 2. What does ((a b) (c d)) mean in the tree notation? 3. Can you give an example of an S-expression which cannot be represented using the list notation? Semantics of S-expressions? 16 Frames: Semantics of the list notation Should you care to run Lisp? Interpreters: read eval print loop (REPL) Demonstrating lists semantics in Lisp The four most basic Lisp functions The car/cdr notation The c[ad]+r syntactic sugar Understanding c[ad]+r syntactic sugar Names and literals in Lisp List & if λ-functions Named functions Summary: Values in Lisp V Prolog intuitively V Prolog precisely V Mathematica 131. Semantics of the list notation The evaluation of a list l = (a b c d) means apply function car(l) to the list of arguments cdr(l) Example > (+ 2 (* 3 4)) 14 Lisp What does it mean to evaluate? Atom Find out the definition of this atom in the symbol table. Literal Itself List Recursively evaluate all list elements, and then apply the first argument as a function to the remaining arguments Should you care to run Lisp? Installing and running Gnu-Lisp: The program 'gcl' is currently not installed. You can install it by typing: sudo apt-get install gcl % sudo apt install gcl Reading package lists Building dependency tree The following NEW packages will be installed: gcl 0 upgraded, 1 newly installed, 0 to remove and 38 not upgraded. % gcl GCL (GNU Common Lisp) CLtL1 Jul :54:39 Source License: LGPL(gcl,gmp), GPL(unexec,bfd,xgcl) > > ^D Observe that At first, Gnu-Lisp was not installed. User does as instructed and installs it User runs Gnu-Lisp User hits Ctrl-D at the prompt to exit 133. Interpreters: read eval print loop (REPL) All interpreters follow the same scheme 1. Read input (and parse it) 2. Evaluate (call function eval in Lisp) 3. Print the result 4. Loop 134. Demonstrating lists semantics in Lisp % gcl > (a b c d) Error: The function A is undefined. Fast links are on: do (si::use-fast-links nil) for debugging Error signaled by EVAL. Broken at SYSTEM::GCL-TOP-LEVEL. Type :H for Help. >> 135. The four most basic Lisp functions (quote γ) Do not evaluate γ (car γ) First element of the list γ (cdr γ) The rest of the list γ (everything but (car γ)) (cons γ δ) The list whose car is γ and whose cdr is δ Using car and cdr > (car (a b)) Error: Lisp The function A undefined > (car '(a b)) A > cdr '(a b) (B) > (cons '(a b) '(c d) ((A B) C D) 22

37 CAR 136. The car/cdr notation Contents of the Address part of Register number Also called in other PLs: CDR first head (or hd for short) Contents of the Decrement part of Register number Also called in other PLs: rest tail (or tl for short) 137. The c[ad]+r syntactic sugar For brevity, let s use the binding of the name z to the value ((a b) (c d)), then, i.e., evaluate then, (setq 'z '((a b) c d) ), Long form Short form Result (car (car z)) (caar z) A (cdr (car z)) (cdar z) (B) (car (cdr z)) (cadr z) C (cdr (cdr z)) (cddr z) (D) If: List: 139. Names and literals in Lisp The set function > (set a b) Error: The variable A is unbound. > (set 'a b) Error: The variable B is unbound. > (set 'a 'b) B Lisp > a B > (set a 3) 3 > a B > (print a) B B 140. List & if (list 'a 'b 'c) = (a b c) (list 'a 'b (list 'c 'd)) = (a b (c d)) (list (list 'a 'b) (list 'c 'd)) = ((a b) (c d)) (if 'x 'a 'b) = a (if (list 'x) 'a 'b) = a (if '(x y) 'a 'b) = a (if NIL 'a 'b) = b (if () 'a 'b) = b Table 3.1.1: sugar Examples of using the c[ad]+r syntactic Notes: Unboundedly long Can be cumbersome Awkward, and therefore does not exist with the first/rest and head/tail terminology Understanding c[ad]+r syntactic sugar z = ((a b) c d) car cdr 141. λ-functions Function lambda makes it possible to define anonymous functions: Defining a λ-function (lambda (x) (cons (cdr x) (car x))) (LAMBDA-CLOSURE () () Lisp () (X) (CONS (CDR X) (CAR X))) Defining and applying a λ-function: Lisp ( (lambda (x) (cons (cdr x) (car x))) '( (a b) (c d) ) ) (((C D)) A B) 142. Named functions (car z) (cadr z) a car (cadar z) b cdr car (cdar z) cdr (cddar z) nil (cadr z) c car (caddr z) d (cdr z) cdr car (cddr z) cdr (cdddr z) Figure 3.1.3: Understanding c[ad]+r syntactic sugar nil Defining (defun foo (x) (cons (cdr x) (car x))) FOO Lisp Applying: (foo '( (a b) (c d) )) (((C D)) A B) Lisp 23

143. Summary: Values in Lisp Extremely simple Have no types Simple basic operations: car, cdr, cons, list, quote, if, defun Can be used to compute anything!

38 143. Summary: Values in Lisp Extremely simple Have no types Simple basic operations: car, cdr, cons, list, quote, if, defun Can be used to compute anything! recursion) (conditions and The mother of all functional PLs V Prolog intuitively Simple: Almost the same as in Lisp 14. Specifically: trees of arbitrary degree: Internal nodes carry label (unlike S-expressions) Leaves are either Labels Symbolic variables Only one fundamental operation on values: 15 Unification Given two trees, replace (if possible) variables in each of them so that they become the same tree V Prolog precisely All values are terms. A term is either one of Atom a name with no inherent meaning. Number Variable must start with an upper case letter Composite which includes 1. an atom called functor 2. a list of any number of terms (arguments). A list is a kind of term written as this [a,b,c,x] 146. V Mathematica Same as Prolog/Lisp with lots of syntactic sugar: 14 recall the isomorphism between binary trees and forest of general trees 15 Come back to this slide at the end of the course Figure 3.1.4: The many ways for presenting the symbolic values of Mathematica Expressions 7 Frames: More traditional values V ML : values in ML Expressions Expressions are recursively defined Function call expression constructor Functions vs. operators Syntactical differences between functions & operators 147. More traditional values V Bash : Values of Bash 1. Numbers 2. Strings 3. List of values. 4. One dimensional arrays V ML : values in ML All the values in ML are first-class values: Atomic values truth values, integers, reals, strings. Composite values records, tuples (records w/o field names), constructions (tagged values), lists, arrays. Function values References to variables What we can do in ML but not in Pascal: create a record composed of two functions write a function that gets a function f : int int and returns the composition of f with itself write an expression whose value is a reference to a variable 24

39 149. Expressions Definition 3.2 (Expression). An expression is a part of a program whose evaluation during computation outcomes with a value. ex- in Several pressions Pascal chr(ord('%')+1) 'Hello, world' 2*a[i]+7 Pascal sqr(4) q^.head An expression in ML if leap(year) ML then 29 else Expressions are recursively defined Naturally, each PL is different, but the general scheme is: Atomic expressions literals variable inspection Expression constructors Operators such as +, -, Function calls The set of atomic expressions and the constructors set are PL dependent, but the variety is not huge Function call expression constructor Dynamic typing version If f is a function taking n 0 arguments, and E 1,...,E n are expressions, then the call f (E 1,...,E n ) is an expression. Static typing version Let f be a (typed) function of n 0 arguments, f τ 1 τ n τ. Let E 1,...,E n be expressions of types τ 1,...,τ n. Then, the call f (E 1,...,E n ) is an expression of type τ. Both are: 152. Functions vs. operators constructors of expressions apply an operation to values Differences are largely syntactical Name Position, e.g., prefix or infix Parenthesis Precedence rules User overloading 153. Syntactical differences between functions & operators Functions Operators Position? prefix prefix, infix, prefix Precedence rules? Parenthesis required? Name? identifier Punctuation characters: + 1, + 2, ++, <<, <<<=, Reserved identifier: new, sizeof, instanceof, Reserved+punctuation: new[], delete[], Arity? 0,1,2,3,... 1: prefix/postfix operators 2: infix operators 3: C s? : Overloadable by programmer?, C++: Pascal, C: C++:,Pascal, C: Table 3.1.2: Syntactical differences between functions & operators References CAR and CDR CONS defun lambda Lisp Mathematica S-Expressions Values of Prolog Wolfram (PL used in Mathematica) Exercises 1. Why are pointers atomic values, where the pointer types are compound? 2. What is the difference between an expression and a value? 3. What does () mean in the tree notation? 4. What does ((a b) (c d)) mean in the tree notation? 5. Provide an example of an S-expression which does cannot be represented using the list notation 6. What s the difference between (car 'a) and (car a)? 3.2 Introduction to types Contents [11 frames] References Exercises

40 154. Visualizing the type system A PL L has a set of values, V L with many values in it. A type is a set of these values, e.g., T 1 = 2. Types may be disjoint, e.g., T 1 T 2 = /0, or contained, e.g., T 1 T 3. and even intersecting, e.g., T 3 T 4 = /0. A type may be empty, e.g., T 5 = 0, a singleton, e.g., T 6 = 1, and even contain the entire set of values e.g., T 7 = V L T 2 v 6 v 7 V L T 4 v 3 v 8 T 1 v v 9 1 v 2 v 11 v 4 v 10 T 3 T 5 Figure 3.2.1: Visualizing the type system The type system is the set of all types; in our example, V = {T 1,T 2,T 3,T 4,T 5,T 6,T Purpose of type? We have values; why do we need types? Taxonomy of values; describe data effectively Legality determine set of legal operations on values (prevent nonsensical operations, e.g., multiply a pointer by a set) Semantics determine semantics of operations on values But, also, defines program-machine interface: Program Machine how to represent values on different machines Machine Program how to interpret sequences of bits as actual values 156. What s in a type system? A set of subsets of the values universe T L V L. (3.2.1) T 7 v 5 T 6 Each type is a subset of the values universe. τ T L τ V L. (3.2.2) Types do not make a partition of V L One type may be contained in another The intersection of two types is not necessarily empty Must belong to the recursively defined type system 157. Type systems are defined recursively Atomic types Make the recursion s base. Mostly pre-defined in the PL. Programmer defined atomic types also happen. AKA: basic types or primitive types. Type constructors Defined by the PL. Typically modeled after the theoretical type constructors Types look like sets of values Each type T, defines a S T, the set of values of T. Types describe values and expressions With every type there is set predicate : Values A value v is of type T, iff v S T Expressions An expression E belongs in type T iff every value v that E may evaluate to, satisfies v S T Shorthand for S T? T = S T? a common abuse of notation by which T sometimes means S T get ready, everyone uses makes this abuse 159. Not all sets of values make a type Every type is a set, but not every set is a type. Only sets which are atomic types, or constructed by the type constructors are types. Type T also defines a set of allowed operations Conversely, all v T must recognize the same operations similar response to each operation Not every subset of V L makes a type: {4.20, Cannabis, true not a type (in most PLs) {false,true is a type (in most PLs) 26

41 160. Type equality vs. set equality Every type is a set, but not every set is a type. Type containment: When is T 1 T 2? Important, e.g., for assuring compatability of actual to formal parameter. If it holds for the corresponding sets that T 1 T 2? Not always! Actually, the answer is most often negative! Type equality: When is T 1 = T 2? If it holds for the corresponding sets that T 1 = T 2? Not always! Actually, the answer is most often negative! 161. Value type association For v V, let types(v) = {T V L v T (3.2.3) Sometimes values have more than one type, Often, v #types(v) > 1 (3.2.4) #types(v) > 1 often #types(v) = (3.2.5) e.g., in C, 0 belongs to all pointer types T T C 0 T * (3.2.6) 162. How does the PL/compiler/runtime uses types? Given v V and an operation op Checking use types(v) to determine whether op is applicable to v. Semantics use types(v) to determine the semantics of op when applied to v. Inference determine the type of the application of op to v Example: significance of type Is the following expression legal? What type is the result? X[i] In C Yes if i is a pointer and X is an integer, e.g., 16 int CX = 12; int *i = &X; X[i]; // What does it mean? No if i is a floating point number and X is a char, float Ci = 4.20; char X = 'c'; X[i] // 16 there is no typo in the following code 164. Types & underlying machine Machine language all values are untyped bit patterns tagged architectures add type information to values very rare no support for compound types Assembly language minimal support for types, e.g., addresses vs. data High-level PLs attach types to values expressions memory cells References Data type Lisp machine Tagged architecture Type safety Type system Exercises 1. What are the three purposes of a type system? 2. What s the difference between a type and a value? 3. For which v it holds that types(v) = /0? explain. 4. What s the difference between a type and a variable? 5. describe data effectively, explain using examples if necessary. 6. Provide an example of a value v for which 1 < #types(v) < 7. Enumerate all qualities that distingguish between a type and a set of values. 3.3 The type constructors of Mock Contents [40 frames] Power sets [5 frames] Cartesian product [5 frames] Integral exponentiation [2 frames] Unit type [2 frames] Branding [3 frames] Records [2 frames] Disjoint union [4 frames] Type None and type Any [3 frames] Mapping types [6 frames] Recursive type constructor [7 frames] References Exercises

42 165. The Mock type constructors Mock is not a real PL, so we can go wild: idealization. the abstract notion behind each type constructor model for achievement type errors Power sets 5 Frames: Power sets Definition of type constructors is not enough Value constructors for power sets Manipulating values of power sets Mock does not bother with practical issues 166. Power sets Definition 3.3 (Power set type constructor). If T T is a type, then so is T, its power set comprising all subsets of T, i.e., T = {T T T. (3.3.1) An alternative notation T = 2 T (3.3.2) We have, #( T ) = 2 #T (3.3.3) where #T denotes the cardinality of T. Pascal is probably the only language which natively supports this type constructor Definition of type constructors is not enough Types worth nothing without: means for creating values operators for manipulating values: 169. Manipulating values of power sets Operators on values of power sets Testing for membership. For v T and u T, the code phrase v u is a membership test Testing for equality. For u 1,u 2 T, the code phrase u 1 = u 2 tests for set equality Set union operator. For u 1,u 2 T, we have u 1 u 2 T Set intersection operator. For u 1,u 2 T, we have u 1 u 2 T Set difference operator. For u 1,u 2 T, we have u 1 \ u 2 T Set complement operator. For u T, we have u T 170. Mock does not bother with practical issues The language definition hardship It is easy to define things mathematically, but it takes great language design effort to put these in a programming language. Are power sets really useful? Can they be implemented efficiently? How should programmers use their plain keyboards to keyin code phrases such as /0, v u, and u? This issues are dealt with in real PLs Value constructors for power sets Value constructors are operators which take values of T and return values of type T Nullary value constructor. The empty set,, is a value of all power sets: T /0 T (3.3.4) in fact, /0 is a literal. Unary value constructor. Let s use the curly brackets: v T {v T (3.3.5) Cartesian product 5 Frames: Cartesian product type constructor Operators for product type Cartesian products of three or more types Equality of types? Type error 171. Cartesian product type constructor Definition 3.4 (Cartesian product type constructor). If T 1 and T 2 are types, their Cartesian product is a type denoted by T 1 T 2 ; values of T 1 T 2 are T 1 T 2 = { v 1,v 2 v 1 T 1 ;v 2 T 2. (3.3.7) n-vary value constructor. Generalizes the unary value constructor: v 1,v 2,...,v n T,n 1 {v 1,v 2,...,v n T (3.3.6) Note that #(T 1 T 2 ) = (#T 1 ) (#T 2 ) (3.3.8) 28

43 172. Operators for product type 175. Type error Composing Given v 1 T 1, v 2 T 2 the binary composition operator, creates a value v 1,v 2 T 1 T 2, i.e., v 1 T 1,v 2 T 2 v 1,v 2 T 1 T 2 (3.3.9) Decomposing two unary decomposition operators ( )#1 ( )#2 If v = v 1,v 2 T 1 T 2, then v#1 = v 1 v#2 = v 2 Thus, we have v T 1 T 2 v#1 T 1 v#2 T 2 (3.3.10) Type error (v 1,v 2 )#3 Suppose that v = v 1,v 2 T 1 T 2 and the program evaluates v#3 Definition 3.5 (Type error). A program commits a type error on a value v T if it makes an attempt to manipulate v in a way which is inconsistent with T. interpret machine representation of v in a way which is inconsistent with T. PLs report type errors Dynamically. When the program tries to commit them Statically. When it cannot be proved that the program will not commit them. Foruntately, the type error v#3 can be detected statically Integral exponentiation 173. Cartesian products of three or more types The Cartesian product type constructor is easily generalized to more than two types. Commutativity? Never! Associativity? Depending on the PL semantics: Structural R S T = R (S T ) = (R S) T Nominal R S T R (S T ) (R S) T R S T Nominal semantics is more common Equality of types? The mathematical equality hardship Even if x and y are essentially the same, a fully formal definiton may force the claim x y. From a practical point of view, the following two types are equivalent: T 1 T 2 T 3 (T 1 T 2 ) T 3 Can we write T 1 T 2 T 3 = (T 1 T 2 ) T 3? PL tend to be idiosyncractice in their definition of equality. It takes non-trivial language design effort to make the two types equal. Many languages don t bother. 2 Frames: Integral exponentiation Operators for integral exponentiation 176. Integral exponentiation Integral exponentiation makes homogeneous tuples: a Cartesian product where all the tuple components are chosen from the same type. Definition 3.6 (Integral exponentiation type constructor). For a type T T and n N, the integral exponentiation of T to the power of n, T n, is defined by n times {{ T n = T T (3.3.11) Observe that #(T n ) = (#T ) n. (3.3.12) 177. Operators for integral exponentiation Composition Given values v 1,v 2,...,v n T, the composition operator [,..., ] evaluates to a value [v 1,v 2,...,v n ] T n Decomposition Given a value v = [v 1,...,v n ] T n and an expression e of type integer, the v#e, is v i, where i is the value to which e evaluates. Issues: How should Mock deal with Overflow error i < 1 or i > n Required operations programmers need insertion, appending, merging, and many other operations to generate values of type T n Type error e is not an integer 29

44 3.3.4 Unit type 2 Frames: Unit type Properties of Unit 178. Unit type What s a natural number? How should Mock define T 1? Is T 1 = T? Oops! Not a very interesting case; PLs make arbitrary decisions. What does T 0 mean? Is it the same for all T? Yes! It is a useful and interesting type. Definition 3.7 (The Unit type). The Unit type is T 0, where T is some type; alternatively, Unit is a Cartesian product of zero types 181. Branding type constructor Let I be an infinite set of identifiers (often called labels) Definition 3.8 (Branding). If T T is a type and l I is label then l(t ) is the l brand of T where, l(t ) = { l,v v T. (3.3.16) Characteristics: l I T l(t ) (3.3.17) l 1,l 2 I l 1 l 2 l 1 (T ) l 2 (T ) (3.3.18) Unit can be thought of as application of product to branding A composite type created from an arbitrary component. An atomic type 179. Properties of Unit Unit is the neutral element of Cartesian product Unit is not the empty set, Unit /0. #Unit = 1 (3.3.13) 182. Operators for branding Creation A value of type l(t ) is created from a value T v T l(v) l(t ) (3.3.19) Extraction A value v T can be extracted from type l(t ) Unit has exactly one value: the 0-tuple: Unit = {(). (3.3.14) A Unit variable is not really a variable, since it can store only one value, which can never be changed. #bits required to store a vale of type Unit: lg 2 Unit = lg 2 1 = 0. (3.3.15) Branding 3 Frames: Motivation for branding Branding type constructor Operators for branding 180. Motivation for branding It is often the case that we want to make a new type of an existing type, without adding anything new to the type definition: The MKSC system of Units Length A real number, designating meters Mass A real number, designating kilograms Time A real number, designating seconds Coulomb A real number, designating electrical charge Records l(v) T l(v)#l T (3.3.20) 2 Frames: Labeled Cartesian products: records Operators for record type 183. Labeled Cartesian products: records Cartesian products positional access to components: 1 st component, 2 nd component, 3 rd component, 4 th component, etc. Records Access component by (hopefully, meaningful) name Definition 3.9 (Record type constructor). Let l 1,...,l n I, n 0 be a set of unique labels. let T 1,...,T n be types. Then, {l 1 : T 1,...,l n : T n is the record type induced by the labels l 1,...,l n types T 1,...,T n In a sense, record types can be thought of as and {l 1 : T 1,...,l n : T n = l 1 (T 1 ) l 1 (T n ) (3.3.21) 30

45 184. Operators for record type Composition and decomposition operators are easy to define Composition values of the record type {l 1 : T 1,...,l n : T n are created by by the n-ary operator {l 1 : ( ),...,l n : ( ): v 1 T 1,...,v n T n {l 1 : v 1,...,l n : v n {l 1 : T 1,...,l n : T n (3.3.22) Decomposition Given a value, {l 1 : v 1,...,l n : v n, the unary decomposition operator ( )#l i (for all 1 i n) evaluates to v i : i,1 i n {l 1 : v 1,...,l n : v n #l i = v i (3.3.23) Disjoint union 4 Frames: Union type constructor? Choice types: disjoint union type constructor Type equality hardship, again Operators for choice type 185. Union type constructor? Definition 3.10 (Union type constructor???). If T 1,T 2 T are types, then so is their union, T 1 T 2. Problem: what if T 1 and T 2 are not disjoint? In particular, they may even be equal Suppose that value v T 1, then v also belongs to T 1 T 2, but how do we know whether it belongs to T 1 or to T 2? The union must be disjoint! Mock does not have a union type constructor Choice types: disjoint union type constructor Definition 3.11 (Choice type). If T 1,T 2 T are types, then so is their disjoint union, T 1 + T 2, defined by the set T 1 + T 2 = l 1 (T 1 ) l 2 (T 2 ) (3.3.24) and where l 1,l 2 I, l 1 l 2 are some labels. The notation follows from the fact that #(T 1 + T 2 ) = #T 1 + #T 2. (3.3.25) Labels; often called tags in the context of disjoint unit used for telling whether v T 1 + T 2 came from T 1 or T 2. are arbitrary; any l 1,l 2 will do as long as l 1 l Type equality hardship, again In a sense, branding is similar to disjoint union of one type. However, the support of branding in PLs is usually distinct than that of disjoint union. Enumerated types are similar to disjoint unions: Definition 3.12 (Enumerated type constructor). If l 1,l 2,...,l n I, n 1 are labels, then {l 1,l 2,...,l n is an enumerated type, whose values are l 1,l 2,...,l n. An enumerated type can be thought of as a disjoint union of branded Unit types: {l 1,l 2,...,l n = l 1 (Unit)+l 2 (Unit)+ +l n (Unit) (3.3.26) 188. Operators for choice type Let T 1,T 2 be types Creation Use the l 1 and l 2 operators v 1 T 1 l 1 v 1 T 1 + T 2 v 2 T 2 l 2 v 2 T 1 + T 2 (3.3.27) Checking Use the?l 1 and?l 2 unary operators v T 1 + T 2 v?l 1 = Operator?l 2 is defined similarly { true v = l 1 v 1,v 1 T 1 false v = l 2 v 2,v 2 T 2. (3.3.28) Decomposing Given v = l 1 v 1 T 1 + T 2, the unary decomposition operator #l 1 evaluates to v 1 T 1. Operator #l 2 is defined similarly Type error when evaluating v : l 1 in the case that v = l 2 (v 2 ), v 2 T Type None and type Any 3 Frames: The None type Properties of type None The Any type 189. The None type I shall now use the terms None, Bottom, and interchangeably. Please try not to be confused. Definition 3.13 (The Bottom type). Type None, also known as Bottom, and often denoted, is the empty set, i.e., None Bottom /0. (3.3.29) Obviously, #Bottom = 0. 31

46 190. Properties of type None Type is derived from the choice constructor just as Unit is derived from Cartesian product the neutral element of the choice type constructor Cardinality is zero! no legal values 193. Mapping & partial mapping Definition 3.17 (Mapping (partial mapping) type constructor). If T and S are types, the (partial) mapping from S to T, denoted S T (sometimes also T S ) is S T = {m m is a (partial) mapping from S to T. (3.3.36) 191. The Any type I shall now use the terms Any, All, Top, and interchangeably. Please try not to be confused. Definition 3.14 (The Any type). Type Any, also known as All, or Top, and often denoted, is the universal set, i.e., for a language L with values universe V L, Any = V L. (3.3.30) 194. Cardinality of mapping an ancient joke made even more silly What s the difference between a professor and a diplomat? A diplomat When she says yes, she means maybe ; when she says maybe, she means no ; and, when she says no, she is not a diplomat! Type : The type of any arbitrary value that L may generate T T : T T T : T = T T : + T = T T : + T = Mapping types T (3.3.31) (3.3.32) 6 Frames: Mappings vs. partial-mapping definition Mapping & partial mapping Cardinality of mapping Mapping is similar to exponentiation Mapping and integral exponentiation Unifying equation 192. Mappings vs. partial-mapping definition Definition 3.15 (Mapping). a mapping (also called function) from a set S to a set T is a set m S T, that associates precisely one value of T with each value of S, i.e., s S {t (s,t) m = 1 (3.3.33) Definition 3.16 (Partial mapping). We say that m S T is a partial mapping (partial function), from set S to set T if it never associates more than value of T with any value of S (but there may be members of S for which m associates no members of T ), i.e., s S {t (s,t) m 1 (3.3.34) Ideally, T = T Boolean (3.3.35) A professor When he says yes, he means no, when he says no, he means yes, and when he says maybe, he is not a professor! Some Pascal types TYPE Saying = (saysyes, saysmaybe, saysno); Meaning = (meansyes, Pascal meansmaybe, meansno, identitycrisis); Interpretation = Array[Saying] of Meaning; With these types, a diplomat is defined by, VAR diplomatinterpretation: Pascal Interpretation; Begin diplomatinterpretation[saysyes] := meansmaybe; diplomatinterpretation[saysmaybe] := meansmaybe; diplomatinterpretation[saysno] := identitycrisis; End Thus, variable diplomatinterpretation stores ( Formally we did not introduce the notion of storage, but it should be clear what it is.) Thus, variable diplomatinterpretation stores the value meansmaybe, meansmaybe, identitycrisis. This value is a tuple, a triple to be specific. Its type is Meaning 3 Similarly, the representation of the interpretation of a professor sayings is meansno, identitycrisis, meansyes. Cardinaltity of Mapping? 32

47 So, we have a diplomat and a professor How many possible different characters are there? How many different values are there of type Interpretation? We have, #Saying = 3 #Meaning = 4 To specify an array of type Interpretation you must provide value (one of four) to each tuple entry (three entries in total): #Interpretation = 4 3 = #Meaning #Saying = 64. We have, 195. Mapping is similar to exponentiation #(S T ) = #T #S. (3.3.37) Suppose that we write the S T = T S (3.3.38) Then, currying (S 1 S 2 ) T = S 1 (S 2 T ) (3.3.39) looks very much like the exponential identity T S 1 S 2 = ( T S 2 ) S 1. (3.3.40) 196. Mapping and integral exponentiation Let n denote the type obtained by taking the n sized subrange of the integer type n = 1,...,n. (3.3.41) Then, the mapping n R which is the type of a real array in Fortran 18 is isomorphic to R n, i.e., n R = R n (3.3.42) Ain t it fortunate that we allow ourselves to write the mapping n R also as R n? Note that Unit = 1. (3.3.43) 197. Unifying equation Euler s equation involving the five most important constants of mathematics e iπ + 1 = 0. (3.3.45) The type theory equivalent = 1. (3.3.46) There is precisely one function that maps the None type to the All type. (This function is empty, but should we care?) Recursive type constructor 7 Frames: Recursive type definition Bottom up solution of recursive type equations Solving the list recursive equation Does it converge? Taylor series solution of recursive type equations Understanding the Taylor series expansion Summary: theoretical type constructors 198. Recursive type definition Definition 3.18 (Recursive type definition). Let T 1,...,T m be types, let τ 1,...,τ n be type unknowns, which are the types to be defined, and let E 1,...E n be type expressions over T 1,...,T m,τ 1,...,τ n and using type constructors such as disjoint union, Cartesian product, etc., then the system of equations τ 1 = E 1 (T 1,...,T m,τ 1,...,τ n ). τ n = E n (T 1,...,T m,τ 1,...,τ n ) (3.3.47) defines new types σ 1,...,σ n as the minimal solution of the above. Adequate constraints must be placed on the system of equations so that it would be solvable ; we will not deal with these here Bottom up solution of recursive type equations Consider the case n = 1, i.e., an equation with only one type variable, σ σ = E(T 1,...,T m,σ) (3.3.48) then, we build the following approximations, σ 0 = /0 = σ 1 = E(T 1,...,T m,σ 0 ) = E(T 1,...,T m, ) σ 2 = E(T 1,...,T m,σ 1 ) = E(T 1,...,T m,e(t 1,...,T m, )). σ n+1 = E(T 1,...,T m,σ n ) = E(T 1,...,T m,e(t 1,...,T m,σ n 1 )). and, None = 0. (3.3.44) σ = σ i. i=0 (3.3.49) 18 Unlike C, the first index of a Fortran array is 1 33

200. Solving the list recursive equation datatype intlist = nil ML cons of int * intlist For brevity, 1. let τ denote the unknown, 2. write Z instead of int 3. write µ instead of nil 4.

48 200. Solving the list recursive equation datatype intlist = nil ML cons of int * intlist For brevity, 1. let τ denote the unknown, 2. write Z instead of int 3. write µ instead of nil 4. write ς instead of cons τ = µ + ς(z τ) = µ + ς (Z ( µ + ς ( Z τ) )) = µ + ς(z µ) + ς ( Z ς(z τ) ) ( ) = µ + ς(z µ) + ς Z ς(z µ) + Figure 3.3.1: Using Mathematica to solve the equation imposed by the recursive binary tree type 201. Does it converge? Many (boring) theorems and (even more boring) proofs regarding 203. Understanding the Taylor series expansion convergence uniqueness of solution τ = 1 + Z + 2Z 2 + 5Z Z Z 5 + (3.3.51) independence in the order of application proper structure of E 1,...,E n Not discussed here 202. Taylor series solution of recursive type equations 1 tree with 0 nodes 1 tree with 1 node Z 5 trees with 3 nodes Z Z Z Z Z Z Z Z Z t = (1 + T) = 1 + Z t t (3.3.50) 2 trees with 2 nodes Z Z Z Z Z Z // Type alias: typedef // for an // Cincomplete type: struct T *t; // completing the // type definition struct T { int data; t left; t right; ; Z Z Z Figure 3.3.2: Distinct topologies of binary trees with with n nodes; n = 0,1,2,3; each node stores an integer i Z 204. Summary: theoretical type constructors Z 34

49 Nullary Type constructors n-ary Unit Null Any Enum Cartesian product Disjoint union General recursive 5. Why is that the disjoint union type constructor, does not have a positional equivalent, in the same manner that the record type constructor has a positional equivalent in the form of tuple type constructor? Unary Power set Branding Exponentiation Simple recursive Binary Mapping 6. Explain why it makes sense that PLs for matrix, vector and other mathematical computation would not distnguish between a value of an array of of length 1 and a scalar value. Figure 3.3.3: Taxonomy of the type constructors of Mock References Theory vs. practice Type constuctors in real PLs are not so elegant; their shape is invariably a compromise between many conflicting practical demands. Algebraic types Bottom type Currying Disjoint union Empty type First class citizens First class functions Function type Nested function Recursive data type Set type constructor Tagged union Top type Tuple types Type punning Type system Unit type Void safety Void type Exercises 1. What s the type of the composition operator for tuple type? and, for record type? 2. Define a set of operators for mappings in Mock. 3. Enumerate the differences between operators and functions? (what s common to them?) 4. Define a set of operators for power sets in Mock. 7. Which type errors can mappings produce? 8. Let L be a PL, then, For Enumerate all type constructors of L Which of these are modeled after the constructors mentioned in this subsection? Which type constructors of L cannot be modeled after the constructors mentioned in this subsection? Which of the constructors mentioned in this subsection, does not exist in L? why? Does L have a Unit type? is this type first class or not? Does L have a None type? is this type first class or not? Does L have an All type? is this type first class or not? L = C++,Pascal,ML,Go,MetaPost,D, What is the difference between value and type? 10. Why doesn t Eiffel have a Choice type? 11. Does it follow from the fact that cardinality of the set of all Pascal programs in ℵ 0, that #V Pascal = ℵ 0? 12. Can you explain why arrays with a variety of index types (as in Pascal and Ada) seem to have vanished? 13. Provide an example 19 for a composite type whose values are atomic? 14. Provide an example 20 for a atomic type whose values are composite? 15. Provide an example 21 for a composite type whose values are composite? Can you tell whether the values are composed in the same way that the type is composed? 16. Is it true that v V L T T L v T? Explain. 17. Why has no Choice type? 18. Present a feature of void which incomplete types don t have. 19. What s the difference between a value and a variable? 20. Explain why languages designed for string processing would not distinguish between an array of length 1 of characters and a scalar value. 19 i.e., a specific type in a specific PL and perhaps even values 20 ditto 21 ditto 35

50 21. Why is that very few languages have a power-set type constructor? 22. Explain how it is possible to view functions with more than one parameter in, e.g.., Pascal, as taking a tuple type argument. Refer both to the definition of the function and its invocation. 23. What does nullptr denote? Type, variable, value, or something else? Should it be considered an identifier? 24. What is the difference betwween type aliasing and type branding? 25. Explain why many examples in this subsection mention variables, despite the fact that this course did not discuss variables yet? 3.4 Type constructors in actual PLs Contents [57 frames] Product type constructor [1 frame] Integral exponentiation [2 frames] Branding [4 frames] Union and choice/disjoint union type constructors [4 frames] Tags in concrete PLs [8 frames] Micro-Lisp in C [15 frames] Special types: Unit, Top & Bottom [10 frames] Mapping as functions and arrays [5 frames] Power sets [1 frame] Recursive types [7 frames] Product type constructor 1 Frames: Tuples vs. records in ML 205. Tuples vs. records in ML ML tuples ML type person = string * string * int * real if (#3 someone) >= 18 then else val (surname, forename, age, height) = someone if age >= 18 then else ML records type person = { ML surname: string, forename: string, age: int, height: real Integral exponentiation 2 Frames: Integral exponentiation gives rise to array values Array values? 206. Integral exponentiation gives rise to array values Integral exponentiation is useful for understanding arrays. However, Integral exponentiation is meaningful only for languages like C and Fortran in which the indices or array are integers in the range 0,1,... (in C) or 1,2,... (in Fortran). In Pascal any discrete type may serve as array index so integral exponentiation does not apply. Pascal, Fortran and C offer array variables We are still in the context of values, not variables Array values? Many PLs offer array variables; array values are usually not first class: Pascal C Many limitations on array values cannot create an array value, without the help of a variable. cannot name an array value in the const section. can pass array values as parameters to functions and procedures but not return these Also limits array values Array values initializer (can be thought of as array literal) int primes[5] = C{ 2, 3, 5, 7, 11 ; no other array literals cannot pass array values as parameters to function cannot return arrays Branding 4 Frames: Does C s typedef brand types? Does Pascal s TYPE definition brand types? Pascal offers automatic coercion from and to branded type Using C s structures for branding 208. Does C s typedef brand types? Some typedefs typedef double C meters; More typedefs typedef double seconds; typedef double C coulombs; // Trying Cthe above typedefs void print_seconds(seconds s) { printf("%gsecs",s); meters m; kgs k; seconds s; coulombs c; main() { print_seconds(s); // print_seconds(m); // print_seconds(k); // print_seconds(c); // Conclusion: C s typedefs do not brand! 36

51 209. Does Pascal s TYPE definition brand types? Definitions: TYPE Meteres = Real; Seconds = Real; Kgs Pascal = Real; Coulombs = Real; Trying this out: VAR m: Meters; s: Seconds; k: Kgs; c: Coulombs; Procedure PRINT_SECONDS(s: Seconds); Begin Write(s, 'sec'); end; Pascal Begin PRINT_SECONDS(s); // PRINT_SECONDS(m); // end. Conclusion: Pascal s TYPE definitions do brand! Definitions: 210. Pascal offers automatic coercion from and to branded type TYPE Seconds = Real; VAR s: Seconds; r: Real; Procedure PRINT_SECONDS(s: Seconds) Begin Pascal Write(s, "sec"); end; Begin PRINT_SECONDS(s); // PRINT_SECONDS(r); // end. Conclusion: Pascal s TYPE definitions do brand! 211. Using C s structures for branding C s typedef does not create a new type provides an alias for an existing type. Use struct to create new types: struct CAddress { char *s; ; struct Name { char *s; ; struct Address a; struct Name n; a = n; // Union and choice/disjoint union type constructors 4 Frames: The hotel thermometer Temperature tagging: why unions must be disjoint? The 3 operations on disjoint union in ML Disjoint union in Pascal 212. The hotel thermometer yet another tagging example Guests to Hotel California come from: USA Use Fahrenheit Israel Use Celsius The room data structure C struct Room {. // Many room facilities struct Displays {. // Many indicators union Temperature { float fahrenheit; float celsius; temperature; controls; ; 213. Temperature tagging: why unions must be disjoint? Temperature is a real number, either way Tagging is needed for safe decomposition. Simple set union: R R = R With tagging we obtain Temperature = ( {Celsius R ) ( {Fahrenheit R ) Tagged temperature C struct TemperatureS { enum { Fahrenheit, Celsius units; union TemperatureU { float fahrenheit,celsius; value; temperature; 214. The 3 operations on disjoint union in ML Definition ML datatype number = exact of int approx of real; Value construction exact(i + 1) approx(r/3.0) ML Decomposing a construction ML case n of exact i => i approx r => round(r); 215. Disjoint union in Pascal In the Pascal lingo variant records : A number type in Pascal type Accuracy Pascal = (exact, approx); Number = record case tag: Accuracy of exact: (ival: Integer); approx:(rval: Real) end Values of type Number:..., exact, 2, exact, 1, exact,0, exact,1, exact,2,...,..., approx, 1.0,..., approx,0.0,..., approx,1.0,... 37

52 3.4.5 Tags in concrete PLs 8 Frames: Tagging in choice types Missing tag in Pascal variant record Tagging in ML vs. C (& Pascal) Unsafety of C s union And the output is Tags in C Choice & enumerated types Quiz: why wouldn t this work in C Tagging in choice types Definition 3.19 (Tag). Tag is the mechanism for storing the selection made in a choice type along with the value associated with the choice. Tagging of pointers (T + Unit) is implicit in all PLs, however, compiler enforced safety is rare. Tagging in Pascal: C Responsibility lies with programmer: Definition: define (or not) a tag field Usage: use (or don t use) the tag field Safety: safe (or unsafe use) the tag field i.e., the programmer is free to decide whether a tag field is defined, and if it is defined, whether it is used or not, and if it is used, whether it is used in a safe manner or not Unsafety of C s union Original version Compiler forcing a definition of a tag field. Standard version Tag field is optional (syntax reminds you of its necessity). All versions Compiler does not enforce safety Missing tag in Pascal variant record Point type TYPE Rectangle type Point = Record Pascal X, Y: Integer; end; TYPE Pascal Rectangle = Record case Integer of 0: (Left, Top, Right, Bottom: Integer); 1: (TopLeft, BottomRight: Point); end; Left: Integer Top: Integer Right: Integer Bottom: Integer X: Integer Y: Integer X: Integer Y: Integer TopLeft: Point BottomRight: Point Figure 3.4.1: Pascal variant record providing two perspectives of the same memory cells Reckless programmer assumptions All cases in a variant record use the same memory address There is no alignment There is no padding Machine representation is in the order of definition 218. Tagging in ML vs. C (& Pascal) ML Tagging is built into the language. Tag is Implicit: Cannot access a Tag Field directly. Correct tagging Enforced: no way to store a value into one choice selection and read it from another. union C{ long int i; double d; unsigned char chars[sizeof(double)]; u; // All fields occupy the same storage #include <stdio.h> main() { int i; u.d = ; printf("u.i=%ld\n", u.i); printf("u.d=%f\n", u.d); for (i = 0; i < sizeof(double); i++) printf("u.chars[%d] = %3d = %c\n", i,u.chars[i],u.chars[i]); 220. And the output is Who knows? On my machine, I got u.i= u.d= u.chars[0] = 245 = u.chars[1] = 244 = u.chars[2] = 59 = ; u.chars[3] = 83 = S u.chars[4] = 251 = u.chars[5] = 33 = \ignore*! u.chars[6] = 9 = u.chars[7] = 64 This is what s called digging deeply into the machine representation! 221. Tags in C Adding a tag to our example C struct { enum {LONGINT, DOUBLE, UNSIGNEDCHAR tag; union { long int i; double d; unsigned char chars[sizeof(double)]; u; tagged_u; There is no way of defining the tag in the union itself. Must wrap the tag in a struct We can see that union in C, is the nasty union type constructor, not the disjoint union we are after. 38

53 222. Choice & enumerated types In ML, an enumerated type can be simulated as a choice between Units: Datatype in ML (full version) datatype ML suit = diamond of unit heart of unit spade of unit club of unit; Datatype in ML (short version) datatype ML suit = diamond heart spade club; 223. Quiz: why wouldn t this work in C++ typedef Cunion { struct { diamond; struct { heart; struct { spade; struct { clover; Suit; The word was divided into equal size parts: CAR and CDR. CAR/CDR was further divided into two parts 15 bits of pointer to something, which could be either an CONS word, or another ATOM. 3 bits, telling which pointer it was Types for micro-lisp in C Please, please, do not try to understand this in full now // A Cmore civilized way to name integer values: enum { // How many bits for index into pool: LG2_POOLSIZE = 14, // How many bits for storing car/cdr kind: KIND_SIZE = 2 ; // Will be used for atoms: char atoms[1 << LG2_POOLSIZE]; enum kind { NIL, STRING, INTEGER, CONS; struct Cons { enum kind carkind: KIND_SIZE; unsigned int car: LG2_POOLSIZE; enum kind cdrkind: KIND_SIZE; unsigned int cdr: LG2_POOLSIZE; // Pool of struct Cons nodes: pool[1 << LG2_POOLSIZE]; Micro-Lisp in C 15 Frames: Choice type in the implementation of Lisp Lisp s original implementation Types for micro-lisp in C Testing these type definitions The Cons pool Types for micro-lisp in C: Cons records Types for micro-lisp in C: string handles Representing pointers with choice type Operations on pointers Void safety Syntax of variant records in Pascal Unsafety of Pascal s variant record Inherent unsafety with choice types Safe decomposition of a variant record Choice types in some languages 224. Choice type in the implementation of Lisp Why do we need set union types? Example: C s representation of Lisp s CONS Records struct PointerToAtomOrCons; C// Forward declaration typedef Atom; // Some definition of the ATOM type struct Cons { PointerToAtomOrCons car; PointerToAtomOrCons cdr; ; struct PointerToAtomOrCons { Huh???? // This should be either : // 1. Pointer to a struct Cons, or // 2. Pointer to an Atom, // but not both! ; 225. Lisp s original implementation 227. Testing these type definitions #include <assert.h> C #include <stdio.h> main() { printf("a Cons record requires %ld bytes.\n", sizeof (struct Cons)); printf("our store requires %ld bytes.\n", sizeof pool); printf("it may contain up to %ld Cons entries.\n", sizeof pool / sizeof pool[0]); assert(1 << LG2_POOLSIZE == sizeof pool / sizeof pool[0]); assert(4 == sizeof (struct Cons)); 14b car 228. The Cons pool 14b cdr enum { LG2_POOLSIZE = 14 ; struct Cons { pool[1 C << LG2_POOLSIZE]; 14b cdr 14b cdr 14b car 14b cdr 14b car Tag CAR Part: 18 Bits Tag CDR Part: 18 Bits struct Cons struct Cons struct Cons struct Cons struct Cons struct Cons 3 bits 3 bits Machine Word: 36 Bits Pool of struct Cons Records 2 LG2_POOLSIZE = 2 14 = 16,384 records Figure 3.4.2: Layout of a CONS record as a single machine word A CONS was a 36 bits machine word. Figure 3.4.3: The Cons pool 39

54 229. Types for micro-lisp in C: Cons records C. enum kind { NIL, STRING, INTEGER, CONS; struct Cons { enum kind carkind: KIND_SIZE; unsigned int car: LG2_POOLSIZE; enum kind cdrkind: KIND_SIZE; unsigned int cdr: LG2_POOLSIZE; // Pool of ``struct Cons'' nodes: pool[1 << LG2_POOLSIZE]; CAR: half word (16b) tag variant part tag variant part 2b 14b 2b 14b CONS: machine word (32b) CDR: half word (16b) Figure 3.4.4: Layout of a Cons record using bit fields 14b s 2 14b s Types for micro-lisp in C: string handles enum { LG2_POOLSIZE = 14 ; const char *handles[1 C<< LG2_POOLSIZE]; 14b s 3 s 6 14b 14b 14b 14b s 4 s 5 s 1 Pool of handles: char * Pointers Hello World 2 LG2_POOLSIZE = 2 14 = 16,384 pointers War Lord War Lord Hello, World! Figure 3.4.5: String handles in micro-lisp 231. Representing pointers with choice type A pointer to type T, either points to a value of type T, or has a special value denoting that the pointer points nowhere Special value of pointers? C 0 Pascal Eiffel C++.. nil void null nullptr Pointers can be thought of as a choice between Unit type and T 232. Operations on pointers Definition 3.20 (Operations on pointers). Construction Create a null pointer Create a pointer to a stored value Tag testing Determine whether a pointer is null or not. Projection If the pointer is not null, extract value Void safety The property of a PL that a null/nil/void is never dereferenced. example g.factor(1.10); // May generate a runtime error Grade g = null; // Variable may or may not be initialized to // a reference to a real object Runtime check generating an error C++ pointers No runtime check, an O/S error may be generated C++ references Compile time guarantee C # nullable types Just like C # non-nullable types Just like C++ references Eiffel where Huge effort to ensure void safety 234. Syntax of variant records in Pascal Syntax record case I: T of v 1 : (l 1 : T 1 );. Pascal v n : (l n : T n ); end I is an identifier used for tagging T is its type (which must be discrete 22 ) v 1,...,v n are values of type T l 1,...,l n are field names T 1,...,T n are their types 235. Unsafety of Pascal s variant record VAR n: Number; Begin n.tag := exact; (* n.ival is still undefined *) (* n is now in an inconsistent state *) (* must not read n here *) Pascal n.ival := 7; (* n is now in a consistent state *) n.tag := approx; (* n s value is changed in one step from exact, 7 to approx, undefined *) (* n is now in a consistent state *) End 22 e.g., Char, Integer, Boolean, 40

55 236. Inherent unsafety with choice types With the absence of PL help, we may: Change the tag, without changing the value s A type type. error is just likemodern a languages tend more and more let type Unit Change the value s type, without changingcrime: the tag. some crimes go undetected, others are caught; be a first class type Read a value as if it belongs a certain type,while some are the misdemeanors tag (or even technical), others are serious. indicates otherwise. this can make the language definition simpler, These are a kind of type errors: Definition 3.21 (Type error). A program commits a type error if it makes an attempt to: manipulate v in a way which is inconsistent with T. e.g., no distinction between functions and procedures: Unit corresponds to the type unit of ML interpret machine representation of v in a way which is inconsistent with T. for some type T and a value v T Type Unit in contemporary languages Unit corresponds to the (incomplete) void type of C, denoted by its own keyword. C 237. Safe decomposition of a variant record Given a Number, return its rounded value Pascal function roundnum(n: Number): Integer; case Pascal n.tag of exact: roundnum := n.ival; approx: roundnum := round(n.rval) end; end; 238. Choice types in some languages union type constructor variant record ML datatype type constructor, e.g., datatype color = Red Blue Green; datatype number ML = i of int r of real; Eiffel datatype can be used for defining enumerated types as well. Missing! Missing! and Eiffel lack union as typical to OO languages the need for choice type constructors is diminished with inheritance Special types: Unit, Top & Bottom 10 Frames: Emulating Unit type in C Type Unit in contemporary languages Incomplete types Using void in C & other subtle points Why void is not a C (or ) type? Bottom: examples of use? Emulating None in C Types None & Any in Eiffel Eiffel inheritance clause Type Any in C Emulating Unit type in C Option I: singleton enum typedef enum {unit C++ Unit; Option II: empty struct typedef struct { C++ Unit; This type s (only) value is {; it can only be used for initialization Option III: zero sized array (only in C++) C++ typedef int Unit[0]; 241. Incomplete types Incomplete types are types which do not provide information on their values. Type struct Person in the variable declaration is incomplete struct Person C *person; An incomplete type can usually be completed, by specifying the missing information, but type void can never be completed! For historical reasons, sizeof(void) = 1 when used in arithmetics of a pointer of void * Using void in C & other subtle points void foo(int) Is not a function that does not return anything It is rather a function that can return in only one way, i.e., returns type Unit int bar(void) Takes no arguments Can be thought of as taking an argument of type unit int baz(int, ); function is declared to have a variable number of arguments extern boo() return type is implicitly int In K&R C, no declaration of arguments In C++, function takes no argument (an argument of type unit) 41

56 243. Why void is not a C (or ) type? Yes, we have void return void argument (but not in nor C++) But, No void variables No void arrays No void fields in structs. void * is not a pointer to a cell of type void. The language s specification makes every possible attempt to avoid calling void a type. In many ways, void is just a reserved word of C, which may look like a type Bottom: examples of use? Actual PLs meaningless as a variable s type C s function exit() return type should be None rather than Unit or void. More Examples? that s it! Only functions which never return are of type None Program analysis e.g., to prove that a variable may be uninitialized in a certain program location 245. Emulating None in C Challenge: define a type with no legal values Option I: empty enum typedef enum { // empty! none; C Option II: empty union typedef union { // empty! none; C The author of these slides is not so sure your C compiler will like these! 246. Types None & Any in Eiffel 247. Eiffel inheritance clause class ARRAYED_LIST [G] inherit ARRAY[G] export {NONE all All features are inaccessible {ANY capacity except for this fetaure end ARRAYED_LIST[G] inherits from ARRAY[G] while changing the export level of inherited features: All features are made private (exported to NONE) except capacity which is public (exported to ANY) Type Any in C++ C++ offers minimal support for type Any: Variable arguments int printf(const Cchar *format, ); Function printf may take any number of arguments of any type, as long as the first argument is of type const char *. The notation... here refers to a list of any length (including zero), or arguments of type Any. Catch all exceptions try C++ { // do something catch () { printf("unfamiliar exception\n"); The notation... here refers to an exception of type Any. Function printf will be invoked if the try block throws an exception of any type Mapping as functions and arrays 5 Frames: Mappings: single argument function Mappings: many arguments functions Programmatic functions vs. mathematical functions Mappings: arrays Mappings: multi-dimensional arrays & currying 249. Mappings: single argument function All types are classes Even INTEGER is a class All classes inherit from class ANY Class NONE inherits from all classes No class inherits from NONE Type is A Pascal Function Definition Pascal Function even(n: Integer): Boolean; Begin even := (n mod 2) = 0 end; Integer Boolean. 42

57 Type is 250. Mappings: many arguments functions A recursive Pascal function computing GCD TYPE Natural = 1..MAXINT; Function Pascal GCD(p, q: Natural): Natural; Begin If p mod q = 0 then GCD := q else GCD := GCD(q, p MOD q); end; Integer Integer Boolean Mappings: multi-dimensional arrays & currying Type is S1 S2 S3 T, or with currying, A multi-dimensional array type in Pascal Pascal TYPE M = array[s1, S2, S3] of T; S1 ( S2 (S3 T) ). By convention, the mapping operator is right associative: 251. Programmatic functions vs. mathematical functions Functions in most PLs are an algorithmic implementations of mathematical mappings (also called functions). However, Time Programmatic functions may take time to compute Runtime of Function GCD is O(log(p + q)) Space Programmatic functions are usually more memory efficient than the use of arrays for mapping. An array implementation of GCD would require Huge memory. Side effects Programming functions may have side effects We can add Writeln commands to Function GCD Partial Programmatic functions may not terminate Type is Are you absolutely sure that the Function GCD will terminate and not produce runtime error when presented with negative numbers? 252. Mappings: arrays S T. An array type definition in Pascal Pascal TYPE A = array[s] of T; In Pascal, the index type S, must be a discrete type: Boolean Char Integer or a subrange of these. S1 ( S2 (S3 T) ) = S1 S2 S3 T. (3.4.1) Power sets 1 Frames: Representation of power set values 254. Representation of power set values If #T = n, #( T ) = 2 #T. v T requires n bits (with the simple bit-mask representation) Set of Boolean 2 bits Set of character Ancient CDCs 60 bits, which make one machine word. Modern Architectures 256 bits (assuming ASCII) which make eight 32-bits words, with Unicode 100,000 bits, which make an array of 3,000 integers Set of integer Ancient CDCs 2 60 bits Modern Architectures 2 32 bits, which make 2 29 bytes, i.e., half a gigabyte Recursive types 7 Frames: Type of Prolog values in ML ML types for V Prolog : system of equations Simplified version Type of Lisp values in ML Lists: the classical recursive data type Recursive definitions of lists Recursive definition of trees 255. Type of Prolog values in ML Recall that in Prolog, all values are terms, where a term can be composite with an atom called functor, and children terms an atom (a string) a number which can real or integer a variable (a string) 43

58 datatype Term = COMPOUND of (Atom * Terms) ATOM of Atom NUMBER of Number ML VARIABLE of string and Terms = none many of {first: Term, rest: Terms and Number = INT of int REAL of real and Atom = string ; 256. ML types for V Prolog : system of equations datatype Term = COMPOUND of (Atom * Terms) ML ATOM of Atom NUMBER of Number VARIABLE of string and Terms = none many of {first: Term, rest: Terms and Number = INT of int REAL of real and Atom = string; Characteristics Unknown types (n = 4) Term, Terms, Number, and Atom Known types (m = 3) int, real, and string Simplified version datatype Atom = string; datatype Number = ML INT of int REAL of real; ML datatype Term = COMPOUND of (Atom * Terms) ATOM of Atom NUMBER of Number VARIABLE of string and Terms = none Characteristics many of {first: Term, rest: Terms; Unknown types (n = 2) Term and Terms Known types (m = 2) Atom, Number 258. Type of Lisp values in ML In Lisp, all values are S-expressions, where an S-expression may be, composite with CAR and CDR which are S-expressions; an atom (a string) ML datatype SExpression = CONS of {CAR: SExpression, CDR: SExpression ATOM of string NIL ; Characteristics Unknown types (n = 1) SExpression Known types (m = 1) string 259. Lists: the classical recursive data type datatype intlist = nil ML cons of int * intlist Characteristics Unknown types (n = 1) intlist Known types (m = 1) int Values of intlist: ML nil cons(11,nil) cons(2,cons(3,cons(5,cons(7,cons(11,nil))))) These can be obtained by substituting values obtained so far in the right-hand side of the definition, to get new values Recursive definitions of lists Lists are very useful. Each list is finite, but there is no global limit on the number of elements (so there are unboundedly many lists defined here) ML has a pre-defined list type constructor. are all valid ML types: int list bool list int ML list list These Operations include: test for emptiness, select head, select tail, concatenation, and length Recursive definition of trees datatype T = leaf of ML int branch of int*t*t Possible values: leaf(11) branch(7,leaf(5),leaf(11)) branch(7,leaf(5),branch(9,leaf(8),leaf(11))) Set theoretical representation: type T is the minimal solution of the equation: Set equation for unknown τ τ = int (int τ τ) (3.4.2) NIL 44

59 3.5 Atomic types Contents [33 frames] Taxonomy of atomic types [9 frames] Set of primitive types as PL i.d. [5 frames] Integral primitive types [3 frames] More on language design [2 frames] Real numbers [6 frames] The character primitive type [6 frames] Strings as atomic types [2 frames] References Exercises Taxonomy of atomic types 9 Frames: Reminder: the structure of a type system Atomic types: builtin vs. programmer-defined Primitive types Common primitive types Classification of primitive types More classification: unordered, ordered & ordinal types Ordinal primitive types Testing for properties of a type Primitive types in the Mock PL 262. Reminder: the structure of a type system The type system is a set of subsets of the values universe T L V L. which is recursively defined: Atomic types The building blocks; one cannot find any other type within an atomic type Type constructors Create compound types from existing types atomic- and compound- types. Compound types Types constructed from other types by employing type constructors; A compound type must include within it: a construction rule another type, atomic or compound e.g., (except empty list of arguments to the construction rule) 263. Atomic types: builtin vs. programmer-defined int in C++ integer in Pascal AKA (Also Known As) primitive types, basic types, builtin types rudimentary types mainely enumerated types Builtin Programmer defined found in Ada, Pascal, C, Pascal example: Pascal July, August, September, TYPE Month = (January, February, March, April, May, Jun, October, November, December) 264. Primitive types The term primitive types : is very commonly used may carry (slightly) different meanings to different people may be considered derogratory Henceforth, we shall use the common term primitive types for builtin atomic types: Definition 3.22 (Primitive type). A type which is both atomic and builtin, i.e., is not programmer defined, is called a primitive type. Note: A PL could have builtin types which are not atomic, e.g., string in some PLs 265. Common primitive types Some standard (more or less) primitive types: Character a character of some alphabet String a sequence of characters Boolean truth value Integer an approximation of Z Natural an approximation of N Real an approximation of R Complex an approximation of C Fixed point A non-integral number with some fixed (decimal) accuracy, e.g., 2.75 Less standard atomic types Unit (also compound type: product of zero elements) None (also compound type: disjoint union of zero elements) Any (can be thought of as disjoint union of all possible types) 266. Classification of primitive types In general, classification: is typically PL dependent helps the PL designer and PL user communicate remember PL rules rationalize PL rules Numeric Non-numeric Semi-numeric Arithmetical operators: *, +, int, real, complex No arithmetical operators string, unit, none Some arithmetical operators date, time, pointer 23 Table 3.5.1: Classification of primitive types Pointer, date, and time types are considered semi-numerical since they support (i) adding an integer (ii) subtracting an integer (iii) subtraction to find the difference between two dates, times, pointers 45

60 267. More classification: unordered, ordered & ordinal types 270. Primitive types in the Mock PL Ordered Unordered Numeric Atomic Other Comparison operators: <, <=, >, >= boolean, integer, real, date, time, character, string, Ordinal Nonordinal Can be mapped Cannot be to a subrange mapped to a of N subrange of N boolean, int, character pointer, string, date, time No comparison operators complex, point 24 Complex complex64 complex256 Ordered complex128 Nonordinal Float Fixed float16 float80 fixed10d2 fixed20d2 float64 Signed int8 big int16 int32 String Boolean Character message text Ordinal bool ASCII Unicode Integeral NonIntegral Natural int64 uint8 uint64 uint16 uint32 date Misc time Table 3.5.2: More classification: unordered, ordered & ordinal types Ordinal types: (i) can serve as array indices; (ii) support succ and pred functions; (iii) support ++ and -- operators; 268. Ordinal primitive types successor operator predecessor operator aka discrete types Examples: ASCII-Character, Integer, Natural, Non-Examples: String, Unicode-Character, Real, Some languages offer more for ordinal types, e.g., for loops in Pascal switch in C and caseof in Pascal array indices in Pascal are exclusive to ordinal types Testing for properties of a type Trick: write a short program; does it compile? File BooleanIsOrdered.java class BooleanIsOrdered { boolean foo() { return true <= false; // The operator <= is undefined for the argument type(s) boolean, boolean Figure 3.5.1: Primitive types in the Mock PL Set of primitive types as PL i.d. 5 Frames: The Mock PL Primitive types design: the FEM metaphor Design of primitive types Force I Programmer: match intended use of L Case study: boolean & the intended programmer 271. The Mock PL has a rich and expressive set of primitive types organizes these in a meaningful taxonomy Is ridiculous indeed! But why? How should a language L select its set of primitive types? Programmer Match intended use of L Hardware Two conflicting requirements: Efficiency allow L s programs to use the hardware efficiently Portability make L s programs portable Language Design L to be coherent and easy to understand Ingredients: F Flour 272. Primitive types design: the FEM metaphor File CharIsNumerical.java class CharIsNumerical { int foo() { return ':'-')'; // compiles just fine! Traps: Misunderstanding compiler error messages Idiosyncratic, non-standard obeying, compiler Special cases in the language definition E Eggs M Milk Some combinations work great: Crêpe Blintzes Pancake Yummy!!! Others combinations are Yuck! 46

61 273. Design of primitive types Tough questions How will programmers actually use L? Which architectures will die? Which architectures will be born? Too many types? Language could be cumbersome Too few types? Programming may be awkward The combination used in L are telling of L s C Signature of PL design The set of primitive types of L relects the design principles and objectives of L, and even L s spirit. study these carefully 274. Force I Programmer: match intended use of L L s set of primitive types is telling its intended use Operating Sytems/Systems Programming: Try to match hardware word size Fortran Scientific computation: Choice of precision of real and complex numbers Cobol Data processing: fixed length strings; fixed point numbers Snobol String processing: Variable length strings. Excel Spreadsheet: Text, Number, Date, Time, DATE- TIME, Currency, Logical (and no type constructors) Mock???? ASCII big bool complex128 complex256 complex64 date fixed10d2 fixed20d2 float16 float64 float80 int16 int32 int64 int8 message text time uint16 uint32 uint64 uint8 Unicode Background 275. Case study: boolean & the intended programmer Hardware does not 25 have a Boolean type Programs rarely use data of type Boolean Approach Smörgåsbord type Boolean is used mainly for conditional commands Pascal educational purposes; good typed programming style; type Boolean is a must C no Boolean type; any atomic type is a Boolean in disguise C++ Type bool introduced by community demand; attempt to deprecate automatic coercion of other types to bool as in Pascal JVM (the Virtual Machine): as in C 25 booleans are realized by hardware as CPU flags which change after many operations and by instructions such as jz and jge Integral primitive types 3 Frames: Force II Hardware: efficiency vs. portability Case study: integer in various PLs Summary: Integer in various languages 276. Force II Hardware: efficiency vs. portability Efficiency Make efficient use of the underlying hardware Match primitive types with those of the hardware Grant access to all hardware primitive types Intel is going to hate you unless you make efficient use of their hardware Portability different architecture employ different types. Issues: Varying word size: 36, 60, in ancient architectures 16, 32, 64, 128, in more modern one Varying representation methods: Pascal BCPL C/C++ two s complement one s complement 277. Case study: integer in various PLs pre-defined Integer Very portable for toy programs In serious work, machine dependent value maxint gets in the way In one word, unportable Mother of all { PLs Acronym: Basic Combined Programming Language Bacronym: Before C Programming Language Only one primitive type, word A word is an integer, isn t it? Matches whatever a machine word is Many integral types: char short int long long long (on some dialects) Two Varieties: signed unsigned Hardware mapping must obey: #(char) #(short) #(int) #(long) #(char) = #(signed char) = #(unsigned char) 47

62 JVM Go #(short) = #(unsigned short) 26 Type int is the most natural to hardware supposedly, C makes the right portability/efficiency tradeoff Many many types Complex language rules Expression involving mixed types Converting each of the 8 types to another Bugs & confusion Similar to C/C++ but (tries to be) better Types: byte (8-bit) short (16-bit) int (32-bit) long (64-bit) No unsigned variant All types are two s complement Fixed mapping to hardware Fixed hardware : the Virtual Machine (JVM) Many conversions, e.g., long to int Simplified conversion rules Types: int, long, float, double No bool; minimal support for byte and short Mapped to actual hardware by the JVM Program Many non-mixable types, implemented as a library int8, int16, int32, int64 uint8, uint16, uint32, uint64 includes also type aliases byte is uint8 rune is int32 int either int32, or int64 uint again, Newspeak/Mock uint32, or uint64, but, same size as int Unbounded integers Integer/big maps to machine word in case of overflow, switches to long word in case of further overflow switches to an array of long words in case of even further overflow double the array size 26 int is a synonym for signed int, short is a synonym for signed short, but char is not a synonym for signed char, nor for unsigned char 278. Summary: Integer in various languages Each language takes its own special perspective of the underlying hardware Pascal We know nothing of the hardware; we do not care about the hardware; so let s assume it has a good enough integer BCPL Worship the unknown, myserious and almighty Newspeak; programmer must cope with machine words. C/C++ Squeeze the maximum out of hardware, whatever it is Go Let s invent our own hardware! Yes, both C and C++ have failed; but we will do it better with pre-defined types. Newspeak Hardware is just an implementation detail; we want Z, and Z we shall have! More on language design 2 Frames: Force III Language: complexity and stability Simple conversion rules of 279. Force III Language: complexity and stability Design is generally easy; Good design is difficult PL specification complexity PL implementation complexity PL stability Example: Primitive Types of C 1978 K&R language release; 1988 ANSI C first standard; new type long double that few people heard of at the time 1999 C 99: newly born, yet severely brain damaged bool type cannot add bool reserved identifier (defined in an include file as _Bool ) cannot add complex reserved identifier (_Complex instead) cannot add I reserved identifier (defined in an include file) new type long long int 280. Simple conversion rules of 19 widening operations 22 narrowing operations byte short long double int float char Figure 3.5.2: widening casts 48

63 Preserve magnitude 283. Standards for floating point numbers May loose accuracy float double int char When you do need real numbers, you cannot live without them. Standards (of IEEE 27 ) for representing reals as floating points come to the rescue. long byte short Figure 3.5.3: narrowing casts magnitude and accuracy may be lost short and char can be narrowed to each other. 7 pseudo-numerical primitive types ( 7 6) = 42 possible conversions defines 41 = = 98% out of these. In C, = 17 numerical primitive types; 272 possible conversions Real numbers 6 Frames: Who needs real numbers Fixed point & infinite precision Standards for floating point numbers Mantissa, exponent, sign A taste of floating point representation Issues in floating point representation 281. Who needs real numbers Real Programmers don t Use Reals?! Some-times No, or very few realsi, e.g., in O/S programming Other-times You must have real numbers Reals come in three varieties 1. Infinite precision. real real numbers, such as π and e, 2, etc., used for symbolic computation 2. Fixed point. for representing currency, weights, distances, and the such 3. Floating point. for most scientific applications 282. Fixed point & infinite precision Fixed point: Minimal support by modern hardware No support by most modern PLs Usually implemented, if necessary, in a library IEEE 754 interchange formats Base Width Binary Decimal 16 binary16 a 32 binary32 b decimal32 64 binary64 c decimal binary128 d decimal128 a 3.3 signifcant decimal digits; not too useful b typical implementation of C s float c typical implementation of C s double d typical implementation of C s long double Table 3.5.3: IEEE standards for representing numbers in floating point format Other standards (all use binary format; most are exinct) Architecture Width Name IBM n/a IBM n/a IBM/ HFP32 a IBM/ HFP64 IBM/ HFP128 X86 / X b x86 EPF c a Hexadecimal Floating Point b non-extinct (yet) c x86 Extended Precision Format; an instance of one of the non-interchange IEEE 754 formats Table 3.5.4: Some other, mostly obsolete, standards for floating points 284. Mantissa, exponent, sign Components of real number π e = (3.5.1) Infinite precision: No support by modern hardware No support by most modern PLs Newspeak support infinite precision for integers and reals, including support for transcendental numbers. Can be implemented, if necessary, in a library Sign: (3.5.2) 27 Institute of Electrical and Electronics Engineers 49

64 Exponent: (3.5.3) Multiple infinities: + (3.5.9) Mantissa: (3.5.4) Special NaN is not a number, value, e.g., 1 NaN (3.5.10) Normalized mantissa m: (3.5.5) The character primitive type First digit is never zero: 0.1 < m 1 (3.5.6) In binary base, first digit is always < m 1 (3.5.7) 285. A taste of floating point representation The IEEE 754 / binary32 format for floating point representation: MSB Sign = { 0 positive 1 negative Exponent (8 bits) Four bytes word (32 bits) Mantissa (23 bits) Figure 3.5.4: Floating point format; IEEE 754 / binary32 0 LSB 6 Frames: Case study: how many characters are there? Evolution of character sets Is Unicode the answer? Challenges in PL design Impact of character encoding on PLs I Impact of character encoding on PLs II 287. Case study: how many characters are there? The ever changing answer 2 5 = 32 punched tapes, telegraph 60 early Pascal 2 6 = 64 BCD 2 7 = 128 ASCII 2 8 = 256 EBCDIC, Extended ASCII, ISO Western Europe, IBM s Code Page 437 (all codings are distinct) 2 16 = 65,536 Unicode 1.0; 1992; extension of ASCII; and other PLs of that time ,114,112 < 2 20 Unicode 2.0; 1996; of which about 110,000 characters are used; 100 scripts; zillion of symbols. Questions answered by standard 288. Evolution of character sets bits allocated for each component? reprentation of signed exponent? (Two s complement or +N, e.g., +128) eliminate first bit of mantissa? Issues in floating point representation Standard are complex; essential details include: is it always normalized? allowing subnormal numbers (i.e., numbers where the mantisssa is not normalized)? Signed zeroes: 0 +0 (3.5.8) Baudot Code by Émile Baudot, 1874; Telegraph Alphabet No. 2 (ITA2) 1930; 2 5 = 32 characters; not even sufficient for letters and digits. CDC Display Code 1960; used in CDC computers on which Pascal was developed; no 6 bits per character; 2 6 = 64 or = 63 distinct characters; no upper/lower case distinction BCD (IBM s Binary Coded Decimal) 1960; 6 bits per character; 2 6 = 64 distinct characters. ASCII (American Standard Code for Information Interchange) 1960; 2 7 = 128 characters: 95 printable + 33 controls. (used until today!) EBCIDIC (IBM s Extended Binary Coded Decimal Interchange Code) 1965; 8 bits per character; but not all 2 8 = 256 code points were used. Unicode Unicode

65 289. Is Unicode the answer? Mostly, Yes! but, there are still pockets of resistance Windows/IBM Variations; still in common use; all use 8-bits Windows-1250 European Languages; similar to ISO Windows-1252 Similar to ISO , but not identical Windows-1255 Hebrew; almost compatible superset of ISO Windows-1256 Arabic; is not compatible with ISO Mac OS Character Encodings Code page Impact of character encoding on PLs I No upper/lower case distinction in Pascal (CDC) Pascal (CDC) supports set of characters (words of 60 bits are bit masks for all 60 characters) Some versions of Pascal (CDC) have string type which is ten characters long (60-bit word= 10 6 characters) Why do C/C++ treat primitive type char as a small integer? About half of Unicode is not readily supported by Writing non-ascii characters is not trivial in C/C++ Code page Code page Impact of character encoding on PLs II Code page Code page Code page Code page KanjiTalk Mac Icelandic encoding Mac OS Roman MacArabic encoding MacGreek encoding Macintosh Central European encoding Macintosh Cyrillic encoding Macintosh Ukrainian encoding π and α are legal identifier names in C/C++ do not say whether char is signed char, or unsigned char Since some codes (e.g., European ISO-646) use certain character positions (e.g., [, { ) for letters, C++ allows trigraph sequences, e.g.,??( for [,??= for # Programming in for Windows or MacOS can be challenging Huh? 290. Challenges in PL design Code page 10079? Unicode 1.0? EBCDIC? Windows- 1252? ASCII? Unicode 2.0? ISO ? BCD? CDC Display Code? Why should I care? Are all these historical details, acronyms & standards really interesting? Some standards are much more important than others. ASCII Windows-1250 Unicode 1.0 Unicode 2.0 Tough decisions Portability a character literal may not be supported by all architectures Elegance support Unicode 2.0 with its weird number of bytes Efficiency most characters in common use can be represented using a single byte Strings as atomic types 2 Frames: Strings Strings in various PLs 293. Strings Strings are A sequence of characters. Useful data type. Supported in one way or another by all modern PLs. Issues: Atomic (as in Icon) or composite (as in Pascal) type? Fixed length (as in Cobol) or variable length (as in C)? What string operations are supported? String literals? Delimiters or quotes? Are characters strings of length one? 51

66 294. Strings in various PLs ML atomic type of any length. Operations: equality test, concatenation, decomposition are builtin. Pascal Ada C Bash C++ an array of characters. Most trivial string operations require non-trivial programming as in Pascal as in Pascal, but with string literals and other scripting languags: full blown type. standard library implementation. References standard library implementation. 1 s complement 2 s complement Ada arbitrary precision ASCII Bash Baudot code BCD BCPL blintzes byte CDS Display Code character encoding Cobol codepage 437 CPU architectures crêpe deprecation EBCDIC enumerated type Excel extended ASCII extended precision fixed point Fortran Icon IEEE floating point IEEE ISO ISO JVM pancake portability primitive data type Smörgåsbord Snobol string in C string literals taxonomy Newspeak string in C++ trigraphs in C++ type string Unicode Windows-1250 Windows-1252 word Exercises 1. Why should we expect languages that support Unicode letters in identifier names to distinguish between lower and upper case? 2. How does C++ support Unicode? Which version? How? 3. Modern language allow Unicode letters in identifier names. Why? 4. Why is the support of floating point numbers in Mock brain-damaged? 5. Which is the missing conversion in? Why is it missing? 6. What are far pointers? 7. Determine whether type char in is numerical? 8. Writing non-ascii characters is not trivial in C/C++ 9. What are the literals of enumerated types? 10. What are the atomic types of AWK? Pascal? 11. How come there are no conversions from- or to- atomic type boolean in? 12. Why does Pascal fail to distinguish between lower and upper case letters? 52

67 13. Are all numerical types ordinal? If yes, explain why. If no, provide a counter example. 14. How come C++ does not say whether char is signed char, or unsigned char? 15. Is the type string in C++ a primitive type? 16. Are any digraphs used in? Explain the process by which you have reached your conclusion. 17. Determine whether type bool in ordered or not. 18. Does C include trigraphs? Which? 19. Are all ordinal types numerical? If yes, explain why. If no, provide a counter example. 20. Anient versions of Pascal had string type (called alpha) which was ten characters long. Why this particular magic number? 21. Languages that rely on character encoding other than ASCII and Unicode are rare. Why? 22. About half of Unicode is not readily supported by Huh? How come? And why is this not such a big deal? 23. Enumeration can be viewed as a combination of Unit, branding and disjoint union Explain, and demonstrate with ML. Why wouldn t this work in C/C++? 24. Can you explain why ancient Pascal used a sequence of two characters to delimit comments? 25. Why should programming in for Windows/MacOS be challenging? 26. Why does Pascal fail to distinguish between lower and upper case letters? 27. Why is that PLs rarely offer literals for compound types? 28. Given is a type T. If no other types can be found within T. Does this mean that T is atomic? 29. BCPL included an equivalent of trigraphs. Why? 30. Why is the support of fixed point numbers in Mock brain-damaged? 31. Give an example of a programming langguage and a compound type in it, such that the programming language offers literals for that compound type. Hint: you do not have to search far; what you look for is very familiar. 32. Are all ordered types numerical? If yes, explain why. If no, provide a counter example. 33. What is the large programming model in ancient C? 34. Determine whether type bool in is ordered. 35. Given is a primitive type T. Does this necessarily mean that other types can/cannot be found within T? How would your answer change if it is known that type T is atomic? 36. Why was it so easy to support character sets in ancient Pascal? How was set union and intersection implemented? 37. Write a program to test if Go uses type aliases or type branding in its management of machine types. 38. What s the difference between a trigraph and a digraph 39. Why do C/C++ treat primitive type char as a small integer? 40. Enumeration can be viewed as a combination of Unit, branding and disjoint union Explain why wouldn t this work in C/C++? 41. Suppose that a certain hardware reprsents both an integer and a real number using precisely 42 bits. Explain why is it that precision must be lost in a conversion of an integer to real. 3.6 Representation of types in memory 6 Frames: Purpose of type? PLs policy for type representation Implicit contract of type representation policy Exception I: non-primitive atomics Exception II: pointer values Exception III: weird atomicity 295. Purpose of type? We have values; why do we need types? Taxonomy of values; describe data effectively Legality determine set of legal operations on values (prevent nonsensical operations, e.g., multiply a pointer by a set) Semantics determine semantics of operations on values But, also, defines program-machine interface: Program Machine how to reprsent values on different machines Machine Program how to reprsent values on different machines 296. PLs policy for type representation The selection of atomic types also determines a policy of mapping these types into memory. Policy for representation of types and values A PL L sets a policy P L for the representation of V L on machines M 1,M 2,..., P L : V L {M 1,M 2,... : (3.6.1) most often, employs the recursive nature of the type system How to represent atomic types? How to represent type constructors? 53

68 Policy P L 297. Implicit contract of type representation policy P L : V L {M 1,M 2,..., sets a contract between two parties: select map- Implementor of L Given a machine M i ping V L M i which is Most efficient Compliant with P L Programmer using L Write a program which is Most efficient Does not depend on the specifics of V L M i 298. Exception I: non-primitive atomics The main non-primitive atomic type is enumeration Usually mapped to integer values PLs often stay silent regarding representation, enums are ordinal values with their own unique operations even allows different operations for different enumerands in the same enumeration 299. Exception II: pointer values Representation of pointer values Most PL map pointers to hardware address types Issues: Sometime the hardware supports several address types, e.g., on X86 architectures, with near and far pointers. Many modern PLs (include and ML) have smart pointers, which cannot be mapped directly to hardware 300. Exception III: weird atomicity In most cases: values of atomic types are atomic values atomic values belong to atomic types Anomaly I Compound types whose values are atomic. Pointers Anomaly II Atomic types whose values are compound. Values of atomic type string where it exists, are nonatomic Hardware mapping policy needs to deal with these two. 54

69 4 Advanced typing Contents [152 frames] 4.1 Classification of type systems [41 frames] Existence & sophistication level [7 frames] Orthogonality [8 frames] Strong vs. weak typing [4 frames] Statics vs. dynamic typing [9 frames] Other kinds of typing [6 frames] Type information responsibility [5 frames] 60 References Exercises Structural vs. nominal typing [13 frames] Theoretical polymorphism [64 frames] Motivation [12 frames] Overloading [18 frames] Coercion [7 frames] Universal polymorphism [5 frames] Polytypes [11 frames] Inclusion polymorphism [10 frames] summary [1 frame] References Exercises Polymorphism in practice [34 frames] Overloading in Pascal, C/C++, and [7 frames] Coercion and the C++ overloading tournament [6 frames] Polymorphic functions [7 frames] Type inference [3 frames] Checking parameters with parametric polymorphism [6 frames] Case studies [5 frames] Classification of type systems Contents [41 frames] Existence & sophistication level [7 frames] Orthogonality [8 frames] Strong vs. weak typing [4 frames] Statics vs. dynamic typing [9 frames] Other kinds of typing [6 frames] Type information responsibility [5 frames] References Exercises Criteria for the classification of type systems Existence Does the language include a type system at all? Sophistication level Assuming a PL has a type sytem, how rich is it? Orhogonality Discriminatory vs. non-discriminatory Strength How strictly are the typing rules enforced? Time of Enforcement What stage is type checking performed? Static vs. dynamic typing Responsibility Is the programmer responsible for type declarations or is it the compiler? Explicit vs. implicit typing Equivalence When can one type replace another? Flexibility To what extent does the type system restrict the user s expressiveness? Polymorphic typing? 302. In this section We will discuss criteria 1. Existence 2. Sophistication level 3. Orhogonality 4. Strength 5. Time of enforcement, 6. Responsibility However, 7. Equivalence 8. Flexibility deserve their own (fat) sections Existence & sophistication level 7 Frames: Existence of a type system? Example I: Lisp is an untyped language Example II: Mathematica is untyped as well Degenerate type systems Degenerate type systems in scripting languages Sophistication level of the type system Beyond types & non-standard types A PL can be 303. Existence of a type system? Typed vs. untyped Typed The set of values can be broken into sets, with more or less uniform behavior under the same operation of the values in each set. C, Pascal, ML, Ada,, and most other PLs Untyped Each value has its own unique set of permissible operations, and their semantics are particular to the value. Lisp, Prolog, Mathematica, 304. Example I: Lisp is an untyped language In Lisp All values are S-expressions An S-expressions is an unlabeled binary trees, whose leaves are atoms Basic operations: Car extracting left subtree Cdr extracting right subtree Cons construct a tree from two subtrees Null determine whether a tree is the atom NIL.. Short-hand notation: cons(1, cons(2, cons(3, nil))) is shorthanded as (1 2 3). Summary Legality of operations is determined by tree structure and values at the leaves 55

70 305. Example II: Mathematica is untyped as well Mathematica is a language for symbolic mathematics: Values are symbolic mathematical expressions Expressions are trees All expressions have a head and body head is a symbol body is a list of expressions most details are not so interesting The crucial point Manipulation of an Expression: still requires check of legality determine of semantics These are determined by tree structure of the expression values residing in internal nodes & leaves Degenerate type systems Characteristics: Very few (typically one or two) primitive types Very few type constructors Examples: BCPL (C s ancestor) the only data type is a machine word DOS batch language the only data type is a string, which sometimes look like a file name Ancient Fortran Several full blown primitive types for scientific computation Single type constructor: array 307. Degenerate type systems in scripting languages 308. Sophistication level of the type system 1. no typing 2. degenerate typing 3. non-recursive type systems as in Fortran 4. recursive type systems as in Pascal 5. functions as first class values as in ML 6. highly advanced type constructors monads of Haskell 309. Beyond types & non-standard types Modern languages use the typing system to indicate more than just the type returned by a function. the type of a method includes a list of the types of the exceptions that it can throw. Method M1 declares that it might throw exceptions of type E, and method M2 includes a call to M1, then either: The call is inside a try catch(e) block, or M2 must also declare that it may throw E. The catch-or-declare principle. C++ a const method cannot invoke a non-const method. Haskell the type system indicates which functions perform I/O operations. Shared theme: statically detect and prevent potential bugs Orthogonality 8 Frames: Orthogonality, discrimination & being second-class Discrimination vs. second-class types More on discrimination vs. second-class types Understanding orthogonality of a PL Orthogonality in Pascal First & second class values in Pascal Apparent contradiction? Value manipulation 310. Orthogonality, discrimination & being second-class AWK strings two data types numbers possible interpretation of some strings only one type constructor : associative array No easy way to create arrays of arrays most scripting PLs including Bash, and C-Shell are similar to AWK Definition 4.1 (Discriminatory type system). A type system is discriminatory if one of its type constructors is discriminatory, i.e., it is applicable to some types, but not to others. int a = 3; int& reference = a; // int&& C++ illegal = reference; // In C++, the reference to type constructor is (self) discriminatory. There are types of references to almost everything. But, there are no references to references. 56

71 311. Discrimination vs. second-class types Once upon a time, in the far far away kingdom called calligraphic capital T 28, lived a type named little τ, and as we say in math, little τ T. Little τ was looking for an employment by one of C 1,...,C n T the type constructors of kingdom calligraphic capital T. So, little τ went to type constructor capital C 2, but capital C 2 said No So, little τ went to type constructor capital C 7, but capital C 7 also said No and capital C 3 also said No and capital C 5 said No but, then capital C 1 said Yes so, what did little τ do? 312. More on discrimination vs. second-class types Little τ was close to despair. She thought of capital C 1, and then of capital C 2, and, and, all the other C i who would not employ her, and then she realized that she was not a first class type, which means, errghh that little τ must be second class type Definition 4.2 (Second class types). if a type τ is being discriminated against by most (or even just very many) type constructors, then we tend to say that τ is a second class type Understanding orthogonality of a PL Saying that τ is a second class type is cruel and unjust. It lets us refrain from accusing so many type constructors as being evil and discriminatory. But placing a blame on a second class types allows more compact expression and understanding of the language orthogonality 314. Orthogonality in Pascal Non-discriminatory type constructor arrays of anything, you can create Discriminatory type constructor you can create sets of Booleans and of Char, but not of Integer or of set of Boolean. Second class type there are no arrays of functions, no records of functions, no sets of functions, no pointers to functions, etc., so we must conclude that functions are second class types. 28 when we say kingdom we really mean type system 315. First & second class values in Pascal First-class values Only simple, atomic values: truth values, characters, enumerands, integers, reals, and also pointers. Lower-class values can be passed as arguments, but cannot be stored, or returned, or used as components in other values composite values (records, arrays, sets and files): cannot be returned! procedure and function abstractions references to variables (unless disguised as pointers) 316. Apparent contradiction? Above we said, that in Pascal: You can create arrays of anything! You cannot create arrays of functions! Resolution: We like to think of the array constuctor as being nondiscriminatory We like to think of functions as second class We like to allow non-discriminatory constructors to discriminate against second class As we shall see, the fault lies with functions, not with arrays Value manipulation Operations on values Passing them to procedures as arguments Returning them through an argument of a procedure Returning them as the result of a function Assigning them into a variable Using them to create a composite value Creating/computing them by evaluating an expression A value for which all these operations are allowed is called a first-class value We are used to integer or character values, but function values are also possible! Strong vs. weak typing 4 Frames: Strong vs. weak typing Type punning Spectrum of strength Type punning in C # 57

72 318. Strong vs. weak typing Two kinds of PLs Strongly typed 29 e.g., ML, Eiffel, Modula, it is impossible to break the association of a value with a type from within the framework of the language. it is impossible to subject a value to an operation which is not acceptable for its type. Weakly typed e.g., assembly, C, C++, some variants of Pascal values have associated types, but it is possible for the programmer to break or ignore this association. type punning in truth there is a spectrum of strength 319. Type punning value is represented as a sequence of bits, hide this fact. yet, types Definition 4.3 (Type punning). type punning: revealing the mascaraed of bit sequences as values Type punning has the power to Peep into the bit sequence implementation of a type Mutilate a value, by subjecting it to operations not allowed for its type Type casting C long i, j; int *p = &i, *q = &j; long L1 = ((long)p); long L2 = ((long)q); long L = L1^L2; Union type C union { double foo; long bar; baz; baz.foo = 3.7; printf("%ld\n", baz.bar); 320. Spectrum of strength Some languages are more safe than others Pascal is more strongly typed than C; can still break type rules with: Variant records Through files 30 is more strong typed than C # : s JVM guarantees (dynamically) strong typing in C #, there are several ways of type punning 30 See below, discussion of structural typing 321. Type punning in C # In C #, type punning must be annotated with the unsafe keyword class Foo { public static void Main() { C # unsafe { // could also annotate class Foo // or function Main int i = 14; int *p = &i; Console.WriteLine("I is " + i); Console.WriteLine("I is also " + p->tostring() ); Console.WriteLine("It address in memory is " + (int)p); // unsafe block // function Main // class Foo Statics vs. dynamic typing 9 Frames: Type checking Time of enforcement Dynamic typing & type tags Cons of dynamic typing Characteristics of static typing Why static typing? Escaping the evil Dr. Rice Benefits of static typing Limits of static typing 322. Type checking Definition 4.4 (Type checking). Language implementation applies type checking to ensure that no nonsensical operations occur. Examples: Multiplication check that both operands are numeric. Logical and check that both operands are of Boolean type. Field access check that the operand is a Record containing the given field name. Tuple access check that the operand is a Tuple (array value) and that the index is valid Time of enforcement Type checking must be precede the operation could be done either at compile-time or at run-time: Statically typed PLs type rules are enforced at compile time. every variable and every formal parameter have an associated type. C, Pascal, Eiffel, ML, Dynamically typed PLs type rules are enforced at run-time. variables, and more generally expressions, have no associated type. only values have fixed types. Smalltalk, Prolog, Snobol, APL, AWK, Rexx, 58

73 324. Dynamic typing & type tags Identifiers have no type associated with them. Types are associated with the values generated in runtime Each value carries a type tag, identifying its type. Pros Flexibility Arrays don t have to be of a homogeneous type Run partial programs An identifier needs to be typecorrect only if accessed Quick turnaround Faster development time 325. Cons of dynamic typing Conversely, pros of static typing Space overhead Each value is tagged with type information Time overhead Tag must be examined at runtime Unsafety Many type errors could have been detected by static compile-time checks Obfuscation Entities annotated with type information are easier to understand. yet, it seems as if the world is moving toward dynamically typed languages 326. Characteristics of static typing Type annotation for each variable, parameter, function and procedure. Pre-Declaration usually means that all identifiers should be declared before used. Invariant of Values no value will ever be subject to operations it does not recognize. Invariant of Variables a variable may contain only values of its associated type. Invariant of Operations no operation, including user defined functions and procedures, will ever be applied to values of a type they do not expect 327. Why static typing? Theorem (H. G. Rice) Let f be any feature or property of the execution of computer programs Suppose that f is not-trivial, i.e., that f holds for at least one program p t, and does not hold for at least some other program p f. Then, there is no general algorithm that, given a program p, decides whether p exhibits feature f or not. Examples: Cannot (systematically) decide if the program stops. Cannot (systematically) decide if the program is correct. Cannot (systematically) decide almost any other interesting run time property of a program Escaping the evil Dr. Rice We still need every little help in fighting the horrors of software development Types manage to escape this evil theorem; Several other automatic aids are: Garbage collection automatic memory management (run time) Const correctness no modification of const parameters (compile time) Design by contract assertions, invariants, preconditions and post-conditions: partial specification of a function (run time) Void safety to prevent null-pointer access Other makes every effort to ensure that an initialized variable is never used; compiler warnings, find bug heuristics, 329. Benefits of static typing Prevent run time crashes: Mismatch in # of parameters Mismatch in types of parameters Sending an object an inappropriate message Early error detection (supposedly) reduces development time Enforce design decisions: Cost Effort More efficient and more compact object code Values do not carry along the type tag No need to conduct checks before each operation 330. Limits of static typing Some senseless operations cannot be statically checked. Generic examples: Division by zero no typing system for the integers can prevent this without hitting the halting problem; checks are realized in run time by using machine-level exceptions. These checks carry no overhead. Void safety Access to null pointers. Other math issues square root of a negative value Array references it is impossible to statically detect underflow or overflow of array indices Other kinds of typing 6 Frames: Postmortem typing Mixed typing Mixed typing in Gradual typing Duck typing Notion of type with duck typing 59

74 331. Postmortem typing Postmortem typing is just a different name for weak typing. Definition 4.5 (Postmortem typing). Suppose that the execution of a type error is not detected by the PL s runtime. Then, the program is allowed to merrily carry along with this error. In (hopefully) most cases, the program find its bitter end, shortly after. In other cases, the program will die much later or just behave strangely. The posthumous analysis of errors is called postmortem typing. The programmer is then responsible to remove the type errors from his program Example: array overflow in C/C Mixed typing Definition 4.6 (Mixed typing). We say that a PL has mixed typing if the PL exercises both static typing and dynamic typing (usally for different purposes). The Pascal case Most checks are compile time Array bounds are checked at runtime 31 Some type errors are not checked Mixed typing in Arguments to functions Function return type The PL will cooperate behind the scenes: Mark obvious type violations Mark contradicting annotations Reduce runtime overhead 335. Duck typing Duck typing is a variant of dynamic typing Given an operation op and a value v Dynamic typing at run time 1. Determine type T for which op is defined. 2. Determine type T the type of v 3. If T T, execute op on v Duck typing at run time 1. Determine O(v), the set of operations recognized by v, by either determining T, the type of v reading the list O(v) as attached to v 2. If op O(v), execute op on v 336. Notion of type with duck typing Type of Values duck typing allows each value to have its own type. Error Compile time Load time V = E f (E) null pointer access array overflow Runtime Runtime type e.g., each function each parameter each invocation defines a set of operations that are being applied by Table 4.1.1: Mixed typing in 334. Gradual typing Some modern additions allow gradual introduction of types into your programm Write your program as usual in a dynamically typed language. As the program matures, gradually add type annotation to: Variables 31 Some Pascal compilers have a flag for turning off array bounds checks this function to this parameter in this particular invocation. Definition 4.7 (Duck typing error). A duck typing error occurs if a value s type does not match the runtime type during the program runtime Type information responsibility 5 Frames: The 3 alternatives for declaration of types Manifest typing or explicit typing Responsibility for tagging The risk of implicit typing More risks of implicit typing 60

75 337. The 3 alternatives for declaration of types I. Manifest typing as in C, Pascal II. Inferred typing as in ML; AKA Implicit typing most PLs can infer, at least partially, types. In ML it is particularly astonishing since it involves recursive functions and type parameters. III. Semiimplicit typing Fortran: variables which begin with one of the six letters i, j, k, l, m, and n are integers; all others are real. 32 Basic (older versions): Suffixes such as % and $ determine the variable s type. 10 A$ = "HELLO" Basic 20 PRINT A$ Perl: Essentially the same as Basic Manifest typing or explicit typing Programmer is in charge for annotating identifiers of values variables functions procedures parameters with type information. found, e.g., in Pascal, Ada, C,, Type annotation is also a documentation aid X: speed; (* Good *) Y: real; (* Bad *) Z = 3; (* Worse *) 339. Responsibility for tagging Definition 4.8 (Type inference). Compilers have the ability to apply type inference rules to e.g., to determine the type of expressions. Why not apply this also to variables and other entities? Definition 4.9 (Implicit typing). A programming language feature by which the programmer does not have to provide type information; the compiler infers the type of an entity from the way it is defined. 32 The risk of inadvertent creation of variables can be precluded by the declaration implicit none 340. The risk of implicit typing The compiler infers the type of any defined value, including that of functions. Risk Inadvertent creation of variables due to typos and spelling errors: ML Answer no variables Value declaration just like CONST declaration in Pascal Pascal ML CONST Pascal a = 100 val ML a = More risks of implicit typing Formal parameters: header of a function. declared (without type) in the Risk Confusing error messages, and confusing type errors ML Answer programmer is allowed to add type constraints but, there is more Risk Some complex (recursive and generic) type inference problems are undecideable ML answer I careful analysis of the type system to detect when this problem may occur ML answer II type constraints References Duck typing Gradual typing Manifest typing Marshalling Name mangling Nominal typing Nominative & structural typing Serialization Structural type System Structural vs. nominal typing Type inference Type safety Type system 61

76 Exercises 1. Compare the terms type error and type punning. 2. Write a program which never makes type errors, yet, would be rejected by a compiler implementing static typing. Did you know that Your program explains how static typing escapes Rice theorem. How is that? 3. In PLs which exercise only static typing, values do not need to carry a type tag. True or false? Explain. 4. Explain why the paralles between tuple and record values make us consider index overflow and index underflow as type errors? 5. Explain why divison by zero is not usually regarded as type error. 6. What is the difference between mixed typing and gradual typing? 7. Explain why violating void safety is regarded as type error. 8. Why shouldn t arcsin( 2) be marked as type error? 9. Explain the statement referene types are closer to be first class citizens, in C++ than in Pascal (Hint: where are the reference types in Pascal?). 10. In the above we saw an example in which the XOR function of two pointers. Explain how this trick can be used for a compact representation of a bi-linked list, storing only one pointer in each node. What limitations does this compact representation place on its users? 11. Explain why the combination of static typing and weak typing, as in the C PL, to never attach type tags to values. 4.2 Structural vs. nominal typing 13 Frames: Type equivalence Structural equivalence An example for (hypothetical) structural equivalence Name equivalence Name equivalence across programs Name equivalence across programs Declaration equivalence Derived types in Ada Type equivalence Structural equivalence Name equivalence Name equivalence across programs Declaration equivalence 342. Type equivalence Suppose that an operation expects an operand of type T, but receives an operand of type T. Is this an error? No, if T is a subtype of T Two types that are subtypes of each other are called equivalent Caveat: because the notion of subtype is more refined than the notion of type equivalence, we will find languages where type equivalence is defined but subtyping is not Kinds of type equivalence: Structural equivalence Name equivalence Declaration equivalence 343. Structural equivalence Have the same values. In Algol-68: if T and T are both atomic, then T = T if, and only if, they are identical. Else if T = A B, T = A B, or T = A + B, T = A + B, or T = A + B, T = B + A, or T = A B, T = A B and A = A, B = B Otherwise? Recursive types: T = Unit + c(a T ) S = Unit + c(a S) R = Unit + c(a S) intuitively T, S and even R are structurally equivalent, but structural equivalence is not easy to define and test for in recursive types. Points of disagreement between languages: Do records require field name identity to be structurally equivalent? Do arrays require index range identity to be structurally equivalent? 344. An example for (hypothetical) structural equivalence 345. Name equivalence if, and only if, T and T were defined in the same place (original Pascal and Ada): 346. Name equivalence across programs Program p1(f) TYPE T = file of Integer; VAR f: T; Begin Pascal Write(f,); end; 62

77 Program p2(f) TYPE T = file of Integer; VAR f: T; Begin Pascal Read(f,); (* *) end; 347. Name equivalence across programs By definition of Pascal, it follows that Two Pascal programs cannot communicate legally through files using any user-defined type, hence the type error above. However, multiple instances of the same program can. type Text, which is the only predefined type for files in Pascal, allows communication between programs. In practice, most implementations of Pascal do not type check files. Thus, these implementations subvert Pascal s type safety, but allow reasonable interfacing Declaration equivalence A later Pascal standard (ANSI 1983): T = T if, and only if, T and T have the same declaration Problem across programs: gone. Sub-typing in Pascal A weaker notion than type equivalence: an operation that expects an operand of type T but receives an operand of type T is legal also if T and T are not equivalent, but T is a subtype of T. Sub typing in Pascal- only in one case: if T = [a..b] and T = [c..d] then T is a subtype of T if, and only if, T is a subrange of T, i.e., a c d b Limitation: there is no way to override this definition Derived types in Ada Subtypes in Ada More about Sub-typing in Ada Sub-typing of ordered primitive types - using subranges: 350. Type equivalence Suppose that an operation expects an operand of type T, but receives an operand of type T'. Is this an error? No, if T' is a subtype of T Two types that are subtypes of each other are called equivalent Caveat because the notion of subtype is more refined than the notion of type equivalence, we will find languages where type equivalence is defined but subtyping is not Kinds of type equivalence: Structural equivalence Name equivalence Declaration equivalence 351. Structural equivalence Have the same values In Algol-68: If T and T are both primitive, then T = T only if they are identical. Else if T = AxB,T = A xb, or T = A + B,T = A + B, or T = A + B, T = B + A, or T = A B, T = A B and A = A,B = B T = T Otherwise? Recursive types: T = Unit + c(axt ) S = Unit + c(axs) R = Unit + c(axs) Intuitively T, S and even R are structurally equivalent, but structural equivalence is not easy to define and test for in recursive types. Points of disagreement between languages: Do records require field name identity to be structurally equivalent? Do arrays require index range identity to be structurally equivalent? 352. Name equivalence T = T if, and only if, T and T were defined in the same place (original Pascal and Ada): TYPE T1 = File of Integer; T2 = File of Integer; VAR f1: T1; f2: T2; Procedure Pascal p(var f: T1); p(f1); (* *) p(f2); (* *) 353. Name equivalence across programs Program p1(f) TYPE T = File of Integer; VAR f: T; Begin Pascal write(f,); end; 63

78 Program p2(f) TYPE T = File of Integer; VAR f: T; Begin Pascal Read(f,); (* *) end; By definition of Pascal, it follows that two Pascal programs cannot communicate legally through files using any user-defined type, hence the type error above. However, multiple instances of the same program can. type Text, which is the only predefined type for files in Pascal, allows communication between programs. In practice, most implementations of Pascal do not type check files. Thus, these implementations subvert Pascal s type safety, but allow reasonable interfacing Declaration equivalence A later Pascal standard (ANSI 1983): T = T only if T and T have the same declaration TYPE T1 = File of Integer; T2 = File of Integer; VAR f1: T1; f2: T2; Procedure Pascal p(var f: T1); p(f1); (* *) p(f2); (* *) Problem across programs: gone. 4.3 Theoretical polymorphism Contents [64 frames] Motivation [12 frames] Overloading [18 frames] Coercion [7 frames] Universal polymorphism [5 frames] Polytypes [11 frames] Inclusion polymorphism [10 frames] summary [1 frame] References Exercises Motivation 12 Frames: The road to polymorphism Benefits of strong static typing An annoying Pascal example Flexibility of type system Life without handcuffs can be wonderful! Responses to inflexibility Monomorphic vs. polymorphic type systems What s monomorphism? Monomorphism of user defined functions in Pascal Monomorphism of user defined functions in C More than one type polymorphism Ad hoc polymorphism polymorphic type system 355. The road to polymorphism Rich Type Constructors Atomic Types Sophistication? Degenerate Typed! Nominal Typed? Untyped Dynamically Statically Type Equivalence? Enforced? Polymorphic Monomorphic? Polymorphism Structural Ad hoc Coercion Universal Monomorphic Overloading Inclusion Parametric Theory of Polymorphism Subrange Inheritance Figure 4.3.1: The road to polymorphism 356. Benefits of strong static typing Large software systems tend to use static strongly typed languages, because of Safety fewer bugs Efficiency fewer runtime checks, and more efficient use of memory Clarity typing makes the code clearer However, typing can be a nuisance the utility of a given piece of a code may be very restricted by typing An annoying Pascal example Procedure sort(var Pascal a: Array[1..300] of T); could not be applied to Arrays of real (body and declaration has to be repeated with T=Real). array[1..299] of T: Array is too small. array[1..500] of T: Array is too large. array[0..299] of T: Mismatch of indices. array[1..300] of T: No name equivalence!!!! Pascal is so fussy and inflexible in its type system that even two identical type declarations are considered distinct. A type declaration made at a certain point in a program is equivalent only to itself Flexibility of type system Flexible 33 type system makes typing an aide, not a hurdle Avoid issuing type error messages on programs which will not make run time type errors. Promotes code reuse for many different types. 33 Flexibility is yet another criterion for the classification of type systems 64

79 Clearly, Pascal offers a very inflexibility type system. The holy grail of language design Simultaneously maintain: 1. Flexibility 2. Safety 3. Simplicity 359. Life without handcuffs can be wonderful! In dynamically typed Languages, polymorphic code may be invoked with variables of different type (writing almost at a pseudo-code level) search(k) { // k is the key to search for // p is the current position in the search for k for (p = first(); not exhausted(p,k); pseudo p = next(p,k)) C if (found(p,k)) return true; return false; Alas Very flexible, but not so safe 360. Responses to inflexibility 1. The C camp: Weak typing. int qsort( C char *base, // Start of array int n, // Number of elements int width, // Element s size int(*cmp)() // How elements are compared ); 2. Dynamically typed languages camp: Smalltalk, Python, etc.: dynamic typing overcomes complex inflexibility problems. In a sense, all code is polymorphic. Many standard algorithms are inherently generic (e.g., sort) Many standard data structures are also generic (e.g., trees) Polymorphic Type Systems Appear in modern languages, e.g., Ada, C++, and ML. Entities may have multiple types Code reuse thanks to universal polymorphism Supports Generic functions, e.g., sort. Generic types, e.g., binary tree What s monomorphism? Overloading Ad hoc Polymorphic typing Coercion Subrange Inclusion Typing Universal Inheritance Monomorphic typing Parametric Polytypes Figure 4.3.2: Monomorphism in context In a monomorphic type system, functions (and other entities) have one, and only one, type. Monomorphic = single-shaped f is a function types( f ) = 1. (4.3.1) 3. Ada/C++/ camp: Polymorphic type systems generic Ada type T is private with function comp(x: T, y: T) procedure sort(a: array(1..max) of T)... procedure int_sort is new sort(int, "<");... But, what is a polymorphic type system? 361. Monomorphic vs. polymorphic type systems Monomorphic Type Systems Used in classical PLs, e.g., Pascal Every entity has a single simple type Type checking is straightforward 363. Monomorphism of user defined functions in Pascal Programmer defined functions (and procedures) in Pascal are monomorphic: Function gcd(n, m: Integer): Integer; Begin if n mod m <> 0 then gcd := gcd(m, n mod m) else gcd := m; end; Function gcd is monomrphic: types(gcd) = {Integer Integer Integer = 1 (4.3.2) Unsatisfactory for reusable software; 65

364. Monomorphism of user defined functions in C int gcd(int n, Cint m) { return n % m? gcd(m, n % m) : m; Function gcd is monomrphic: types(gcd) = {int int int = 1 (4.3.3) 367.

Unraveling the Marx riddle Term Meaning I Meaning II flies flows winged insects like similar to favor, prefer Overloading Ad hoc Coercion Inclusion Universal Parametric Table 4.3.

literally, the capacity of an entity to have several shapes Time flies like an arrow. Fruit flies (the disgusting insects) like (favor) a banana! 366.

80 364. Monomorphism of user defined functions in C int gcd(int n, Cint m) { return n % m? gcd(m, n % m) : m; Function gcd is monomrphic: types(gcd) = {int int int = 1 (4.3.3) 367. Riddle Can you figure out this? Groucho Marx ( ): Time flies like an arrow. Fruit flies like a banana! 365. More than one type polymorphism Polymorphic typing Typing Monomorphic typing?368. Unraveling the Marx riddle Term Meaning I Meaning II flies flows winged insects like similar to favor, prefer Overloading Ad hoc Coercion Inclusion Universal Parametric Table 4.3.1: Overloaded meanings of terms in the Groucho Marx riddle Subrange Inheritance Polytypes Figure 4.3.3: Polymorphism in context Poly-Morphism = poly + morphos [Greek] = many + form. literally, the capacity of an entity to have several shapes Time flies like an arrow. Fruit flies (the disgusting insects) like (favor) a banana! 366. Ad hoc polymorphism polymorphic type system Figure 4.3.4: Unraveling the Marx riddle Overloading minimal utility. A (small) number of distinct procedures that just happen to have the same identifier. Not a truly polymorphic object Does not increase the language s expressive power All connections between shapes is coincidental Coercion a little greater utility Same routine can be used for several purposes Number of purposes is limited Return type is always the same Connection between shapes is determined by the coercions, which are usually external to the routine Overloading 18 Frames: Riddle Unraveling the Marx riddle Overloading Overloading in English Context dependent resolution of overloading Keyword overloading in Pascal Plain overloading vs. identifier/operator overloading Use of overloading for type polymorphism Builtin procedure overloading in Pascal Builtin function overloading in Pascal? Overloaded builtin operators of Pascal Programmer defined overloading Function overloading in C++ Overloading the division operator in Ada Resolving ambiguity of overloading Ambiguity resolution Overloading vs. hiding Overloading & hiding together? 369. Overloading Definition 4.10 (Overloading). An overloaded term is a term that has multiple meanings, which may, but also may not be related. How did the Marx trick work? Overloading Unrelated meanings Misleading context: (On its own, the phrase Fruit flies like a banana is not so confusing) 370. Overloading in English Unrelated meanings: lie to present false information with the intention of deceiving I did not lie in the deposition lie to place oneself at rest in a horizontal position I did not lie in this position Close (more or less) meanings: fly to move through the air 66

fly to travel by an airplane fly a two winged insect, such as insect Lemma 4.11 (The fundamental rule of overloading). The intended meaning is figured out by context 371.

81 fly to travel by an airplane fly a two winged insect, such as insect Lemma 4.11 (The fundamental rule of overloading). The intended meaning is figured out by context 371. Context dependent resolution of overloading Archie Is Branded episode 57 (episode 20/season III) All In The Family 1973 Figure 4.3.5: Archie Bunker explains to Edith Bunker how context is used to resolve the ambiguity of the three overloaded meanings of the word Shalom in Hebrew Script Paul: Shalom. Edith Bunker: Shalom? What does that mean? Mike Stivic: Believe it or not, Ma, it means peace. Gloria Stivic: Jewish people also use it to say hello and good-bye. Edith Bunker: How do you tell if they mean hello or good-bye? Archie Bunker: Simple, Edith, If a Jew is walking towards you, it means hello. If he s walkin away, it means good-bye. Edith Bunker: When does it mean peace? Archie Bunker: In between hello and good-bye. Arrays CONST N Pascal = 100; TYPE Range = 1..N; Sets VAR Matrix = Array[Range] Array[Range] of Real; vowels: Set of Char; Multiway conditional of Pascal Case month Pascal of January: February: Variant Records TYPE String = Kind = (NIL, CONS, ATOM); Link = ^Child Pascal Child_= Record of Case tag: Kind of NIL: (* Nothing *) CONS: car, cdr: Link; ATOM: data: String end end; But, all these cases of overloading are unrelated to type Plain overloading vs. identifier/- operator overloading Overloading of of in Pascal is not of an identifier. Keyword of does not identify an entity such as variable, a function, a procedure, etc. Definition 4.12 (Identifier overloading). An identifier or operator is said to be overloaded if it simultaneously denotes two or more distinct functions or procedures Operator + in Pascal and C denotes two distinct functions: Integer addition Floating point addition Identifiers may be similarly overloaded Overloading 374. Use of overloading for type polymorphism Ad hoc Polymorphic typing Coercion Subrange Inclusion Typing Universal Inheritance Monomorphic typing Parametric Polytypes 372. Keyword overloading in Pascal Keyword of serves several similar meanings in different contexts Figure 4.3.6: Use of overloading for type polymorphism We say that a function (or, an operator) f is overloaded, if: 67

82 f has more than one type there is no automatic mechanism that generates the set types( f ) Overloading provides a mechanism for better utilization of scarce good names 375. Builtin procedure overloading in Pascal 378. Programmer defined overloading C does not allow operator overloading by programmer. but, its younger, fatter, and uglier, daughter, C++, does Pascal does not allow operator overloading by programmer. but, its younger, fatter, and uglier, sister, C++, does Output is E+00 FALSE Sunday Program Write; TYPE Days = (Sunday, Monday); Begin Writeln(0); Pascal Writeln(0.0); Writeln(false); Writeln(Sunday) end. The identifier Writeln in Pascal denotes many distinct functions Builtin function overloading in Pascal? Many Pascal functions apply to more than one type: eof succ ord sin. But, their polymorphism is not overloading; it is either coercion, or parameteric polymorphism. Similarity of overloaded meanings is a matter of coincidence 377. Overloaded builtin operators of Pascal Consider, e.g., operator + Number of types More than one type: + Integer Integer + Real Real + Integer Integer Integer + Real Real Real Thus, types(+) = 4 > 1. Regularity Is the set of types automatically genreates? The above four types are similar and related. They were designed by the individual who designed Pascal. This individual gave them semantics. Incidintally, these semantics are related. But, they were not automatically generated Function overloading in C++ C forbids function overloading, but its young, fat, and ugly sister, C++, welcomes it: Function overloading in C++ double max(double d1, double d2) { return d1 > d2? d1 : d2; char max(char c1, char c2) { return c1 > c2? c1 : c2; char* max(char* s1, char* s2) { C++ return strcmp(s1,s2)>0 : s1 : s2; const char* max(const char* s1, const char* s2){ return strcmp(s1,s2)>0 : s1 : s2; Neither C, nor C++ have builtin functions. they have no builtin function overloading Overloading the division operator in Ada Hence, Pascal forbids operator overloading by programmer, but, its younger, fatter, and uglier, daughter, Ada, allows it: Builting semantics of / : Integer division Integer Integer Integer Real division Real Real Real Programmer defined overloading: User Overloading operator / in Ada Ada function "/" (m, n : Integer) return Float is begin return Float(m) / Float(n); end; Adds another meaning to division of integers: it can now also return a real number. (i) (ii) 381. Resolving ambiguity of overloading The actual meaning is determined by context: which parameters are passed to operator / upon invocation how its result is used Ambiguity resolution Consider the call Id(E) where Id denotes both: a function f 1 of type S 1 T 1 a function f 2 of type S 2 T 2 Context Independent (C++) Either f 1 or f 2 is selected depending solely on the type of E 68

83 We must have S 1 S 2 May lead to ambiguities in the presence of coercion Context Dependent (Ada) Either f 1 or f 2 is selected depending on both on the type of E or how Id(E) is used. Either S 1 S 2 or T 1 T 2 (or both). Ambiguity is not always resolved: x : Float:=(7/2)/(5/2); Ada Has at least two ambiguous interpretations: 3/2 = 1.5, 3.5/2.5 = Overloading vs. hiding Hiding (by lexical scope): an identifier defined in an inner scope hides an identifier defined in an outer scope Hiding in C C static long tail; int main(int ac, char **av) { // hides outer tail const char **tail = av + ac - 1; Comparison: both do not make polymorphic types Overloading Multiple meanings co-exist Hiding New meaning masks the old meaning Overloading & hiding together? May be challenging for language designers? Can inner definition overload external definition? What happens if an inner definition hides one overloaded outer definition, but not the other? Exercise Provide examples in concrete languages, and see how they deal with these dilemmas Why coercion? Pascal provides coercion from Integer to Real, so we can write: Primality testing in Pascal Function isprime(n: integer): Boolean; VAR d: Integer; (* Potential divisor *) primesofar: Boolean; Begin If n < 0 then n := -n; primesofar := n >= 2; d := 2; Pascal While primesofar and (d <= sqrt(n)) do Begin primesofar := n mod d <> 0; d := d + 1; end; isprime := primesofar; end; Function sqrt expects a Real We need to compute n, but n is an Integer Coercion: n is implicitly converted to Real Net effect: sqrt applies also to Integer 387. Implicit use of coercion Coercion enhances the utility of existing functions Pascal While primesofar and (d <= sqrt(n)) do 1. Function sqrt expects a Real, but thanks to coercion we can pass it an Integer 2. Function sqrt returns a Real, but thanks to coercion we can compare its result with an Integer 388. Type coercion in Algol Coercion 7 Frames: What s coercion? Why coercion? Implicit use of coercion Type coercion in Algol-68 Builtin coercion in C++ Ambiguity due to coercion Coercions + overloading 385. What s coercion? Typing and more Polymorphic typing Monomorphic typing Now you can understand why modern languages tend to Ad hoc Universal minimize or even eliminate coercions altogether Builtin coercion in C++ Overloading Coercion Subrange Inclusion Inheritance Parametric Polytypes Definition 4.13 (Coercion). Coercion is a conversion from values of one type to values of another type which occurs implicitly. casting. Algol 68 allows the following coercions: From integer to real Widening From real to complex number Dereferencing From reference to a variable to its value Rowing From any value to a singled value array int pi = ; // Builtin Ccoercion from double to int float x = '\0'; // Builtin coercion from char to float extern double sqrt(float); x = sqrt(pi); // Builtin coercion from int to double // and then // Builtin coercion from double to float Coercion is sometimes called, especially in C++, type casting and type conversion, without particular distinction between implicit and explicit applications. 69

84 390. Ambiguity due to coercion Types int, double and float in C, can all be coerced into each other. Therefore, the language definition must specify exactly the semantics of e.g., 'a'*35+5.3f 391. Coercions + overloading Strategies for support of mixed type arithmetic, e.g., A + B Overloading and no coercion integer + integer real + integer integer + real real + real Coercion and no overloading real + real integer real Coercion and overloading integer + integer real + real, integer real Each overloaded version Coercions graph is not always a tree Each distinct coercion What is the path of coercion from unsigned char to Human can be language designer and/or programmer (depending on the PL) long double? unsigned char char int long double long double 393. Ad hoc vs. universal polymorphism or maybe, Definition 4.14 ( ad hoc ). ad hoc adv. 1. For the specific unsigned char unsigned unsigned long long double purpose, case, or situation at hand and for no other: a committee formed ad hoc to address the issue of salaries. ad hoc adj. 1. Selecting a different path may lead to slightly different Formed for or concerned with one specific purpose: an ad hoc semantics compensation committee. 2. Improvised and often impromptu: K&R C, ANSI-C and C++ are all different in this On an ad hoc basis, Congress has placed ceilings on military aid to specific countries (New York Times). [Latin ad, to + hoc, respect. this.] Coercions graph is not always a DAG No. Shapes Shape Generation Shape Uniformity Universal Unbounded Ad Hoc Automatic Manual Finite and few (often very few) Systematic Coincidental Table 4.3.2: Ad hoc vs. universal polymorphism 394. The benefits of universal polymorphism A single function (or type) has a (large) family of related types The function operates uniformly on its arguments, whatever their type. Provide a genuine gain in expressive power, since a polymorphic function may take arguments of an unlimited variety of types 395. Annoying example: a monomorphic Pascal function Universal polymorphism 5 Frames: What is ad hoc polymorphism? Ad hoc vs. universal polymorphism The benefits of universal polymorphism Annoying example: a monomorphic Pascal function Using the disjoint monomorphic function 392. What is ad hoc polymorphism? Typing Type is Determine whether two sets of characters are disjoint Type Pascal CharSet = set of Char; Function disjoint(s1, s2: CharSet): Boolean; Begin disjoint := (s1 * s2 = []) end Overloading Ad hoc Polymorphic typing Coercion Inclusion Universal Monomorphic typing (Char) (Char) Boolean Parametric Applicable only to sets of Chars Using the disjoint monomorphic function Applicable to a pair of arguments, each of type Char: Subrange Inheritance Polytypes All we have seen so far is ad hoc polymorphism, in which the variety in different shapes is created by human. VAR chars Pascal : CharSet; Begin If disjoint(chars,['a','e','i','o','u']) then ; end 70

85 But, cannot be applied to arguments of other type, such as, Integer, Color, Counter example: a Pascal polymorphic operator The * operator in Pascal is polymorphic. It can be applied to any two sets of the same kind of elements Polymorphism is universal, since the operator works in the same fashion for all types for which it is applicable Polytypes 11 Frames: What are polytypes? Polytypes A polytype derives many types No programmer-defined polytypes in Pascal! Defining polytypes in ML Values of a polytype Example: polytype list(σ) Example: the polytype σ σ Values of polytypes (more examples) Polytypes & software engineering Algebra of polytypes 397. What are polytypes? Overloading Ad hoc Polymorphic typing Coercion Subrange Inclusion Typing Universal Inheritance Monomorphic typing Parametric Polytypes A polytypes is the type common to all instances of a specific parametric type. Definitions 398. Polytypes Polytype (also called parametric type) a type whose definition contains one or more type variables Monotype a type whose definition includes no type variables; Monomorphic PL offers solely monotypes Polymorphic PL offers also polytypes 400. No programmer-defined polytypes in Pascal! The type of the pre-defined function eof is File of σ. If Pascal had user defined polytypes, we could have written TYPE Pair(σ) = Record pseudo first, second: Pascal σ; end; IntPair = Pair(Integer); TYPE RealPair = Pair(Real); pseudo list(σ) = Pascal ; VAR line: list(char); Unfortunately, this would not work in Pascal. All we can write is something of the sort of Pascal end; TYPE IntPair = Record first, second: Integer; VAR line: CharList; 401. Defining polytypes in ML type σ pair = σ * σ; datatype σ list = nil ML cons of (σ * σ list); fun hd(l: σ list) = case l of nil => (* error *) cons(h,t) => h and tl(l: σ list) = case l of nil => (* error *) cons(h,t) => t and length(l: σ list) = case l of nil => 0 cons(h,t) => 1 + length (t) Notations for some common polytypes: Pair(σ) = σ σ list(σ) = Unit + (σ list(σ)) Array(σ,σ) = σ σ Set(σ) = (σ) Examples: A plain polytype, and plenty of types of polymorphic functions: list(σ) list(σ) σ list(σ) Integer σ σ σ σ σ (β γ) (α β) (α γ) 399. A polytype derives many types A polytype derives a whole family of types, e.g., type σ σ derives: Integer Integer, String String, list(real) list(real), 402. Values of a polytype What is the set of values of a polytype? Weird question In C++ A class template has no values, only if you substitute an actual type to its type variable, you will get a real type. In ML One can easily define values of a poltypes representing polymorphic functions. For example, the type of the function second is the polytype σ σ σ. A tough problem what are the values of the polytype list(σ)? Definition The set of values of any polytype is the intersection of all types that can be derived from it. Rationale suppose v is a value of a polytype for which no monotype substitution was performed. Then the only legitimate operations on v would be those available for any monotype derived from the polytype. 71

86 403. Example: polytype list(σ) Monotypes Derived From list(σ) list(integer) all finite lists of integers, including the empty list. list(boolean) all finite lists of truth values, including the empty list. list(string) all finite lists of strings, including the empty list. The empty list is the only common element Nonempty lists are values of a specific monotype, determined by components type. The empty list is a value of any monotypes derived from list(σ) The type of the empty list has type list(σ) There are no other values of type list(σ) 404. Example: the polytype σ σ Monotypes derived from σ σ : Integer Integer includes the integer identity function, the successor function, the absolute value function, the squaring function, etc. String String includes the string identity function, the string reverse function, the space trimming function, etc. Boolean Boolean includes the truth value identity function, the logical negation function, etc. The identity function is common to all σ σ types. In fact, this is the only such common value Values of polytypes (more examples) (σ) The empty set, [] Pointer(σ) The value nil. σ σ σ Function second (β γ) (α β) (α γ) Function o (σ σ) (σ σ) id, twice, thrice, fourth, etc., and even function fixedpoint (the function mapping any σ σ function to id : σ σ. Pair(σ) = σ σ empty Array(σ,σ) = σ σ empty 406. Polytypes & software engineering The polytype of a function is very telling of what it does. It is often easy to guess what a function does, just by considering its polytype. Many polytypes have only one value, which eliminates the guessing altogether Easy examples list(σ) σ list(σ) list(σ) list(σ) Integer σ σ σ σ σ (β γ) (α β) (α γ) Slightly more difficult list(σ) list(σ) list(σ σ), (σ σ) list(σ) List(σ), (σ σ σ) σ List(σ) σ 407. Algebra of polytypes There are software systems that promote reuse by supporting a search for functions based on their signatures. Clearly, the search must be insensitive to application of the commutative laws to product and choice. Further, the search should be made insensitive to choice of labels Inclusion polymorphism 10 Frames: Inclusion polymorphism Subtyping Inclusion polymorphism Subranges in Pascal Subtypes in Pascal Subtypes in Ada: builtin types Subtypes in Ada: array types Subtypes in Ada: user defined types Hypothetical ML with structural subtyping Non-type parametric polymorphism 408. Inclusion polymorphism Inclusion Polymorphism: The other kind of universal polymorphism. Arising from an inclusion relation between types or sets of values. Typing Overloading Ad hoc Polymorphic typing Coercion 409. Subtyping Overloading Ad hoc Subrange Inclusion Universal Inheritance Polymorphic typing Coercion Inclusion Monomorphic typing Typing Universal Subrange Inheritance Parametric Polytypes Monomorphic typing Parametric Polytypes Most inclusion polymorphism is due to subtyping, but not always. Definition (Subtyping: Version I) Type A is a subtype of the type B if A B. Definition (Subtyping: Version II) Type A is a subtype of the type B, if every value of A can be coerced into a value of B. Builtin: 410. Inclusion polymorphism Pascal: types. The Nil value belongs to all pointer C: The value 0 is polymorphic. It belongs to all pointer types. 72

87 C++: The type void * is a super-type of all pointer types. User Defined Two Varieties (not OO 34 ) Subranges in Pascal: TYPE Pascal Index = ; Digit = '0'..'9'; Anything applicable to Integer will be applicable to type Index. Anything applicable to Char will be applicable to type Digit. OO A subclass is also a subtype Inheritance in C++ C++ // a Manager is kind of an Employee class Manager: public Employee { // 411. Subranges in Pascal Pascal subrange definition Pascal TYPE MonthLength = ; Type MonthLength has four values: 28, 29,30,31. Values of make a subset of type Integer. Any operation that expects an Integer value will happily accept a value of type MonthLength. Type MonthLength inherits all operations of type Integer. Inheritance in Pascal A Pascal subrange type inherits all the operations of its parent type; otherwise, no Pascal type inherits any operations from another distinct type Subtypes in Ada: builtin types In contrast, Ada allows subtypes of all atomic types, as well as user-defined, composite types. Discrete types in Ada subtype Natural is Integer range 0..Integer'last; subtype Small is Integer Ada range ; Indiscrete types in Ada subtype Probability Ada is Float range ; 414. Subtypes in Ada: array types Strings in Ada type String is array Ada (Integer range <>) of Character; subtype String5 is String (1..5); subtype String7 is String (1..7); 415. Subtypes in Ada: user defined types type Sex is (f, m); type Ada Person (gender : Sex) is record name : String (1..8); age : Integer range ; end record; subtype Female is Person (gender => f); subtype Male is Person (gender => m); 416. Hypothetical ML with structural subtyping Some geometric types ML type point = {x: real, y: real; type circle = {x: real, y: real, r: real; type box = {x: real, y: real, w: real, d: real; Assuming inheritance relationship being derived from structure 35, we have box circle point. Operations associated of point should be applicable to box, e,.g., move : σ Point σ Real Real σ Subtypes in Pascal Pascal recognizes only one restricted kind of subtype: subranges of discrete atomic types. Safe i:=n and i:=s Subranges in Pascal TYPE Natural = 0..maxint; Small = ; VAR i: Integer; Pascal n: Natural; s: Small; Unsafe n:=i, s:=i, n:=s and s:=n (require run-time range check) A value may belong to several (possibly many) subtypes. Run time check is required to verify that a value belongs to a certain subtype Non-type parametric polymorphism What we have seen so far is Entity Type (parametrized) Parameter Type Output Type (concrete) Entity Function (parameterized) Parameter Type Output Function (concrete) How about other entities? in most mainstream PLs, including and C++, structure is derived from inheritance relationship 36 An example was shown above 73

88 4.3.7 summary 1 Frames: Varieties of polymorphism 418. Varieties of polymorphism Ad Hoc Created by hand; caters for a limited number of types Overloading A single identifier denotes several functions is an ad hoc term simultaneously Reuse is limited to names, but there are is reusable code Coercion A single function can serve several types thanks to implicit coercions between types Extending the utility of a single function, using implicit conversions Universal Systematic, applies to many types Parametric Functions that operate uniformly on values of different types Inclusion Subtypes inherit functions from their supertypes References Ad hoc Polymorphism Function Overloading Operator Overloading Parametric Polymorphism Polymorphism (d) and in C? (e) and in C++? (f) { in C++? (g) [] in Pascal? 3. Write a Pascal program that while using ranges makes a type error? 4. Is the type error flagged at compile time? At runtime? How? 5. Write a C++ program in which an enum variable is overflowed. 6. Is the type error flagged at compile time? At runtime? How? 7. Repeat the above two questions for Pascal. 8. Pascal has this problem of programs not being able to read files. Explain this problem. 9. With respect to this problem, why does it still make sense to allow functions such as eof which can take many different types of files? 10. Classify the polymorphism kind (if any) of all of Pascal s pre-defined functions, constants, and procedures. Type Conversion Type Punning Exercises 1. What (if any) kind of polymorphism? (a) operator >= in Pascal? (b) and in C? (c) and in C++? (d) function succ in Pascal? (e) function chr in Pascal? (f) the array type constructor in Pascal? (g) the forto iterative command in Pascal? (h) procedure new in Pascal? (i) and procedure dispose? (j) operator + in Pascal? (k) and in C? (l) and in? 2. What are all the types of (a) true in Pascal? (b) June in Pascal? (c) 0 in Pascal? 4.4 Polymorphism in practice Contents [34 frames] Overloading in Pascal, C/C++, and [7 frames] Coercion and the C++ overloading tournament [6 frames] Polymorphic functions [7 frames] Type inference [3 frames] Checking parameters with parametric polymorphism [6 frames] Case studies [5 frames] Overloading in Pascal, C/C++, and 7 Frames: Keyword overloading in C++ Builtin operator overloading in C Builtin overloading of operator * in C Builtin operator overloading in Pascal User defined operator overloading in C++ More operator overloading opportunities in C++ Overloading in 419. Keyword overloading in C++ Meanings of C++ s static keyword are only vaguely related: 74

89 Meaning Scope Storage class Not an instance member Example static char buff [1000]; C Comments When applied to definitions made at the outer most level of a file 37 Antonym of extern; global in file, but inaccessible from other files Do not place on the stack. int counter(void) C { static int val = 0; Shared by all invocations of the return val++; function. Antonym of auto; value persists between different invocations. class CBook { static int n; public: Book() { ++n; ~Book() { --n; ; Shared by all instances of a struct or a class. Table 4.4.1: Overloading of keyword static in C 420. Builtin operator overloading in C Keyword overloading does not make entities with more than one type. Keyword overloading is not type polymorphism Many builtin operators offer overloaded semantics foo(int a, int Cb, double x, double y) { a + b; /* Integer addition */ x + y; /* Floating point addition */ 421. Builtin overloading of operator * in C Integer multiplication int int int Long integer multiplication long long long Floating point multiplication double double double Pointer dereferencing Pointer(σ) σ for any type σ * has another overloading in type definitions, but this overloading is not considered polymorphism Builtin operator overloading in Pascal Operator - in Pascal serves for Integer negation Integer Integer Real negation Real Real Integer subtraction Integer Integer Integer Real subtraction Real Real Real Set difference Set(σ) Set(σ) Set(σ), where σ is any of the types for which Pascal s sets can be created Parametric polymorphism vs. overloading One of the overloaded meanings of - follows parametric polymorphisms ( ) σ T Pascal ( σ σ) σ types( - ) 423. User defined operator overloading in C++ Overloading operator += C++ (4.4.1) class Rational { public: Rational(double); const Rational& operator += (const Rational& other); ; 424. More operator overloading opportunities in C++ In C++ you can overload even stuff you did not know was an operator Including (), the function call operator Including the type casting operator Including,, the comma operator Including [], the array access operator Including *, the dereferencing operator Including ->*, the field access operator not so easy to learn and use 425. Overloading in Even if you do not know, you should be able understand and apply the following: Builtin operator overloading: Similar to C++ + serves also for string concatenation. Programmer defined operator overloading: None. Language designer did not wish to replicate the C++ nightmare. Builtin function overloading: None. just like many other languages has no builtin functions. Programmer defined function overloading: Similar to C Coercion and the C++ overloading tournament 6 Frames: Coercion in ML Programmer defined coercion in C++ The overloading tournament in C++ A tournament example More tournament examples Overloading + coercion + parametric + inclusion = C++ style headache! 75

90 426. Coercion in ML No mixed type arithmetic in ML: - 1+1; val it ML = 2 : int ; val it = 2.0 : real ; stdin: Error: operator and operand don't agree [literal] operator domain: int * int operand: int * real in expression: No implicit coercion from int to real; must use function real - real; val it = fn : int -> real - (real 1) + 1.0; val it = 2.0 : real Programmer defined coercion in C++ Can be done by Defining a (non-explicit 38 ) constructor with a single argument Overloading the type cast operator class Rational { public: Rational(double); explicit Rational( const char *s ; C++ ); operator double(void); Rational r = 2; // Builtin // coercion from int to double and // then programmer defined coercion // from double to Rational double d = sqrt(r); // Programmer defined coercion C++ // from Rational to double Rational h = "half"; // Error Rational h = Rational("half"); // OK 428. The overloading tournament in C++ In every function call site foo(a1,a2,, an), there could be many applicable overloaded versions of foo. C++ applies context independent, compile-time tournament to select the most appropriate overload. Ranking of Coercions (short version) None or unavoidable array pointer, T const T, Size promotion Standard conversion short int, int long, float double, int double, double int, Derived* Base*, Programmer defined by constructor or operator overloading Ellipsis e.g., int printf(const char *fmt, ) 38 In C++, an explicit constructor, i.e., a constructor whose definition is adorned with the explicit keyword is a constructor which will not be employed for implicit coercion; it can only be used if invoked explicitly. Winner must be: Better match in at least one argument At least as good for every other argument An error message if no single winner is found 429. A tournament example Resolve ambiguity of the function call where, max(a,b) a is of type float b is of type Rational and with two candidates: I double max(double, double) II Rational max(long double, Rational) Signtature double, double 1 st Argument float double 2 nd Argument Rational double Signtature long double, Rational 1 st Argument float long double 2 nd Argument none First argument equally good (size promotion) Second argument second contestant wins ( none is better than programmer defined ) second contestant wins More tournament examples With the declarations made previously, which version of max would the following invoke? max(rational(3), \ ) Given void foo(int) C++ { cout << "int"; void foo(char) { cout << "char"; void foo(char *) { cout << "char *"; void foo(const char *) { cout << "const char *"; What will be printed? void bar() C++ { foo(0); int 431. Overloading + coercion + parametric + inclusion = C++ style headache! Parametric polymorphism may contribute to ambiguity template C++ <typename T> const T & max(const T &a, const T &b) { return a > b? a : b; Inheritance may contribue to ambiguity The overloading tournament is not limited to overloading Certain PLs languages forbid overloading and coercion and restrict parametric polymorphism for precisely this reason. 76

91 4.4.3 Polymorphic functions 7 Frames: What are polymorphic functions? write vs. eof in Pascal Polymorphic functions with C++ s templates If Pascal allowed polymorphic functions Polymorphic functions in ML Polymorphic functions taking function parameters Polymorphic identity function in ML 432. What are polymorphic functions? Overloading Ad hoc Polymorphic typing Coercion Subrange Inclusion Typing Universal Inheritance Monomorphic typing Parametric Polytypes Can even make this inference Implicit instantitation of C++ function template C++ types unsigned long // return type (*pf) // variable name (unsigned long, unsigned long) // argument = max; // assignment 435. If Pascal allowed polymorphic functions function disjoint(s1, s2: set of σ) :Boolean; begin disjoint Pascal := (s1 * s2 = []) end VAR chars : set of Char; ints1, ints2 : set of 0..99; if disjoint(chars, ['a','e','i','o','u']) then if disjoint(ints1, ints2) then Definition 4.15 (Polymorphic functions). Functions that can work on a variety of types; a kind of parametric polymorphism i.e., polymorphism occurring for unboundedly many related types. The type variety may or may not show up as an explicit parameter write vs. eof in Pascal write(e) Effect depends on the type of E: type Char, type String, type Integer, The identifier write simultaneously denotes several distinct procedures, each having its own type Overloading (We ignore in this course the magic of Write taking multiple parameters, where each can be of a different type.) eof(f) Type is: File(σ) Boolean, where σ is any type Function is polymorphic ( many-shaped ). Argument types: File of Char, File of Integer, etc. operates uniformly on all of argument types 434. Polymorphic functions with C++ s templates Definition of a function template C++ template<typename Type> Type max(type a, Type b) { return a > b? a : b; Using template functions C++ int x,y,z; double r,s,t; z = max(x,y); t = max(r,s); Type Parameters Explicitly declared Inferred upon use Definition 4.16 (Type variables/type parameters). Type expressions like σ in the definition of disjoint are called type variables or type parameters Polymorphic functions in ML Type variables are used in ML to define parametric polymorphism: Definition fun second(x:τ, y ML :τ) = y or fun second(x,y) = ML y Type is τ τ τ, where τ is arbitrary. Use Illegal Use second(13,true) second(name) where name is the pair (1984,"Orwell") second(13) second(1983,2,23) Standard ML of New Jersey v [built: Thu May 9 05:41: ] - fun second(x, y) = y; val second = fn : 'a * 'b -> 'b - fun second(x:'t, y:'t) = y; val second = fn : 'a * 'a -> 'a Polymorphic functions taking function parameters Function twice takes as a parameter function f and returns a function g such that g(x)=f(f(x)): fun twice(f: σ σ) = fn (x: ML σ) => f( f(x) ) e.g., val fourth ML = twice(sqr) 77

92 Function o takes two arguments, functions f and g and returns a function which is their composition: fun op o (f: β γ, g: α β) = fn (x:α) ML => f(g(x)) e.g., val even ML = not o odd or, fun twice(f: ML σ σ) = f o f 438. Polymorphic identity function in ML Identity function σ σ. fun ML id(x: σ) = x represents Identity mapping on booleans {false false, true true (4.4.2) Identity mapping on integers {..., 2 2, 1 1,0 0,1 1,2 2,... (4.4.3) Identity mapping on strings ε ε, "a" "a","b" "b",..., "aa" "aa","ab" "ab",..., Type inference (4.4.4) 3 Frames: Type inference Type inference does not always produce the desired result Polymorphic type inference 439. Type inference The type of an entity is inferred, rather than explicitly stated. Pascal Constant definition: CONST Pascal pi= ; is of type Real. 2. Therefore, pi is of type Real. ML 39 Function definition fun even(n) = (n mod 2 = 0) 1. mod is of type int int int; 2. Since n occurs in n mod 2, n is of type int. 3. The type of operator = is σ σ bool for all σ; 4. n occurs in n mod 2, so n is of type int. 5. Therefore, the type of n mod 2 = 0 is bool 6. It follows that the type of even is int bool 440. Type inference does not always produce the desired result Define a max function in ML: A max Function in ML Since ML does not ML allow programmer defined overloading, - we fun can max(x,y) only have = one version of function max if x > y then x else y; val max = fn : int * int -> int But we want max to operate on reals: Figure 4.4.1: The ancient, obsolete, annoying, yet relatively easy to explain memory model of the 8086 Argument type declaration ML - fun max(x:real,y:real) = if x > y then x else y; val max = fn : real * real -> real 441. Polymorphic type inference Type inference might yield a monotype As for the function even Type inference might yield a polytype fun id(x) = x The type of id is σ σ fun op o (f, g) = fn (x) => f (g (x)) We can see from the way they are used that f and g are functions. The result of g must be the same as the argument type of f. Thus, type of o can be inferred: o (β γ) (α β) (α γ) (4.4.5) Checking parameters with parametric polymorphism 6 Frames: Checking type parameters: in C++ Checking type parameters in ML Polymorphic functions: C++ vs. ML Parametric polymorphism: ML vs. Ada vs. C++ Const exercises Polytypes in Ada: generics 442. Checking type parameters: in C++ Templates are checked when they are instantiated, not when they are defined: Instantiation of a C++ function template template C++ <typename T> // a ``function ''template const T& max(const T &a, const T &b) { return a > b? a : b; int a = max(2/3,3/2); // a ``template function double d = max(2.3,3.2); // another template function // And, a third template function struct S { s1, s2, s3 = max(s1,s2); gcc max.c max.c: In instantiation of const T& max(const T&, const T&) [with T = S ]: max.c:7:25: required from here max.c:3:14: error: no match for operator > (operand types are const S and const S) return a > b? a : b; ^ 78

93 443. Checking type parameters in ML Polymorphic functions are checked when they are defined, not when they are used. Standard ML of New Jersey v [built: Thu May 9 05:41: ] - fun max(a:'t, b:'t): 'T = if a > b then a else b; stdin: Error: operator and operand don't agree [ UBOUND match] operator domain: 'Z * 'Z operand: 'T * 'T in expression: a > b Cannot define a polymorphic max function, since most types do not have a greater than operator, and the language does not offer overloading Polymorphic functions: C++ vs. ML ML C++ Declaration of Type Parameters Optional Obligatory Passing Type Arguments Optional No Checking On Declaration On Instantiation Table 4.4.2: Polymorphic functions: C++ vs. ML Notes: ML can make a more sophisticated type inference than C++. In fact, ML can make deductions based on functions return type. Overloading complicates type inference For that reason, ML does not allow programmer defined overloading And, for that reason, ML ignores its own builtin overloading when conducting type inference. ML Elegant syntax Type inference 445. Parametric polymorphism: ML vs. Ada vs. C++ Checking at definition time Implicit instantiation Limited power, since no restrictions on type parameter Ada Verbose, and readable, but heavy syntax. No type inference Checking at definition time Explicit instantiation Explicit restrictions on type parameter C++ Ugly, kludge and unreadable syntax Type inference on invocation Checking upon instantiation Implicit instantiation 40 Implicit restrictions on type parameter of function templates, explicit function template instantiation is possible 41 Recent versions of C++ allow an explicit list of constraints on type parameters 446. Const exercises Given are the following definitions. typedef Cchar* t1; typedef char* const t2; typedef const char* t3; typedef const char* const t4; t1 c1; t2 c2; t3 c3; t4 c4; Determine for all i, j, k which of the following commands will legally compile? c i = c j ; c i = const_cast<t j >(c k ); *c i = *c j ; *const_cast<t i >(c j ) = *c k ; 447. Polytypes in Ada: generics generic(type ElementType) module Stack; export Push,Pop,Empty,StackType,MaxStackSize; constant MaxStackSize = 10; type private StackType = record Size: 0..MaxStackSize := 0; Data: array 1..MaxStackSize of ElementType; end; procedure Push( reference ThisStack: StackType; Ada readonly What: ElementType); procedure Pop(reference ThisStack): ElementType; procedure Empty(readonly ThisStack):Boolean; end; Stack module IntegerStack = Stack(integer); Case studies 5 Frames: Case study: universal pointer in C Case study: casting in C++ Const exercises Parametric polymorphism on enumerated types in Pascal Responses to inflexibility in 448. Case study: universal pointer in C Universal pointer type. In C, a void* pointer could be assigned to any pointer, and any pointer can be assigned to void*. extern void* Cmalloc(size_t); extern void free(void*); void foo(size_t n) { long *buff = malloc(n * sizeof(long)); free(buff); Parametric Polymorphism In C the coercion from long* to void* and vice-versa is not ad-hoc It universally exists for all pointer types The actions performed are the same for all pointer types 79

94 449. Case study: casting in C++ C++ deprecates C-style casts; instead there are four cast operations const_cast<σ> takes a type σ and returns a cast operator from any type σ to σ provided only that σ can be obtained from σ just by adding const reinterpret_cast<σ > takes a type σ and returns a cast operator from any type σ to σ (useful for peeping into bit representations) 452. Responses to inflexibility in = C++ minus all complexities Originally dynamic typing Now polymorphic types Comparator.compare(Object, Object) comparator<t>.compare(t, T) static_cast<σ > takes a type σ and returns a cast operator from any type σ, provided this is a standard casting (e.g. double to int) dynamic_cast<σ > takes a type σ of a derived class and returns a cast operator from any type σ of its base classes into σ Const exercises Given are the following definitions. typedef Cchar* t1; typedef char* const t2; typedef const char* t3; typedef const char* const t4; t1 c1; t2 c2; t3 c3; t4 c4; Determine for all i, j, k which of the following commands will legally compile? c i = c j ; c i = const_cast<t j >(c k ); *c i = *c j ; *const_cast<t i >(c j ) = *c k ; 451. Parametric polymorphism on enumerated types in Pascal Nonsense code to demonstrate Pascal s builtin parametric polymorphism for m := January to December do for d := Saturday downto Sunday do case suit of Pascal Club, Heart: suit := succ(suit); Diamond, Spade: if suit < Heart then if ord(m) < ord(d) then suit := pred(suit); end; control structure (up and down for loops and case), relational operators ord, succ and pred functions. 80

95 5 Storage Contents [140 frames] 5.1 Storage models [21 frames] Utopic perspective of memory [14 frames] Real-world memory models [2 frames] Classical storage model [2 frames] References Arrays [24 frames] Varieties of arrays [12 frames] Arrays with integral index types [7 frames] Type of arrays [3 frames] References Exercises Variables life time [35 frames] Simple lifetime [6 frames] Storage class [10 frames] The heap [9 frames] Dangling references [6 frames] Heap errors [4 frames] Value vs. reference semantics [27 frames] Shared representation & lazy copy [10 frames] Value vs. reference semantics in various PLs [7 frames] References Exercises Automatic memory management [16 frames] References Run time type information [16 frames] References Storage: a visual mindmap 5.1 Storage models Contents [21 frames] Utopic perspective of memory [14 frames] Real-world memory models [2 frames] Classical storage model [2 frames] References C Style:Popped PC by Caller SP 454. Storage models: a visual mindmap Pascal style: Popped by callee Saved Registers the break separator Heap Data Stack Frame Stack Automatic variables Code Arrays Misc Main Segments Zero Life time Pointer to most recent nesting scope Extended Memory Classical Storage Models Mysterious Memory Garbage collection Static Binding Expanded Memory 8086 Nested routines Idealized persistence Cells Free Real Life Segment Registers Utopic Selective Operations Update Dynamic binding Allocated Inspect Figure 5.1.1: Storage models: a visual mindmap 455. Storage & the need for variables Functional Paradigm No variables. Values which might be named. Logic Programming Paradigm Values which might be named Total Actual Arguments Association List Set No variables Lifetime Uninitialized Update Inspect Math-like variables, denoting something which is yet undetermined Once determined, value does not change Value vs. Reference Semantics Storage Models Imperative Paradigm might change Closer to the machine Variables denote values that Useful for modeling real life quantities, e.g., person s weight 456. Variables Arrays Storage Memory Management Definition 5.1 (Variable). An entity that may contain a value and provides inspect and update operations on its content. Realized by storage medium, e.g., Functional Abstractions Run- Time Type Information Figure 5.0.2: Storage: a visual mindmap memory disk Very different from mathematical variables Mathematical variables fixed but possibly unknown values Logic programming variables just like mathematical variables Imperative variables may change over time: n := n+1; may not have a value at all 81

96 5.1.1 Utopic perspective of memory 14 Frames: Store, cells & values Confusing terminology: variables vs. cells? Life cycle of a cell in the store Cells & references to cells in ML Retrieving cell contents in ML Mutable cells in ML More on mutable cells in ML Left vs. right occurrence of variable L-Value vs. R-value of an expression L-values vs. R-values in ML More on L-values in C Composite variables Selective/total inspect/update Total or selective update 457. Store, cells & values But, there are anonymous variables. Or perhaps, we should reserved the term cell for anonymous variables Do whatever you please; in this course, the terms are synonymous 459. Life cycle of a cell in the store Store is another name for memory, which has unboundedly many cells. Some of which are persistent, while others are ephemeral. Some cells are free ; others are allocated. Allocated cells could be uninitialized ; initialized cells may contain integers, or strings, or any arbitrary value. Storage = collection of cells which may be in a variety of states war 1? inspect undefined behavior! allocate uninitialized? initialize free defined 6 inspect deallocate defined 28 update inspect 17??? lord? 1984 Figure 5.1.2: Store, cells & values foo(){ int n; n = 6; cout << n; n = 28; cout << n; Figure 5.1.4: Life cycle of a cell in the store Unspecified thingy Persistent cell Ephemeral cell Our cell, just as all other cells is born free Let s define a C++ function, and, at the same time, execute it The function allocates a variable And then initializes it It then inspects the variable Free Cell Allocated cell Allocated cell? Uninitialized cell 17 Cell with integer value war Cell with string value Cell with composite Value Figure 5.1.3: Legned of Figure Confusing terminology: variables vs. cells? The inspection of an unitialized variable is undefined Our function then updates the variable and inspects it The variable is deallocated when the function returns 460. Cells & references to cells in ML allocate a cell of type int; store 3 in it; and, let r be a reference to this cell, i.e., r is a name of the value of the reference to this cell: val r = ref 3; val r = ref 3 : int ref ML let s be another name for the value representing a reference to the same cell: val s = r; val s = ref 3 : int ref ML Strictly speaking, an allocated cell is the implementation of a variable r 3 s But, the terms are often used interchangeably immutable mutable immutable Usually, when one says variable, one means a named variable Figure 5.1.5: A cell and names of two references to it in ML 82

97 values vs. cells 2 r and s are not cells; they are merely names of values. r r := 2 s Retrieving cell contents in ML immutable mutable immutable get 3, the contents of this cell:!r; val it = 3 : int!s is the same as!r:!s; val it = 3 : int ML ML use cell s contents to create a new named value: val x =!r + 2; val x = 5 : int r ML x 5 val x :=!r + 2; immutable 3 immutable mutable immutable s Figure 5.1.7: Mutating a cell contents in ML 463. More on mutable cells in ML change the contents of this cell: s :=!r + 4; val it = () : unit ML now get the content of the cell referencd by r: ML!r; val it = 6 : int s :=!r r 2 immutable mutable immutable s Figure 5.1.6: Retrieving cell contents in ML Figure 5.1.8: Reading & mutating a cell contents in ML New name of a value The newly created name, x, just as r and s, is not a cell; it is a name of a value Mutable cells in ML The ML memory model is close to the utopic change the value stored in this cell: r := 2; val it = () : unit get this value:!r; val it = 2 : int get the same value:!s; val it = 2 : int ML ML ML 464. Left vs. right occurrence of variable What s the difference between the two occurrences of v? Left vs. right v ϕ(v,e 1,e 2,...,e n ) Left hand side a[3*a[i*2] Pascal - 2*a[i*3]] := 0 1. Evaluate v (even in a very basic PL, v may be the result of an expression) 2. Treat the result as reference to a cell 3. Use this reference as the value of v 4. Get ready to assign something to that cell Right hand side 1. Evaluate v t := a[3*a[i*2] Pascal - 2*a[i*3]] 2. Treat the result as reference to a cell. 3. Retrieve the contents of that cell 4. Use this contents as the value of v 5. You can forget about the cell now 83

98 465. L-Value vs. R-value of an expression C, C++, Pascal, and most other PLs make a distinction between the L-value and the R-value of an expression All expressions have an R-value Only particular expressions have an L-value The distinction between the two is determined by context C, and more so C++, has fairly sophisticated L-values while (*s++ = *t++) ; C The expression *s++ has an L-value! int& C++ min(int &x, int &y) { return x < y? x : y; min(a, b) = min(c, d); Function min returns an expression which has an L-value (just as an R-value) L-values vs. R-values in ML Distinction between L-values and R-values in ML is simple: If a is a name of a value, created by val ML a = 19; then: a does not designate a memory cell a is just name of a value a cannot be changed a can only go out of scope (in an enclosing context) be hidden (in an enclosed context) If r is a reference to a cell, created by val r ML := ref 17; then:!r is the contents of this cell (R-value); it can be used e.g., by val x ML := ref 17; r is the reference to this cell (L-value); it can be used e.g., by r ML := x + 3; 467. More on L-values in C Not every value is an L-value: 0 = 1; // C not an L value Not every L-value is modifiable: const int i = 0; i = 1; // unmodifiable C L value There is an implicit conversion from L-value to R-value: int max(int C++ & x, int & y) { return x + y - min(x,y); // implicit conversion max(a, b) = max(c, d); // not an L value C s address taking operator, &, is applicable to, and only to, L-values: &1; C++ // not an L value &i; // an (unmodifiable) L value &max(a,b); // not an L value &min(a,b); // is an L value! 468. Composite variables Normally, a variable of type T is structured like a value of type T Oddballs exist, e.g., packed arrays in Pascal, which cannot be accessed before the array is unpacked A record variable is a tuple of variables: TYPE Date = Record of m: Month; d: 1..31; Pascal y: Integer end; VAR today: Date; today access the entire value stored in this variable today.d access a component of the value stored this variable 469. Selective/total inspect/update Composite value Has subcomponent values, which may be inspected selectively. Composite variable Has subcomponent variables. These may be inspected and (sometimes) updated separately. It is always possible to make selective inspection, since once the value in a variable is inspected, you can selectively inspect each component. Normally, selective update is also possible (update a single field from a record). In some cases, in some languages, only total updates are possible (update all fields, or none) Total or selective update Composite variables can be inspected and updated in total or selectively struct Complex C{ double x, y; a, b; a = b; // Total update (and total inspect) double z = b.y * a.x; // Selective inspections a.x = z // Selective update Atomic variable single cell Composite Variable nested cells Real-world memory models 2 Frames: Memory structure: the 8086 architecture Extended vs. expanded memory in the ancient 8086 architecture 84

99 471. Memory structure: the 8086 architecture 473. Classical model of memory CS,DS,SS,ES AX, BX, CX, DX, SP, BP, SI, DI 16-bit Segment Register 16-bit Offset Register Segments: Unmapped Unmapped Virtual Address Space Zero Code Constants Data Heap Stack 20 Bits Address break stack pointer Figure 5.1.9: Address computation in the 8086 hardware architecture Segment Registers DS Data Segment CS Code Segment SS Stack Segment ES Extra Segment Offset Registers AX, BX, CX, DX General Purpose SP Stack Pointer BP Back Pointer SI,DI Offset Registers Figure : Segments in the classical memory map Permissions: Read Write Execute Zero Code Constants Data Heap Unallocated Stack Table 5.1.1: Permissions of memory segments in an idealized storage mode 474. C programs & the classical memory model Virtual Address Space Zero Code Constants Data Heap Stack 472. Extended vs. expanded memory in the ancient 8086 architecture Figure : C programs & the classical memory model addressable in real mode addressable through hardware bug Conventional Memory UMA HMA Extended Memory 0KB 640KB 1024KB = 1MB 1MB + 64KB addressable in protected mode 16MB/4GB Figure : Extended vs. expanded memory in the ancient 8086 architecture Conventional Memory. Accessible to software. Upper Memory Area. Accessible to software, but reserved for screen and other I/O memory map. High Memory Area. Accessible to software, but only on certain architecture variants. Extended Memory. To access, need to switch to protected mode, copy to UMA, and revert to real mode. Expanded Memory. Early, less-elegant, but more popular version of extended memory. (not shown on diagram) Classical storage model 2 Frames: Classical model of memory C programs & the classical memory model Where does each identifier in the following program reside in the classical memory map? Are there any identifiers which are not mapped to memory? The programs has two nameless entities which are still found in the above map. Which are they? Where do they reside? #include C<stdio.h> #include <stdlib.h> long fib(int n) { static int N; auto long r = (n <= 1? 1 : fib(n-1) + fib(n-2)); printf("call #%d\n",++n); return r; enum { CN = 20 ; long *r; int main() { int i; r = malloc(n * sizeof(long)); for (i = 0; i < N; i++) r[i] = fib(i); return r[n-1] + r[n-3]; References Call Stack Local Variables 85

100 Stack-based Memory Allocation Static Local variables Variable 5.2 Arrays Contents [24 frames] Varieties of arrays [12 frames] Arrays with integral index types [7 frames] Type of arrays [3 frames] References Exercises Array variables Pascal VAR holidays: Array[1..30] of Date; Definition 5.2 (Array values). An array value is a mapping from a set of indices to a set of values. Definition 5.3 (Array variables). An array variable is a realization of array value using variables, so that each of the image of each index may be changed at runtime. Fortran Integral Exponentiation C/ Integral Exponentiation 42 Pascal Map Variety II/V: Stack based arrays void filecopy(file C*from, FILE *to) { char buffer[1 << 12] C void printprimes(int n) { unsigned char sieve[n]; Size determined at runtime Size but cannot change after creation Allocated on the stack The only kind of arrays in Pascal Required that index was a compile-time constant in early versions of C Added, after noticing that they do not violate the no hidden costs principle: Creation is by mere subtraction of a value from the stack pointer Time to create is O(1) Size can be negative, but C programmers are accountable and responsible Why only now? 479. Variety III/V: Dynamic arrays Array values are not very useful 44 But array variables become very useful Efficient mapping into memory with the classical storage models Foundation for many algorithms Foundation for many data structures Varieties of arrays 12 Frames: Variety I/V: static arrays Variety II/V: Stack based arrays Variety III/V: Dynamic arrays Variety IV/V: flexible arrays Variety V/V: associative arrays The unbelievable power of associative arrays The unbelievable power of associative arrays Summary: determining the index set Arrays efficiency Sophisticated data structures as part of PLs? The sad story of Pascal s sets Dilemmas in language design 477. Variety I/V: static arrays const char* Cdays[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" ; // An array literal int main() { size determined at compile time allocated on the data segment 42 with slight variation due to the fact that arrays first index is 0 rather than 1 43 Recall that subrange is a type constructor in Pascal 44 Did you see any arrays in ML? int[] printprimes(int Cn) { unsigned char sieve[n]; int r[] = malloc(sum(sieve) * sizeof(int)); return r; size determined at runtime size cannot change after creation allocated on the heap segment 480. Variety IV/V: flexible = 1..6; # uninitialized; size Perl = (1,2,3); # initialized; size = 17; # size is now = 13; # size is now 17 # size is now 13 # size is now 3 size may change at runtime size may change after creation array may expand or shrink found e.g., in Perl 86

101 481. Variety V/V: associative arrays $wives["adam"] = "Eve"; $wives["lamech"] = "Adah and Zillah"; $wives["abraham"] = "Sarah"; $wives["isaac"] = "Rebecca"; $wives["jacob"] = "Leah and Rachel"; PHP... echo $patriarch; echo $wives[$patriarch]; END { # after last line read # Accumulate in b[i] all words # that occur i times max = 0; # not really necessary for (w in a) { if (! (a[w] in b)) { b[a[w]] = w if (max < a[w]) max = a[w] AWK else b[a[w]] = b[a[w]] ", " w; # Print array b in descending order for (; max > 0; max--) if (b[max]!= "") print max, b[max]; index can be anything, typically strings. common in scripting PLs, e.g., AWK, Script, PHP typically, implemented as a hash table 482. The unbelievable power of associative arrays Using AWK to compute the frequency of words in the input stream: #!/usr/bin/awk -f { for (i = 1; i <= NF; i++) a[$i]++; END { for (w in a) if (a[w] in b) b[a[w]] = b[a[w]] ", " w; else { b[a[w]] = w; AWK if (max < a[w]) max = a[w] for (; max > 0; max--) if (b[max]!= "") print max, b[max]; Explanation follows 483. The unbelievable power of associative arrays Computing word frequencies in AWK AWK s implicit loop reads lines in turn, breaking each line to space-separated fields. #!/usr/bin/awk -f # implicitly executed # for each input line { for (i = 1; i <= NF; i++) AWK a[$i]++; # optional semicolon (;) # The $ character is special: # variable $i is the i th # word in the current line 484. Summary: determining the index set When is the index set determined? Static Arrays fixed at compile time. Dynamic Arrays on creation of the array variable. Stack Based Arrays on creation of the array variable. Flexible arrays not fixed; bounds may change whenever index is changed. Associative Arrays no bounds for the set of indices; the set changes dynamically as entries are added or removed from the array Arrays efficiency Static, Stack based, and Dynamic: efficient implementation in the classical memory model. including range-based arrays, as in Pascal including true mutli-dimensional arrays, as in Fortran including arrays of arrays, as in C Flexible and Associative: require more sophisticated data structure to map to the classical memory model Sophisticated data structures as part of PLs? Associative arrays are great! We want more, sets! multi-sets!! stacks and queues and trees!!! 487. The sad story of Pascal s sets simple implementation efficient implementation does not scale with scale, you need to carefully balance operations repertoire time memory parallelization 87

102 488. Dilemmas in language design Which, if any, sophisticated data structures should be part of the PL? Which, if any, sophisticated data structures be part of the library? Would it be possible to implement sophisticated data structures as part of the library? What PL structures can support the making of a better standard library of good data-structures Arrays with integral index types 7 Frames: Efficient but inflexible Piddles Layout of multidimensional arrays Row-major layout of 2D arrays (e.g., Pascal) Column-major layout of 2D arrays (e.g., Fortran) Multiple dereferencing layout of 2D arrays Example: triangular array in 489. Efficient but inflexible Ordinary arrays are formed as mappings from integral types. Pros Only values are stored, not indices. Simple description of legal indices (defined completely by higher bound, and in some PLs by lower bound as well) Efficient access using simple addition: Explicit in C and C++ pointer arithmetic is explicit 491. Layout of multi-dimensional arrays Two Main Strategies: Multi-Layered Memory Mapping: 1. row-major 2. column-major Multiple Derefencing 492. Row-major layout of 2D arrays (e.g., Pascal) Offset of A i, j where A is an n m matrix is given by: offset(a i, j ) = (i 1)m + ( j 1) (5.2.2) A: a 4 4 matrix A 1,1 A 1,2 A 1,3 A 1,4 A 2,1 A 2,2 A 2,3 A 2,4 A 3,1 A 3,2 A 3,3 A 3,4 A 4,1 A 4,2 A 4,3 A 4,4 columnmajor layout a[i] *(a+i) *(i+a) i[a] (5.2.1) A 1,1 A 1,2 A 1,3 A 1,4 A 2,1 A 2,2 A 2,3 A 2,4 A 3,1 A 3,2 A 3,3 A 3,4 A 4,1 A 4,2 A 4,3 A 4,4 Implicit in, e.g.,, array access it translated to simple machine instructions Range Mapping in, e.g., Pascal, array access may require subtraction of the first index to compute the actual offset Cons When data are sparse, packing techniques are needed. Inflexible programming Piddles What are Piddles? (Quotes from the Perl manual) Having no good term to describe their object, PDL developers coined the term piddle to give a name to their data type Figure 5.2.1: Row-major layout of 2D arrays (e.g., Pascal) 493. Column-major layout of 2D arrays (e.g., Fortran) Offset of A i, j where A is an n m matrix is given by: 8 offset(a i, j ) = ( j 1)n + (i 1) (5.2.3) A: a 4 4 matrix A 1,1 A 2,1 A 1,2 A 2,2 A 1,3 A 2,3 A 1,4 A 2, A piddle consists of a series of numbers organized as an N-dimensional data set A 3,1 A 4,1 A 3,2 A 4,2 A 3,3 A 4,3 A 3,4 A 4,4 Perl has a general-purpose array object that can hold any type of element Perl arrays allow you to create powerful data structures, columnmajor layout but they are not designed for numerical work. For that, use piddles A 1,1 0 A 2,1 1 A 3,1 2 A 4,1 3 A 1,2 4 A 2,2 5 A 3,2 6 A 4,2 7 A 1,3 8 A 2,3 9 A 3,3 10 A 4,3 11 A 1,4 12 A 2,4 13 A 3,4 14 A 4, yes, logic here is a bit confusing. Think about it this way: if you give the library designer better PL tools, he will be able to design a better datastructures library. Perfection and extensions to the protocol of the standard library would not require any changes to the PL. Figure 5.2.2: Column-major layout of 2D arrays (e.g., Fortran) 88

103 494. Multiple dereferencing layout of 2D arrays Address of A i, j where A is a matrix, is given by: address(a i, j ) = derefrence(address(a) + i 1) + j (5.2.4) Fact 5.4 (The array type predicament). To properly define the type of arrays, one needs heavier type theory artillery, which is not really interesting in our course Array types in Particularly simple situation 4 4 matrix A A 1,1 A 1,2 A 1,3 A 1,4 A 2,1 A 2,2 A 2,3 A 2,4 A 3,1 A 3,2 A 3,3 A 3,4 A 4,1 A 4,2 A 4,3 A 4,4 a not all rows are fully interesting A 1,1 A 1,2 A 1,3 A 1,4 A 2,1 A 2,2 A 2,3 A 2,4 A 3,1 A 3,2 A 3,3 A 3,4 A 4,1 A 4,2 A 4,3 A 4,4 C/ Representation a[0] a[1] In C and, a 2D array is an array of arrays: may be null. may be of any length even length 0 is OK a[0][0] a[0][1] a[0][2] a[0][3] The type array of τ includes all arrays of τ, regardless of size. All these arrays are assignment compatible. double[] x, y, z; x = new double[100]; y = new double[0]; z = x; x = y; y = z; 498. Array types in Ada a[2] a[2][0] a[2][1] a[2][2] a[3] a[3][0] Figure 5.2.3: Multiple dereferencing layout of 2D arrays 495. Example: triangular array in For A, an n m matrix, iis int k = 0; int[][] iis = new int[][] { new int[k++], new int[k++], new int[k++], new int[k++], ; // An array initializer for (int i = 0; i < k; i++) for (int j = 0; j < i; j++) iis[i][j] = i*j; Figure 5.2.4: Layout of triangular array with the multiple dereferencing layout Type of arrays 3 Frames: Arrays type? Array types in Array types in Ada 496. Arrays type? The type of an array of values of type τ (first approximation) Integer Indexed String Indexed Integer τ String τ But, the mapping is only partial; not all possible values of Integer/String indices are mapped into values of type τ. type Vector is array (Integer range <>) of Float;. procedure ReadVector(v: out Vector) is ; Uses v'first and v'last Ada ReadVector(a);. m: Integer := ;. a: Vector(1..10); b: Vector(0..m) ReadVector(b);. a := b; Succeeds only if array b has exactly 10 elements. References Arrays Arrays Associative Arrays Automatic Variable Call Stack Column-Major Order Dynamic Arrays Row Major Order Stack-based Memory Allocation Exercises 1. Generalize the offset calculation formula of the layeredmemory layout to Pascal arrays. 2. What s the output of the following AWK program? for (i in A) AWK print i; Why is the only sensible option? Repeat the above for. 89

104 3. It is known that a language L uses the layeredmemory layout and that L does not check array bounds. Write a program in L that prints row if L implementation on the machine in which the program is run, uses the row-major layout, and column if L uses column-major layout. 4. Explain why the multiple-dereferencing layout for representing multi-dimensional arrays is likely to be slower than than the multi-layered memory mapping layout. 5. Generalize the offset calculation formula of the layeredmemory layout to three (four, five, six) dimensions. 6. Propose a method for implementing flexible arrays on the stack. 7. Explain how the multiple-dereferencing layout is applied in 3D arrays. 8. Lady D Arbanville used to say that stack based arrays make the best of all worlds. Discuss her claim. 9. Explain how knowledge of whether the PL is row-based or column-based plays a role in the design of algorithms for large numerical problems. 10. Explain why the multiple-dereferencing layout for representing multi-dimensional arrays tends to be more memory efficient than multi-layered memory mapping. 11. Explain how it follows from the classical storage model that Pascal functions are forbidden from returning arrays. 12. How does Script support multi-dimensional arrays? 13. Name a data structure with a very efficient array implementation? 14. Demonstrate a case in which the multipledereferencing layout for representing multidimensional arrays is less efficient in its use of memory than the multi-layered memory mapping layout. 15. How does AWK support multi-dimensional arrays? 16. Would you say that array variables more useful than array values? Discuss your position. 17. Where are shebangs used in this unit? 18. What are the pros and cons of the modification to the multiple-dereferencing layout, in which a reference to a zero length array is used instead of a null reference. 19. Which features does C++ offer in support of data structures library? and? and Pascal? and AWK? and Go? 20. Eitan Ha-Ezrahi claimed that he can extend to include associative arrays. Explain why this is not such a good idea. 21. Pascal does not allow to overflow array bounds. Write a program that uses timing to demonstrate that the language uses row-major layout. (Hint: locality of reference) 22. Are the following Pascal types equivalent? Array [1..10, 1..10] Pascal of Integer; Array[1..10] Pascal of Array [1..10] of Integer; 23. Daniel Webster claimed that all integer arrays of C++ (, Pascal, AWK) are of the same type. Discuss his claim. 5.3 Variables life time Contents [35 frames] Simple lifetime [6 frames] Storage class [10 frames] The heap [9 frames] Dangling references [6 frames] Heap errors [4 frames] Simple lifetime 6 Frames: Variable lifetime Persistent variable lifetime Global vs. local lifetime: simplistic approach More on the simplistic approach Local & global scopes Variables in the simplistic approach 499. Variable lifetime Definition 5.5 (Variable lifetime). The period between allocation of a certain variable and its deallocation is called the lifetime of that variable Main varieties: 46 Persistent/Permanent continues after program terminates Global/Program activation while program is running Local/Block activation while declaring block is active Heap from allocation to explicit deallocation Garbage collected from allocation to automatic garbage collection Lifetime management is important for economic usage of memory 500. Persistent variable lifetime Definition 5.6 (Persistent variables). Variables whose lifetime continues after the program terminates are called persistent variables. Rationale useful for modeling entities such as second storage, files, databases, objects found on web services. Existence only a few experimental languages offer transparent persistence. 46 we will see more in the slides below 90

105 Substitute achieved via I/O operations, e.g., C files: fopen(), fseek(), fread(), fwrite() Serialization as in : language/library support the conversion of object into a binary image that can be written on disk or sent over a serial communication line; makes it possible to take objects snapshot, save these, and then restore them Global vs. local lifetime: simplistic approach Global lifetime Life of global variables starts at program startup and terminates with the program. An external variable in C is a variable defined outside of all functions. All external variables have global lifetime. In Pascal, all variables defined with the main program are global. Local lifetime A local variable is a variable defined in a function or in a block. Its starts its life when the containing block is activated; its life ends when the block is terminated. The above terminology is inappropriate since the terms suggests scope as well. However, There are global variables which are not universally accessible There are local variables whose lifetime is the same (or almost the same) as the entire program More on the simplistic approach What s a block? Pascal functions and procedures ML s let expressions C and C++ s functions (but also { command constructor) What s block activation? The time interval during which the block is executed The same block may be activated more than once If d 1 and d 2 are two durations of activation of two blocks (which may or may not be equal), then, precisely one of the following holds: d 1 = d 2 d 1 d 2 d 2 d 1 d 2 d 1 = / Local & global scopes Local entity: declared in a block can be used within the block can be used within all nested blocks Global entity: declared in the outermost block can be used within all blocks of the program 504. Variables in the simplistic approach Global variable: Declared in the outer most block Lifespan is the same as that of the program Local variable: declared in any other block lifespan is the same as block activation incarnated each time the block is activated may incarnate more than once. name may stand in fact for different variables Location of delcaration? Usually, to make the compiler s job easier, declarations are made at the beginning of the block However, in C++,, declarations can be made anywhere in a block Storage class 10 Frames: Terms local & global are confusing Better terminology Approximate meaning of C s storage specifiers static & auto in blocks extern & register in blocks Summary: C s storage specifiers in blocks Examples: C s storage specifiers at the external level C: access to entities defined in another file Lifetime of static variables in C++ and C s storage specifiers at the external (file) level 505. Terms local & global are confusing Better terms are automatic vs. static variables Definition 5.7 (Storage class in C/C++). A storage-class specifier in C or C++, is one of the keywords auto, register, static, extern, typedef, or thread_local 47 ; it is used mainly for specifying the lifetime of a variable and its scope Better terminology Can be understood in terms of two of these keywords: Block activation variable designated by auto; allocated on the stack Program activation variable designated by static; allocated in the data segment Approximate meaning of C s storage specifiers auto: block activation block variables with no storage-class specifier default to auto register: same as auto, but with recommendation to place in a register static: program activation extern: program activation 47 the next few slides will discuss these in greater detail 91

106 but declaration must be done somewhere else typedef: empty lifetime variables exists during compilation, as a template for defining other variables thread_local: thread lifetime not in the scope of this course Specifier Allowed? Lifetime missing same as auto auto block activation a register same as auto b static program activation extern same as static c a rarely used b but adds a recommendation to the compiler to place in a register c but must be declared somewhere else 508. static & auto in blocks static in block C /* In the demo version of the software: function undo() can be called only ten times */ void undo() { static counter = 10; if (--counter == 0) return;. auto in block C gcd(int a, auto b) { while (a!= 0) { auto int c = a; a = b % a; b = c; return b; Type of functions with missing type specifier defaults to int 509. extern & register in blocks extern in block C isprime(unsigned n) { extern isprimearray[]; extern isprimearraysize; extern isprime(unsigned n); return n < isprimearraysize? isprimearray[n] : isprime(n); Variables need no type specifier if defined with a storage class; missing type defaults to int register in block C isprime(register unsigned n) { register unsigned d; for (d = 2; d*d <= n; d++) if (n % d == 0) return 0; return 1; 510. Summary: C s storage specifiers in blocks Table 5.3.1: C s storage specifiers in blocks 511. Examples: C s storage specifiers at the external level File a.c C auto x; // register double y; // /* static storage class: */ static N = 100; /* Accessible only from this file */ static void f(void){ /* Accessible only from this file */ /* extern storage class: */ extern M; /* Defined in some other file */ extern void h(void); /* Defined in some other file */ extern void r(void){ // /* missing storage class: */ void g(void) { /* Accessible from other files */ int isprimesarray[] = { ; /* Accessible from other files */ 512. C: access to entities defined in another file File b.c C extern N = 100; 48 extern void f(void); // /* referred to from file a.c: */ int M = 1000; /* referred to from file a.c: */ void h(void); /* Reference to function defined in file a.c: */ extern void g(void) /* Reference to array defined in file a.c: */ extern isprimesarray[]; 513. Lifetime of static variables in C++ and Static variables in C (and PL/I) are used for maintaining state across different activations of a block, regardless of nesting. However, this end is better served with OOP C++ (tries to maintain C compatibility) block from first block activation to the program s end class same as file level file from construction, which occurs sometime before main() is called until program end; all such global variables are constructed in some order, which is only partially specified by the language s standard. 92

107 (dynamic loading; truly OO) block no static variables in s functions or blocks. class when the class is first used, until program end. file no file level variables in C s storage specifiers at the external (file) level Specifier Allowed? Lifetime missing same as static 49 auto register static program activation extern same as static 50 Table 5.3.2: C s storage specifiers at the external (file) level The heap 9 Frames: The heap Intuition Motivation Allocation & deallocation Linked list with heap variables Access to heap variables Many realizations of references The C++ references vs. pointers confusion The null pointer 515. The heap Definition 5.8 (Heap variables). Heap variables are anonymous variables whose lifetime spans From the time they are allocated most commonly, directly by the programmer at times, as per the runtime environment of the PL (e.g., closures) Until they are deallocated: directly by the programmer, or, by the garbage collecting system 516. Intuition Think of the heap as Large, but not infinite, bank of memory Place from which you can loan storage for variables If loans are not returned, bankrupt The heap can be the bank may become C 517. Motivation Program garbage. (* Truly useless program *) VAR p: ^Integer; Begin Pascal new(p); (* Allocate a cell *) p^ := 5; (* Set its contents *) dispose(p); (* Deallocate this cell *) End. Why heap variables? When the program duration lifetime is inappropriate When the contained/disjoint dichotomy of block activation variables is inappropriate When memory size is not known in advance For realizing data structures such as linked lists, trees, graphs, etc Allocation & deallocation Allocation Pascal Function malloc() (library function) Procedure new() (pre-defined) C++ s Operator new (builtin; can be overloaded) Deallocation C Pascal new (keyword) Function free() (library function) Procedure new() (pre-defined) C++ s Operator delete (builtin; can be overloaded) Automatically, by the GC Linked list with heap variables TYPE IntList = ^IntNode; IntNode = Record head: Integer; tail: IntList; end; VAR odds, primes: IntList; Function Pascal cons(h: Integer; t: IntList): IntList; VAR l: IntList Begin new(l); l^.head := h; l^.tail := t; cons := l; end;. odds := cons(3, cons(5, cons(7, nil))); primes := cons(2, odds); odds := cons(1,odds); BuiltIn managed by the PLs runtime systems (as in Pascal and ) odds Library based Library is Standard (more or less) User replaceable (at least by some sophisticated users) examples include C, and to some extent, C++) primes 2 Figure 5.3.1: Lists with shared representation 51 GC = Garbage Collector 93

108 520. Access to heap variables Heap variables are anonymous. So, how can they be accessed? Definition 5.9 (Reference 52, referring, and dereferencing). A reference 53 is a value through which a program may use to indirectly access variable (typically heap variable) we say that a reference refers to the variable dereferencing is the action of employing a reference to access the variable it refers to References allow modifications that are more radical than selective updating, and cyclic values which are impossible otherwise Many realizations of references (and many other PLs) Pointer? no such beast; sometimes used as a synonym for reference Reference, i.e., reference variable Is disjoint sum of pure-reference and Unit May be null, or point to a variable Can change No dereferencing prior to use. Is not pure reference 523. The null pointer Address most commonly, references are nothing but memory addresses, in which case, they are called pointers Offset references may be implemented as offsets from a fixed address. Array index in a language that forbids manipulation of memory addresses, references may be realized as array indices Handle Index into an array which contains the actual pointer. Smart pointer an abstract data type that extends the notion of pointers, while providing services such as C++ computing frequency of use reference counting lazy copying caching legality of access checking 522. The C++ references vs. pointers confusion Pointer, i.e., pointer variable May be 0, or point to a variable Can change Must be explicitly dereferenced Reference, i.e., reference variable May not be 0 Must point to a variable Cannot be changed No dereferencing prior to use. Is pure reference 52 actually, pure reference 53 not to be confused now with C++ references Strictly speaking, the pure definition requires that all references provide access to some variable. Still, it is useful to have references which refer to nothing ; e.g., for designating the end of a linked list. It is possible to realize refer to nothing as reference to a special variable. It is more convenient to allow a special, illegal value of references instead. This value is known in different languages as null, nil, void, nullptr 54, 0, etc. In and many other languages, references are disjoint sum of pure references and Unit. C++ s references are nothing but immutable, pure references Dangling references 6 Frames: Dangling references How are dangling references created? Language protection against dangling references Quiz: What s dangling here? What s the penalty of accessing a dangling reference Stack corruption via dangling reference into an automatic variable 524. Dangling references Definition 5.10 (Dangling reference). A dangling reference is a reference to a variable whose lifetime has ended, e.g., a variable which has been deallocated Lifetime may end as a result of Termination of containing block Deallocation 54 A new C++ keyword 94

109 525. How are dangling references created? Freed memory Stack reference Inner functions C programmer s responsibility programmer s responsibility no inner functions I. Freed memory a deallocated heap variable: Gnu-C programmer s responsibility programmer s responsibility programmer s responsibility char *p = Cmalloc(100); strcpy(p,"hello, World!\n"); free(p); // p is dangling strcpy(p, p + 5); // programmer s responsibility cannot take the address of stack variables func- cannot inner tions leak Pascal II. Reference to stack reference to a dead automatic variable: char *f() C{ char a[100]; return &a; char *s = f(); // s is dangling strcpy(s,"hello, World!\n"); // with garabage collection, programmer never deallocates memory objects are always drawn from the heap; stack variables are scalars or references to objects; one cannot take the address of these functions are not 1 st class; their address cannot be taken, and they cannot leak III. Inner functions Activating a function outside the enclosing block in which it was defined, in the case that the function uses variables local to the block Language protection against dangling refer- Table 5.3.3: ences 527. Quiz: What s dangling here? // Provide Cname for function type: typedef void (*F)(void); // Forward declaration F f(); // of function returning a function F h = f(); h(); // May access a dangling reference F f() { C char a[100]; // only Gnu C allows inner functions void g(void) { // a is dangling if g // is called from outside f strcpy(a,"hello, World!\n"); // return g; 526. Language protection against dangling references #include <stdlib.h> #include <string.h> #include C<stdio.h> struct N { struct N *n; char *q; int a; *q; int foo(void) { N *p = malloc(sizeof *p); p->a = 42; p->q = "life, universe, everything"; p->n = (struct N *)p; q = p; free(p); return 1; int Cmain() { free( strcpy( (char *)malloc(20), foo() + "Hello, World\n" ) ); (void) printf("q=%p\n", q); (void) printf("q->a=%d\n", q->a); (void) printf("q->q=%s\n", q->q); return 0; 95

110 528. What s the penalty of accessing a dangling reference Well, it depends whether you are lucky or not Lucky Immediate program crash Unlucky Program crashes, but not immediately Extremely unlucky Program does not crash while testing; it just has a bug which stays dormant until field trial! In our case, the output is Output of our dangling reference program q=0x q->a=42 Segmentation fault (core dumped) We were quite lucky, however, if we would not have accessed the q field, the bug would not have been detected! 529. Stack corruption via dangling reference into an automatic variable #include <string.h> C #include <stdio.h> char *f(void); // Forward declaration main() C{ char *hell = f(); printf("%s\n", hell); return 0; char *f(void) { char s[1<<9]; strcpy(s, "Hello, World\n"); return s; Hello, W C The output looks almost right, but the stack is clearly corrupted Heap errors 4 Frames: Dangling references Memory leak Output of the above program Simple heap management with linked list of free blocks 530. Dangling references Definition 5.11 (Dangling reference). A dangling reference is a reference to a variable whose lifetime has ended, e.g., a variable which has been deallocated Lifetime may end as a result of Termination of containing block Deallocation 531. Memory leak Definition 5.12 (Memory leak). A memory leak occurs when a variable is not deallocated prior to the termination of the lifetime of all of its references. #include <stdlib.h> #include C<stdio.h> typedef struct N { int a[ ]; N; N *t(n *n){ N *p = malloc(sizeof *p); Can if (p you!= pinpoint 0) return the leak? p; perror("oops"); exit(1); C for (i = 0 ; i < 1 << 30; i++) int main() { int i; N *s = 0; printf("%5d) %p\n",i,s=t(s)); return 0; 532. Output of the above program 0) 0x7f7b8adf0010 1) 0x7f7a9c73d010 2) 0x7f79ae08a ) 0x7bd8382b ) 0x7bd749c ) 0x7bd65b ) 0x5b1a4fdc ) 0x5b196170d ) 0x5b187305a ) 0x36b914d ) 0x36b8266dd ) 0x36b73802a ) 0x1257d9d ) 0x1256eb6ad ) 0x1255fcffa ) 0x7ffc1105b ) 0x7ffcff70e ) 0x7ffdeddc1010 OOPS: Cannot allocatege memory 533. Simple heap management with linked list of free blocks (we maintain the invariant that free regions are in ascending addresses) Allocation request Traverse the list, searching for a region large enough to accommodate the request, using a first fit, best fit or worst fit strategy. Exact match the region is returned to the client and removed from the list Non-exact match the region is split into two: 1. a sub-region is of the appropriate size is returned to the client; 2. the other sub-region is kept in the list. Deallocation request Add the region to the list; if there are no gaps, with the previous/next region, the two are merged. 96

111 Most real life implementations use much more sophisticated data structures. 5.4 Value vs. reference semantics Contents [27 frames] Shared representation & lazy copy [10 frames] Value vs. reference semantics in various PLs [7 frames] References Exercises Variables: reference vs. value semantics Value Semantics. Variable value Figure 5.4.1: Value semantics Variable contains the actual value. C, C++ for builtin, atomic types Reference Semantics. Variable value Figure 5.4.2: Reference semantics Variable contains a reference to a value which is stored elsewhere. C, C++, if pointers or references are used for all other types, including arrays Most modern languages 535. Values vs. reference semantics in The basic type system of is defined by: 8 atomic types: byte, short, int, long, float, double, boolean, char 1 pseudo type: void 4 type constructors: array class interface enum Precisely 8 types in follow value semantics All the rest are reference semantics 536. Wrapper classes When generics were introduced to, it was discovered that the implementation was much simpler for referene types. The 8 value types did not justify extra machinery: Instead, the library introduced reference type equivalents Integral types Byte, Short, Integer, Long Floating point types Float, Double Other types Boolean, Character Unit types Void List<double> ds1; // compilation error; // type double is not // a reference type List<Double> ds2; // works fine; // type Double is // a reference type 537. Integer vs. int in Each wrapper classes (except for Void) wraps a value of the corresponding primitive type. Wrapper types are almost fully interchangeable with their primtitive equivalents: int v = 3; // Primitive type Integer r = new Integer(a); // Wrapper type v = r.intvalue(); // Explicit conversion v = r; // auto un boxing r = v; // auto boxing 538. Boxing Auto boxing Coercion from, e.g., int to Integer Auto unboxing Coercion from, e.g., Integer to int Type Integer also includes value null Type int does not include value null The following will generate RuntimeException Double dd = null; double d = dd; Objects? 539. The OO terminology OO languages often use the term object the term propagates also to non-oo PLs means (usually) a variable whose contents has an i.d. total inspection of this object, yields a value with an i.d., 97

112 Variable a Object s i.d., Variable/Object foo bar baz Figure 5.4.3: References to objects cannot be changed by user Variable/Value two object with the same contents, still have a distinct i.d C: value semantic of assignment A date record in C typedef Cstruct Date { int year, month, day; Date; Initializing two variables of this type: Date today = {2015,04,05; Date tomorrow = {2015,04,06; Before assignment today After assignment year monthday today = Ctomorrow; today tomorrow year monthday tomorrow year monthday year monthday 541. Reference semantic of assignment in A date record in class Date { Date(int year, int month, int day) { this.year = year; this.month = month; this.day = day; int year, month, day; Date today = new Date(2015, 04, 05); 542. C++ vs. a a Can you detect and explain all the syntactical differences between the two languages? this follows reference semantics; C++: Value semantics C++ notation: -> class Date {public: C Legacy int year,month,day; Date(int year, int month, int day) { this->year = Variable year; C++ this->month contains = month; this->day = day; ; value Date today(2015, 04, 05); Date tomorrow(2015, 04, 06); Uses Operator today = tomorrow; Overloading tomorrow.year = 3025; cout << today.year; this follows reference semantics; : Reference semantics notation:. class Date { No C Legacy int year,month,day; Date(int year, int month, int day) { this.year = year; Variable this.month = month; refers to this.day = day; value Date today = new Date(2015, 04, 05); Date tomorrow = new Date(2015, 04, 06); No Operator today = tomorrow; Overloading tomorrow.year = 3025; System.out.print(today.year); 543. Comparing the two semantics C++: Value semantic // Creating values for today and tomorrow Date C++ today = Date(2015, 04, 05); Date tomorrow = Date(2015, 04, 06); // Assigning variable tomorrow to today today = tomorrow; tomorrow.year = 3025; cout << today.year; today year monthday tomorrow year monthday Output is 2015 Date tomorrow = new Date(2015, 04, 06); Before assignment today = tomorrow; today year monthday today tomorrow year monthday tomorrow : Reference Semantic // Creating values for today and tomorrow Date today = new Date(2015, 04, 05); Date tomorrow = new Date(2015, 04, 06); // Assigning variable tomorrow to today today = tomorrow; tomorrow.year = 3025; System.out.print(today.year); today tomorrow After assignment year monthday year monthday year monthday year monthday Output is

113 5.4.1 Shared representation & lazy copy 10 Frames: Which semantic does ML use? Efficient Lisp implementation with references Generalization Reference semantic in C++? Value semantic in? Assignment in reference semantic languages? Assignment in reference semantic languages? Shallow clone: yet another semantics Assignment strategies side by side Properties of assignment strategies 544. Which semantic does ML use? Answer The programmer shouldn t care and cannot know! It looks like value semantics. In reality, ML, Lisp and many other languages use value semantics, in the sense that the programmer cannot observe any references in the program. Implementation is with references. Behind the scenes, memory and time is saved by using references Efficient Lisp implementation with references Let α,β be two large S-expressions. (setq a α) (setq b β) (cons a b) (setq c (cons a b)) a c cons b Support for languages which permits mutation of values. Many extensions of pure-lisp allow such mutation Mutation is the bread an butter of imperative programming. Conceptually similar to copy on write in memory management 547. Reference semantic in C++? Of course! You just have to be explicit about it! class C++ Date {; // Storing references to newly allocated values // of today and tomorrow Date *today = new Date(2015, 04, 05); Date *tomorrow = new Date(2015, 04, 06); today C++ = tomorrow; // Leak! tomorrow->year = 3025; cout << today->year; delete tomorrow; delete today; // Heap corruption? Output is Value semantic in? Of course! You just have to be explicit about it! class Date implements Cloneable {. Date today = new Date(2015, 04, 05); Date tomorrow = new Date(2015, 04, 06); today = (Date) tomorrow.clone(); tomorrow.year = 3025; System.out.printl(today.year); α Some large S-expression β Some other large S-expression Figure 5.4.4: Sharing of representation in Lisp values 546. Generalization Definition 5.13 (Lazy copying). Generalizing the Lisp approach, lazy copying is an implementation technique of value semantics, where a copy of a large object is made by creating a new reference to it. The actual copy operation is made when (and if) the source or the destination variables are modified. Generalizes the Lisp approach Output is Assignment in reference semantic languages? But, what does (Date) tomorrow.clone(); actually mean? More generally, in any reference semantic programming language: Given two variables, a and b, each containing a reference to a value, which may include references to a network of variables, and an assignment command a := b; 99

114 what s going to happen? a 12 b 13 α?β a b α β Before assignment a b α β After assignment Figure 5.4.5: Semantics dilemma of assignment in reference semantics 550. Assignment in reference semantic languages? Reference assignment Only the reference is copied a b α β a b a b 552. Assignment strategies side by side 12 α Ref. assignment 13 β a 12 b α β Shallow clone Shallow 12 α 13 β Deep copy clone a b a b α β α β β a b 12 13?Shallow copy Only the refernced value is copied a b Deep clone The whole network of variables accessible from b is duplicated, and a b assigned to a Shallow clone: yet another semantics 13 The variable itself is cloned, but all the references inside it are copied, rather than being cloned α β α β β α β Figure 5.4.6: Assignment strategies side by side 553. Properties of assignment strategies Semantic Null pointer assignment? Memory allocation? Reference assignment Never Shallow copy Maybe Shallow clone Never (bounded) Deep clone Never (unbounded) Table 5.4.1: Behind the scenes of four semantics of assignment Value vs. reference semantics in various PLs 7 Frames: What does clone() do? Working knowledge of semantics? Overloading the assignment operator in C++ Case study: assignment & copy in Eiffel More general working knowledge 57 Semantics in some contemporary PLs Overview: semantics of assignment 554. What does clone() do? Runtime Exception if the class does not implement interface Cloneable Shallow clone if the class implements interface Cloneable, and the programmer does not override the default clone() method. 57 Typical exam question, if you like it phrased this way 100

115 Whatever Deep clone and if the class implements interface Cloneable, and the programmer overrides the default clone method in whatever way he likes. if the class implements interface Cloneable, the programmer overrides the default clone method, and correctly implements a deep clone semantic 555. Working knowledge of semantics? Typical exam question: Read the documenation of a particular language feature, and determine which semantic it uses. Feature could be assignment (of a particular kind of variable), library function, and even equality testing : comparison of components one by one vs. meager comparison of the references Overloading the assignment operator in C++ class Date C++ {; Date today(2015, 04, 05); Date tomorrow(2015, 04, 06); today = tomorrow; // Call the assignment operator And, what does the assignment operator do? 58 Whatever Can anyone really understand programmers mind? The author of these slides (at least) cannot. Default Behavior not so clear recursively apply assignment operator on each of the fields Non-user defined types: shallow copy. User Defined The default assignment operator will typically be shallow copy. Deep (state) operations: a.deep_copy(b) Attribute by attribute copy and cloning of inner objects a := deep_clone(b) Create a full clone of a complex structure deep_equal(a,b) Attribute by attribute recursive comparison 558. More general working knowledge a Language Design Question a Typical exam question, if you like it phrased this way Suppose that Pascal had a list type constructor VAR pseudo PASCAL primes, odds: list of Integer; What does primes := odds mean? Reference Copying Inconsistent with arrays, records and primitive types Pointers in disguise Selective updates to one will affect the other Value Copying Natural, but inefficient Possible solutions: prohibit selective update (as in Lisp), or lazy copying 559. Semantics in some contemporary PLs Value Semantic Pascal, Lisp, ML, Prolog Reference Semantic, Smalltalk Mixed Semantic Eiffel, C, C++ Most languages have some kind of a mix: In, primitive types have value semantic There are hacks in Lisp that allow reference semantic References in ML allow reference semantic Eiffel has expanded types C # has non-nullable types. In most cases, a conclusive judgment value/reference semantic for an entire language is plain wrong Overview: semantics of assignment 557. Case study: assignment & copy in Eiffel Value Semantic Atomic types, such as Char, Integer, Real, and Boolean, and object attributes marked as expanded Reference Semantic Everything else Reference (identity) operations: a := b Reference assignment a = b Reference equality testing a /= b Reference inequality testing Shallow (state) operations: a.copy(b) Attribute by attribute shallow copy a := clone(b) Create new cloned object equal(a,b) Attribute by attribute comparison 58 If you think you are smart, please repeat for copy constructor Assignment semantics is defined by the language design: C structures follows value semantics. C used to place restrictions on passing structures by value. Arrays cannot be assigned. Pointers are used to implement reference semantics. follows value semantics for primitive types. Value semantics may be slower Reference semantics may lead to sharing problems. Reference semantics is more expressive. 60 as are some of the sweeping judgments made in this slide 101

116 References Dereferencing Object Copy Reference Reference Type Value Semantics Exercises 1. Which semantics is used in Pascal? (explain why your answer could have been guessed) 2. Do function values use reference or value semantics? 3. Which semantics is used in Go? 4. Which assignment semantics is used in Pascal? 5. Which assignment semantics is used in ML? 6. C offers automatically generaed assignment operator; which semantics does this operator use? 7. C offers automatically generaed constructor; which semantics does this constructor use? 5.5 Automatic memory management Contents [16 frames] References Memory management Maintenance Object o1 = new Object(); // Denote the newly allocated object by O 1 ; // Set RC(O 1 ) 1; Object o2 = new Object(); // Denote the newly allocated object by O 2 ; // Set RC(O 2 ) 1. Object o2 = o1 // RC(O 1 ) + +; RC(O 2 ) ; De-allocation After each decrement, if RC(O) = 0: (i) deallocate O; (ii) decrement RC for all children of O; and, (iii) recursively de-allocate objects whose RC=0; Pros Cons 563. Pros & cons of reference counting predictable performance smooth execution without interruptions Implementable in Manual Memory Management System via smart pointers, or even as part of the language semantics. Automatic Memory Management System as part of the garbage collection system. cost is proportional to actual computation, not to memory size Cannot deal with circular structures Is generally slow, incurring a huge write barrier 62 Fact 5.14 (Write barrier). The formidable write barrier excludes the universal application of RC for memory management 564. What is garbage collection? Memory Leak Null Pointer Reference Mark and Sweep Algorithms Cautions Copying Collector Memory Management Garbage Collection Memory Compaction Reference Counting Stack Based Automatic Manual Escape Analysis Memory Leak Dangling Reference Handles Figure 5.5.1: Mindmap of memory management 562. Reference counting Idea a reference count (RC) field in every variable Invariant RC is the number of references to the variable. The RC of all live variable is positive Initially In allocation such as 61 : Thingy t = new Thingy(); // syntax Set RC of the newly created Thingy to No other allocation command makes sense Definition 5.15 (Mark & sweep GC algorithm). Invented by John McCarthy around 1959 as an enabling technology for Lisp implementation, Garbage Collection (GC) is a part of the program semantics and runtime, which automatically claims back all unused memory. In simple words, de-allocation becomes the responsibility of the PL s runtime system, rather than the programmer s. Programmer never de-allocates memory When memory becomes scarce, a GC procedure is applied to collect all unused variables Mark & sweep: the simplest GC algorithm found in, Smalltalk, Python, Lisp, ML, Haskell, and most functional, or modern OO languages Why garbage collection? GC prevents Dangling references Memory leak Heap corruption Heap de-fragmentation (with a compacting collector). Also, GC makes first-class functions value possible. 62 the amount of work that needs to be done in each memory write 102

117 566. Mark & sweep garbage collection Notation for references List of Allocated Cells Hey, I am a Variable! Root Set Global Variables Hi! Me too! Heap Data Structure Runtime Stack Me 3!!! List of Free Blocks This is our storage bank, which contains many cells.which contains many cells. Our interest, though, lies only with allocated cells.an allocated cell is called a variable.some variables follow value semantics; others contain references.some belong in the runtime stack, others are global others are global ; the rest are heap allocated.the heap is primarily a list of free blocks!but, the heap also maintaines another list, which keeps references to all heap allocated cells!so, we have a list of free blocks and a list of allocated cells. a list of allocated cells. Together, the two lists make the Heap Data StructureNow, our variables reference each other, and have many null references, which we will not always show.for garbage collection, we define the Root Set, which contains global variables, and the runtime stack.staring from the root set, we conduct a mark phase.we first mark all variables in the root set, and then follow their references, to mark variables referenced from the root set.again, we follow references, to mark variables two references away from the root.we keep following references, until we mark all variables reachable from the root set.consider now on the entire set of variables. Which variables are garbage? In the sweep phase, we collect all variable blocks which are not marked.recall the List of Allocated Cells?Let s iterate over it!start at the first node, and iterate, and iterate, and iterate, and iterate, 1 st cell for recycling, and iterate, 2 2nd cell for recycling, and iterate, and iterate, and iterate, and iterate, and iterate, 3 rd cell for recycling, and iterate, 4 th cell for recycling, until we are done!all that remains is, to remove the cells destined for recycling from the List of Allocated Cells, to claim back the memory they occupy, and to add these memory blocks to the back to the List of Free Cells.This diagram depicts the main points 567. Summary: mark & sweep garbage collection Mark mark all cells as unused Sweep unmark all cells in use (stack, global variables), and cells which can be accessed, directly or indirectly, from these Release all cells which remain marked 568. Delicate issues of the marking process Do not visit an object more than once Do not get stuck in a loop. Typical implementations: Breadth-first search Depth-first search Marking: Can be done by raising a bit in each object More efficient procedure: * Initially, all objects are 0 * In first collection, marking is by changing the bit to 1 * In second collection, marking is by changing the bit to 0 * In third collection, marking is by changing the bit to 1 * Stop & copy garbage collection Divide the heap into two regions: Region I takes all allocations Region II is put on hold When region I is exhausted, copy live (reachable) variables to region II Switch the roles of the two regions 570. Defragmentation Can be done whenever the GC detects defragmentation Can be done in each collection cycle: Presumably slower Often performs better due to caching and locality of reference Predicaments of garbage collection Memory/Time Resources could be saved using programmers knowledge. Decreased Performance of the[real]core program Uneven Performance with embarrassing pauses for GC cycles Unpredictable Performance the program can never know when a GC cycle may start Not for Real Time which requires predictable performance Not for Transactions a transaction may time out with no good reason Hinder Interactiveness pauses can lead to user abandonment Incompatible with Resource Allocation Is Initialization cannot rely on the destructor of a file object to close the file 103

118 572. Some responses to these predicaments Generational GC collects variables at the nursery first, where mortality is high Incremental GC Can perform some computation and resume it later. Concurrent GC Can run concurrently to the program. Realtime GC Obeys time constraints Concurrency, predictability, etc., always incur a performance toll Memory leak in garbage collection? GC can only claim reachable variables If a programmer forgets to nullify references, then a pseudo memory leak may occur Define a class Leak whose contents is: public class Leak { private Leak next; private int[] data; private Leak(Leak next) { this.next = next; this.data = new int[1<<25]; private static Leak cons(leak l) { return new Leak(l); public static void main(string[] args) { Leak l = new Leak(null); final Runtime r = Runtime.getRuntime(); for (int i = 0; i < 100; ++i) { System.out.println( i + ": " + r.freememory()); l = cons(l); 574. Output of the above program 0: : : : : : : : : : Exception in thread "main" java.lang.outofmemoryerror: heap space at Leak.<init>(Leak.java:16) at Leak.cons(Leak.java:12) at Leak.main(Leak.java:8) 575. Semantical memory leak a a Note that the previous example exhausted memory for the sake of demonstration; it did not really create semantic garbage. Definition 5.16 (Semantical garbage). A variable which the program will never use again, but still keeps a reference to it, is called semantic garbage. class Huge { Huge() { // Constructor: // Allocates lots of data and stores // it in the newly created object void f() { Huge semanticgarbage = new Huge(); heavy.computation(new Indeed(100); System.exit(1); The semantic garbage predicament All sophisticated GC algorithms contend in vain against semantic garbage 576. GC & the stack: escape analysis GC is always slower than stack-based memory management. In a pure GC, there are no automatic variables. In, local variables are: Stack allocated builtin, atomic types: int, double, boolean etc. ( forbids Stack allocated References to classes and arrays. Heap allocated Classes and arrays (accessed only by references) Seemingly Innocent Program. // does a gets assigned to global variables? void foo() { int a[] = new int[1 << 20]; List<Integer> b = new ArrayList<Integer>(); for (int i = 0; i < 1<<20; i++) f(); // Lots of GC activity With escape analysis a smart compiler can determine that variables a and b never escape function foo(), and then can be safely claimed when this function terminates. References Copying Collector Dangling Reference Garbage Collection Handle Heap Corruption Heap 104

119 Mark and Sweep Memory Compaction Memory Leak Memory Management Memory Safety Reference Counting Stack-based Memory Allocation Static Local variables Unreachable Variables 579. Memory = bits & bytes! To understand the difficulty better, we need to take a second look at: Bits Bytes Values Types Memory representation of values The interpretation of memory representation 580. Example: different Interpretations of a single byte 5.6 Run time type information Contents [16 frames] References The challenge of deep clone Algorithm for Deep Clone: Start from current value. Traverse the network of values accessible from it. Duplicate this network How should we traverse the network? Definition 5.17 (Network Traversal: breadth- (or depth-) first search). In Each Value we Visit: Mark the value as visited b Figure 5.6.2: An 8-bit byte in memory C As integer: '75'; as character: 'K' #include <stdlib.h> #include <stdio.h> main() { const void *p = malloc(1); *(unsigned char *)p = 0b ; printf("as integer: '%d'; ",*(char *)p); printf("as character: '%c'\n",*(char *)p); 581. Example: different interpretations of a 16 bits word Proceed to all values it references 8b 8b 578. But, there is a catch Definition 5.18 (Network Traversal: breadth- (or depth-) first search). In Each Value we Visit: Mark the value as visited Proceed to all values it references The challenge: When we reach a value, we do not what s in it! following Reference Character? Reference? Ends here? Figure 5.6.1: Dereferencing a pointer to an unfamiliar memory address Figure 5.6.3: A 16-bits word in memory #include <stdlib.h> C #include <stdio.h> main() { const void *p = malloc(2); *(unsigned short *)p = 0b ; printf("as signed integer: '%d'\n", *(signed short *)p); printf("as unsigned integer: '%d'\n", *(unsigned short *)p); printf("as an array: '(%d,%d)'\n", 0[(char *)p], 1[(char *)p] ); As signed integer: '-10487' As unsigned integer: '55049' As an array: '(9,-41)' 16b 105

120 582. Example: different interpretations of 32 bits word // A more Ccivilized way to name integer values: enum { // How many bits for index into pool: LG2_POOLSIZE = 14, // How many bits for storing car/cdr kind: KIND_SIZE = 2 ; enum kind { NIL, ATOM, STRING, INTEGER; struct Cons { enum kind carkind: KIND_SIZE; unsigned int car: LG2_POOLSIZE; enum kind cdrkind: KIND_SIZE; unsigned int cdr: LG2_POOLSIZE; ; CAR: half word (16b) tag variant part tag variant part 2b 14b 2b 14b CONS: machine word (32b) CDR: half word (16b) Figure 5.6.4: Layout in memory of a 32-bits word used as micro-lisp Cons record 583. The layout of a C structure Floating point values Characters References Arrays Sets in bit mask representation. etc. Deciphering a Value The values type is the key It gives meaning to the bit representation. Information provided by type: Value s length Partitioning into sections Appropriate way of interpreting each section 585. A step in a BFS/DFS tour DFS/BFS 16 bytes memory block ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? T0: float T0: struct T1 * T0: char[4] T0: struct T2 * 16 bytes memory block Figure 5.6.6: A step in a BFS/DFS tour T 0 : float T 0 : struct T1 * T 0 : char[4] T 0 : struct T2 * T 1 : char[4] T 1 : float T 1 : struct T2 * T 1 : struct T0 * T 2 : struct T0 * T 2 : char[4] T 2 : struct T1 * T 2 : float Figure 5.6.5: Different legitimate interprertations of a 16-bytes memory block The same memory block could be interpreted in many different ways. Here is a 16 bytes block, which can be interpreted as struct T0, as struct T1, or, as struct T2. struct CT0 { float x; struct T1 *p; char s[4]; struct T2 *q; ; struct CT1 { char s[4]; float x; struct T2 *q; struct T0 *p; ; struct CT2 { struct T0 *p; char s[4]; struct T1 *q; float x; ; 584. Summary: the meaning of bits and bytes A value is represented in memory as a sequence of bits and bytes. Components: Integers struct T0 { float x; struct T1 *p; char s[4]; struct T2 *q; ; Suppose we are the midst of a DFS (or BFS) traversal in the values graph, and we follow a reference, reaching a memory block. Unfortunately, a-priory, we do not know how long the block is. Further, although we can examine the bits and bytes, we cannot know what their values mean! Supposing that we know that the value is of type T 0, then, we know how long the memory block is, and that it has four words, of four bytes each, as well as the exact type of each of these words. With this information, we can continue the traversal, along the first reference found in this memory block, and then, along the second such reference. DFS/BFS But, the visited block could be of any type! T 0: float T 0: struct T1 * T 0: char[4] T 0: struct T2 * struct T0 { float x; struct T1 *p; char s[4]; struct T2 *q; ; DFS/BFS T 1: char[4] T 1: float T 1: struct T2 * T 1: struct T0 * struct T1 { char s[4]; float x; struct T2 *q; struct T0 *p; ; DFS/BFS T 2: struct T0 * T 2: char[4] T 2: struct T1 * T 2: float struct T2 { struct T0 *p; char s[4]; struct T1 *q; float x;;

121 587. Interpreting bit and bytes as values of a type Definition 5.19 (Static typing). The compiler knows the deciphering key, and it generates code based on this information. Definition 5.20 (Dynamic typing). A deciphering key is attached to each value; the run-time system decodes the key. The deciphering key in nothing but the type! RTTI field may contain all the type information more commonly, RTTI is a reference to a type descriptor shared by all values of the type. RTTI v 1 T "gaga" Floating Point Number Reference (to type T 1) 4 Characters Reference (to type T 2) RTTI for type T 0 RTTI v 2 T "barr" RTTI v 3 T "vazz" RTTI v 4 T "gaga" Figure 5.6.7: Reference representation of the RTTI tag attached to values 588. Designing an algorithm for traversing values Serialization. Dynamically Typed Deep Cloning, Garbage Collection, Serialization, and, Run time type checks 590. C, C++, & RTTI As a result of the no hidden cost language principle, C does not and cannot have RTTI. As a result, C cannot have general purpose GC, serialization, cloning or any deep operations. Due to the C-compatability at almost all costs language principle, C++ does not and cannot have RTTI. As a result, C++ cannot have general purpose GC, serialization, cloning or any deep operations. C++ has a limited form of RTTI for the implementation of virtual functions. More on these mysterious vptr and vtbl in our OOP course Use of RTTI in the implementation of different PLs Consider a variable today which references an object with (say) three fields: year, month, year How is today.day=35 being implemented? Can we use static type information? No!!! prog. lang. C Script The network of objects typically contains values of very many distinct types The traversal algorithm should know the type of each visited value, the types of each of the values it references It is impractical to generate a different traversal algorithm for each input program as per the the different that occur in it RTTI is the answer! Definition 5.21 (Run-time type information). Run-time type information (or RTTI for short) is a tag attached to each value, which specifies its type. Application of RTTI in different kinds of PLs: Statically Typed Deep cloning, Garbage collection, and, syntax today->day=35 today.day=35 today.day=35 static typing dynamic typing RTTI type punnning implementation 1. dereference today 2. advance by off(day) 3. update field 1. dereference today 2. ignore RTTI 3. advance by off(day) 4. update field 1. dereference today 2. examine RTTI 3. determine off(day) 4. advance by off(day) 5. update field Table 5.6.1: Use of RTTI in the implementation of different PLs 592. Comments on use of RTTI in PLs When and how is off(day), the function determining the field offset, determined? In statically typed languages: at compile time from the static type of today. In statically typed languages: 107

122 at runtime from the RTTI of *today In C, the actual type of *today could be anything (due to type punning). In, the actual type of the object that today refers to, can be any class that extends of Date. References Garbage Collection Variable Object Copy 108

123 6 Commands Contents [113 frames] 6.1 Expressions vs. commands [10 frames] Recursive definitions [11 frames] Atomic commands [4 frames] Exercises Block commands [6 frames] Conditional commands [13 frames] Iterative commands [11 frames] Exercises Structured programming [11 frames] Sequencers [11 frames] Exercises Exceptions [35 frames] Robustness [7 frames] Policy I: resumption [3 frames] Policy II: error rolling [5 frames] Policy III: setjmp/longjmp of C [6 frames] Policy IV: exceptions [1 frame] Kinds of exceptions [4 frames] Resource acquisition is (or isn t) initialization [8 frames] Commands: a visual mindmap Escape Command Expressions Expression Oriented Languages Examples I/O: print, read, Assignment Loops Conditional nop \relax ; 595. Commands vs. statements The misnomer statement is (much) more frequently used in the literature. But, there is nothing declarative in commands! Statement also means: definitions declarations. anything ending with a ; 596. Commands vs. expressions Jump Nassi- Shneiderman Diagrams Varieties Definite/Indefinite Sequential/Simultaneous/Collateral Iterative Structured vs.non-structured Programs Commands Variations Commands vs.expressions Basics Examples Pascal C Atomic Commands Skip Ideally, they should be distinct Commands Change state No value Varieties Sequential Simultaneous Collateral Conditional Command Constructors Block Varieties Sequential Simultaneous Collateral Partial Specification Varieties Assignment Vanilla Update Multiple Simultaneous Collateral Expression No state change Produce a value Figure 6.0.8: Commands: a visual mindmap 6.1 Expressions vs. commands 10 Frames: Commands: what are they? Commands vs. statements Commands vs. expressions Expressions changing the program s state? Expressions without side-effects? Statementexpressions in Gnu-C and in Mock Commandexpression Reasonable realizations of command-expression I Reasonable realizations of command-expression II 594. Commands: what are they? Commands are characteristic of imperative languages 64 Definition 6.1 (Command). A command is a part of a computer program, which: does not produce a value, whose main purpose is altering the program s state. even vacuously 64 No commands in purely functional languages In practice, the borderline is not so clear Commands Expressions 597. Expressions changing the program s state? Nasty CS101 Exam Question You are given a seemingly innocent Pascal code, and asked Procedure Hamlet; VAR happy: Boolean; Function tobe:boolean; Begin happy := not happy; tobe := happy Begin happy := false; End; If tobe and not tobe WriteLn("The Answer!"); End; Could "The Answer" ever be written? 109

124 Suppose that tobe is a function nested in procedure Hamlet, which may have access to a global variable, whose initial value is false, In fact, function tobe returns the value of this global variable, just after flipping it! So, the answer is, 598. Expressions without side-effects? What happened here? Expressions do not make sense without function calls Functions may invoke commands Commands, by definition, alter the program state! Worse, in some PLs, certain operators have side-effects Would it be possible to prevent side-effects at the PL design level? Representation of state? How would you do I/O? In general, tough, but awkward Obvious example, pure-ml 599. Statement-expressions in Gnu-C An excerpt from Section 6.1 Statements and Declarations in Expressions of Chapter 6 Extensions to the C Language Family of the Gnu-C manual: Gnu-C else z = - y; ({ int y = foo (); int z; if (y > 0) z = y; z; ) is a valid (though slightly more complex than necessary) expression for the absolute value of foo(). Note Gnu-C uses the misnomer statement instead of command 600. and in Mock return if ( (while (*s++ = *t++) ;) > (while (*t++ == *s++) ;) Mock ) 3; else while (*s++!= *t++) return 7; Huh? What does this mean? Is this useful to anyone? 601. Command-expression Command expressions is an idealistic notion: Any expression may be substituted by a command Every command is an expression, so every command returns a value: Atomic atomic commands are expressions Sequence the last expression Conditional the selected branch Iteration the last iteration? iterations? What if there were no Return What value should return 3 return? 602. Reasonable realizations of command-expression I In Statement-Expressions of Gnu-C ML, in with the semicolon, ; operator: takes 2 operands computes the 1 st operand and then discards it! computes the 2 nd operand and then returns it. The ancient BCPL PostScript Standard ML of New Jersey - (1;2)*(3;4); val it = 8 : int 603. Reasonable realizations of command-expression II Icon, in which every expression is a generator; atomic expressions are things such as values, which can only yield one value; iterations return a sequence of values; sequencing means concatenating the output of generators 6.2 Recursive definitions 11 Frames: Expressions are recursively defined Function call expression constructor Commands are also recursively defined! Three atomic commands in Pascal More on Pascal s atomic commands The advent of expression oriented languages Two kinds of atomic commands in C++ Command expressions in C More on atomic expressions in C Two kinds of atomic commands in 65 Not all expressions make commands 65 ignoring sequencers 110

125 604. Expressions are recursively defined Naturally, each PL is different, but the general scheme is: Atomic expressions variable inspection literals Expression constructors Operators such as +, -, Function call: The set of atomic expressions and the constructors set are PL dependent, but the variety is not huge Function call expression constructor Definition 6.2 (Function call expression constructor [dynamic typing version]). If f is a function taking n 0 arguments, and E 1,...,E n are expressions, then f (E 1,...,E n ) is an expression Three atomic commands in Pascal (Ignoring goto, the only sequencer of the language) Empty Can you figure out where it hides? Pascal Procedure swap(var a, b: Integer); Begin Assignment As in the above, a := a + b; b := a - b; a := a - b; end; Procedure call As in the above, Pascal WriteLn("The Answer!") Pascal happy := not happy 608. More on Pascal s atomic commands Pascal is a separa language; the sem colon is not part o the command; hen an empty comman is hiding here. Definition 6.3 (Function call expression constructor [static typing version]). Let f be a (typed) function of n 0 arguments, Only a few cases of type compatibility Pascal, e.g., f τ 1 τ n τ. Let E 1,...,E value n could be expressions be assigned toof types τ 1,...,τ n. Then, f (E 1,...,E n ) INTEGER REAL, in the sense that an integer a real variable. Empty no change to state; no computation; no textual representation; existence determined solely by context. Definition 6.4 (Assignment atomic command). Let v be a variable of type τ, and let E be an expression of type τ, or of compatible type τ, τ τ. Then, v := E (6.2.1) is an atomic command. is an expression of type τ Commands are also recursively defined! Each PL is different. The scheme is the same, but the variety is huge: Atomic commands 66 the empty command assigmment sequencers 67 Command constructors 68 Block command constructor Conditional command constructor Iterative command constructor try catch finally command constructor with command constructor 66 each PL is different 67 WTF? sequencers will be discussed later 68 huge variety Definition 6.5 (Procedure call atomic Command). If p is a procedure taking n 0 arguments of types τ 1,...,τ n, and E 1 τ 1,...,E n τ n are expressions, then the procedure call p(e 1,...,E n ) (6.2.2) is an atomic command The advent of expression oriented languages Pascal sharp distinction between expressions and commands distinction between Function and Procedure distinction between epxression and command C,, Go, blurred distinction: a procedure is a function returning Unit an expression is a command, more or less, and subject to PLs variety. 111

126 mand; no need for ; all work is carthe side-effects ssion used in the on nment is an opertwo arguments L (right). The oprns R, and as sidegns R into L Two kinds of atomic commands in C++ The empty command does not change the program state; does not perform any computation; textual representation is the semicolon, i.e., ; while (*s++ = *t++) ; C Expression marked as command An atomic command is also an expression followed by a semicolon, e.g., C0; 1*1; i; (i=i)==-i; i i; 611. Command expressions in C Definition 6.6 (Command expressions in C). If E is an expression, then E; is a command. Not every semicolon makes a command Not every lonely semicolon makes an empty command Not every expression followed by a semicolon makes a command Just as in C, 613. Two kinds of atomic commands in a a ignoring sequencers ; the empty command is a lonely semicolon; Expression; provided that the first step in the recursive decomposition of expression is something that has (might have) side-effects: Function call Operator with side effects: Assignmnet e.g., =, +=, <<=, Incremen/decrement ++ and --; either prefix or postfix. Object creation e.g., new Object() Nothing else! Element Purpose Example Expression evaluates to a value f()? a + b : a -b change program state f()? a + b : a - b; Command (even vacuously) i = 0; Variable definition Variable declaration Definition + initializer Declaration + initializer creates a variable and binds a name to it makes a binding; variable must be created elsewhere creates a variable, binds a name to it, and initializes it int i; extern int i; int i = 3; Table 6.2.1: C program elements 612. More on atomic expressions in C ; // i++; // ++i; // ++i // 614. Not all expressions make commands i++, j++ // ;; // (two commands) i = f(); // f(); // new String(); // new String() // All C s atomic commands (including sequencers) are semicolon terminated Not every command includes a semicolon Not every semicolon is part of a command Can you locate the atomic command(s) in this code? struct CComplex { No command here! double x, y; Nor here! ; I am just a lonely semicolon! longer! ; main() { Neither here! Hey, Iint am Ok, also i, waiting a[100]; a command! for (i = 0; i < 100; i++) a[i] = 100; Ok, waiting return 0; Hey, I am a command and I am atomic! j <<= g(); // 0; // In fact, I am the only atomic command around f; // here! ;f(); // (two commands) We are three expressions, separated by semicolons, but none of us is a command! Yes, but you arenot a sequener, atomic! So, andshut sequencers up now and wait wait evena few more slides for your turn. i++ + j++; // i++ && j++; // f() + 0; // a? f() : g() // 1 << f(); // no comma operator in 112

127 6.3 Atomic commands Contents [4 frames] Exercises Vanilla assignment command v e (6.3.1) Expression e is evaluated Its value is assigned to variable v Multiple 616. Two variation of vanilla assignment v 1,v 2,...,v n e (6.3.2) Expression e is evaluated Its value is assigned to variables Update v 1,...,v n v ϕ(,e 1,e 2,...,e n ) (6.3.3) Simultaneous v 1,v 2 e 1,e 2 (6.3.5) e 1 is evaluated and then assigned to v 1 (as in collateral assignment) e 2 is evaluated and then assigned to v 2 (as in collateral assignment) the two actions take place simultaneously can be used for swapping we had tuples of values; v 1,v 2 can be thought of as a tuple of variables; simultaneous assignment can be thought of as tuple assignment 618. And, what about the forgotten atomic commands? The SKIP command aka NOP, aka \relax, aka ;, aka is not really interesting syntactically necessary on occasions Procedure call command is not really interesting occurs only when procedures are distinct from functions; in most PLs, a procedure is just a function that returns void aka the Unit type. syntactic sugar for v ϕ(v,e 1,e 2,...,e n ) as in Cobol s Add 1 to a Exercises 1. What does this program do? What kinds of assignments do we find in it? x,y := 1; to n do (if x<y then x else y) := x+y; Algol x := max(x,y); as in C/ as in C/ i++ 2. What does this program do? What kinds of assignments do we find in it? x, y := 1; Algol to n do begin x,y := y,x; x := x+y end; i *= Two more varieties of the assignment command Collateral v 1,v 2 e 1,e 2 (6.3.4) 6.4 Block commands 6 Frames: Sequential block constructor Collateral block constructor Programmatically identical vs. semantically equivalent Concurrent block constructor Collateral vs. concurrent collateral Concurrent execution in Occam 619. Sequential block constructor e 1 is evaluated and assigned to v 1 e 2 is evaluated and assigned to v 2 the two actions take place collaterally cannot be used for swapping contents of variables Definition 6.7 (Sequential block constructor). If C 1,...,C n are commands, n 0, then {C 1 ;C 2 ;...;C n (6.4.1) is a composite command, whose semantics is sequential: C i+1 is executed after C i terminates. theoretically possible, but not very useful 113

128 Most common constructor Makes it possible to group several commands, and use them as one, e.g., inside a conditional If your language has no skip command, you can use the empty sequence, {. Separatist Approach: semicolon separates commands; used in Pascal; mathematically clean; error-prone. Terminist Approach: semicolon terminates commands (at least atomic commands); used in C/C++//C # and many other PLs; does not match the above definition Collateral block constructor Definition 6.8 (Collateral block constructor). If C 1,...,C n are commands, n 0, then {C 1 ~C 2 ~ ~C n (6.4.2) is a composite command, whose semantics is that C 1,...,C n are executed collaterally. Very rare, yet (as we shall see) important Order of execution is non-deterministic An optimizing compiler (or even the runtime system) can choose best order Good use of this constructor, requires the programmer to design C 1,...,C n such that, no matter what, the result is programmatically identical, or at least, semantically equivalent 622. Concurrent block constructor Definition 6.9 (Concurrent block constructor). If C 1,...,C n are commands, n 0, then {C 1 C 2 C n (6.4.3) is a composite command, whose semantics is that C 1,...,C n are executed concurrently. Common in concurrent PLs, e.g., Occam Just like collateral Commands can be executed in any order; Order of execution is non-deterministic; An optimizing compiler (or even the runtime system) can choose best order; Good use of this constructor, requires the programmer to design C 1,...,C n ; such that, no matter what, the result is, programmaticall identical, or semantically equivalent 623. Collateral vs. concurrent collateral Collateral really means not guaranteed to be sequential, or undefined ; PL chooses the extent of defining this undefined, e.g., the order of evaluation of a and b in a + b is unspecified. Also, the runtime behavior is undefined in the case a and b access the same memory. Concurrent may be executed in parallel, which is an extent of definition of a collateral execution. the evaluation of a + b by executing a and b concurrently; as usual, this concurrent execution is fair and synhronous, which means that Programmatically identical vs. semantically equivalent Programmatically Identical Now these are the generations of the sons of Noah, Shem, Ham, and Japheth: and unto them were sons born after the flood. 1. The sons of Japheth; Gomer, and Magog, and Madai, and n, and Tubal, and Meshech, and Tiras 2. And the sons of Ham; Cush, and Mizraim, and Phut, and Canaan 3. The children of Shem; Elam, and Asshur, and Arphaxad, and Lud, and Aram { grandsons += 7; ~ grandsons += 4; ~ grandsons += 5; Semantically Equivalent { ~ humanity.add("adam"); humanity.add("eve"); At the end, both "Adam" and "Eve" will belong to humanity; but the internals of the humanity data structure might be different Concurrent execution in Occam The cow PROC cow(chan INT udder!) INT milk: -- definitions are ':' terminated SEQ milk := 0 WHILE TRUE Occam SEQ udder! milk milk := milk + 1 : -- end of PROC cow The calf PROC calf(chan INT nipple?) WHILE TRUE INT milk: Occam SEQ nipple? milk : -- end of PROC calf The cowshed PROC cowshed() CHAN INT mammarygland: PAR calf(mammarygland?) calf(mammarygland?) Occam calf(mammarygland?) calf(mammarygland?) cow(mammarygland!) : -- end of PROC cowshed 114

129 6.5 Conditional commands 13 Frames: Conditional commands Semantics of conditional commands CSP: Communicating sequential processes The else variants Variant #1 / many: the else clause Variant #2 + #3 / many: if-then-else & cases Cases with range in Pascal Why special switch/case statement? Efficient implementation + usability considerations = wrong conclusion? Another weird (& obsolete) conditional statement Cases variants? Vanilla multiway conditional? else if? elseif? what s the big difference? 625. Conditional commands Definition 6.10 (Conditional command constructor). If C 1,...,C n are commands, n 1, and E 1,...,E n are boolean expressions, then {E 1?C 1 : E 2?C 2 : : E n?c n (6.5.1) is a conditional command Semantics of conditional commands Semantics of Can be: {E 1?C 1 : E 2?C 2 : : E n?c n Sequential: Evaluate E 1, if true, then execute C 1, otherwise, recursively execute the rest, i.e., {E 2?C 2 : : E n?c n. Collateral: Evaluate E 1, E 2,, E n collaterally. If there exists i for which E i evaluates to true, then execute C i. If there exists more than one such i, arbitrarily choose one of them. Concurrent: Same as collateral, except that if certain E i are slow to execute, or blocked, the particular concurrency regime, prescribes running the others. Example of a concurrency regime: Strong fairness: In any infinite run, there is no process which does not execute infinitely many times CSP: Communicating sequential processes Occam features a concurrent conditional command: Jacob and his four wives INT kisses: ALT -- a list of guarded commands rachel? kisses Occam out! kisses leah? kisses out! kisses bilhah? kisses out! kisses zilpah? kisses out! kisses If none of the guards is ready, then the ALT commands waits, and waits, and waits. Deep theory of communicating sequential processes ALT is a only a small part of it but we must proceed in our course 628. The else variants Definition 6.11 (Conditional command constructor with else clause). If C 1,...,C n,c n+1 are commands, n 1, and E 1,...,E n are boolean expressions, then {E 1?C 1 : E 2?C 2 : : E n?c n : C n+1 (6.5.2) is a conditional command, whose semantics is the precisely the same as the familiar {E 1?C 1 : E 2?C 2 : : E n?c n, where we define E n = E 1 E 2 E n 1 (6.5.3) The else clause is sometimes denoted by: default otherwise 629. Variant #1 / many: the else clause Almost all languages use else If thouwilttakethelefthand then iwillgototheright else iwillgototheleft Pascal uses Otherwise { else instead of otherwise is allowed case expression of Selector: Statement; Selector: Statement otherwise Statement; Statement end C uses default (the Gnu-Pascal s EBNF) int CisPrime(unsigned c) { switch (c) { case 0: case 1: return 0; case 2: case 3: return 1; default: return isprime(c); 630. Variant #2 + #3 / many: if-thenelse & cases Special construct for the case n = 1 in the form of [ else State- if Condition then Statement ment ] your syntax may vary Special construct for the case that each of E i is in the form e = c i e is an expression (usually integral), common to all i = 1,2,... c i is a distinct constant expression for all i = 1,2,... case Expression of { constantexpression Statement + [ otherwise Statement ] your syntax may vary 115

130 631. Cases with range in Pascal ROT13 Filter in Pascal Program Rot13(Input, Output); VAR c:char; Begin While not eof do begin Read(c); Case c of 'a'..'m', 'A'..'M': Pascal Write(chr(ord(c)+13)); 'n'..'z', 'N'..'Z': Write(chr(ord(ch)-13)); otherwise Write(c); end end end. A selector of Pascal s case statement may contain Multiple entries Range entries 632. Why special switch/case statement? Because the PL designer thought it would be used often it has efficient implementation on wide-spread machines Dedicated hardware instruction in some architecture Jump-table implementation Binary search implementation The above two reasons, with different weights, explain many features of PL. these are precisely the reasons for the particular specification of conditional in the form of if-then-else for the cases n = Efficient implementation + usability considerations = wrong conclusion? Early versions of Fortran relied on a very peculiar conditional statement, namely arithmetic if IF ( Expression ) l 1, l 2, l 3 where l 1 is the label to go to in case Expression is negative l 2 is the label to go to in case Expression is zero l 3 is the label to go to in case Expression is positive could be efficient, but not very usable in modern standards 634. Another weird (& obsolete) conditional statement Early versions of Fortran had a computed goto instruction GO TO (l 1,l 2,...,l n ) Expression (6.5.4) where l 1 is the label to go to in case Expression evaluates to 1 l 2 is the label to go to in case Expression evaluates to 2.. l n is the label to go to in case Expression evaluates to n likely to have efficient implementation, but not very usable in modern standards 635. Cases variants? Range of consecutive integer values (in Pascal) Cases of string expression No straightforward efficient implementation Added in later versions of after overwhelming programmers demand Regular expressions in selectors Exists in Bash Seems natural for the problem domain General patterns in selectors Exists in ML and other functional PLs In the spirit of the PL type system No cases statement In Eiffel a pure OO language Language designer thought it encourages non OO mindset 636. Vanilla multi-way conditional? Exists in many languages, in the form of a special keyword elseif, or elsif or ELIF, e.g., in PHP you can write elseif in PHP if ($a > $b) { PHP echo "a is bigger than b"; elseif ($a == $b) { echo "a is equal to b"; else { echo "a is smaller than b"; 637. else if? elseif? what s the big difference? There is no big difference! else if many levels of nesting elseif one nesting level this might have an effect on automatic indentation, but modern code formatters are typically smarter than that! another small difference occurs if the PL requires the else part to be wrapped within { and. 116

131 6.6 Iterative commands Contents [11 frames] Exercises Iterative command constructor A very general pattern of iterative command constructor Definition 6.12 (Iterative command constructor). If S is a program state generator and C is a command, then forall S do C is an iterative composite command whose semantics is the (sequential / collateral / concurrent) execution of C in all program states that S generates. Note that with sequencers such as break and continue, iterative commands can be even richer! 639. State generator? answer #1/5 Range of integer (ordinal) values, e.g., For Pascal i := gcd(a,b) to lcm(a,b) do If isprime(i) then Writeln(i); 640. State generator? answer #2/5 The state generator S may be Any arithmetical progression, e.g., in Fortran Comment WHAT IS BEING COMPUTED??? INTEGER SQUARE11 SQUARE11=0 Fortran DO 1000 I = 1, 22, 2 SQUARE11 = SQUARE11 + I 1000 CONTINUE 641. State generator? answer #3/5 The state generator S may be Expression, typically boolean: expression is re-evaluated in face of the state changes made by the command C; iteration continues until expression becomes true, or, until expression becomes false, 642. State generator? answer #4/5 The state generator S may be Generator, e.g., in List <Thing > things = new ArrayList <Thing >(); for (Thing t : things) System.out.println(t); 643. State generator? answer #5/5 The state generator S may be Cells in an array, e.g., in public static void main(string[] args) { int i = 0; for (String arg: args) System.out.println( "Argument " + ++i + ": " + ): arg 644. Minor varieties of iterative commands Minimal number of iterations? Minimal # Iterations = 0 while (s < 100) s++; Minimal #Iterations = 1 do { s++; while (s < 100); Truth value for maintaining the iteration Iteration continues with true Pascal While not eof do Begin end Iteration continues with false Pascal Repeat until eof none of these is too interesting 645. The iteration variable Several iteration constructs, e.g., ranges and arithmetical progressions, introduce an iteration variable to the iteration body, e.g., #!/ usr /bin/gawk f BEGIN { antonym["big"] AWK = "small" antonym["far"] = "near" for (w in antonym) print w, antonym[w] int[] primes = new int[100]; for (int p = 1, i = 0; i < primes.length; i++) primes[i] = p = nextprime(p); for (int p: primes) System.out.println(p); 646. Subtleties of the iteration variable Can you make an educated guess as to what should happen in the following cases 1. the value of the expression(s) defining the range/arithmetical progression change during iteration? 2. the loop s body tries to change this variable? 3. the value of the iteration variable is examined after the loop? 647. Definite vs. Indefinite Iteration To make an educated guess, Let s educate ourselves: Definite Loop Number of iterations is known before the loop starts Idefinite Loop A loop which is not definite 117

132 It is easier to optimize definite loops. Many PL try to provide specialized syntax for definite loops, because they perceived as more efficient and of high usability. Only definite loops may have collateral or concurrent semantics Even if a PL does not specify that loops are definite, a clever optimizing compiler may deduce that certain loops are definite, e.g., for (int i = 0; Ci < 100; i++) ; // I f loop body does not change i // the loop i s e f f e c t i v e l y d e f i n i t e 648. So, let s make our guesses 1. the value of the expression(s) defining the range/arithmetical progression change during iteration The iteration range, as well as the step value are computed only at the beginning of the loop. (Check the Fortran/Pascal manual if you are not convinced) 2. the loop s body tries to change this variable The loop body should not change the iteration variable; The PL could either issue a compile-time error message (Pascal), runtime error message (), or just state that program behavior is undefined. 3. What s the value of the iteration variable after the loop? The iteration variable may not even exist after the loop (); or, its value may be undefined (Pascal). Exercises the PL designer thought that programmers should not use the iteration variable after the loop ends if the value is defined, then collateral implementation is more difficult many architectures provide a specialized CPU instructions for iterations; the final value of the iteration variable with these instructions is not always the same. 1. Revisit the example of using references in ML. Is r an L-value? An R-value? Both? None of these? Explain. 2. AWK designers chose the iteration variable to range over the indices of an array, instead of the values. Why was this the only sensible decision? 3. What s the iteration variable of a while loop? 4. What does the acronym CSP stand for? 5. Write the most feature-rich class in, without using the semicolon character, ;, even once. 6. How come C does not offer any rules regarding the iteration variable? 7. How come, despite being similar to C, recognize the notion of and iteration variable. 8. Explain why it is impossible to use the Perl die() programming idiom in. 9. designers chose the iteration variable to range over the array cells, instead of the indices. Why was this the only sensible decision? 10. Revisit the example of using references in ML. Is!r an L-value? An R-value? Both? None of these? Explain. 11. Could there be an iteration variables in non-definite loops? Explain. 12. Write a C function with at least three commands in it, without using the semicolon character, ;, even once. 13. What are the circumstances in which If false then Pascal Writeln(true < false); prints true; concretely, add some Pascal code around to make this happen. 6.7 Structured programming 11 Frames: Flowcharts: another method for command composition Flowchart for getting things done (GTD) Pros & cons of flowcharts Challenge of understanding spaghetti code Structured programming Nassi-Shneiderman diagram Compound commands in Nassi-Shneiderman diagrams Matrix multiplication with Nassi-Shneiderman diagram Factorial with Nassi-Shneiderman diagram More Nassi-Shneiderman notations Even more Nassi- Shneiderman notations Nodes : 649. Flowcharts: another method for command composition I/O: * read * print * display * Controls: * start * stop Empty: skip assignment Decision point nodes are in factatomic commands Edges: goto 118

133 skip 652. Challenge of understanding spaghetti code Print Da start yes is EOF? no Read Bit yes is EOF? no Read Bit no yes is it 1? Print Da yes is EOF? no Read Bit Print Ha. STOP The program on the right does something useful! Many intersecting spaghetti edges No obvious meaningful partitioning of the chart Only a few nodes with one entry and one exit all decision nodes have two (or more) outgoing edges Some nodes with two incoming edges, even three! no yes is it 1? skip no yes is it 1? STOP Print Da yes is EOF? skip no Read Bit start yes is EOF? Print Ha ; no yes is it 1? Print Da yes is EOF? no Read Bit no yes is it 1? Print Da yes is EOF? STOP Figure 6.7.1: An intriguing finite automaton computing (in a non-sensible manner) a function that does make sense no Read Bit no Read Bit 650. Flowchart for getting things done (GTD) no yes is it 1? skip no yes is it 1? How to process items in your incoming mailbox? STOP Print Da yes is EOF? Inbox no Read Bit no Important item? yes something you Want? yes no best use of my time if I do it myself? no no yes is it / Trash yes Urgent no item? yes no no can be done in two minutes? no yes can be done now? yes no Actionable item? yes yes Reference material? has a deadline? yes List Figure 6.7.3: Challenge of understanding spaghetti code Do it / Tickler File 653. Structured programming Figure 6.7.2: Flowchart for getting things done (GTD) 651. Pros & cons of flowcharts Pros Very visual Very colorful Can beaesthetically pleasing Can be understood byalmostanyone Cons Do not scale Many competing standards Not necessarily planar Spaghetti code No one can really understand them is a programming paradigm,characterized by Three Controls: precisely three ways for marshaling control: 1. Sequence, e.g., begin C 1 ;C 2...;C n ) end for n >= 0 2. Selection, e.g., if then elseif else endif. 3. Iteration, e.g., while do done Structured Control: all control commands are in fact, command construcors. control is marshalled through the program structure. Theorem 6.13 (The structured programming theorem (Böhm-Jacopini, 1966)). Every flowchart graph G, can be converted into an equivalent structured program, P(G). 119

134 654. Nassi-Shneiderman diagram Main Idea Programming is like tiling the plane. Also Called NSD, and structograms Thought Of As the visiual definition of structured programming Principles: 1. every command is drawn as a rectangle 2. every command has exactly: One entry point One exit point 3. a command may contain other commands 4. a command may be contained in other commands 655. Compound commands in Nassi- Shneiderman diagrams Based on an example provided by the original October 1973 SIGPLAN Notices article by Isaac Nassi & Ben Shneiderman no 657. Factorial with Nassi-Shneiderman diagram nfact := 1 is (n>1)? return nfact nfact := 2 do i := 3 to n yes nfact := nfact+1 Sequential Command Conditional Command Iterative Command Command 1 Condition Condition false true Command 2 Figure 6.7.6: Factorial with Nassi-Shneiderman diagram 658. More Nassi-Shneiderman notations. Command f Command t Command Switch Command Repeat Until Command Command n 4 Figure 6.7.4: Compound commands in Nassi-Shneiderman diagrams 7 case i do Command Compound commands are rectangles which have smaller rectangles in them C4 C default Condition Each rectangle may contain in it one, two, or more rectangles C17 Cd Correspond to our familiar command constructors Color is not part of the diagram But we can add it anyway 656. Matrix multiplication with Nassi- Shneiderman diagram do i := 1 to n do j := 1 to n Figure 6.7.7: More Nassi-Shneiderman notations 659. Even more Nassi-Shneiderman notations Nassi and Shneidermanh did not fully work out the semantics of NSD; not in any formal notation; not in legalese. not in mock of legalese. Some notation may be intriguing sum := 0 do k := 1 to n Concurrent Command Constructor Begin End Iteration? Begin sum += A[i,k] * B[k,j] C 1 C 2 C n Command C[i,j] := sum End Figure 6.7.5: Matrix multiplication with Nassi- Shneiderman diagram Figure 6.7.8: Even more Nassi-Shneiderman notations 120

135 did not really catch 6.8 Sequencers Contents [11 frames] Exercises What are sequencers? Definition 6.14 (Sequencers). Sequencers are atomic commands whose executution alters the normal (structural) flow of contorl. Examples: goto from any progtram point to another return to the end of an enclosing function break out of an enclosing iteration continue to the head of an enclosing iteration throw exception, that transfers control to a handler in an invoking function 661. Labels To denote where goto will go to, one needs a label Definition 6.15 (Label). A label an entity which denotes an empty command in the program text; typically there are non-empty commands before and after the empty command that the label denotes Labels are a delierately disprevilidged entities Label literal & 1 st class labels Literal labels The label itself is the empty command which it denotes: Identifiers. as in C and assembly PLs Integers. as in Pascal, Basic and Fortran In Basic, all commands must be labeled in a strictly ascending order 1 st Class labels Basic, PL/I, and some other obscure languages, treat labels as first class values, which can be stored in variables passed as arguments, returned by functions, etc. l1: label v = a > b?. l2: goto v; 663. Declared vs. ad-hoc labels l1: l2; Declared label as in Pascal; labels must be declared before they are used Ad-hoc labels as in C/C++ int main() { printf("page loaded successfully\n"); 664. Ad-hoc labels may generate subtle bugs int CisPerect(unsigned n) { switch (n) { defualt: return 0; case 6: case 28: case 496: case 8128: case 0x1FFF000u: return 1; spelling error in defualt 665. The persecuted goto Restrictions on goto Only within a block structure. Fortran, Only goto within a function. C does not allow inter-functional goto, but gotos are allowed in and out of a block. No goto from a bracketed command into itself. Pascal No goto into a loop or into a conditional. C No goto into a compound command. Pascal No goto into a nested function. Pascal and Algol No goto at all.! 666. Goto to a nesting function Labels obey scope rules: If a variable of a nesting function is recognized in a nested function, the nested function can also goto to a label defined the nestin function. In case of recursion, labels denote a program point in the current activation Problems of structured programming A large portion of all software is dedicated to dealing with exceptional cases, erroneous inputs and funny situations, e.g., Pascal code tends to be heavily nested and difficult to read: If some error was discovered then Begin deal with it End else Begin do a little bit more processing If another error was discovered then Begin deal with this error end else Begin continue processing If another problem has occurred then Begin deal with it end else Begin Pascal work a little bit more If oops, a problem of a different kind was found then Begin do something about it end else Begin continue to work end end end end 121

136 668. Escapes Definition 6.16 (Escape). An escape is a special kind of goto which terminates the execution of a compound command in which it is nested Makes single entry, multiple exit commands: exit in Ada break in C/C++/ Useful for simplifying nesting structure 669. Varieties of escape Escape any enclosing Loop. exit l in Ada and break l in Perl/, where l is a label of an enclosing loop Escaping out of a Function. return in C and Fortran Terminal escape. terminate the execution of the whole program; halt in Fortran. Specialized escape. in C 670. Continue break out of a switch command Definition 6.17 (Continue). A continue is a special kind of an escape which can only occur within a iteration command; it terminates the execution of the current iteration and if there is a next iteration, it proceeds to it. Just like break, useful for simplifying nesting structure. Continue any Enclosing Loop. continue l in, where l is a label of an enclosing iteration command, proceeds to the next iteration of the of the iteration command marked by l. Cannot be emulated by Pascal goto, due to restrictions that Pascal places on goto. Exercises 1. How can you emulate break using goto in Pascal? 2. Give an example in which continue can be used to simplify nesting structure. 3. Why did Pascal s designer decide to identify labels with integers. 4. Give an example in which return can be used to simplify nesting structure. 5. Why did Pascal s designer decide to require that labels are pre-declared. 6. Give an example in which break out of a loop can be used to simplify nesting structure. 7. C does not make it possible to break out of more than one loop. How can you do something similar with return and auxiliary functions? 8. Compare stored labels with continuations. What s common and how are they different? 6.9 Exceptions Contents [35 frames] Robustness [7 frames] Policy I: resumption [3 frames] Policy II: error rolling [5 frames] Policy III: setjmp/longjmp of C [6 frames] Policy IV: exceptions [1 frame] Kinds of exceptions [4 frames] Resource acquisition is (or isn t) initialization [8 frames] Sometimes it does not make sense to proceed as usual! Two kinds of abnormal and unusual situations Bugs An error in the program code, makes it try to execute an invalid instruction. division by zero array bounds overflow stack overflow dereferencing null pointer runtime type error (memory exhaustion) Unusual environment The program encounters an environment for which normal execution does not make sense: wrong password file not found low battery no connection to server cannot locate GPS volume is off Robustness 7 Frames: When you cannot proceed as usual Robust PL vs. robust programs The fundamental theorem of exception handling Handling exceptions Corollaries of the robust programs theorem Example: Heron s formula for the area of the triangle Non-robust program for the area of the triangle 672. When you cannot proceed as usual It is tough to write robust programs; sometimes 80% of the code is dedicated to abnormal situations. There is an exception! Bugs The language runtime environment must take action, printing error messages, and enforcing graceful termination Unusual environment programmer must deal with the error; if the programmer fails to detect an unusual environment, then it is a bug Robust PL vs. robust programs Definition 6.18 (Robust PL). The PL s runtime system recovers gracefully from all program bugs the programmer cannot make the runtime system crash Definition 6.19 (Robust program). The program recovers gracefully even in the face of weird end-user and execution environment errors the user cannot make the runtime system crash 122

674. The fundamental theorem of exception handling 677. Example: Heron s formula for the area of the triangle Theorem 6.20 (The robust programs theorem).

<+-> Exceptions are detected at lower level of abtraction: Wrong kebyoard input Missing file Internet problem Low battery Out of memory Figure 6.9.

137 674. The fundamental theorem of exception handling 677. Example: Heron s formula for the area of the triangle Theorem 6.20 (The robust programs theorem). No one really knows how to write robust programs Proof. <+-> Exceptions are detected at lower level of abtraction: Wrong kebyoard input Missing file Internet problem Low battery Out of memory Figure 6.9.1: Hero of Alexandria (10 70 AD); also known as Heron Hero of Alexandria (10 70 AD) Also known as Heron but they must be handled in higher levels of abstraction Handling exceptions Handling must be done at a high abstraction level. Challenges include: b A a Consistency: Deal with similar errors similarly (tough because many details of the errors are lost at higher abstraction level) Coverage: make sure that all errors are covered appropritely, and that no two dissimilar errors are grouped togetehr for the purpose of handling. (tough, because the programmer does not always know what errors may happen at lower levels of abstraction) Smart recovery: Sometimes, by the time the exception is caught, there is nothing useful that can be done. Systematic testing: It is tough to systematically generate all exceptions Corollaries of the robust programs theorem Since no one know how to do it, no one can design for it Corollary 6.21 (Adequate support for exception handling). No PL provides adequate support for exception handling. Humanity concentrates in what it does well Corollary 6.22 (Graceful termination). Many PLs offer graceful termination in case of bugs. Humanity cannot do what it does not know how to do Corollary 6.23 (Rarenress of robust programs). Very few programs are truly robust And since programmers are lazy, Corollary 6.24 (Input errors considered bugs). Most programs convert input errors into bugs c Figure 6.9.2: A triangle with edges a, b, and c A = s(s a)(s b)(s c) s = a + b + c 2 Exceptions 1. Cannot read a 2. Cannot read b 3. Cannot read c 4. a < 0 5. b < 0 6. c < 0 7. s < 0 8. s a < 0 9. s b < s c < Non-robust program for the area of the triangle (6.9.1) Let s start implementing it. First, essential include files #include <stdio.h> #include C++ <math.h> Then, a function to prompt for and read real numbers: 123

Preliminaries. 1. Preliminaries 1.1 Administration. 1.2 Motivation 1.3 Hello, World!

Preliminaries. 1. Preliminaries 1.1 Administration. 1.2 Motivation 1.3 Hello, World! Section 1 Preliminaries 1. Preliminaries 1.1 Administration 1.2 Motivation 1.3 Hello, World! 1.1. Administration Where are we? 1. Preliminaries 1.1. Administration 1. Preliminaries 1.1 Administration 1.2