CSE 230: Winter 2008 Principles of Programming Languages Lecture 15: Recursive Types & Subtyping Ranjit Jhala UC San Diego News? Formalize first-order type systems Simple types (integers and booleans) Function types (simply typed λ-calculus) Structuredtypes (products and sums) Recursive types (lists, trees) Imperative types (pointers and exceptions) Subtyping Recursive Types (e.g. Lists) What is a list? How to describe using known types? A list of elements of type α (i.e. a α list) is: either empty or it is a pair of a α and a α list α list = unit + (α α list) What does this remind you of? Write t for α list H h i i Hmm another recursive equation: t = unit + (α t)
Recursive Types (e.g. Lists) Hmm another recursive equation: t = unit + (α t) Write as t = τ (t) The type variable t occurs in, is bound in τ Introduce recursive type constructor: μt. τ = least fixpoint solution of the equation α list defined as: μt. (unit + α t) Allows unnamed recursive types Manipulating μ Introduce syntactic operations to convert between μt.τ and [μt.τ/t]τ e.g. between αα list and unit + α α list τ ::= t μt.τ e ::= fold μt.τ e unfold μt.τ e Intuition: ii fold μt.τ : takes a τ value, turns it into a μt.τ unfold μt.τ : takes a μt.τ value, turns it into a τ Example with Recursive Types Lists α list = μt. t (unit + α t) nil α = fold α list (injl ()) cons α = λx:α.λl:αλl:α list. fold α list (injr (x, L)) List length function length α = λl:α list. match (unfold α list L) with injl x 0 injr y 1 + length α (snd y) Check that nil α : α list cons α : α α list α list length α : α list int Static Semantics of Recursive Types Syntax directed Γ ` e : μt.τ Γ ` unfold μt.τ e : [μ t.τ/t]τ Γ ` e : [μ t.τ/t]τ Γ ` fold μt.τ e : μt.τ Often, for simplicity, fold/unfold omitted
Dynamics of Recursive Types Add a new form of values v ::= fold μt.τ v fold ensures value has recursive type not its unfolding The evaluation rules: e v fold μt.τ e fold μt.τ v e fold μt.τ v unfold μt.τ e v The folding annotations for type checking only can be dropped after type checking Recursive Types in ML Syntactic trick avoids explicit un/fold: combine recursive and union types! datatype t = C 1 of τ 1 C 2 of τ 2... C n of τ n recursive: t can appear in τ i datatype intlist = Nil of unit Cons of int * intlist Programmer writes: Cons (5, l) Compiler reads: fold intlist (injr (5, l)) Programmer writes: match e with Nil... Cons (h, t)... Compiler reads: match unfold intlist e with injl_... injr(h,t)... Encoding CBV λ-calculus in F μ 1 F 1 can t encode non-terminating computations Cannot encode recursion Cannot write the λx.x x (self-application) Recursive types level playing field: Calculus called: F 1 μ typed λ-calculus as expressive as untyped λ-calculus! Convert C B V λ-calculus terms to C B V F 1 μ 1 Untyped programming in F μ 1 e : conversion of the term e to F μ 1 The trick? The type of e is V = μt. t t Conversion rules: Verify that 1. ` e : V x = x λx. e = fold V (λx:v. e) e 1 e 2 = (unfold V e 1 ) e 2 2. e v if and only if e v Non-terminating computation D = (λx:v. (unfold V x) x) (fold V (λx:v. (unfold V x) x)))
Formalize first-order type systems Simple types (integers and booleans) Function types (simply typed λ-calculus) Structuredtypes (products and sums) Recursive types (lists, trees) Imperative types (pointers and exceptions) Subtyping Types for Imperative Features So far, pure functional languages Now we look at types for imperative features Types used to characterize non-local effects references, pointers, assignment exceptions Small step semantics : update heap Reference Types For mutable memory cells Syntax (as in ML) e ::=... ref e: τ e 1 := e 2! e τ ::=... τ ref ref e evaluates e allocates new cell stores the value of e returns reference to new cell like malloc + initialization in C, or new in C++ and Java e 1 := e 2 eval e 1 to a memory cell (reference) updates cell s value with value of e 2 Global Effects with Reference Cells Cell can escape static scope where created (λf: int int ref.!(f 5)) (λx: int. ref x) Cell s value must be visible in whole program Result of expr must include changes made to heap side effects must extend the evaluation model!e evaluates e to a memory cell (reference) returns its contents
Modeling References heap = mapping from addresses to values h ::= h, a v : τ a Addresses Tag heap cells with their types Types only for static semantics, not evaluation not a part of the implementation program = heap + expression p ::= heap h in e Initial program: heap in e Heap addrs act as bound variables in the expression Allows reuse of local l var. properties for heap addresses e.g., we can rename the address and its occurrences Static Semantics of References Typing rules for expressions: Γ ` e : τ Γ ` ref e : τ ref For programs: Γ ` e : τ ref Γ ` e 1 : τ ref Γ ` e 2 : τ Γ `!e : τ Γ ` e : τ Γ ` heap h in e : τ Where: Γ = a 1 :τ 1 ref,...,a n :τ n ref h = a 1 v 1 : τ 1,, a n : v n :τ n Γ ` e 1 :=e 2 : unit Contextual Semantics for References Addresses are values: v ::=... a New contexts H ::= ref H H 1 := e 2 a 1 := H 2!H No new local reduction rules New global reduction rules propagate effects heap h in H[ref v : τ] heap h, a v : τ in H[a] where a is fresh heap h in H[!a] heap h in H[v] where a v : τ h heap h in H[a := v] heap h[a v] in H[()] where h[a v] = h except cell a v 1 :τ replaced by a v:τ Example with References Consider the evaluation (redex is underlined) heap in (λf:int int ref.!(f 5)) (λx:int. ref x : int) heap in!((λx:int. ref x : int) 5) heap in!(ref 5 : int) heap a = 5 : int in!a heap a = 5 : int in 5 The resulting program has a useless memory cell No references to it in program Equivalent to: heap in 5 Simple way to model garbage collection
Formalize first-order type systems Simple types (integers and booleans) Function types (simply typed λ-calculus) Structuredtypes (products and sums) Recursive types (lists, trees) Imperative types (pointers and exceptions) Subtyping Types for Imperative Features So far, pure functional languages Now we look at types for imperative features Types used to characterize non-local effects references, pointers, assignment exceptions Small step semantics : update heap Reference Types For mutable memory cells Syntax (as in ML) e ::=... ref e: τ e 1 := e 2! e τ ::=... τ ref ref e evaluates e allocates new cell stores the value of e returns reference to new cell like malloc + initialization in C, or new in C++ and Java e 1 := e 2 eval e 1 to a memory cell (reference) updates cell s value with value of e 2 Global Effects with Reference Cells Cell can escape static scope where created (λf: int int ref.!(f 5)) (λx: int. ref x) Cell s value must be visible in whole program Result of expr must include changes made to heap side effects must extend the evaluation model (see pdf)!e evaluates e to a memory cell (reference) returns its contents
Static Semantics of References Γ ` e : τ Γ ` ref e : τ ref Γ ` e : τ ref Γ `!e : τ Γ ` e 1 : τ ref Γ ` e 2 : τ Γ ` e 1 :=e 2 : unit Modeling References heap = mapping from addresses to values h ::= h, a v : τ a Addresses Tag heap cells with their types Types only for static semantics, not evaluation not a part of the implementation program = heap + expression p ::= heap h in e Initial program: heap in e Heap addrs act as bound variables in the expression Allows reuse of local l var. properties for heap addresses e.g., we can rename the address and its occurrences Static Semantics of References Typing rules for expressions: Γ ` e : τ Γ ` ref e : τ ref For programs: Γ ` e : τ ref Γ ` e 1 : τ ref Γ ` e 2 : τ Γ `!e : τ Γ ` e : τ Γ ` heap h in e : τ Where: Γ = a 1 :τ 1 ref,...,a n :τ n ref h = a 1 v 1 : τ 1,, a n : v n :τ n Γ ` e 1 :=e 2 : unit
Contextual Semantics for References Addresses are values: v ::=... a New contexts H ::= ref H H 1 := e 2 a 1 := H 2!H No new local reduction rules New global reduction rules propagate effects heap h in H[ref v : τ] heap h, a v : τ in H[a] where a is fresh heap h in H[!a] heap h in H[v] where a v : τ h heap h in H[a := v] heap h[a v] in H[()] where h[a v] = h except cell a v 1 :τ replaced by a v:τ Example with References Consider the evaluation (redex is underlined) heap in (λf:int int ref.!(f 5)) (λx:int. ref x : int) heap in!((λx:int. ref x : int) 5) heap in!(ref 5 : int) heap a = 5 : int in!a heap a = 5 : int in 5 The resulting program has a useless memory cell No references to it in program Equivalent to: heap in 5 Simple way to model garbage collection Formalize first-order type systems Simple types (integers and booleans) Function types (simply typed λ-calculus) Structuredtypes (products and sums) Recursive types (lists, trees) Imperative types (pointers and exceptions) Subtyping Subtyping Types = Sets of Values Subtyping = Subsets of Values τ < σ means τ is a subtype of σ If τ < σ then, any expression of type τ 1. also has type σ 2. can be used in a context that expects a σ Subtyping is reflexive and transitive
Subtyping examples FORTRAN : int < real 5 + 1.5 is well-typed 5 : int, hence 5 : real PASCAL : [1..10] < [0..15] < int Subclasses in object-oriented languages If S is a subclass of C, then inst of S can be used where an inst of C expected Subclassing Subtyping philosophy Can be used = Subsumption Rule of subsumption Formalize the informal subtyping requirement Γ ` e:τ Γ ` e:σ τ < σ If τ < σ then expression of type τ also has type σ Oh look out! What about type safety? If we say that int < (int int) Then we can prove that 5 5 is well typed! Q: How to define subtyping while preserving safety? Defining Subtyping By derivation rules for the judgment τ < σ 1. Start with subtyping on the base types E.g. int < real or nat < int Language-dependent, using types-as-sets 2. Close under reflexivity, transitivity τ 1 <τ 2 τ 2 <τ 3 τ < τ τ 1 < τ 3 3. Build-up subtyping for compound types Subtyping for Pairs τ <σ τ <σ τ τ < σ σ Wherever a σ σ is usable, a τ τ is usable: Consider context H = H [fst ] expecting a σ σ ie i.e. H safely accepts a σ As τ < σ we have H safely accepts a τ If e : τ τ then fst e : τ, thus H safely accepts e : τ τ Case of snd is symmetric
Subtyping for Functions Unsafe! τ<σ τ <σ Suppose Γ = f : int bool τ τ < σ σ Rule gives Γ ` f 5.0 : bool 5.0is an invalid argument of f Γ ` f : int bool int < real Γ ` f : real bool bool < bool int bool < real bool Γ ` f 5.0 : bool Γ ` 5.0 : real Eh? int bool not a special case of real bool Works for fewer inputs, can t use instead of Subtyping Functions is: covariant in the result type, Subtype returns fewer outputs contravariant in the argument type Subtype accepts more inputs Informal correctness argument: f : τ τ f expects an argument of type τ can handle special case arg σ < τ No problems executing callee f with arg f returns a value of type τ < σ Special case of σ, caller has no problem with result σ τ<σ< τ τ <σ τ τ < σ σ caller σ σ τ f τ More on Contravariance real real int real real int int int int < real Consider the subtype relationships Subtyping References Unsafe τ < σ τ ref < σ ref Suppose τ < σ Above rule implies x : σ, y : τ ref, f : τ int ` y := x; f (!y) Unsound: f called with σ but defined only on τ Java has covariant arrays!
Subtyping References Unsafe: τ < σ σ ref < τ ref Example: assume τ < σ Above rule implies x : σ, y : σ ref, f : τ int ` y := x; f (!y) Unsound: f called on σ but defined d only on τ References are invariant no subtyping for references hence, arrays should be invariant hence, mutable records should be invariant The Limitations of F 1 One function for one type id = λx:τ. x : τ τ sort : (τ τ bool) τ list τ list Need different versions for different τ Only different w.r.t. static typing Same operations performed at runtime Alternatives 1. Circumvent the type system (C, Java,...), 2. Juice up the type system! Polymorphism Polymorphic funs: can be applied to many argument types Varieties of polymorphism : depending di on def. of many 1. subtype (or bounded) polymorphism many = all subtypes of a given type 2. ad-hoc polymorphism many = depends on the function choose behavior at runtime (depending on types, e.g. sizeof) 3. parametric predicative polymorphism many = all monomorphic types 4. parametric impredicative polymorphism many = all types