The Type Checker Compilation 2007 The type checker has several tasks: determine the types of all expressions check that values and variables are used correctly resolve certain ambiguities by transformations Some languages have no type checker no compile-time guarantees runtime tagging and checking become necessary Michael I. Schwartzbach BRICS, University of Aarhus 2 Types Describe Possible Values Type Annotations The Joos types are: void(the empty set of values) byte short int char boolean C (objects of class C or any subclass) null(the polymorphic null constant) σ[] (for any type σ) Type annotations: int x; Foo y; specify invariants about the runtime behavior: xwill always contain an integer value ywill always contain null or an object of class Foo or any subclass Type annotations are not very expressive viewed as invariants 3 4 1
Type Correctness Static Type Correctness A program is type correct if the type annotations are valid invariants Type correctness is trivially undecidable: int x = 0; TM(j); x = false; The program is type correct iff the j'th Turing machine does not halt on empty input A program is statically type correct if it satisfies some type rules The rules are chosen to be: simple to understand efficient to decide conservative with respect to type correctness Type systems are not canonical They are designed like the rest of the language 5 6 Slack Examples of Slack Static type systems are necessarily flawed: Trivial example: type correct statically type correct x = 87; if (false) x = true; Useful example: slack Map m = new Hashmap(); m.put("bar","foo"); String x = m.get("bar"); Java 1.5 is designed to pick up some slack There is always slack: programs that are unfairly rejected by the type checker 7 8 2
Specifying Type Rules The Shortcomings of Prose Prose: "The argument of the sqrt function must be of type int, the result is of type double" Logical rules: C - x: int C - sqrt(x): double The first expression must be of type boolean, or a compile-time error occurs. The conditional operator may be used to choose between second and third operands of numeric type, or second and third operands of type boolean, or second and third operands that are each of either reference type or the null type. All other cases result in a compile-time error. Note that it is not permitted for either the second or the third operand expression to be an invocation of a void method. In fact, it is not permitted for a conditional expression to appear in any context where an invocation of a void method could appear. The type of a conditional expression is determined as follows: If the second and third operands have the same type (which may be the null type), then that is the type of the conditional expression. Otherwise, if the second and third operands have numeric type, then there are several cases:if one of the operands is of type byte and the other is of type short, then the type of the conditional expression is short. If one of the operands is of type T where T is byte, short, or char, and the other operand is a constant expression of type int whose value is representable in type T, then the type of the conditional expression is T. Otherwise, binary numeric promotion is applied to the operand types, and the type of the conditional expression is the promoted type of the second and third operands. Note that binary numeric promotion performs value set conversion. If one of the second and third operands is of the null type and the type of the other is a reference type, then the type of the conditional expression is that reference type. If the second and third operands are of different reference types, then it must be possible to convert one of the types to the other type (call this latter type T) by assignment conversion; the type of the conditional expression is T. It is a compile-time error if neither type is assignment compatible with the other type. 9 10 Three Kinds of Type Rules Judging Expressions Declarations: this variable is declared to have this type Propagations: if the argument is of this type, then the result is of that type Restrictions: the argument must be of this type or that type The logical notation handles all three uniformly The judgment: C, L, X, σ - E: τ means that the expression E is statically type correct and has type τ in a context where: C is the current class L is the current local environment X is the current set of exceptions σ is the return type of the current method 11 12 3
Judging Statements Class Environments The judgment: C, L, X, σ - S means that the statement S is statically type correct in a context where: C is the current class L is the current local environment X is the current set of exceptions σ is the return type of the current method 13 The class environment is implicitly available The notation D C means that D is a subclass of C We can query a class C: C - extends D (super class) C - σ f (fields) C - static σ f (static fields) C - (σ 1,..., σ k ) X (constructors) C - σ m (σ 1,..., σ k ) X (methods) C - static σ m (σ 1,..., σ k ) X (static methods) return type name argument types thrown exceptions 14 Assignment Compatibility The relation σ := τ means that values of type τ may be assigned to variables of type σ: σ := σ short := byte σ :- σ int := byte int := short σ[] :- τ[], if σ :- τ int := char C := null java.lang.cloneable :- σ[] σ[] := null Object := σ[] java.io.serializable :- σ[] java.lang.cloneable := σ[] Object :- σ[] java.io.serializable := σ[] C :- D, if D C σ[] := τ[], if σ :- τ C := D, if D C Assignment Compatibility The relation σ := τ means that values of type τ may be assigned to variables of type σ: σ := σ short := byte σ :- σ int := byte int := short σ[] :- τ[], if σ :- τ int := char C := null java.lang.cloneable Turbo-Pascal: :- σ[] σ[] := null Object := σ[] java.io.serializable :- σ[] java.lang.cloneable := σ[] C :- D, if D reflexive C transitive java.io.serializable := σ[] anti-symmetric σ[] := τ[], if σ :- τ symmetric C := D, if D C It is reflexive, transitive, and anti-symmetric It is reflexive, transitive, and anti-symmetric 15 16 4
Statements Control Statements C,L,X,σ - S 1 C,L,X,σ - S 2 C, L, X, σ - S 1 S 2 C,L,X,σ - E: boolean C, L, X, σ - if (E) S C,L,X,σ - S C, L[n τ], X, σ - S C, L, X, σ - { τ n; S } C,L,X,σ - E: boolean C,L,X,σ - S i C, L, X, σ - if (E) S 1 else S 2 C, L, X, σ - E: τ C, L, X, σ - E; C,L,X,σ - E: boolean C, L, X, σ - while (E) S C,L,X,σ - S 17 18 Return Statements Throw Statement σ = void C, L, X, σ - return C,L,X, σ - E: τ throwok({τ},x) C, L, X, σ - throw E C, L, X, σ - E: τ σ := τ C, L, X, σ - return E throwok(a,b) α A: Error:= α RuntimeException:= α β B: β:=α 19 20 5
Locals Fields L(n) = τ C, L, X, σ - n: τ C,L,X,σ - this: C D - static τ f C,L,X,σ - D.f: τ C,L,X,σ - E: D D - τ f C, L, X, σ - E.f: τ 21 22 Assignments Arrays C,L,X,σ - E: τ 2 L(n)=τ 1 τ 1 :=τ 2 C, L, X, σ - n = E: τ 1 C,L,X,σ - E 1 : τ 1 [] C,L,X,σ - E 2 : τ 2 num(τ 2 ) C, L, X, σ - E 1 [E 2 ]: τ 1 C,L,X,σ - E: τ 2 D - static τ 1 f τ 1 :=τ 2 C, L, X, σ - D.f = E: τ 1 C,L,X,σ - E 2 : τ 2 C,L,X,σ - E 1 : D D - τ 1 f τ 1 :=τ 2 C, L, X, σ - E 1.f = E 2 : τ 1 C, L, X, σ - E 1 : τ 1 [] C, L, X, σ - E 2 : τ 2 C, L, X, σ - E 3 : τ 3 num(τ 2 ) τ 1 := τ 3 C, L, X, σ - E 1 [E 2 ]= E 3 : τ 1 num(σ) σ { byte, short, int, char } 23 24 6
Array Operations Operators C,L,X,σ - E: τ[] C,L,X,σ - E.clone(): Object C,L,X,σ - E: τ[] C,L,X,σ - E.length: int C,L,X,σ - E: boolean C,L,X,σ -!E: boolean C,L,X,σ - E 1 : τ 1 C,L,X,σ - E 2 : τ 2 num(τ 1 ) num(τ 2 ) C, L, X, σ - E 1 *E 2 : int C,L,X,σ - E: τ 2 num(τ 2 ) C,L,X,σ - new τ 1 [E]: τ 1 [] 25 26 Literals Plus C,L,X,σ - true: boolean C,L,X,σ - E 1 : τ 1 C,L,X,σ - E 2 : τ 2 num(τ 1 ) num(τ 2 ) C, L, X, σ - E 1 +E 2 : int C,L,X,σ - 42: int C,L,X,σ - "abc": String C,L,X,σ - E 1 : String C,L,X,σ - E 2 : τ 2 τ 2 void C, L, X, σ - E 1 +E 2 : String C,L,X,σ - null: null C,L,X,σ - '@': char C,L,X,σ - E 1 : τ 1 C,L,X,σ - E 2 : String τ 1 void C, L, X, σ - E 1 +E 2 : String 27 28 7
Equality Casts and Instanceof C,L,X,σ - E 1 : τ 1 C,L,X,σ - E 2 : τ 2 (num(τ 1 ) num(τ 2 )) τ 1 :=τ 2 τ 2 :=τ 1 C, L, X, σ - E 1 ==E 2 : boolean C,L,X,σ - E: τ 1 (num(τ 1 ) num(τ 2 )) τ 1 :=τ 2 τ 2 :=τ 1 C, L, X, σ - (τ 2 )E: τ 2 C,L,X,σ - E: τ τ:=d D:=τ C, L, X, σ - E instanceof D: boolean 29 30 Why Restrict Casts? Method Invocation τ 2 τ 1 τ 1 τ 2 Alway succeeds, but downcasting is useful for selecting different methods τ 2 := τ 1 Really useful, the object may or may not by a subclass of τ 2 C,L,X,σ - E: D C,L,X,σ - E i : τ i D - τ m(τ 1,...,τ k ) Y C, L, X, σ - E.m(E 1,...,E k ): τ τ 1 := τ 2 C,L,X,σ - E i : τ i τ 2 τ 1 Always fails, don't bother to execute D - static τ m(τ 1,...,τ k ) Y C, L, X, σ - D.m(E 1,...,E k ): τ τ 1 : τ 2 τ 1 : τ 2 31 32 8
Constructor Invocation Closest Match Method Invocation C,L,X,σ - E i : τ i D - (τ 1,...,τ k ) Y throwok(y,x) C, L, X, σ - new D(E 1,...,E k ): D C,L,X,σ - E i : τ i C - extends D D - (τ 1,...,τ k ) Y throwok(y,x) C, L, X, σ - super(e 1,...,E k ) C,L,X,σ - E: D C,L,X,σ - E i : σ i D - τ m(τ 1,...,τ k ) Y τ i := σ i D - γ m(γ 1,..., γ k ) Z : ( i : γ i := σ i ) ( i : γ i := τ i ) C, L, X, σ - E.m(E 1,...,E k ): τ This rule does not describe access modifiers 33 34 Checking Exceptions Judging Methods and Constructors C, L, X, σ - E: D C, L, X, σ - E.m(E 1,...,E k ): τ C, [a i σ i ], { X i }, σ - S Throwable := X i C - σ m(σ 1 a 1,...,σ k a k )throws X 1,...,X n { S } throwok( {Y i D - τ m(τ 1,...,τ k ) Y i }, X) C, [a i σ i ], { X i }, void - S Throwable := X i Y Z = Y Z Z Y C - C(σ 1 a 1,...,σ k a k )throws X 1,...,X n { S } Y Z = { y Y throwok({y},z) } 35 36 9
Different Kinds of Type Rules Type Proofs Axioms: C,L,X,σ - this: C A type proof is a tree in which: nodes are inferences leaves are axioms or true predicates Predicates: τ 1 :=τ 2 τ 1 :=τ 2 Inferences: C,L,X,σ - E: D D - τ f A judgment is valid it is the root of some type proof C, L, X, σ - E.f: τ 37 38 A Type Proof Static Type Correctness [x A,y B](x)=A C,[x A,y B],X,σ - x:a A B B A [x A,y B](y)=B C,[x A,y B],X,σ - (B)x: B B:=B C,[x A,y B],X, σ - y=(b)x: B C,[x A,y B],X, σ - y=(b)x; C,[x A],X, σ - {B y; y=(b)x;} C,[],X,σ - {A x; {B y; y=(b)x;}} (Assuming that B is a subclass of A) A program is statically type correct all method and constructor judgments are valid A type system is sound static type correctness implies type correctness Equal to a trace of a successful run of the type checker 39 40 10
Java's Type System is Unsound Greed The following is a valid judgement: Java's type system is too greedy: C, [b B,x A[],y B[],c C], X, σ - x=y; x[0]=c; b=y[0]; where B A, C A, and B and C are incomparable Afterward, b contains an object of class C But its declared type only allows B objects type correct statically type correct unsound Runtime type checks catch the unsound cases 41 42 Fixed Assignment Compatibility Complexity of Inference Systems By design, the rule for assignment of arrays has been chosen to be convenient but unsound It should have looked like: σ := σ short := byte σ :- σ int := byte int := short σ[] :- τ[], if σ :- τ int := char C := null java.lang.cloneable :- σ[] σ[] := null Object := σ[] java.io.serializable :- σ[] java.lang.cloneable := σ[] C :- D, if D C java.io.serializable := σ[] σ[] := τ[], if σ :- τ C := D, if D C Deciding valid judgments of an inference system depends on its rules and axioms and may be: undecidable (hyper)exponential time (backtracking) linear time The Java type system is designed to be simple A type checker performs a single traversal of the parse tree in linear time But that would then introduce annoying slack... 43 44 11
Type Checking in Java Implicit Coercions Type checking is performed bottom-up The rule to apply is uniquely determined by the types of subexpressions and the syntax The type proof also specifies how to transform the program with respect to: address comparisons implicit string conversions concat(...) code Suppose we add a coercion type rule: C,L,X,σ - E: int C,L,X,σ - E: float corresponding to the JVM bytecode i2f Type proofs are now no longer unique! 45 46 Ambiguous Type Proofs Semantic Coherence L(i)=int L(j)=int C,L,X,σ -i: int C,L,X,σ - j: int C,L,X,σ -i: float C,L,X,σ - j: float L(f)=float C,L,X,σ - i+j: float C,L,X,σ - f: float C,L,X,σ - i+j==f: boolean L(i)=int L(j)=int C,L,X,σ -i: int C,L,X,σ - j: int C,L,X,σ -i+j: int L(f)=float C,L,X,σ - i+j: float C,L,X,σ - f: float C,L,X,σ - i+j==f: boolean Different type proofs generate different transformed programs The type system is semantically coherent if they always have the same semantics The previous hypothetical type system is not: i = 2147400000 j = 83647 f = 2.14748365E9 Java chooses the proof with coercions as near the root as possible 47 48 12