Polymorphic Bytecode: Compositional Compilation for Java-like Languages

Size: px

Start display at page:

Download "Polymorphic Bytecode: Compositional Compilation for Java-like Languages"

Bernard Bryant
5 years ago
Views:

1 Polymorphic Bytecode: Compositional Compilation for Java-like Languages Davide Ancona DISI - Università di Genova davide@disi.unige.it Sophia Drossopoulou Dep. of Computing - Imperial College sd@doc.ic.ac.uk Ferruccio Damiani Dip. di Informatica - Università di Torino damiani@di.unito.it Elena Zucca DISI - Università di Genova zucca@disi.unige.it Abstract We define compositional compilation to be the capability to typecheck source code fragments in isolation, generate corresponding binaries, and link together fragments whose mutual assumptions are satisfied, without reinspecting the code. Even though compositional compilation is a highly desirable feature, in Java-like languages it can hardly be achieved. This is due to the fact that the bytecode generated for a fragment (say, a class) is not uniquely determined by its source code, but depends on the compilation context. In this paper, we propose a way to obtain compositional compilation for Java, by introducing a polymorphic form of bytecode containing type variables (ranging over class names) and equipped with a set of constraints involving type variables. Thus, polymorphic bytecode provides a representation for all the (standard Java) bytecode that can be obtained by replacing type variables with class names satisfying the associated constraints. We illustrate our proposal by developing a type inference and a linking algorithm for a small subset of Java. The type inference algorithm typechecks a class in isolation generating the corresponding polymorphic bytecode fragment with the needed constraints on the classes it depends on. The linking algorithm takes a collection of polymorphic bytecode fragments, checks their mutual consistency, and possibly simplifies and specializes them. In particular, when a self-contained collection of fragments is taken, linking either fails, or produces standard Java bytecode (which is the same as the one which would have been produced by standard Java compilation of all fragments together). Categories and Subject Descriptors: D.3.3[Programming languages]: Language constructs and features classes and objects; D.3.1[Programming languages]: Formal definitions and theory syntax, semantics; D.3.4[Programming languages]: Processors incremental compilers General Terms: languages, theory, design Keywords: type systems, selective recompilation, Java-like languages 1 Introduction Compilers have two main tasks: To check that the source code adheres to the language rules (which usually means that it type checks), and to produce target code. Originally, compilers would process the complete source of an application; thus they would apply global compilation. In strongly typed languages execution of a globally compiled application is guaranteed to be type safe. In the 70s, Modula-2 and Ada introduced separate compilation: An application would consist of fragments (e.g., modules, packages, or classes). A fragment would be compiled separately in the context of other, used fragments, and the produced target code fragment would reflect these used fragments, that is, the compilation environment in which it was created. An application would be put together through safe linking of target fragments; linking of target fragments was legal only with target fragments which corresponded to the compilation environment in which the former was created. Thus, linking preserved a correspondence between the compilation and the execution environment, and the ensuing application would correspond to a globally compiled one, and have the same type safety guarantees.

2 In recent years, Java and C have adopted the separate compilation approach, however combined with dynamic linking, whereby fragments (in this case, classes in binary form) are loaded lazily at run-time. Thus, dynamic linking does not attempt to preserve a correspondence between the compilation and the execution environment, nor does the ensuing application correspond to a globally compiled one; type safety can only be achieved through runtime verification checks. Thus, we argue that there is a clash of philosophy in Java (and C too) between compilation and execution. Namely, the adoption of separate compilation means that the target fragments reflect the compilation environment in which they were created, while the adoption of dynamic linking means that there is no correspondence between compilation and execution environment. For example, compilation of the source method declaration md s : E m(b x){ return x.f1.f2; in an environment 1 containing class B with field f1 of type C, and class C with field f2 of type E, generates bytecode md b 1 with annotations which reflect the classes where fields were found and their types, that is: E m(b x){ return x[b.f1 C][C.f2 E]; On the other hand, compilation of md s in an environment 2 containing a class B with a field f1 of type D, and a class D with a field f2 of type F, for some F subclass of E, generates a different bytecode md b 2: E m(b x){ return x[b.f1 D][D.f2 F] More importantly, execution of md b 1 in the environment 2 will throw a FieldNotFoundException, even though compilation and subsequent execution of md s in the environment 2 would be successful. Thus, given the lazy nature of Java dynamic linking, separate compilation is, in some sense, too eager, and too context dependent. In this paper, we consider, instead, compositional compilation, whereby target fragments do not reflect the compilation environment in which they were created, while linking produces an application which corresponds to a globally compiled one, and which, therefore, has all its type safety guarantees. We define compositional compilation to be the capacity to typecheck source code fragments in isolation, to generate corresponding binaries, and to link together fragments whose mutual assumptions are satisfied, without reinspecting the code. We illustrate a new approach to compilation and linking for Java-like languages, which will support compositional compilation. We propose a polymorphic form of bytecode containing type variables (ranging over class names) and equipped with a set of constraints involving type variables. Thus, polymorphic bytecode provides a representation for all the (standard Java) bytecode that can be obtained by replacing type variables with class names satisfying the associated constraints. In terms of our example, md s can be compiled in isolation; the set of polymorphic constraints associated with md s is { φ(b, f1, α), φ(α, f2, β), β E, where α, β are type variables, and φ(t, f2, t ) expresses that type t is expected to declare or inherit a field f2 of type t. Furthermore, the following polymorphic bytecode md b would be generated: E m(b x){ return x[b.f1 α][α.f2 β]; In this approach, we can assign to each typable fragment a typing containing a fragment of polymorphic bytecode and a set of type constraints. For instance, in the typing for the fragment containing md s the set of type constraints contains φ(b, f1, α), φ(α, f2, β), β E and the polymorphic bytecode contains md b. Our approach also supports linking, which checks whether the polymorphic bytecode of the various fragments satisfies each other s requirements, without inspecting the code itself. The process involves the replacement of some type variables by concrete classes. In our example, linking md b in the environment 1 leads to md b 1, linking md b in the environment 2 leads to md b 2. The rest of the paper is organized as follows. In Section 2 we define a schema formalizing the notions of both global and compositional compilation, the notions of soundness and completeness of compositional compilation w.r.t. global compilation, and give sufficient conditions for guaranteeing them. These conditions place most requirements on the linking process, and on the relation between global and compositional compilation of one class. In Sections 3 and 4 we instantiate the schema to model global compilation and compositional compilation for a small Java-like language. In Section 5 we give an algorithm for linking and outline why this algorithm satisfies the properties required by the theorems in Section 2. We conclude by discussing some further work. A preliminary version of the material presented in this paper appeared in [1]. 2 Formalizing global and compositional compilation In this section we define a schema formalizing both global and compositional compilation, by listing the basic syntactic categories and judgments such a type system should define. In the presentation, to be more 2

3 concrete, we use a Java-oriented terminology, since a significant class of languages on which the schema could be instantiated are Java-like languages (in particular, in the next sections we present an instance which defines global and compositional compilation for Featherweight Java [7]). However, the schema is much more general, and appropriate for any language where, roughly speaking, executable code which is generated by compilation is context-dependent. Hence, class declaration below can be thought of, in general terms, as declaration of a language entity, and so on. Source class declarations (s), binary class declarations (b). Source fragments (S), which are sequences of source class declarations; and binary fragments (B), which are sequences of binary class declarations. Class type environments ( ), which are sequences of class type assignments (δ). A class type assignment can be thought of as the type information which can be extracted from a class declaration (hence the metavariables and δ); thus a class type environment corresponds to a sequence of source class declarations deprived of bodies. Global compilation (of a class), g s : δ b, to be read: The source class declaration s has type δ and compiles to b in the class type environment. Type constraint environments Γ, which are sequences of type constraints (γ). Compositional compilation (of a class), c s : δ Γ b, to be read: The source class declaration s has type δ and compiles to b under the type constraints in Γ. Linking, Γ B Γ B, to be read: In the class type environment the type constraints Γ are simplified into Γ, and the binary fragment B becomes B. Empty class type environments and empty type constraints environments will be denoted by Λ. The ingredients from above model two different approaches to compilation. In the first approach, code fragments are compiled in the context of full type information on other fragments (formalized by the class type environment ), as shown by the rules in Fig.1. The global compilation (of fragment) judgment G S : B means that in the fragment S has type and compiles to B. Here represents (full) type information related to classes which are not being compiled, such as libraries. In particular, Λ G S : B models the compilation of a self-contained fragment. (g-frag1) δ g s : δ b G s : δ b 1... n G S i : i B i i 1..n (g-frag2) G S 1... S n : 1... n B 1... B n Figure 1: Global compilation of a fragment In the second approach, code fragments can be compiled in isolation producing binary code equipped with type constraints (both the binary code and the type constraints might contain type variables denoting the names of yet unknown classes). Then, it is possible to link together a successfully compiled collection of fragments, obtaining, if their mutual requirements are compatible, a new binary fragment with simplified type constraints. This is shown by the rules in Fig.2. The compositional compilation (of a fragment) judgment C S : Γ B means that the source fragment S has type and compiles to B under the type constraints in Γ. Notice that this check does not depend on the source code. (c-frag1) (c-frag2) c s : δ Γ b δ Γ b Γ b C s : δ Γ b C S i : i Γ i B i i 1..n 1... n Γ B Γ B C S 1... S n : 1... n Γ B (Γ,B )= i 1..n (Γ i, B i) Figure 2: Compositional compilation of a fragment In rule (c-frag2), we assume an operator which, given a (non empty) sequence of pairs consisting of a type constraint environment and a binary fragment, gives a new pair, intuitively obtained by combining them avoiding interferences (typically, in case type constraints and binary fragments contain type variables, this operator will avoid clashes by α-renaming). We assume that (Γ, B) = (Γ, B). Iterating this process, we will eventually obtain a fragment for which no type constraints are left, that is, a judgment C S : Λ B. This means that we have obtained a self-contained fragment. In this case, we expect to have obtained the same result as global compilation, that is, compositional compilation is sound w.r.t. global compilation, and conversely, that is, compositional compilation is complete w.r.t. global 3

4 compilation. As a first approximation, soundness and completeness could be expressed as follows: C S : Λ B if and only if Λ G S : B. However, since compositional compilation is obviously expected to be incremental, the claim above should be generalized in order to deal with open source fragments, that is, fragments where some needed class is missing. Definition 1 We say that compositional compilation is sound w.r.t. global compilation iff C S : Γ B and Γ B Λ B imply G S : B. Theorem 2 (Sufficient conditions for soundness) Compositional compilation is sound w.r.t. global compilation if the following conditions hold: 1. c s : δ Γ b and δ Γ b Λ b imply g s : δ b Γ B Γ B and 1 2 Γ B Λ B imply 1 2 Γ B Λ B. 3. Γ B Λ B, with (Γ, B) = i 1..n (Γ i, B i ), implies Γ i B i Λ B i, for some B i, for all i 1..n, and B = B 1... B n. 4. Λ Λ B Λ B for all B. Proof: We prove the claim inductively on the derivation of the compositional compilation judgment. Note that if we take = Λ = Γ, applying (4), we get soundness for closed fragments: C S : Λ B implies Λ G S : B. There are two cases. Assume C s : δ Γ b holds by rule (c-frag1). Then, premises (P1) c s : δ Γ b and (P2) δ Γ b Γ b hold. Moreover by hypothesis (HP) δ Γ b Λ b holds. Then, from (HP) and (P2), applying (2), we get δ Γ b Λ b. From that, and from (P1), applying (1), we get δ g s : δ b; hence we can conclude by applying rule (g-frag1). Assume C S 1... S n : 1... n Γ B holds by rule (c-frag2). Then, premises (P1) C S i : i Γ i B i, for all i 1..n and (P2) 1... n Γ B Γ B, with (Γ, B ) = i 1..n (Γ i, B i ), hold. Moreover by hypothesis (HP) 1... n Γ B Λ B holds. Form (HP) and (P2), applying (2), we get 1... n Γ B Λ B. Then, by (2), we get 1... n Γ i B i Λ B i, for some B i, for all i 1..n, and and from (P1) by inductive hypothesis 1... n g S i : i B i, hence we can conclude by applying rule (g-frag2). Definition 3 We say that compositional compilation is complete w.r.t. global compilation iff G S : B implies B, Γ s.t. C S : Γ B and Γ B Λ B. Theorem 4 (Sufficient conditions for completeness) Compositional compilation is complete w.r.t. global compilation if the following conditions hold: 1. g s : δ b implies b, Γ s.t. c s : δ Γ b and δ Γ b Λ b Γ B Λ B implies 2 Γ B Γ B, for some Γ, B, and 1 2 Γ B Λ B. 3. Γ i B i Λ B i, for all i 1..n, (Γ, B) = i 1..n (Γ i, B i ) implies Γ B Λ B 1... B n. Proof: We prove the claim inductively on the derivation of the global compilation judgment. Note that for = Λ, if there exist B, Γ s.t. C S : Γ B and Γ B Λ B, we get C S : Λ B by applying rule (c-frag2), hence we obtain completeness for closed fragments. Λ G S : B implies C S : Λ B. There are two cases. Assume G s : δ b holds by rule (g-frag1). Then, premise (P) δ g s : δ b holds. Hence by (1) there exist b, Γ s.t. c s : δ Γ b and δ Γ b Λ b. By (2) we have that δ Γ b Γ b, for some Γ, b, δ Γ b Λ b. Hence we can apply rule (c-frag1) and get C s : δ Γ b, and the thesis follows. 4

5 Assume G S 1... S n : 1... n B 1... B n holds by rule (g-frag2). Then, premises (P) 1... n G S i : i B i hold, for all i 1..n. By inductive hypothesis, for all i 1..n there exist Γ i, B i s.t. (IndHP1) C S i : i Γ i B i and (IndHP2) 1... n Γ i B i Λ B i hold. From (IndHP2), we get, by (3), 1... n Γ B Λ B 1... B n, for (Γ, B ) = i 1..n (Γ i, B i ). Then, by (2), there exist Γ, B s.t. (*) 1... n Γ B Γ B and (**) 1... n Γ B Λ B 1... B n. Hence, we can apply rule (c-frag2) with premises (IndHP1), for all i 1..n, and (*), and get C S 1... S n : 1... n Γ B. Finally, from (**) we get the thesis. 3 Global compilation of Featherweight Java In this section, we formalize global compilation of a small Java-like language. This models both standard type checking for Java-like languages (see, e.g., [7, 6]), as well as bytecode generation (as already in, e.g., [3]). The syntax of the (source) language is defined in Fig. 3. S ::= s 1... s n s ::= class c extends c { fds mds s fds ::= fd 1... fd n fd ::= c f; mds s ::= md s 1... md s n md s ::= mh {return e s ; mh ::= c 0 m(c 1 x 1... c n x n ) e s ::= x e s.f e s 0.m(e s 1... e s n) new c(e s 1... e s n) (c)e s Figure 3: Source language It is basically Featherweight Java [7] (FJ in the sequel hence a functional subset of Java with no primitive types, except for the minor difference that here class constructors are implicitly declared. Every class can contain instance field and method declarations and has only one constructor whose parameters correspond to all class fields (both inherited and declared) in the order of declaration. Method overloading and field hiding are not supported. Expressions are variables, field access, method invocation, instance creation and casting; the keyword this is considered a special variable. Finally, in order to simplify the presentation, we assume field names in fds, method names in mds s, parameter names in mh to be distinct. In Fig. 4 we give the syntax of bytecode generated by global compilation that is (an abstraction of) standard Java bytecode. B ::= b 1... b n b ::= class c extends c { fds mds b mds b ::= md b 1... md b n md b ::= mh {return e b ; e b ::= x e b [c.f c ] e b 0[c.m( c)c ](e b 1... e b n) new [c c](e b 1... e b n) (c)e b c ::= c 1... c n where fds and mh are defined in Figure 3 Figure 4: (Abstract) standard bytecode Our notion of bytecode is abstract, since the only differences between source code and bytecode of interest here are the annotations needed by the JVM verifier recall that in Java bytecode, a field access is annotated with the static type of the receiver and the type of the field, a method invocation is annotated with the static type of the receiver, the type of the parameters and the return type, and an instance creation with the type of the parameters. In Fig. 5 we define class type assignments. δ ::= (c, c, fss, mss) fss ::= {fs 1,..., fs n fs ::= c f mss ::= {ms 1,..., ms n ms ::= c m( c) Figure 5: Class type assignments A class type assignment collects the type information needed for compiling other classes which can be extracted from a class declaration; it is a 4-tuple consisting of the name of the class, the name of the parent class, the set of field signatures (type and names of declared fields) and the set of method signatures (return type, name and parameter types of declared methods). We assume the existence of a function type extracting type information from a source class declaration. So we will write type(s) to denote the class type assignment extracted from the source class declaration s. 5

6 Similarly, we will write type(fds) and type(mds s ) to denote the set of field signatures and the set of method signatures extracted from the field declarations fds and from the method declarations mds s, respectively. The straightforward definition of type, has been omitted. The typing rules defining global compilation of a class are given in Fig. 6. They are standard rules analogous to those given in other type type systems for Javalike languages [7, 6, 3]). We use the following auxiliary judgments: ; c g mds s : mds b, meaning that method declaration(s) mds s in class type environment and current class c (needed for assigning the right type to this) compile to mds b. ; Π g e s : c e b, meaning that expression e s in class type environment and local type environment Π (which maps this and method parameters to class names) has type c and compiles to e b. In rule (g-class), for compiling a class c in a class type environment we check that is well-formed (judgment g ), and compile each method body in and current class c. A class type environment is wellformed if there are no multiple type assignments for the same class name, the inheritance relation is acyclic, each extended class is available, there is no field hiding, the Java rule on overriding is respected and there is no overloading (the last two conditions correspond to the requirement that a class may not declare a method with the same name and different return or parameter types as an inherited method). Note, however, that a class which is only used as a type may not have an assignment in. The judgment g is formally defined in Figure 16 in the Appendix. In rules for compiling expressions, we use an auxiliary judgment of the form γ, meaning that in the class type environment the type constraint γ holds. Type constraints are listed in Fig. 7. They have the following informal meaning: c c means c must be a subtype of c. φ(c, f, c ) means c provides field f with type c. µ(c, m, c, (c, c )) means c provides method m applicable to arguments of type c, with parameters of type c and return type c. κ(c, c, c ) means c provides constructor applicable to arguments of type c, with parameters of type c. The rules defining the judgment γ (in Fig. 8) are intuitive and almost self-explanatory. In rule (φ-2), the side condition f fss means that f is not declared in fss; analogously, in rule (µ-2), m mss means that m is not declared in mss. The type constraints in Fig. 7 and the rules in Fig. 8 are essentially a subset of the those defined in [4] (where type constraints are called local type assumptions). The rules for the compilation of a class in Fig.6, together with the general rules for the global compilation of a fragment, given in Fig.1, provide an instantiation to Featherweight Java of the global compilation schema introduced in Section 2. 4 Compositional compilation of Featherweight Java In this section we formalize compositional compilation for the small Java-like language introduced in Section 3. Classes are compiled in isolation into polymorphic bytecode. That is, bytecode where the annotations may contain type variables denoting names of yet unknown classes. The syntax of polymorphic bytecode is described by the first four productions in Fig. 4 (defining binary fragments, class declarations, method sequences, and methods, respectively) and by the productions in Fig. 9 (defining binary expressions). e b ::= x e b [t.f t ] e b 0[t.m( t)t ](e b 1... e b n) new [c t](e b 1... e b n) (c)e b c, α e b t ::= c α t ::= t 1... t n Figure 9: Polymorphic bytecode (expressions) Besides the presence of type variables, the only difference between polymorphic and standard bytecode is the presence of the polymorphic casting annotated expression, c, α e b, which can be specialized either into e b, if the type variable α is substituted with c s.t. c c (casting-up), or into (c)e b, if α is substituted with c s.t. c c (casting-down). For the polymorphic casting annotation we use a different notation (double angle brackets rather than square brackets) since this annotation is only allowed in polymorphic bytecode. Polymorphic bytecode comes with a sequence of polymorphic type constraints, which involve type variables and class names. These are listed in Fig. 10. As for polymorphic bytecode, the meta-variable t denotes either a type variable or a class name. Besides the presence of type variables, the only difference between the polymorphic type constraints and the type constrains listed in Fig. 7 is the presence of the last two constraints in Fig. 10, whose informal meaning is the following: 6

7 (g-class) g ; c g mds s : mds b g class c extends c {fds mds s : (c, c, fss, mss) class c extends c {fds mds b type(mds s ) = mss type(fds) = fss (g-methods) (g-method) ; c g md s i : md b i i 1..n ; c g md s 1... md s n : md b 1... md b n ; x 1 :c 1... x n :c n, this:c g e s : c e b ; c g c 0 m(c 1 x 1... c n x n ) {return e s ; : c 0 m(c 1 x 1... c n x n ) {return e b ; c c 0 (g-parameter) Π x:c ; Π g x : c x (g-field access) ; Π g e s : c e b ; Π g e s.f : c e b [c.f c ] φ(c, f, c ) (g-meth call) ; Π g e s 0 : c 0 e b 0 ; Π g e s i : c i e b i i 1..n ; Π g e s 0.m(es 1,..., es n) : c e b 0 [c 0.m( c)c](e b 0,..., eb n) µ(c 0, m, (c 1,..., c n ), (c, c)) (g-new) ; Π g e s i : c i e b i i 1..n ; Π g new c(e s 1... es n) : c new [c c](e b 1... eb n) κ(c, c 1... c n, c) (g-downcast) ; Π g e s : c e b ; Π g (c)e s : c (c)e b c ; Π g e s : c e b c (g-upcast) ; Π g (c)e s : c e b c c Figure 6: Global compilation γ ::= c c φ(c, f, c ) µ(c, m, c, (c, c )) κ(c, c, c ) Figure 7: Type constraints c means c must be defined (this constraint will be generated when compiling a reference to an external class). c t means c and t must be comparable (this constraint will be generated when compiling a cast to an external class). Polymorphic type constraints not containing type variables will be called monomorphic type constraints. The rules defining the judgement for compositional compilation of classes are given in Fig. 11. We use the following auxiliary judgments: c c mds s : Γ mds b, meaning that method declaration(s) mds s in current class c (needed for assigning the right type to this) compile to mds b under the polymorphic type constraints in Γ. Π c e s : t Γ e b, meaning that expression e s in local type environment Π (which maps this and method parameters to class names) has type t and compiles to e b under the polymorphic type constraints in Γ. The intuition behind the compositional compilation rules is that they extract the polymorphic type constraints Γ necessary to compile a given source fragment into a certain polymorphic binary fragment. However, note that the rules do not check whether the inferred collection of constraints Γ is actually satisfiable; indeed, for any fragment it is possible to derive a judgment, even for those that are not statically correct. Consistency checks are performed by the rule for the linking judgment (see below). This approach has the advantage that the typing rules for separate compilation are very simple and can be implemented in a 7

8 (refl) c c (trans), (c 1, c 2, fss, mss) c 2 c 3 c 1 c 3 (top) c Object (φ-1), (c, c, fss, mss) φ(c, f, c ) c f fss (φ-2), (c, c, fss, mss) φ(c, f, c ), (c, c, fss, mss) φ(c, f, c ) f fss (µ-1), (c, c, fss, mss) c i c i i 1..n, (c, c, fss, mss) µ(c, m, (c 1,..., c n ), (c, (c 1,..., c n))) c m(c 1,..., c n) mss (µ-2), (c, c, fss, mss) µ(c, m, c, (c, c )), (c, c, fss, mss) µ(c, m, c, (c, c )) m mss (κ-1) κ(object, ɛ, ɛ) (κ-2), (c, c, fss, mss) κ(c, (c 1,..., c k ), (c 1,..., c k )), (c, c, fss, mss) c i c i i k + 1..n, (c, c, fss, mss) κ(c, (c 1,..., c n), (c 1,..., c n )) fss = c k+1 f k+1,..., c n f n Figure 8: Entailment judgement γ (rules for the type constraints in Fig.7) γ ::= t t φ(t, f, t ) µ(t, m, t, (t, t )) κ(c, t, t ) c c t Figure 10: Polymorphic type constraints straightforward way. Note also that in the type system a unique judgment can be derived for any class declaration (the proof is immediate); therefore, we can easily define a type inference algorithm, that is, an effective way for deducing just from the single declaration of a class c the type and the (polymorphic) bytecode of c, and the required type constraints. This is not possible for the system in [4, 2], where one needs to know the environment where c is compiled. In order to define compositional compilation for FJ, we have now to define the linking judgment Γ B Γ B. Linking a fragment of polymorphic bytecode B, equipped with polymorphic type constraints Γ, according to a given class type environment, amounts to finding a suitable substitution σ, mapping the type variables into class names. The substitution σ should instantiate some polymorphic type constraints in Γ into monomorphic type constraints that hold in, and instantiate variables in B correspondingly. In this way, these constraints can be eliminated, leaving only the constraints in Γ. In particular, when all constraints are eliminated, we will obtain a fragment of standard bytecode (like the one produced by global compilation). Instantiation of Γ w.r.t. substitution σ is denoted by σ(γ); we have omitted the trivial inductive definition which coincides with conventional variable substitution. Instantiation of B w.r.t. and σ is denoted by I σ (B); is needed for dealing with the case c, α e b : I σ ( c, α eb ) = I σ (eb ) if σ(α) = c and c c (c)i σ (eb ) if σ(α) = c and c c c, α I σ (eb ) if α is not substituted by σ undefined otherwise In all other cases, instantiation of polymorphic bytecode corresponds to variable substitution. The fact that the monomorphic type constraint γ holds in the class type environment is expressed by the judgement γ, which is defined by the rules in Fig. 8 and in Fig. 12. We will write Γ to mean that γ holds for all γ Γ. The linking judgement is formally defined by rule (c-linking) in Fig. 13. Rule (c-linking) is parameterized w.r.t. a linkingsimplification relation ls. A linking-simplification relation models a particular way of finding suitable substitutions for simplifying type constraints w.r.t. class type environments. The judgment c (well-formed type environments for compositional compilation) is formally defined in Fig.17 in the Appendix. The judgment c is 8

9 (c-class) c c mds s : Γ mds b c class c extends c {fds mds s : (c, c, fss, mss) Γ class c extends c {fds mds b type(mds s ) = mss type(fds) = fss (c-methods) (c-method) c c md s i : Γ i md b i i 1..n c c md s 1... md s n : Γ 1... Γ n md b 1... md b n x 1 :c 1... x n :c n, this:c c e s : t Γ e b c c c 0 m(c 1 x 1... c n x n ) {return e s ; : Γ, t c 0, c i 1..n i c 0 m(c 1 x 1... c n x n ) {return e b ; (c-parameter) Π x:c Π c x : c ɛ x (c-field access) Π c e s : t Γ e b Π c e s.f : α Γ, φ(t, f, α) e b [t.f α] α fresh (c-meth call) (c-new) (c-cast) Π c e s 0 : t 0 Γ 0 e b 0 Π c e s i : t i Γ i e b i i 1..n Π c e s 0.m(es 1... es n) : β Γ 0 Γ 1... Γ n, µ(t 0, m, t 1... t n, (β, ᾱ)) e b 0 [t 0.m(ᾱ)β](e b 1,..., β,ᾱ fresh eb i ) Π c e s i : t i Γ i e b i i 1..n Π c new c(e s 1... es n) : c Γ 1... Γ n, κ(c, t 1... t n, ᾱ) new [c ᾱ](e b 1... eb n) ᾱ fresh Π c (c)e s Π c e s : t Γ e b : c Γ, c t c, t e b Figure 11: Compositional compilation ( ), (c, c, fss, mss) c (Obj) Object ( -1) c c c c ( -2) c c c c Figure 12: Entailment judgement γ (rules for the type constraints c and c 1 c 2 ) (c-linking) c ; Γ ls σ Γ Γ B Γ I σ (B) Figure 13: Linking more liberal then g, since it allows extended classes to be undefined in. The rules for the compilation of a class in Fig.11 and the rule for linking in Fig.13, together with the general rules for the compositional compilation of a fragment, given in Fig.2, provide an instantiation to FJ of the compositional compilation schema introduced in Section 2. In this instantiation the operator, assumed by rule (c-frag2) in Fig.2 in Section 2, just makes a fresh renaming of the type variables occurring in each pair <constraints, fragment> and returns the pair <concatenation of the constraints, concatenation of the fragments>. We prove now that, under suitable hypothesis on the linking-simplification relation, compositional compilation we have defined for FJ can be safely used in place of global compilation, that is, is sound and complete w.r.t. global compilation in the sense of Def.1 and Def.3 in Sect.2. More precisely, the assumptions (1) in Theorem 2 and (1) in Theorem 4, which only relate compositional and global compilation of a single class, hold independently of the linking-simplification relation we choose, since they correspond to require the following properties of the entailment judgment δ σ(γ). 9

10 Proposition 5 (Soundness of entailment) If δ σ(γ) holds, then s, b c s : δ Γ b implies g s : δ I σ (b). Proposition 6 (Completeness of entailment) If g s : δ b holds, then there exist σ, Γ s.t. c s : δ Γ b, δ σ(γ), b = I σ (b ). For proving the other assumptions in Theorem 2 and Theorem 4, we need the simplification relation to satisfy the requirements listed in Theorem 7 below, which have the following informal meaning. ls -sound guarantees that the simplification step is sound, in the sense that it actually eliminates type constraints which (under the given substitution) hold in the current class type environment, and type constraints which are left do not contain variables which have been substituted. ls complete 1 guarantees that, if contains enough type information to satisfy all type constraints in Γ, then this simplification step must be possible in ls. ls complete 2 handles the case in which there is not enough type information in a class type environment, say 2, to guarantee that all constraints in Γ hold. However, if it is possible to eliminate these type constraints in a larger class type environment 1 2, then it must be possible to partially simplify Γ in 2, obtaining Γ. Note that the last requirement leaves the possibility of many choices in defining ls, including that in which partial simplification just does nothing (Γ = Γ in the above) and all simplifications are done only when the current class type environment allows to simplify all type constraints. However, of course an algorithm implementing ls should attempt at doing as many simplifications as possible at any step. This is the problem faced in the next section. In the following we denote by functional composition of substitutions, by the empty substitution, by Γ \ Γ the sequence of constraints obtained by removing from Γ all constraints in Γ, and Var(Γ) the set of type variables in Γ. Theorem 7 If the linking-simplification relation ls satisfies the following assumptions: ls -sound ; Γ ls σ Γ implies σ(γ) \ Γ, dom(σ) Var(Γ ) = ls -complete-1 σ(γ) implies ; Γ ls σ Λ ls -complete ; Γ ls σ Λ implies Γ, σ, σ s.t. 2 ; Γ ls σ Γ 1 2 ; Γ ls σ Λ σ = σ σ then, compositional compilation of FJ is sound and complete w.r.t. global compilation. Proof sketch Soundness Conditions in Theorem 2 are satisfied. Indeed, condition (1) holds by Prop.5; conditions (2), (3), (4) hold by assumption ( ls -sound). Completeness Conditions in Theorem 4 are satisfied. Indeed, condition (1) holds by Prop.6; conditions (2) and (3) holds by assumptions ( ls -complete-1) and ( ls -complete-2). 5 A linking-simplification algorithm In this section we describe a particular linking algorithm, thus making rule c-linking effective, and we sketch a proof that this algorithm is a correct implementation of the ls relation. We start with some basic definitions which specify the problem we are aiming to solve. 5.1 Basic definitions Unless specified, in this section we will only consider well-formed type environments, that is, possibly open environments with no inheritance cycles, no field hiding and no bad method overriding (see Fig. 17 in the Appendix). We will use the notation (Γ Γ ) for meaning that (Γ) is a subsequence of Γ, that is, all δ (γ) occurring in the sequence (Γ) occur in (Γ ) as well. Analogously, (Γ Γ ) means that (Γ) is a subsequence of and there exists δ (γ) occurring in (Γ ) which does not occur in (Γ). As a first, rather imprecise attempt, the problem could be informally stated as follows: given a type environment and a sequence of constraints Γ, find the possibly maximal Γ Γ s.t. satisfies Γ ; note that, since we are interested in incremental linking, might not satisfy the whole environment Γ. A first problem with the above statement is that satisfaction of type constraints is under-specified; e.g., for = (c, Object, ɛ, ɛ) and Γ = φ(c 1, f, α), α c (with c c 1 ), one might be tempted to assert that α c is satisfied by with α = c. Nevertheless, even though α = c originally seems like the only possible 10

11 solution, it cannot be considered valid because, it is sensitive to extensions to. For instance, if we take =, (c 1, c, c 1 f, ɛ) then we discover that α = c is no longer a valid solution, and that the whole Γ is satisfied by with α = c 1. In terms of the linking process, this means that we would need to backtrack from α = c when adding a class c 1 as specified by. We now formalize the above reasoning. First, a solution is a substitution σ for type variables s.t. satisfies σ(γ), that is, σ(γ) is valid. Second, in order to avoid backtracking, σ must be the unique possible choice (up to inclusion of maps) for all extensions of. Definition 8 Given two substitutions σ, σ, we write σ σ iff dom(σ) dom(σ ) and for all α dom(σ), σ(α) = σ (α). Definition 9 A sequence of constraints Γ has solution σ w.r.t. a given iff 1. σ(γ); 2. σ, : if and σ (Γ), then σ σ. Fact 10 A Γ has at most one solution w.r.t. a given. Now that we have formalized the notion of solution, we can consider in more detail the situation where Γ has no solution w.r.t. a given ; as already said, this situation occurs quite naturally when considering incremental linking, because is likely to be incomplete, and thus may contain insufficient information to compute a solution for Γ. Nevertheless, the algorithm should be able to identify a subsequence Γ Γ s.t. Γ has a solution σ w.r.t., and perform a simplification step: the constraints Γ are removed in order to avoid unnecessary checks in further linking steps, whereas σ is applied to the remaining constraints Γ\Γ, thus returning a sequence Γ. Note that if the algorithm was smart enough, then Γ would be maximal, that is, there would not exist a Γ s.t. Γ Γ Γ and Γ has solution w.r.t.. We distinguish two possible situations w.r.t. remaining constraints Γ : If there exists no extension of which gives a solution for Γ, then the algorithm should detect a linking error. In this case, we say that Γ is inconsistent w.r.t.. Otherwise, linking succeeds, but the constraints Γ still need to be satisfied, therefore the obtained fragment needs to be linked further before execution. In this case, we say that Γ is undetermined w.r.t.. Definition 11 A sequence of constraints Γ is inconsistent w.r.t. a given iff for all if, then there is no σ s.t. σ(γ); it is consistent (w.r.t. ) otherwise. A sequence of constraints Γ which is inconsistent w.r.t. ɛ (that is, for all ) is called inconsistent. Conversely, a sequence of constraints Γ which is consistent w.r.t. some (hence, w.r.t. ɛ as well) is called consistent. Definition 12 A sequence of constraints Γ is determined w.r.t. iff Γ either has a solution or is inconsistent w.r.t. ; is undetermined (w.r.t. ) otherwise. Definition 13 A sequence of constraints Γ is determined iff for all, there exists s.t. and Γ is determined w.r.t. ; and is undetermined otherwise. A sequence of constraints Γ is fully determined (undetermined) iff Γ is determined (undetermined) w.r.t. all. Proposition 14 The constraint γ is determined iff it matches one of the following patterns: c, c c, φ(c, f, t), µ(c, m, c, (t, t)), κ(c, c, t), c c. The constraint γ is fully undetermined iff it matches one of the following patterns: α t, t α, φ(α, f, t), µ(α, m, t, (t, t )), c α. Definition 15 A type variable in a constraint may appear either in an input or an output position. Input and output positions are specified as follows: in in, φ(in, f, out), µ(in, m, in, (out, out)), κ(in, in, out), c in. Definition 16 The following topological relation is defined on type constraints: γ γ iff there exists α s.t. α appears in output position in γ and in input position in γ. 5.2 Description of the algorithm The Java pseudo-code of the main algorithm is defined in Figure 14. The method solve takes a class type environment and returns a substitution; it is declared in the class implementing sequences of constraints, therefore this denotes a certain Γ. When invoked, method solve either throws fail or returns a substitution σ and simplify Γ into Γ. The exception fail can be thrown either by the invocation.check(), if is ill-formed, or by the invocation γ.entailedby( ), if some γ Γ is found to be inconsistent w.r.t. (hence the whole Γ is inconsistent as well). If a substitution σ is returned, then solve has found a subsequence Γ Γ which has solution σ w.r.t. and Γ has been modified to Γ by removing Γ and applying σ. The fact that Γ is maximal can be deduced from the following provably property: for all γ Γ, γ has no solution w.r.t.. However, while all possible simplifications are always performed, some inconsistencies 11

12 Input: this: a sequence of constraints Γ argument: a type environment Output: if it fails then Γ is inconsistent w.r.t. else returns σ and transforms Γ into Γ s.t. Γ = σ(γ \ Γ ) Γ Γ σ is the solution of Γ w.r.t. for all γ Γ, γ has no solution w.r.t. Pseudo-code: Subs solve(env ) throws fail{.check() // throws fail if ill-formed σ = ɛ this.topsort() // Γ must be topologically sorted for each γ this { // according to the order try{ σ = γ.entailedby( ) this.remove(γ) this.apply(σ ) σ.update(σ ) catch(undetermined){ return σ Figure 14: Constraint solving algorithm solve may be discovered later on, when some new fragment is linked, mainly because each constraints γ Γ is checked separately. For instance, given = (c, Object, ɛ, ɛ) and Γ = c 1 c 2, c 2 c 1 (with all class names distinct), Γ.solve( ) returns the empty substitution and does not modify Γ, even though Γ is clearly inconsistent; however, such inconsistency can be captured when performing further linking steps. For instance, if we take =, (c 1, Object, ɛ, ɛ) then Γ.solve( ) throws fail since c 1 c 2 is clearly inconsistent w.r.t.. The constraints are processed respecting the topological order given in Definition 16, so that it is possible to scan Γ only once without failing to simplify some constraints. As an example, if Γ = α c, φ(c, f, α) is not topologically sorted, and = (c, Object, c f, ɛ), then α c is processed first and kept since is undetermined w.r.t.. Then φ(c, f, α) is removed since it has solution α = c and α c specialized into c c, therefore solve fails to perform a simplification step. The method entailedby checks whether a single γ has solution w.r.t. a ; if so, it returns the corresponding substitution σ, otherwise it throws either fail, if γ Subs entailedby(env ) throws undetermined, fail{ // assumed to be well-formed if this.lhs().var() this.rhs().var() throw undetermined c 1 = this.lhs() c 2 = this.rhs() if c 1.equals(c 2 ) c 2.equals(Object) return ɛ c = c 1 S = {c 1 // will contain all c s.t. c 1 c try { while c Object{ c =.superclass(c) S.add(c) if c.equals(c 2 ) return ɛ throw fail // chain complete up to Object catch(undetermined){ c = c 2 // still could fail while c Object{ c =.superclass(c) if S.contains(c) throw fail throw undetermined Figure 15: Definition of entailedby for t t is inconsistent w.r.t., or undetermined if γ is undetermined w.r.t.. As already explained, fail is propagated by solve, whereas undetermined is captured; in this way, the constraint is not removed, no substitution is applied and solve can skip to the next constraint to be processed. Figure 15 contains the pseudo-code for the method entailedby in the class representing constraints of the form t t (all other cases can be found in the Appendix). If either the left or right hand side of the constraint is a variable, then the constraint is fully undetermined, therefore the corresponding exception is thrown. Otherwise the constraint is ground and we can check whether it is satisfied by. If the trivial cases do not apply (reflection and top type), then we perform an inheritance graph traversal from c 1 up to Object which can terminate in three different ways: if c 2 is found, then the constraint is satisfied and we return the empty substitution; if Object is reached without finding c 2 then the constraint is inconsistent and fail is thrown; otherwise the traversal stops because method superclass throws undetermined since some superclass of c 1 (c 1 included) could not be found in. However, in this last case the exception must be caught since the constraint still could be inconsistent if there exists c s.t. c 2 c and c c 1. For this reason a new traversal is 12

13 started from c 2 looking for a superclass of c 2 (c 2 included) contained in the set S of all superclasses of c 1 (c 1 included) collected during the first traversal. If such a class is found, then fail is thrown, otherwise (if either superclass throws undetermined, or Object is reached) undetermined is thrown. 5.3 Correctness of the algorithm The following proposition ensures that in fact solve is a correct implementation of the ls relation, so that soundness and completeness of compositional compilation w.r.t. global compilation is guaranteed. Proposition 17 Let ls be defined by as follows: ; Γ ls σ Γ iff Γ.solve( ) returns σ and transforms Γ into Γ. Then ls satisfies the properties of Theorem 7. The following propositions ensure that solve is only invoked for sequences Γ of constraints which can be sorted w.r.t. the topological order defined in Definition 16. In other words the transitive and reflexive closure of in Γ is a partial order. For brevity, in this case we say that Γ has no cycles. Proposition 18 If C s : δ Γ b, then Γ has no cycles. Proposition 19 If Γ has no cycles and Γ.solve( ) transforms Γ into Γ, then Γ has no cycles as well. 6 Conclusion In this paper we have addressed the problem of supporting compositional compilation for languages (like Java and C ) where the executable code generated by compilation is context dependent. To this aim, we have first defined a schema formalizing both global and compositional compilation for such languages. Then, we have instantiated the schema by providing the algorithms supporting the compositional compilation for Featherweight Java. At the best of our knowledge, our proposal provides the first compositional compilation procedure for a Java-like language. We believe that the results in this paper can be exploited at least in two different ways: Firstly, they can be directly applied to the development of a new generation of compilers/interpreters/linkers (supporting compositional compilation) for real languages like Java and C. In this approach, polymorphic bytecode would be instantiated eagerly, in a step corresponding to staticlinking. Such compilers would naturally support selective recompilation mechanisms, in the same spirit of [8, 9]. Secondly, they could lead to the development of a more flexible run-time support for Java-like languages, allowing execution of bytecode containing type variables. In this approach, polymorphic bytecode would be instantiated lazily, during dynamic linking and loading some initial exploration appears in [5]. Further work includes extensions of our polymorphic model, e.g., to encompass field hiding and overloading. We claim that such extension does not pose major technical problems. We also plan to investigate the extension of the source language so that it may contain type variables as well. Acknowledgement We are grateful to Alex Buckley for feedback. This work has been partially supported by Dynamic Assembly, Reconfiguration and Type-checking - EC project IST , and APPSEM II - Thematic network IST References [1] D. Ancona, F. Damiani, S. Drossopoulou, and E. Zucca. Even More Principal Typings for Javalike Languages. In 6th Intl. Workshop on Formal Techniques for Java Programs 2004, June [2] D. Ancona and G. Lagorio. Stronger Typings for Smarter Recompilation of Java-like Languages. Journal of Object Technology, To appear. [3] D. Ancona, G. Lagorio, and E. Zucca. True separate compilation of Java classes. In ACM SIGPLAN Conference on Principles and Practice of Declarative Programming (PPDP 02), pages ACM Press, [4] D. Ancona and E. Zucca. Principal typings for Javalike languages. In ACM Symp. on Principles of Programming Languages 2004, pages ACM Press, January [5] Alex Buckley and Sophia Drossopoulou. Flexible Dynamic Linking. In 6th Intl. Workshop on Formal Techniques for Java Programs 2004, June [6] S. Drossopoulou and S. Eisenbach. Describing the semantics of Java and proving type soundness. In J. Alves-Foss, editor, Formal Syntax and Semantics of Java, number 1523 in Lecture Notes in Computer Science, pages Springer, [7] A. Igarashi, B. Pierce, and P. Wadler. Featherweight Java: A minimal core calculus for Java and GJ. In ACM Symp. on Object-Oriented Programming: Systems, Languages and Applications 1999, pages , November

14 Subs entailedby(env ) throws undetermined{ if.contains(this.class()) return ɛ throw undetermined Figure 18: Definition of entailedby for c Subs entailedby(env ) throws undetermined, fail{ // assumed to be well-formed if this.rhs().var() throw undetermined c 1 = this.lhs() c 2 = this.rhs() try {return (c 1 c 2 ).entailedby( ) catch(undetermined){ try {return (c 2 c 1 ).entailedby( ) catch(fail){throw undetermined catch(fail){return (c 2 c 1 ).entailedby( ) Figure 19: Definition of entailedby for c t [8] G. Lagorio. Towards a smart compilation manager for Java. In Blundo and Laneve, editors, Italian Conf. on Theoretical Computer Science 2003, number 2841 in Lecture Notes in Computer Science, pages Springer, October [9] G. Lagorio. Another step towards a smart compilation manager for Java. In Hisham Haddad, Andrea Omicini, Roger L. Wainwright, and Lorie M. Liebrock, editors, ACM Symp. on Applied Computing (SAC 2004), Special Track on Object-Oriented Programming Languages and Systems, pages ACM Press, March Subs entailedby(env ) throws undetermined, fail{ // assumed to be well-formed if this.target().var() throw undetermined c = this.target() f = this.field() t = this.type() while c Object{ // start field look-up fs =.fields(c) // fields declared in c if fs.contains(c f) return t.match(c ) c =.superclass(c) throw fail Figure 20: Definition of entailedby for φ(t, f, t ) Subs entailedby(env ) throws undetermined, fail{ // assumed to be well-formed if this.target().var() throw undetermined c = this.target() m = this.meth() t = this.actual() (t, t ) = this.type() while c Object{ // start method look-up ms =.meths(c) // methods in c if ms.contains(c m( c)){ ( t c).entailedby( ) return (t, t ).match(c, c) c =.superclass(c) throw fail A Appendix Figure 21: Definition of entailedby for µ(t, m, t, (t, t )) The judgment g (well-formed type environments for global compilation) is formally defined in Fig.16, by means of an auxiliary judgment g c:(fss, mss) which holds if we can safely calculate in the set of all field and method signatures of class c (both empty for Object). Note that for global compilation if a class c is extended than c must be defined in ; however, if a class c is only used as a type, then c is not required to be defined in. The judgment c (well-formed type environments for compositional compilation) is formally defined in Fig.17. The rules allow a more liberal definition, since in compositional compilation extended classes are not required to be defined in. Subs entailedby(env ) throws undetermined, fail{ // assumed to be well-formed c = this.class() t = this.actual() t = this.type() c f =.allfields() // including fields inherited by c ( t c).entailedby( ) return t.match( c) Figure 22: Definition of entailedby for κ(c, t, t ) 14

Polymorphic Bytecode: Compositional Compilation for Java-like Languages

Polymorphic Bytecode: Compositional Compilation for Java-like Languages Davide Ancona DISI - Università di Genova davide@disi.unige.it Sophia Drossopoulou Dep. of Computing - Imperial College sd@doc.ic.ac.uk