Subtyping and Overloading in a Functional Programming Language. Wilhelm-Schickard-Institut, Universitat Tubingen, Sand 13. D Tubingen. matching.

Subtyping and Overloading in a Functional Programming Language Martin Plumicke and Herbert Klaeren Wilhelm-Schickard-Institut, Universitat Tubingen, Sand 13 D-72076 Tubingen pluemick@informatik.uni-tuebingen.de Abstract Following ideas of [GM89], we present a rst order functional programming language with user dened overloading and subtyping. A regularity restriction on overloading guarantees unique term evaluation. Furthermore we dene a type inference system for this language. The type inference problem is decidable and the type system has the principal type property. Topics In decreasing order of signicance: 53, 36, 31. Introduction Wellknown functional programming languages like HOPE [BMS80], ML [MTH90] or Miranda TM [Tur86] have the restriction that data type carrier sets must be pairwise disjoint and that overloading is prohibited (i.e. it is not allowed to name more than one function with the same function symbol). These restrictions lead to awkward constructions if we want to represent sets as subsets of other sets. Essentially, we need several distinct representations for the elements of the subsets in the same program. Specically for computer algebra where we deal with mathematical objects there is a natural demand to use overloaded function symbols, e.g. `+' may denote an addition of integers, reals, polynomials etc. This paper presents a functional programming language named SODA 1. Following ideas of [GM89], SODA allows the denition of carrier sets as unions of other carrier sets. This denes an inclusion ordering on dened carrier sets; functions may be overloaded. SODA only allows rst order functions. Our main emphasis is on a type inference system for SODA. 1 The Functional Programming Language SODA SODA is a pure functional programming language. In a SODA module, the carrier sets of the sorts (types) are term 1 SODA is an abbreviation for Sort Order Data Algebra. sets. Functions are dened using recursion and pattern matching. SODA allows the denition of an ordering on the types leading to a subtype notion which semantically corresponds to set inclusion on the respective carrier sets. The type semantics we give will make sure that every function on a certain type is also an identical function on all subtypes. There is therefore an obvious correlation between overloading and subtyping. Actually, two functions can only be identical if their ranges are identical; therefore, every function on a carrier set with subsets automatically represents a set of different functions with the same function symbol. 1.1 Syntax of SODA A SODA module starts with the denitions of the sorts. The carrier sets of the sorts are dened by a signature hs; i where S is a set and is a (not necessarily disjoint) S S- indexed family of constructors. The indices represent arity and target sort of the constructors. Union of two sets is denoted by a + symbol and serves as an indirect method for specifying the sort ordering. Functions in SODA are dened by recursion in combination with pattern matching; their denition may or may not declare their type (rank). In this section we assume that functions are explicitly typed; programming without explicit typing is the topic of the next section. The following example shows a SODA module with subtypes and overloaded function symbols. Example 1.1 The simple SODA module add (Fig. 1) de- nes sorts ZERO, nz CARDINAL and BOOLEAN by explicit specication of their constructors and CARDINAL as the union of ZERO and nz CARDINAL. The sorts are therefore ordered by: zero @ CARDINAL; nz CARDINAL @ CARDINAL: The function symbol + is overloaded: It is the name of the addition function on CARDINAL numbers and of the Boolean and function. 1.2 Semantics of a SODA module The semantics of a SODA module is an order sorted algebra (OSA) in the sense of [GM89] whose main concepts we shall briey review here: Denition 1.2 A regular order sorted signature posig := h(s; v); i is a signature hs; i such that:

MODULE add = ZERO = (zero) nz_cardinal = (succ(cardinal)) CARDINAL = ZERO + nz_cardinal BOOLEAN = (TRUE, FALSE) FUN + (x: CARDINAL, y: CARDINAL): CARDINAL = CASE y OF succ(z) => +(succ(x), z) FUN + (x: BOOLEAN, y: BOOLEAN): BOOLEAN = CASE (x, y) of (TRUE, TRUE) => TRUE (TRUE, FALSE) => FALSE (FALSE, _) => FALSE Figure 1: SODA-module add 1. (S; v) is a partial ordered set of sorts. 2. f 2 (w0 ;s) implies f 2 (w;s0) if w v w 0 and s v s 0 : 3. For f 2 (w0 ;s 0) and w v w 0 there is a least rank (w 0; s 0) such that f 2 (w 0;s 0 ) and w v w 0. The third condition is called regularity condition. In the following we use the abbreviation f (w;s) for f 2 (w;s). Denition 1.3 (OS algebra, OSA) Let h(s; v); i be a regular order sorted signature. Then an h(s; v); i-os algebra is an hs; i -algebra (A; ) such that: 1. s v s 0 implies A s A s0 (A s is called the carrier set of s) and 2. f 2 (w;s) \ (w0 ;s 0) and w v w 0 implies (f (w;s) ) : A w! A s is identical to (f (w0 ;s 0) ) : A w0! A s0 on A w. ((f (w;s) ) is called the interpretation of f (w;s) ) We can build terms over an order sorted signature posig = h(s; v); i in the same way as over a many-sorted signature. The S-indexed family of term sets (T s ) s2s forms the carrier sets of an order sorted OSA (T posig; ) where the interpretations of the function symbols are specied as term construction. (T posig; ) is called the term algebra over h(s; v); i. Lemma 1.4 [GM89] For every regular order sorted signature and every term t 2 (T posig; ) there is a least sort s, such that t 2 T s. Theorem 1.5 [GM89] Let h(s; v); i be a regular order sorted signature and (A; ) an h(s; v); i-osa. Then there is a unique homomorphism h : (T posig; )!(A; ). Denition 1.6 Term evaluation eval h(a;)i is a map from the term algebra to an OSA (A; ) dened through the unique homomorphism h. Corollary 1.7 For every regular order sorted signature h(s; v); i and every term t 2 Tposig, s eval h(a;)i (t) is uniquely determined. The following example shows why the regularity condition is important. MODULE not_regular = ZERO = (zero) nz_cardinal = (succ(cardinal)) CARDINAL = ZERO + nz_cardinal ONE Z2 = (succ(zero)) = ZERO + ONE FUN + (x: CARDINAL, y: CARDINAL): CARDINAL = CASE y OF succ(z) => +(succ(x), z) FUN + (x: Z2, y: Z2): Z2 = CASE (x, y) OF (zero, zero) => zero (zero, succ(zero)) => succ(zero) (succ(zero), zero) => succ(zero) (succ(zero), succ(zero)) => zero Figure 2: An example of a non-regular order sorted signature Example 1.8 The SODA module not regular in g. 2 shows the diculties with ambiguous term evaluation. The evaluation of the term t = +(succ(zero), succ(zero)) has an ambiguous result: If succ is considered as a constructor with target sort Z2, then t is evaluated to zero; if it is considered as a constructor with target sort CARDINAL, t is evaluated to succ(succ(zero)). Semantics of a SODA module: The semantics of a SODA module is dened as an OSA (A; ) given by the constructor signature h(s; v);?i contained in the sort declaration. Semantically, the constructor signature is completed to an order sorted signature h(s; v);?i: Each function symbol obtains all arities that are smaller than the arities in the signature and all target sorts that are greater than the original target sort. This adding of arities and target sorts is called ll-out construction 2. The ll-out construction overloads the functions symbols automatically based on the type ordering and turns every signature into an order sorted signature. The carrier sets of a SODA module are dened by the term sets (A s ) s2s over the order sorted signature h(s; v);?i. 2 Fill-out construction is necessary because functional domains are contravariant. 2

The functions of the OSA are dened using recursion. The set of function declarations denes a family of function symbols 0. The signature h(s; v); i is the ll-out of h(s; v); 0 i. Constructors are interpreted as term construction operations; interpretation of the function symbols is dened by xpoint semantics of recursive equations. Denition 1.9 The semantic domains of a SODA module are: 1. A?: the set of the lifted h(s; v);?i-terms, 2. V = [Var! A?]: variable instantiations, and 3. F = [! ((A )?! A?) ]: function environments. Denition 1.10 The semantics of a SODA module Mod with explicitly typed function symbols is given as: SODAJ Mod K := (A?; ) The interpretation j? of the constructors is dened through term construction and the interpretation j of the function symbols through j = F J Mod K dened in g. 3. 2 SODA Type System In this section we describe the types of SODA and their special properties. Then we give a type inference system for SODA with overloading and subtyping. 2.1 Types in SODA Usually a term is typed by a sort and a function symbol by a function type 1 : : : n!. The following example shows that types like that are not general enough in the presence of overloaded function symbols and subtyping. Example 2.1 Consider the SODA module add in Figure 1 and add to this module the function denition FUN sum(x,y) = +(x,y): It is not possible to give a complete type description for sum with types like those mentioned above because sum has both type CARDINAL CARDINAL! CARDINAL and type BOOLEAN BOOLEAN! BOOLEAN. To express complete type descriptions for function symbols like sum in example 2.1, intersection types are introduced (denoted by the symbol ). The intersection type of sum is CARDINAL CARDINAL! CARDINAL BOOLEAN BOOLEAN! BOOLEAN. An additional kind of types are type variables. It is possible to instantiate a type variable by any sort or any type variable. Type variables are also introduced in the ML type system (Milner type system [DM82]). Milner allows type variables to be instantiated by any type. The limitations in SODA are a consequence of the restriction to rst order functions. Another extension is the concept of type schemes. A type scheme is a type universally quantied over a set of type variables. Type schemes are necessary for the decision whether a variable can be instantiated. Only universally quantied variables can be instantiated. Denition 2.2 (Types) Let T V be a nite set of type variables. 1. The set of term types TYPE T (S; T V ) is the union of the set of sorts and the set of type variables. 2. The set of types TYPE(S; T V ) is the smallest subset of ( f ;!; 8; :; g [ S [ T V ) with the properties: (a) TYPE T (S; T V ) TYPE(S; T V ). (b) if i; 2 TYPE(S; T V ) for 1 i n, then 1 : : : n! 2 TYPE(S; T V ) (function type). (c) if 1; 2 2 TYPE(S; T V ), then 1 2 2 TYPE(S; T V ) (intersection type). (d) if 2 TYPE(S; T V ) and if t T V, then 8t : 2 TYPE(S; T V ) (type scheme). 2.2 Type Instantiation If the type scheme of a function symbol contains type variables, we substitute variables or sorts for the universally quantied variables; this will lead to type-correct terms. Denition 2.3 (Generic Instance) If and = 8t : 0 = 8t : 0 : for ; 2 TYPE(S; T V ) and if 1. 0 2 subst( t ; 0 ) and 2. for all 2 t : 62 free( ). the type scheme is called a generic instance of (notation: ). The subst function generates the set of ll-outs of all possible substitutions of the variables in t occurring in the type 0. free determines the free variables in. The second condition is necessary to guarantee the transitivity of the instantiation relation. 2.3 SODA Type Inference System A type inference system derives the types for terms and function symbols which are not explicitly typed. The assumptions for the type derivation are the type assignments of the constructors, the primaries, and the explicitly typed function symbols. We write ( X; K ) t : if the type is derivable for the term t from the two sets of type assumptions X and K with the rules specied in g. 4. The IDENT axiom determines the instantiation. The APP and the LET rule are obvious. For the CASE rule we need two sets of assumptions. X is a set of assumptions to type the expressions on the right side of the case expression. K types the patterns of the case expression. The elements of K are constructors. K contains only a part of the ll-out. If K contained the whole ll-out of the dened constructors, every constructor would occur more than once as a pattern in the case branches. Therefore K contains only the arities 3

2 F4 int( ) = ff (s0 i;1 :::s0 i;n ;s i i ) i [ ff (s i;1:::s i;ni ;s 0 i ) i 3 F 1(x 1;1 : s 0 1;1; : : : ; x 1;n1 : s 0 1;n 1 ) = exp 1 : s 1 5. = fix int( ) F m(x m;1 : s 0 m;1; : : : ; x m;n m : s 0 m;nm ) = expm : sm 7! strict((y 1; : : : ; y ni ):EJ exp i K [x i;j 7! y j]) j 1 i mg 7! strict((y 1; : : : ; y ni ):EJ exp i K [x i;j 7! y j]) j s i;j v s 0 i;j; s i v s 0 ig EJ v K =(v) ( strict((f ))(EJ t 1 K ; : : : ; EJ t n K ); EJ F (t 1; : : : ; t n) K = if F 2? (F )(EJ t 1 K ; : : : ; EJ t n K )); otherwise EJ LET v = exp 0 IN exp K =EJ exp K [v 7! EJ exp 0 K ] 8 EJ CASE e OF : : : < EJ t 1 K [v j 7! a 1;j ]; if EJ e K = K 1(a 1;1; : : : ; a 1;k1 ) K i(v 1; : : : ; v ki ) => t i : : : K = :.. EJ t n K [v j 7! a n;j ]; if EJ e K = K n(a n;1; : : : ; a n;k n ) The function strict evaluates all arguments of a function call. Figure 3: Denotational semantics of SODA. [IDENT] ( X [ f f : V i g ; K ) f : 9j 2 I with j [APP] [LET] [CASE] ( X; K ) f : 1 : : : n! 8 1in ( ( X; K ) t i : i ) ( X; K ) f(t 1; : : : ; t n) : ( X; K ) t : ( X [ f x : g; K ) t 0 : 0 ( X; K ) LET x = t IN t 0 : 0 Let n s = jf k :! s j ( K; ; ) k :! s gj: For all 1 i n s : ( K; ; ) k i : i;1 : : : i;mi! s; and all h 6= l : h;1 : : : h;mh! s and l;1 : : : l;ml! s are pairwise disjoint types. ( X; K ) t : s 8 1in s ( ( X [ f v i;j : i;j j 1 j m i g; K ) t i : ) ( X; K ) CASE t OF : : : k i( v i;1; : : : ; v i;mi ) => t i : : : : Figure 4: Type inference rules for terms 4

and the target sorts which are greater or equal than the dened one for all constructors. The two following type inference rules type function declarations. We write ( X; K; D ) I f : if the type is derivable for a function symbol f from X, K, and the set of function declarations D with the following type rules: Denition 2.4 (Type Inference Rules for function declarations) Let D = f FUN F 1( x 1;1; : : : ; x 1;n1 ) = exp 1; : : : ; FUN F m( x m;1; : : : ; x m;n m ) = exp m g be a set of function declarations. Let gen be the function which generalises the type variables: gen( X; ) = 8 free( ) n free( X ) : : Under these assumptions, the type inference rules for functions are given by gure 5. The FSYM rule derives types for all function symbols of a SODA module at once. For each function symbol F i of the declaration there is a set L i indexing the dierent function types of F i. These types form the intersection type of F i. The function gen generalises all type variables of the types except those x i;j whose actual type is a type variable. This is because in the case of x i;j being typed by a generalised type variable we could derive every sort for the type of position j of F i if we could derive one sort for this position. There is no problem with mutual recursion because all function symbols are typed at once. FSYM doesn't derive the complete type description of the function symbols. To solve this problem, the INTERSEC rule is needed (compare example 2.1). MODULE ti_add = ZERO = (zero) nz_card = (succ(card)) CARD = ZERO + nz_card FUN add(x, y) = CASE y OF succ(z) => add(succ(x), z) Figure 6: SODA-module ti add Example 2.5 The function declaration for add in the SODA module ti add (g. 6) is not explicitly typed. We show now the type inference proof for ( X; K; D ) I add : CARD CARD! CARD where the set assumptions is given as X := f zero : ZERO CARD; succ : CARD! nz CARD ZERO! nz CARD nz CARD! nz CARD CARD! CARD ZERO! CARD nz CARD! CARD g K := f zero : ZERO CARD; succ : CARD! nz CARD CARD! CARD g: For B = f x : CARD; y : CARD g and C = f add : CARD! CARD g we can derive CARD ( X [ B [ C; K ) CASE y OF succ(z) => add(succ(x), z) : CARD: with the type inference rules for terms (g. 4). By the FSYM rule we can derive: [FSYM] ( X [ B [ C; K ) CASE y OF : : : : CARD ( X; K; D ) I add:card CARD! CARD 2.4 The Semantics of a SODA Module without explicitly typed function symbols To describe the semantics of a SODA Module without explicitly typed function symbols we form an order sorted signature from the inferred types of the function symbols. Then we dene an order sorted algebra of this signature. This algebra is then the semantics of the SODA module without explicit types. The signature of the constructors and the carrier sets of the sorts are dened as in section 1.2. Denition 2.6 (Inferred function symbol signature) The function symbol signature h(s; v); i of a SODA module Mod is dened as: (s 1:::sn;s) := f f j ( X; K; D ) I f : ; s 1: : :s n! s g where the sets of assumptions X and K are dened as on page 3 and the set D of function declarations are given through the SODA module Mod. For the interpretation of the function symbols, the function F has to be replaced in gure 3. Denition 2.7 (Semantics of a SODA module without explicit types) Let F be a function which maps a set of function declarations to an interpretation function of the corresponding functions symbols, given through: 2 3 F 1(x 1;1; : : : ; x 1;n1 ) = exp 1 F4 5. := fix int ti( ) F f(x m;1; : : : ; x m;n m ) = exp m with int ti( ) = ff (s i;1:::s i;ni ;s i ) i 7! strict((y 1; : : : ; y ni ):EJ exp i K [x i;j 7! y j]) j F i 2 (s i;1:::s i;ni ;s i ) g: 5

[FSYM] 8 1im; 8 l 2L i ( ( X [ f x i;j : V i;j l j 1 j ni g v V [ f F k : gen( ;; r r2l k k;1 : : : r k;n! r k k ) j 1 k m g; K ) exp i : i l ) ( X; K; D ) I F p : gen( ;; p;1 s : : : s p;np! p s ) 1 p m; s 2 L p [INTERSEC] ( X; K; D ) I F : 1 ( X; K; D ) I F : 2 ( X; K; D ) I F : 1 2 Figure 5: Type inference rules for function declarations Then the semantics of the SODA module Mod is dened as: SODAJ Mod K := (A?; ): As in the semantics denition of SODA modules with explicitly typed function symbols, interpretation of the constructors j? is dened as term construction. Interpretation of the function symbols j is dened through j = F J Mod K. 2.5 Important Properties of the Type Inference System In this section we show semantic correctness and the principal type property of the SODA type inference system. Furthermore, we will show that the principal type for a function symbol is recursive. Theorem 2.8 (Semantic Correctness of the SODA Type Inference System) Let SODAJ Mod K = (A?; ) be the semantics of a SODA module Mod. If ( X; K; D ) I F : is derivable and s 1 : : : s n! s, the interpretation ( F (s 1:::sn;s) ) is a function which maps ( A s 1 : : : A sn )? to A s?. Proof: If the variables x i;1; : : : ; x i;ni in the declaration F i(x i;1; : : : ; x i;ni ) = exp i have types s i;1; : : : ; s i;ni, the term (y 1; : : : ; y ni ):EJ exp i K [x i;j 7! y j]) from denition 2.7 is a function which maps ( A s 1 : : : A sn )? to A s?. As the combinator fix does not change the domains of the functions, F i is an function from ( A s 1 : : : A sn )? to A s?. Now we explain the special meaning of principal type in the SODA type system. First we dene a principal type ordering () on TYPE(S; T V ). The type 1 is smaller than 2 (more principal) w.r.t. a function symbol F if F is typed by 1 and ( f F : 1 g; ; ) F : 2 is derivable with the IDENT rule (g. 4). Let TFun be the following function: TFun( F : ) := f ( F s 1:::sn;s ) j ; s 1: : : s n! s g: Denition 2.9 is a principal type of a function symbol F if is a type of F and for all types 0 holds. TFun( F : ) = TFun( F : 0 ) Theorem 2.10 The SODA type system has the principal type property. This means that for each function symbol F of a SODA module there is a smallest type w.r.t. with for all 0. TFun( F : ) = TFun( F : 0 ) Proof: This follows from the fact that the subset of types which contains only the non intersection types of a SODA module is nite. Hence, a type of a function symbol which is a pairwise disjoint intersection is a nite intersection. If F : V i and F : V i j for j 2 I is derivable then TFun( F : i ) = TFun( F : i j ) holds. Hence, the SODA type system has the principal type property. Theorem 2.11 There exists an algorithm which determines a principal type for the functions symbols of a SODA module and exits with fail if it can not derive a type for each function symbol. Proof: As the subset of types which contains only the non intersection types of each SODA module is nite, the simplest algorithm checks all possible combinations of non intersection types for each function symbol. For all correct combinations, each of the types is added to the intersection types of the function symbols. 2.6 Extended example The following example species binary numbers as lists of binary digits without leading zeroes and complex numbers with non-negative integer components. This is not really meant as a useful specication but only serves to show overloading of constructors and function symbols: + and succ are both constructors and function symbols. 6

MODULE overloaded_add = ZERO = (zero) nz_cardinal = (succ(cardinal)) CARDINAL = ZERO + nz_cardinal nz_cardim CARDIM = (i(nz_cardinal)) = ZERO + nz_cardim nz_cardcomp = (+(nz_cardinal, nz_cardim)) CARDCOMP = ZERO + CARDCOMP BZERO BONE BZERO_BONE BONE_BIN nzo_binary BINARY = (bzero) = (bone) = BZERO + BONE = BONE + nzo_binary = (bin(bone_bin, BZERO_BONE)) = BZERO_BONE + nzo_binary FUN + (x, y) = CASE x OF zero => y succ(z) => +(z, succ(y)) i(z) => CASE y OF zero => i(z) succ(a) => +(x, y) i(a) => i(+(z, a)) +(a, b) => +(a, +(x, b)) +(u, v) => CASE y OF zero => +(u, v) succ(a) => +(+(u, y), v) i(a) => +(u, +(v, y)) +(a, b) => +(+(u, a), +(v, b)) FUN succ (x) = CASE x OF bzero => bone bone => bin(bone, bzero) bin(a, b) => CASE b OF bzero => bin(a, bone) bone => bin(succ(a), bzero) FUN pred bone = bzero FUN pred bin(a, b) = CASE b OF one => bin(a, zero) zero => CASE a OF bone => bone bin(u, v) => bin(pred(a), bone) FUN + (x, y) = CASE x OF bzero => y bone => succ(y) bin(a, b) => +(pred(x), succ(y)) The type inference system determines the following types for the function symbols: + : nz CARDINAL nz CARDIM! nz CARDCOMP CARDCOMP CARDCOMP! CARDCOMP BINARY BINARY! BINARY succ : CARDINAL! nz CARDINAL BINARY! BONE BIN pred : BONE! BZERO nzo BINARY! BONE BIN 3 Further Work SODA allows only rst-order functions. This is a severe restriction. Leivant proved in [Lei83] that the type-inference problem for the -calculus with intersection-types leads to the halting-problem. Therefore the type-inference problem of a programming language with higher-order functions in general is undecidable. Hence, a language with overloading and type-inference must have some restrictions. But the class of functions for which the type-inference problem is solvable can be possibly enlarged. The ll-out of a signature can be very large, on the other hand, we may not be able to test the regularity-condition, if we compute only a part of the ll-out. It is a still open problem to nd an algorithm, which computes only the necessary part of the ll-out. However, there are functions for which we can determine a subset of the carrier-set, where the function is not dened. In further investigation we want to nd an algorithm, which determines this subset for certain functions. We thank Peter Thiemann for many helpful discussions and Beatrice Amrhein for reading an early version of this paper. References [BMS80] Rod M. Burstall, Dave B. MacQueen, and Don T. Sannella. Hope an experimental applicative language. Technical Report CSR-62-80, University of Edinburgh, Dept. of Computer Science, 1980. [DM82] [GM89] Luis Damas and Robin Milner. Principal typeschemes for functional programs. Proc. 9th Symposium on Principles of Programming Languages 1982, 1982. J. A. Goguen and J. Meseguer. Order-sorted algebras I: Equational deduction for multiple inheritance, overloading, exceptions and partial operations, July 1989. [Lei83] Daniel Leivant. Polymorphic type inference. Proc. 10th Symposium on Principles of Programming Languages 1982, 1983. [MTH90] Robin Milner, Mads Tofte, and Robert Harper. The denition of Standard ML. MIT Press, Cambridge, Mass. u.a., 1990. [Tur86] D. A. Turner. Miranda: A non-strict functional language with polymorphic types. In Proceedings Functional Programming Languages and Computer Architecture, Nancy, volume 201 of Lecture notes in computer science, pages 1{ 16. Springer-Verlag, 1986. 7