IA010: Principles of Programming Languages

IA010 3. Types 1 IA010: Principles of Programming Languages 3. Types Jan Obdržálek obdrzalek@fi.muni.cz Faculty of Informatics, Masaryk University, Brno

Data types A data type is a collection of data values and a set of predefined operations on these values. Why use types error detection improves reliability "IQ" + 160 implicit context for many operations improves writability a + b, new p code documentation improves readability A type system consists of 1 a mechanism to define types and associate them with certain language constructs 2 set of rules for type equivalence, type compatibility and type inference IA010 3. Types 2

Outline IA010 3. Types 3 Primitive data types Type checking Composite data types Array types Record types Union types List types Pointer and reference types Type inference

Basic type taxonomy boolean type numeric types Primitive types character type character string types enumeration types, subrange types record types union types array types Composite types list types set types pointer and reference types What about primitive and composite data types? IA010 3. Types 4

Primitive data types IA010 3. Types 5

Primitive and composite types IA010 3. Types 6 primitive data types two meanings, may coincide 1 with support built-in the programming language also called built-in types 2 building blocks for composite types also called basic types composite data types created by applying a type constructor (record, array, set...) to one or more simpler types (either primitive or composite) The distinction is not always clear and may depend on the language.

Numeric types IA010 3. Types 7 historically oldest, typically reflect hardware integers and floats, complex numbers range can implementation-dependent (problem: portability) Integer types different lengths: C99 signed char, short, int, long, long long (using at least 1, 2, 2, 4 and 8 bytes, respectively) may be signed or unsigned (typical implementation: twos complement) arbitrary precision: string representation "12354654231654L" (Python long integer) typical for scripting languages performance penalty

IA010 3. Types 8 Numeric types II floating-point types model real numbers single (float/real) or double (double) precision standard IEEE 754 cannot precisely express all real numbers: 1 irrational numbers: e, π,... 2 even short numbers in decimal: 0.1 (base 10) = 0.0001100110011... (base 2) decimal types fixed number of decimal digits each digit the same number of bits (usually 4 or 8) use business applications, precise decimal number representation (0.1) especially useful, if available in hardware (BCD) examples: COBOL, C#, F#

Non-numeric primitive types IA010 3. Types 9 boolean type just two values (true, false) missing in C89, arbitrary numeric type can be used (zero/non-zero) usually implemented using more than a single bit character type to store a single character size of representation depends on the encoding used (ASCII, modern languages: Unicode, UTF-xx) (may vary for different characters e.g. in UTF-8) sometimes missing from the language (Python: strings of length 1) may be even handled like a numeric type (C)

Character string types IA010 3. Types 10 string a sequence of characters strings according to length static length Python, Java (String), C# limited dynamic length C (upper bound on the length) dynamic length JavaScript, Perl, standard C++ library strings according to implementation special kind of an array of characters C, C++ (terminated by the null character \0) primitive data type PYTHON class JAVA, F# supported operations concatenation, comparison, substring selection pattern matching Perl, JavaScript, Ruby, PHP

Ordinal (discrete) types IA010 3. Types 11 An ordinal type is a type which can be mapped to a range of integers. 1 primitive ordinal types provided in the language e.g. Java: integer, char, boolean 2 user defined ordinal types enumeration types subranges

User-defined ordinal types IA010 3. Types 12 Enumeration types the values, called enumeration constants, are enumerated in the definition enum days {Mon, Tue, Wed, Thu, Fri, Sat, Sun}; typical implementation implicit numerical value the value can often be given explicitly (Fri=2) advantages over named constants: type checking! Important aspects can one enum. constant name be used in multiple types? are enum. constants coerced to integers? (C: yes; JAVA 5.0, C#, F#: no)

User-defined ordinal types IA010 3. Types 13 Subrange types contiguous subsequence of an ordinal type PASCAL, ADA Example (ADA) type Days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun); subtype Weekdays is Days range Mon..Fri; subtype Index is Integer range 1..100; operations of the parent type are preserved (as long as the result stays in range) require run-time type checking advantages: readability, range checks can be simulated by asserts

Type checking IA010 3. Types 14

Type checking IA010 3. Types 15 topics type equivalence type compatibility type conversion (cast) type coercion nonconverting type cast type inference (later)

Type checking Examples Ada, Java, C# strongly typed (except for the explicit cast) Pascal almost strongly statically typed (except for the untagged variant records) C89 weak typing (unions, pointers, arrays,... ) Scheme, Lisp, ML, F# strongly typed Python, Ruby strongly dynamically typed IA010 3. Types 16 Ensuring that the operands of an operation are of compatible types. A language can be strongly typed an operation cannot be applied to any object which does not support the operation statically typed checking can be performed at compile-time dynamically typed checking performed at run-time (a form of late binding) (languages with dynamic scoping)

Type equivalence When are two types equivalent? Nontrivial in a language which allows defining of new types (records, arrays,... ) Two variables have the same type, if they were defined in the same declaration, or in a declaration using the same type name name equivalence: Pascal, Ada, Java, C# if their types are identical as structures structural equivalence: Algol, Modula, C, ML combination of both approaches, e.g. C: name equivalence for struct, union, enum structural equivalence otherwise IA010 3. Types 17

Structural equivalence issues IA010 3. Types 18 Are the following types the same? type T1 = record type s = array [1..10] of char; a,b : integer type t = array [0..9] of char; end; type T2 = record a : integer; b : integer; end; type T3 = record b : integer; a : integer; end; T1 and T2: yes T2 and T3: no (most languages), yes (ML) s and t: no (most languages), yes (Fortran, Ada)

Deciding type equivalence IA010 3. Types 19 Structural equivalence type names are (recursively) replaced by their definitions resulting strings are simply compared obstacles (surmountable): recursive types, pointers Name equivalence straightforward name comparison assumption: if the programmer gave two definitions of the same type, he probably had a particular use in mind

Name equivalence IA010 3. Types 20 TYPE new_type = old_type (* Modula-2 *) old_type and new_type are aliases two types, or two names for the same type? two types: strict name equivalence same type: loose name equivalence (Pascal) Example: strict name equivalence TYPE imperial_distance = REAL; metric_distance = REAL; VAR i : imperial_distance; m : metric_distance;... m := i; (* this should probably be an error *)

Name equivalence in Ada IA010 3. Types 21 a restrictive version of name type equivalence subtype a type equivalent to the parent type subtype new_int is integer; subtype small_int is integer range 1..100; derived type a new type type imperial_distance is new float; type metric_distance is new float; note the difference: type derived_small_int is new integer range 1..100; subtype subrange_small_int is integer range 1..100;

Type conversion (cast) Change of types, explicitly stated in the program code. Implementation three principle cases: 1 structurally equivalent types (same internal representation) (no code executed the conversion is for free ) 2 different types, same representation (e.g. subtypes) (run-time check, value can be used if successful) 3 different types with related values (e.g. int vs float) (the specified conversion is performed) Nonconverting type cast no conversion is performed, the stored value is only interpreted as of the new type uses: systems programming, significand/exponent extraction,... IA010 3. Types 22

Type conversion examples type test_score is new integer range 0..100; type celsius_temp is new integer;... n : integer; -- assume 32 bits r : real; -- assume IEEE double-precision t : test_score; c : celsius_temp;... t := test_score(n); -- run-time semantic check required n := integer(t); -- no check req.; every test_score is an int r := real(n); -- requires run-time conversion n := integer(r); -- requires run-time conversion and check n := integer(c); -- no run-time code required c := celsius_temp(n); -- no run-time code required IA010 3. Types 23

Type compatibility IA010 3. Types 24 of prime importance to the programmer full type equivalence is not always needed we often need only compatible types: addition two numeric type operands assignment target type compatible with the source type subroutine call formal parameters compatible with arguments the definition of compatibility differs significantly among various languages Coercion implicit type conversion; for compatiible types implementation similar to type cast (explicit type conversion)

Coercion short int s; unsigned long int l; char c; /* may be signed or unsigned */ float f; /* usually IEEE single-precision */ double d; /* usually IEEE double-precision */... s = l; /* low bits are interpreted as a signed number */ l = s; /* sign-extended, then interpreted as unsigned */ s = c; /* either sign-extended or zero-extended */ f = l; /* precision may be lost */ d = f; /* no precision lost */ f = d; /* precision may be lost, undefined possible */ IA010 3. Types 25 Coercion causes significant weakening of the type system! trends: less (or no) coercion, but... improves writability, supports abstraction today: scripting languages, C++

Composite data types IA010 3. Types 26

Array types IA010 3. Types 27 the most used and most important combined data type homogeneous aggregate of data elements elements are identified by their relative position semantically finite mappings: array_name(index) element design decisions: which types can be used for indexing? are the bounds checked on access? when are the bounds fixed? when does array allocation take place? rectangullar or ragged multidimensional arrays? initialization when allocated? what kind of slices are allowed, if any?

Arrays indexing IA010 3. Types 28 Which type can be used for indexing? integer types (Fortran, C,... ) any ordinal type (Ada) user-defined keys (associative arrays e.g. Python) Bound checking expensive operation, historically usually omitted (C) however common in modern languages (Java, C#, ML) Lower bounds fixed: C (0) and its successors user defined: Ada, Fortran95+ (1 by default)

Multidimensional arrays IA010 3. Types 29 Language interface pure multidimensional array (access: [2,3]) mat: array (1..10,1..10) of real; -- Ada array of arrays (access: [2][3]) VAR mat = ARRAY [1..10] OF ARRAY [1..10] OF REAL; Array shape rectangular array all rows of the same length jagged array length may differ between the rows (C, Java) typical for the array of arrays approach

Arrays slices [Scott] IA010 3. Types 30 a slice is some substructure of an array trivially a single row/column many othe options (Fortran 90):

Array bounds and storage bindings IA010 3. Types 31 type bounds allocation storage static static static fixed stack-dynamic static elaboration stack stack-dynamic elaboration elaboration stack fixed heap-dynamic execution execution heap heap-dynamic dynamic dynamic heap elaboration when the declaration is elaborated execution when program actually requests the array dynamic execution + can change during run-time C-family languages: C89: fixed stack-dynamic, static (static), fixed heap-dynamic (malloc/free) JAVA: fixed heap-dynamic C#: fixed heap-dynamic, heap-dynamic (List class)

Array type examples IA010 3. Types 32 fixed stack-dynamic array (C89) void foo() { int fixed_stack_dynamic_array[7]; /*... */ } stack-dynamic array (C99) void foo(int n) { int stack_dynamic_array[n]; /*... */ } fixed heap-dynamic array (C89) int * fixed_heap_dynamic_array = malloc(7 * sizeof(int));

Array implementation storage IA010 3. Types 33 Two basic choices: 1 a block of adjacent memory cells advantages: simple adressing mapping to one dimension by rows (row major) almost all languages by columns (column major) Fortran 2 array of arrays cons: may need more space pros: jagged arrays, rows can be shared Some languages support both approaches e.g. C.

Arrays in C IA010 3. Types 34 [Scott]

Record types heterogeneous (unlike arrays) model collections of related data correspond to cartesian products struct rpg_character { char name[20]; int strength, stamina, dexterity, inteligence; _Bool male; };... individual elements are often called fields access usually using the dot notation: frodo.strength nesting usually allowed C struct, C++ special version of class JAVA ordinary classes used instead IA010 3. Types 35

Record type implementation IA010 3. Types 36 usually consecutive memory cells may contain holes (to align with word-length) struct element { char name[2]; int atomic_number; double atomic_weight; _Bool metallic; }; Likely layout on a 32-bit machine [Scott]

Record packing and reordering to make records both space- and speed-efficient may be problematic (systems programming, FFI) (solution: nonstandard alignment can be specified (Ada, C++)) IA010 3. Types 37 record packing usually explicitly requested by the programmer (Pascal) space/speed trade-off (breaks alignment) record reordering

IA010 3. Types 38 Tuple types similar to record types, elements are not named use: functions returning more values Python immutable type, can be converted to an array and back elements accessed using arrays syntax of any number of elements (even 0) mytuple = (42, 2.7, 'mtb') mytuple[1] F# pairs can be addressed using fst and snd otherwise using tuple patterns let tup = (42, 50, 1729); let a, b, c = tup;

Union (variant) types Allow a variable to store different type values at different times during program execution. (Correspond to set unions.) union flextype { int intel; float floatel; }; storage allocated for the largest variant uses system programming (non-converting type cast) representing alternatives in a record problem: free unions are not type checked: union flextype el1; float x;... el1.intel = 27; x = el1.floatel; Unions are often missing in modern languages (e.g. Java, C#). IA010 3. Types 39

Discriminated (tagged) unions IA010 3. Types 40 each union variable keeps and information tag/discriminant which variant is currently in use support type checking common in functional languages (ML, Haskell, F#) type intreal = // F# IntValue of int RealValue of float; let printtype value = match value with IntValue value -> printfn "It is an integer" RealValue value -> printfn "It is a float"; let ir2 = RealValue 3.4; printtype ir2; It is a float

Variant records IA010 3. Types 41 type shapekind = (square, rectangle, circle); (* Pascal *) shape = record centerx : integer; centery : integer; case kind : shapekind of square : (side : integer); rectangle : (length, height : integer); circle : (radius : integer); end;

Variant records in C/C++ void setsquareside(struct Shape* s, int side) { s->kind = Square; s->side = side; IA010 3. Types 42 num ShapeKind { Square, Rectangle, Circle }; struct Shape { int centerx; int centery; enum ShapeKind kind; union { struct { int side; }; /* Square */ struct { int length, height; }; /* Rectangle */ struct { int radius; }; /* Circle */ }; }; int getsquareside(struct Shape* s) { assert(s->kind == Square); return s->side; }

Lists IA010 3. Types 43 defined recursively: a list is either an empty list, or a pair of an object and another (shorter) list particularly useful in functional languages (which use recursion and higher order functions) common in imperative scripting languages (Python) can be modelled using records and pointers two main kinds homogeneous (every element of the same type ML) heterogenous (any object can be placed in the list Lisp) terminology: head the first element, tail the remainder of the list

Lists Lisp and ML IA010 3. Types 44 Lisp program is a list (can be modified during execution!) quote prevents evaluation: (a b c d), quote(a b c d) implementation: chain of cons cells (a pair of pointers) (printed as dotted pairs: (cons 1 2) => (1. 2)) pointer names: car (head) and cdr (tail) (a b c d) ;; list syntax (a. (b. (c. (d. null)))) ;; proper list (a. (b. (c. d))) ;; improper list (a (b c) d) ;; list nesting ML implementation: chain of blocks [(object, value) pairs] operations hd (head) and tl (tail) [a, b, c, d]

List examples IA010 3. Types 45 Lisp (cons 'a '(b)) (a b) (car '(a b)) a (car nil) ; either nil or error (cdr '(a b c)) (b c) (cdr '(a)) nil (cdr nil) ; either nil or error (append '(a b) '(c d)) (a b c d) ML a :: [b] [a, b] hd [a, b] a hd [] (* run-time exception *) tl [a, b, c] [b, c] tl [a] nil tl [] (* run-time exception *) [a, b] @ [c, d] [a, b, c, d]

List comprehensions IA010 3. Types 46 so-called generator notation create list from lists based on traditional mathematical notation (set comprehensions) Miranda, Haskell, Python, F# {i i i {1,..., 100} i mod 2 = 1} Haskell: [i*i i <- [1..100], i `mod` 2 ==1] Python: [i*i for i in range (1,100) if i % 2 ==1] F#: [for i in 1..100 do if i % 2 = 1 then yield i*i]

Pointer and reference types IA010 3. Types 47

IA010 3. Types 48 Pointer types Values are memory addresses and apecial value nil. Uses 1 indirect addressing 2 way to manage dynamic storage heap-dynamic variables often do not have an associated name (anonymous variables) accessible only through a pointer or a reference Pointers are not structured types (even though usually defined using a type operator) scalar types (values are not data, but references to variables)

Pointer operations IA010 3. Types 49 Basic operations 1 assignment (sets value to an address) use of an operator for objects outside heap 2 dereferencing (value of the variable pointed to) implicit (Fortran95) explicit (C, the * operator) Accessing record fields (*p).age / p->age (C, C++) p.age (Ada, implicit dereference) Heap management explicit alocation required malloc (C), new (C++) (in languages using pointers for heap management)

Problem 1: Dangling pointers The pointer contains an address of a deallocated variable. Why it is a problem? new varible can be allocated to the same address heap management can use the empty memory Creating a dangling pointer: 1 new variable is allocated on the heap, pointed to by p1 2 p2:=p1 3 the variable is deallocated through p1 (p2 is now dangling) int * arrayptr1; // C++ int * arrayptr2 = new int[100]; arrayptr1 = arrayptr2; delete [] arrayptr2; In C++ both arrayptr1 and arrayptr2 are now dangling! Solution: prohibit deallocation IA010 3. Types 50

Problem 2: Lost variables (garbage) IA010 3. Types 51 There is a variable on the heap, which is no longer accessible. Creating a lost variable: 1 new variable, pointed to by p1, is allocated on the heap 2 some other address is assigned to p1 consequence: memory leak solution: garbage collection

IA010 3. Types 52 Pointers in C/C++ typed can point anywhere (as in assembly languages) extremely flexible, extra caution necessary operations: * dereference, & address of a variable Pointer arithmetic ptr + index = ptr plus index * sizeof(*ptr) int list [10]; int *ptr; ptr = list; The following holds: * (ptr + 1) is the same as list[1] * (ptr + index) is the same as list[index] ptr[index] is the same as list[index]

Pointers in C/C++ (2) IA010 3. Types 53 pointers pointing to functions used to pass functions as parameters pointers of type void * can point to values of any type (generic pointers) cannot be dereferenced (so the type checker would not complain) use: parameters/results of functions which operate on memory (e.g. malloc)

Reference types IA010 3. Types 54 A variable of a reference type refers to an object or a value in memory, not a to memory address. no point of doing arithmetics C++ constant pointer, always implicitly dereferenced uses: parameter passing two-way communication (advantage over pointers: no need to dereference) Java non-constant, can point to any instance of the same class used for referencing class instances no explicit deallocation (no dangling references) String str1; // value: null... str1 = "This is a Java literal string";

Reference types (2) C# both (C-style) pointers and (Java-style) references use of pointers is strongly discouraged (unsafe modifier for subprograms using pointers) Python, Smalltalk, Ruby all variables are references always implicitly dereferenced pointers vs references Their (pointers) introduction into high-level languages has been a step backward from which we may never recover. C. A. R. Hoare References provide some of the flexibility and capabilities of pointers, without their hazards. IA010 3. Types 55

Type inference IA010 3. Types 56

ML type inference IA010 3. Types 57 Is it necessary to always specify a type? Example 1 val s : string = "Arthur Dent" val n : int = 42 The type is obvious from the syntax. Example 2 fun twice x = 2 * x We know that 2 is of type int, and * is of type int -> (int -> int). Therefore twice is of type int -> int.

IA010 3. Types 58 Example 3 ML type inference Is it necessary to always specify a type? fun add [] = 0 add (a :: L) = a + add L We know that 0 is of type int, and [] and a::l are of type list + is of type int -> (int -> int). Therefore add is of type int list -> int. In the ML language, it is always possible to infer the type (for any correct program).

Type inference IA010 3. Types 59 makes programs more readable supports abstraction (guarantees the most general type) present in many modern languages (ML, Haskell, F#, C# 3.0, C++11, VisualBasic 9.0,... ) based on the Hindley-Milner (Damas-Hindley-Milner) algorithm History 1958 Curry, Feis: simply typed λ-calculus 1969 Hindley: extended, the most geneal type (proof) 1978 Milner: an equivalent algorithm (the algorithm W ) 1982 Damas: proof of completenes

IA010 3. Types 60 Three stages of type inference fun add [] = 0 add (a :: L) = a + add L 1 each (sub)expression is assigned a new type add : α, b : β, L : γ,... 2 the inference rules are applied built-in expressions: 0 : int, [] : δ list add is a function, applied to a list therefore α = ι κ a ι = δ list... 3 the system of equations (constraints) created in step 2 is then solved (using type unification)

Type inference possible results IA010 3. Types 61 1 there is exactly one solution 2 the constraint system cannot be solved (e.g. x is required to be of type int and string) type error 3 there are multiple solutions a) polymorphism: the result contains parametric types (i.e. includes type variables) b) ambiguity: Is fun f(x, y) = x + y of type (int * int) -> int, or (string * string) -> string?

IA010 3. Types 62 ML type expressions (simplified) type expression syntax primitive types: int, bool type variables: a, b, c,... type constructor list: T list (where T is a type expression) type constructor n-tuple: T 1* T 2* T 3 (where T 1, T 2, T 3 are type expressions) function type: T 1 -> T 2 (where T 1 and T 2 are type expressions) our problem will be formulated as a system of type equalities between pairs of type expressions to get a solution we need to find substitutions for type variables

Finding a substitution IA010 3. Types 63 Simple cases: a list = int list a = int a list = b list list a = int list b list = int list b = int a list = b -> b a list = b b list = a What about the following? does not have a solution does not have a finite solution a list = b list list a = int list; b = int a = (int -> int) list; b = int -> int a = (int list) list; b = int list...

The most general solution IA010 3. Types 64 a list = b list list problem: infinitely many solutions we aim to find the most general solution observation: for a we need to substitute some suitable list b must be of the element of a list of type a type no other constraints solution: a = b list, where b is a free type variable The solutions of TI will be sets of equalities ai = T i, where: T i are type expressions no ai appears in any T i

IA010 3. Types 65 Finding the most-general solution Unification of (two) type expressions Finding substitutions for type variables, so the expressions are identical after performing the substitutions. the resulting set of substitutions is called the unifier substitutions are represented by a binding between the type variable a and a type expression τ(a) at the beginnig is each variable free (not bound) we define τ (T ) = { T T = a τ( a) = T T otherwise

The unification algorithm Unify (T 1, T 2 ): T 1 := τ (T 1 ); T 2 := τ (T 2 ) if T 1 =T 2 then return true else if T 1 = a ( a does not appear in T 2 ) then τ( a):=t 2 ; return true else if T 2 = b ( b does not appear in T 1 ) then τ( b):=t 1 ; return true else if T 1 =T 1 list T 2=T 2 list then return Unify (T 1, T 2 ) else if T 1 =D 1 ->C 1 T 2 =D 2 ->C 2 then return Unify (D 1, D 2 ) && Unify (C 1, C 2 ) else return false end As a side-effect the algorithm produces the substitution τ. IA010 3. Types 66

Unification example IA010 3. Types 67 'b list = 'a list; 'a->'b='c; 'c-> bool = (bool -> bool) -> bool 'a: bool 'b: a bool 'c: a-> b bool->bool Unify( b list, a list) Unify( b, a) Unify( a-> b, c) Unify( c->bool, (bool->bool)->bool) Unify( c, bool->bool) Unify( a-> b, bool->bool) Unify( a, bool) Unify( b, bool) Unify(bool, bool) Unify(bool, bool)

IA010 3. Types 68 Constraint generation Selected type rules expression type constraints 1,2,3,... int [] a list hd(l) a L : a list tl(l) a list L : a list E 1 ::E 2 a list E 1 : a, E 2 : a list E 1 +E 2 int E 1 :int, E 2 :int E 1* E 2 int E 1 :int, E 2 :int E 1 =E 2 bool E 1 : a, E 2 : a if E 1 then E 2 else E 3 a E 1 :bool, E 2 : a, E 3 : a E 1 E 2 b E 1 : a -> b, E 2 : a fun f x1.. xn = E x1: a1,..,xn: an, E: b f: a1->...-> an-> b

Example IA010 3. Types 69 fun f x L = if L = [] then [] else if x <> (hd L) then f x (tl L) else x :: f x (tl L) introduce new type variables f, x, L,... generate constraints using rules 'f = 'a0 -> 'a1 -> 'a2 (* fun *) 'L = 'a3 list (* "=" and "[]" *) 'L = 'a4 list (* hd *) 'x = 'a4 (* "<>" *) 'x = 'a0 (* application *) 'L = 'a5 list (* tl *) 'a1 = 'a5 list (* tl, application *)...

Type checking IA010 3. Types 70 Direct: fun f g = g 2 fun not x = if x then false else true f not Error: operator and operand don't agree [tycon mismatch] operator domain: int -> 'Z operand: bool -> bool in expression: f not Indirect: fun reverse [] = [] reverse (x:xs) = reverse xs val reverse = fn : a list -> b list changes the type of the list something is wrong

Conclusion IA010 3. Types 71 type inference computes the types of expressions type declarations are not needed we look for the most general type solving a system of constraints, using unification leads to polymorphism type checking possible errors are discovered statically sometimes the error can be deduced from the expression type disadvantages makes it harder to find the origin of a type error