Week 7&8: Types and Type Systems

CS320 Principles of Programming Languages Week 7&8: Types and Type Systems Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 7&8: Types and Type Systems 1/ 69

Types Types are used in almost every programming language: FOR I = 1 To 10 PRINT A(I) BASIC 'no need to declare types unless array 'has more then 10 elements FORTRAN 1, 2.1, 3D1, CMPLX(4,2)!* has a rich set of numeric types Pascal var Letters: set of char; (* supports set type *) int *(*foo)(int *); class A extends B; C // type decls not always easy to follow Java // supports sub-typing through classes Haskell {- supports algebraic types -} fold:: (a -> b -> b) -> [a] -> b -> b 'a btree = LEAF of 'a NODE of 'a * 'a btree * 'a btree PSU CS320 Fall 17 Week 7&8: Types and Type Systems 2/ 69

Why Types? Types are a way to classify data and regulate operations in a program. They help to simplify programming e.g. with types, operator overloading is possible: x + y no need to have separate operators for add and string concatenation reduce errors e.g. can restrict array index to be integer and test in if statement to be boolean enhance a program s readability e.g. user can give a complex data type a name, and use the name at all places the type is needed PSU CS320 Fall 17 Week 7&8: Types and Type Systems 3/ 69

Languages Type System A programming language s type system consists of amechanismfordefiningtypesand associating them with data objects and program constructs: built-in types type constructors abstract datatypes (ADTs) a set of type-related semantic rules: type equivalence type conversion type inference PSU CS320 Fall 17 Week 7&8: Types and Type Systems 4/ 69

Other Type-Related Concepts Static vs. dynamic typing whether type information is resolved at compile-time or at runtime Statically-typed languages: types are associated with variables Dynamically-typed languages: types are associated with values Strong vs. weak typing whether to use strong typing rules and to enforce rigorous type-checking to prevent and catch all type errors Strongly-typed languages: Type errors are always detected; this requires that the types of all program objects can be determined, either at compile time or at run time. Type checking the process for ensuring that a program obeys the language s type-related semantics rules (later) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 5/ 69

What Is a Type? First, look at some examples: The two values {true, false} form the boolean type The set of integer values in the range [-2,147,483,648, 2,147,483,647] form the integer type The set of ASCII characters form the char type So, looks like a type is a set of values. But what about: A set of selected integers: {128, 192, 256} The set of all state names: {"Alabama", "Alaska", "Arizona",...} The set {true, "hello", 1.1} Question: Do they also form types? PSU CS320 Fall 17 Week 7&8: Types and Type Systems 6/ 69

What Is a Type? A type is a set of values that share some semantic properties: Type = Set of Values with Common Properties Note: The distinction between a type and an ordinary set of values is subjective: Both the set of selected integer {128, 192, 256} and the set of all state names {"Alabama", "Alaska", "Arizona",...}, canbe (user-defined) types, if a program has a reason to treat them as such However, the set {true, "hello", 1.1} is unlikely to be considered a valid type, since the values do not share any common properties PSU CS320 Fall 17 Week 7&8: Types and Type Systems 7/ 69

What Is a Type? A Closer Look What can we do with Boolean values? They can be operated with logical operations: and, or, not, etc.... as well as equality comparisons: =,!= What can we do with integer values? They can be operated with arithmetic operations: +, -, *, /, etc.... as well as relational operations: <, <=, >, >=, =,!=, etc. What can we do with char values? Display them on a screen, or write them to a file Compare a pair for equality Map them to integer code Concatenate them to form a string Observation: Each type has a set of operations that are available for all values of that type. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 8/ 69

What Is a Type? A type can also be defined as a set of values together with a set of available operations: Type = Set of Values + Operations With this view, it s easy to catch type errors involving mismatched operations: 123 and 456 and is not a valid operation for integer type "Hello" * "World" * is not a valid operation for string type Note: There can still be more definitions for type, such as a definition based on the structural view. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 9/ 69

Predefined Types Most programming languages has a small set of predefined types, e.g. integer float (and/or double) character... These types are typically supported directly by hardware (through different sets of operations, instead of explicit type declarations). The rest types are user-defined. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 10 / 69

Primitive Types Types that cannot be further decomposed into simpler types. Predefined types are all primitive types: integer, float, character,... In additional, we have enumeration subrange C enum students = {freshman, sophomore, junior, senior}; Pascal type students = (freshman, sophomore, junior, senior); type upper_class = junior..senior; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 11 / 69

Primitive Types A common property of primitive types is that their values are first-class ; i.e. they can be passed as arguments to functions returned as results from functions assigned to variables in any scope used in their literal forms It is desirable for all other types to have this property as well. But as we ll see, it is not always the case. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 12 / 69

Constructed Types (a.k.a. Composite Types, Structured Types) Types that are constructed from simpler ones with the use of type constructors. Some common constructors include: array record/struct tuple union/variant list set pointer/reference function Most languages provide a set of built-in type constructors for user to define new types. Some languages allow the user to create new type constructors. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 13 / 69

Arrays Arrays are used to represent a collection of elements of the same type. An array is typically stored as a table laid out in adjacent memory locations, permitting indexed access to any element, in constant time. The index set is usually a range of integers 0..n, or a range isomorphic to it: Pascal A: array [2..10] of real; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 14 / 69

Associative Arrays Arrays with arbitrary index sets are called associative arrays : Perl %a = (5, 'x', 3, 'y', 6, 'z'); print $a{5}, $a{3}, $a{6}; Although useful, they are seldom supported directly by language because of the lack of a single, uniform, good implementation. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 15 / 69

Multi-Dimensional Arrays Multi-dim arrays have multiple sets of indices. Generally elements are still allocated in a contiguous memory block. C int a[][] = { {1, 2, 3}, {3, 4, 5} }; Some languages do not support multi-dim arrays directly. Instead, they support array of arrays: a 2D array is a 1D array of 1D array. This arrangement is more general the element arrays do not have to be of the same length: Java int[][] a = { {1, 2}, {3, 4, 5} }; However, to dynamically allocate such an array, individual allocation step has to be taken for each member array: Java int[][] b = new int[2][]; // allocate rows b[0] = new int[2]; // allocate columns for row 0 b[1] = new int[3]; // allocate columns for row 1 PSU CS320 Fall 17 Week 7&8: Types and Type Systems 16 / 69

Array Operations What are the operations for an array type? Answer: Indexed access to any element: a[3] Query array size (e.g. Java): a.length Element-wised operations on whole arrays: Fortran90 integer, dimension(8) :: a, b, c, d data a /1,2,3,4,5,6,7,8/! initializing a b = 2! every elm is 2 c = a**2 + b**2! element-wise op d = c! copy c to d Array section and slicing operations (e.g. Ada, Fortran 90): Fortran90 b(1:5) = a(3:7) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 17 / 69

Arrays Question: Are array values first-class? No in C while arrays can be passed as parameters and returned as return values, they cannot be copied to variables; besides, array literals are limited to be used in declarations C int a[5] = {1,2,3,4,5}, b[5]; /* literal form allowed */ b = a; /* illegal assignment */ b = {1,2,3,4,5}; /* illegal assignment */ Yes in Java: int[] a = {1,2,3,4,5}, b; b = a; b = new int[] {1,2,3,4,5}; Java // literal form is allowed // assignment is OK // a different literal form PSU CS320 Fall 17 Week 7&8: Types and Type Systems 18 / 69

Records (a.k.a. Structs) Related data of heterogeneous types are stored and manipulated together. Record fields are typically accessed via their names: C struct emp {char *name; int age;} p; p.name = "John"; p.age = 35; They could have been accessed via their positions, since in many languages the order of fields in a record is fixed. ML s records are accessed through names, but the order of fields does not matter: ML type emp = {name: string, age: int}; val p:emp = {name="john", age=35}; val q:emp = {age=28, name="mark"}; In many languages, records are treated as first-class values. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 19 / 69

Records In statically typed languages, it is generally necessary to declare new record types before creating record objects: C struct emp {char *name; int age;}; struct emp m, n; Literal record values are often allowed in initialization exprs: C struct emp n = {"John", 48}; ML permits record values to be created without declaring explicit named type first (true first-class status): ML val p = {fname="dave", lname="johnson", age=35}; #fname(p); #lname(p); #age(p); #name{name="steve Reed", age=30}; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 20 / 69

Tuples Tuples are light-weight records: their fields do not have user-assigned names; the fields are accessed via their positions in the tuple. (Hence the order of fields is significant.) Here is an ML example: ML type emp2 = string * int; val r: emp2 = ("Dave", 35); #1(r); #2(r); (* accessing 1st & 2nd comp, respectively *) The positions behave as the default names of the fields. In fact, the tuple ("Dave", 35) and the records {1="Dave", 2=35} are equivalent. A tuple can be decomposed elegantly through pattern matching: ML val (name, age) = r; (* name & age get 1st & 2nd comp of r, respectively *) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 21 / 69

Unions Data of heterogeneous types are stored together in a time-share fashion. Generally behave like records, with tag as an additional field. Size typically equals the size of the largest variant plus tag size. Example: C s unions don t have tags: C typedef union { int value; char* error; } result; result search(...) { result res; if (...) res.value = some_int_val; else res.error = "not found"; return res; } Security hole: There is no type security in C s union. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 22 / 69

Unions Example: Pascal s variant records: Pascal type Result = record case found : Boolean of true: (value:integer); false: (error:string) end; function search(...) : Result;... if (...) then begin search.found := true; search.value :=...; end else... Security hole: It is possible to manipulate the tag independently from the variant contents. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 23 / 69

Unions Example: ML s secure unions: ML datatype result = Found of integer NotFound of string fun search (...) : result = if... then Found 10 else NotFound "problem" var r = search (...) case r of Found x => print ("Found it : " ^ (Int.toString x)) NotFound s => print ("Couldn't find it : " ^ s) Here Found and NotFound tags are not ordinary fields. Case combines inspection of tag and extraction of values into one operation. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 24 / 69

Lists All functional languages support list as a built-in type; some other languages do as well (e.g. Python). In most cases, elements in a list have the same type, but Lisp and Python are exceptions. Operations: Most common ones are car(head), cdr(tail), and dynamic creation of a list Implementation: Sincelistsaredynamicstructures,theyaretypically implemented by blocks linked by pointers Python s lists are really variable-length arrays (of pointers to objects), not Lisp-style linked lists. Therefore, indexing into a list is possible at a cost that is independent of the list s size: emplist = ['john', 35, 'mark', 28, 'steve', 30]; print emplist[4]; del emplist[2-3]; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 25 / 69

Sets Allow an unordered collection of distinct values to be stored and manipulated together. Because sets are expensive to implement in general, few languages have set as a built-in constructor Pascal supports set types. A set is defined over a subrange type of an enumeration type: Pascal type Engineers = (Ann,David,Fred,Harry,Mike,Paula); type Group = set of Engineers; var group1, group2, group3: Group; group1 := [David, Mike, Paula]; group2 := [Ann, Fred, Mike]; group3 := group1 * group2; The restricted domains of Pascal s sets allow efficient implementation using bit-vector representation PSU CS320 Fall 17 Week 7&8: Types and Type Systems 26 / 69

Pointers/References Many languages have pointer types to enable programmers to construct recursive data structures. Example: C typedef struct intcell *intlist; struct intcell { int head; intlist tail; } intlist mylist = (intlist) malloc(sizeof(struct intcell)); while (list!= NULL) { if (list->head!= i) then list = list->tail; } In most such languages, pointers are restricted to addresses returned by allocation operations C allows the address of anything to be taken and later dereferenced, and supports pointer arithmetic While this feature can support very sufficient code, it also destroys the safety of the type system PSU CS320 Fall 17 Week 7&8: Types and Type Systems 27 / 69

Functions A function is a mapping from a domain to a range. All functions sharing the same type signature, such as int int, form a type. Most (imperative) languages treat functions as second class, they can only be invoked, not be manipulated in other ways. C treats function names as pointers, which enables functions to be passed as parameters, return as return values, and stored in variables; however it does not resolve the nesting issue: C void qsort(void *base, size_t nel, size_t width, int (*compar) (const void *, const void *)); Functional languages treat functions as first-class through closure representations: ML val f = (fn x => x + y); PSU CS320 Fall 17 Week 7&8: Types and Type Systems 28 / 69

Mathematical View of Type Constructors Since types are sets of values, it s convenient to think type constructors as operations on sets. Some can be cleanly expressed: Product (S 1 S 2 ) Sum (S 1 S 2 ) forrepresentingrecord and tuple types forrepresentingunion and enumeration types Mapping (S 1 S 2 ) forrepresentingarray and function types One advantage of this view is that they can be easily and cleanly composed to express more complex type structures, e.g. sum of products, product of sums, etc. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 29 / 69

Algebraic Datatypes A unified approach for defining and representing types based on the mathematical view. Many functional languages use algebraic datatypes We ll use ML to illustrate its type system is considered one of the cleanest and most expressive PSU CS320 Fall 17 Week 7&8: Types and Type Systems 30 / 69

ML Basic Types unit uselikevoid in C to indicate no type bool operators: not, andalso, and orelse int and real can t mix them in operations operators: +, -, *, div(int), /(real), ~(negation) explicit conversion functions: trunc, round, real, etc. string and char operators: ^(concat), #(str-to-char) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 31 / 69

ML Constructed Types Lists sequences of values of a single type. e.g., nil, [1,2,3], 0::[1,2,3] hd[1,2,3], tl[1,2,3], [1]@[2,3] (* basic ops *) Records similar notation as in other languages. e.g., {ID=123, name="john"} : {ID:int, name:string} ID{ID=123, name="john"} (* fetch the ID component *) Tuples special case of records, where a component s position serves as its name. e.g., ("abc",33): string * int #1("abc", 33) (* fetch the first component *) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 32 / 69

ML Functions Functions in ML take just one argument and return just one result. A multi-arg function is just a function with a tuple as arg: fun f(x,y) = x+y; val f = fn : int * int -> int Functions can be curried fun f x y = x+y; val f = fn : int -> int -> int f 10; val it = fn : int -> int it 20; val it = 30 : int Functions can be anonymous (fn (x,y) => x+y) (10,20); val it = 30 : int PSU CS320 Fall 17 Week 7&8: Types and Type Systems 33 / 69

ML Type and Data Constructors Type and data constructors can be used to create arbitrary new data types: datatype bool = true false; datatype day = Mon Tue Wed Thu Fri Sat Sun; Here bool and day are called type constructors; each defines a sum datatype The names of the sum type members (e.g. true, false, Mon, Tue, etc) are called data constructors; they are like tags in a union type PSU CS320 Fall 17 Week 7&8: Types and Type Systems 34 / 69

ML Type and Data Constructors Both data and type constructors can be parameterized: Data constructor with parameters: datatype temperature = F of real C of real; fun temp_convert (F x) = C ((x - 32.0) * 5.0 / 9.0) temp_convert (C y) = F (y * 9.0 / 5.0 + 32.0); temp_convert (F 100.0) => C 37.777777778 temp_convert (C 37.0) => F 98.6 Type constructor with parameters: datatype 'a option = NONE SOME of 'a; val x = SOME 4; (* of type int option *) val y = SOME 1.2; (* of type real option *) fun optdiv a b = if b = 0 then NONE else SOME (a div b); PSU CS320 Fall 17 Week 7&8: Types and Type Systems 35 / 69

ML Type Aliases The keyword, type, is used for giving new names for existing types: type int_signal = int list; val v = [1, 2, 3] : int_signal; type ('a) signal = ('a) list; val v = [1, 2, 3] : (int) signal; val w = [1.1, 2.2, 3.3] : (real) signal; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 36 / 69

Algebraic Datatypes Example: Define a type for integer binary tree with two types of nodes: value-holding interior nodes and empty leave nodes, and create an object for the tree to the right: 3 / \ 1 2 / \ / \ - - - - Solution in C: C // type declarations struct leaf {}; struct node { int i; union tree *t1; union tree *t2; }; union tree { struct leaf *l; struct node *n; }; // creating the tree with required values union tree t0 = {.l = &((struct leaf) {})}; union tree t1 = {.n = &((struct node) {1, &t0, &t0})}; union tree t2 = {.n = &((struct node) {2, &t0, &t0})}; union tree tr = {.n = &((struct node) {3, &t1, &t2})}; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 37 / 69

Algebraic Datatypes Example: Define a type for integer binary tree with two types of nodes: value-holding interior nodes and empty leave nodes, and create an object for the tree to the right: 3 / \ 1 2 / \ / \ - - - - Solution in ML: ML datatype Tree = Leaf Node of int * Tree * Tree; val tr = Node (3, Node (1, Leaf, Leaf), Node (2, Leaf, Leaf)); Type Tree is a sum of two sub-types: a singleton type (denoted by Leaf), aproductofint, Tree, andtree (denoted by Node) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 38 / 69

Type Equivalence If two types, typea and typeb, are equivalent, then objects of these types are interchangeable anywhere one is expected, e.g. typea x; typeb y; x = y; y = x; // valid // valid Consider the following C struct definitions: struct S1 { char x; int y; char z[10]; } struct S2 { char x; int y; char z[10]; } struct S3 { char y; int x; char z[10]; } struct S4 { int y; char x; char z[10]; } Question: Which of these types should be considered equivalent? PSU CS320 Fall 17 Week 7&8: Types and Type Systems 39 / 69

Type Equivalence Which of these types should be considered equivalent? struct S1 { char x; int y; char z[10]; } struct S2 { char x; int y; char z[10]; } struct S3 { char y; int x; char z[10]; } struct S4 { int y; char x; char z[10]; } Possible Answers: 1. All of them They all define records with a char, an int, and an int array. 2. The first three Same as above, plus their components are in the same order. 3. The first two Same as above, plus their component names are the same. 4. None Theyeachhaveadistinctname. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 40 / 69

Type Equivalence There are two distinct type equivalence models: Structural Equivalence Two types are structurally equivalent if their internal structures are the same, e.g. theyarethesameprimitivetype,or they are constructed with the same constructor and their corresponding components are equivalent. Name Equivalence Two types are name equivalent if they have the same name. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 41 / 69

Type Equivalence Back to the Example: struct S1 { char x; int y; char z[10]; } struct S2 { char x; int y; char z[10]; } struct S3 { char y; int x; char z[10]; } struct S4 { int y; char x; char z[10]; } With the structural equivalence model: S1 and S2 are equivalent. S1 and S4 could also be equivalent if the language does not insists on the components order (e.g. ML). S1 and S3 are typically not equivalent, since component names are generally considered as part of a record type. With the name equivalence mode: None of the struct types in the example is equivalent to any other. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 42 / 69

Structural Equivalence Structural equivalence is relatively easy to implement except for recursive types. Consider: type t1 = int * t1 type t2 = int * t2 Are these two types structurally equivalent? The answer is yes. However, a type-checking algorithm needs to use some programming trick to determine it is. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 43 / 69

Name Equivalence Name equivalence is more restrictive, but it gives programmer more refined control over type equivalence: If the programmer wants variables to be of the same type, he/she can declare them with the same type name: typedef struct _S1 {char x; int y; char z[10];} S1; S1 x, y, z; If the programmer wants to distinguish variables whose types are structurally equivalent, he/she can declare them with different type names: type celsius = real; type fahrenheit = real; var x,y: celsius, z: fahrenheit; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 44 / 69

PL s Type Equivalence Models Pascal Pascal uses a variant of name equivalence. Each type declaration defines a new type, unless the right hand side is a simple type name: type t = record a: integer; b real end; type u = record a: integer; b real end; (*!= t *) type v = t; (* just an abbreviation for t *) Each anonymous type expression defines a new type: type t = record a: integer; b real end; var x: t, y: record a: integer; b real end, z,w: record a: integer; b real end; The types of x, y, z are all different, but z and w are the same. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 45 / 69

PL s Type Equivalence Models C C uses structural equivalence for array and function types, but name equivalence for struct, union, andenum types. A typedef declaration defines an alias for an existing type: struct S1 {float x; float y;} a; struct S2 {float x; float y;} b; typedef struct S1 defaults; defaults c; a = b; /* type error */ a = c; /* ok */ C s policy makes it easy to check equivalence of recursive types, which can only be built using structs: struct S1 {int x; struct S1 *y;} a; struct S2 {int x; struct S2 *y;} b; a = b; /* type error */ PSU CS320 Fall 17 Week 7&8: Types and Type Systems 46 / 69

PL s Type Equivalence Models Java Java uses structural equivalence for scalar and array types, but name equivalence for class and interface types. Aclassorinterfacedeclarationcreatesanewtypename,hence a new type, with the exception that a subclass type object may be assigned to a superclass variable. Array size is not part of array type. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 47 / 69

PL s Type Equivalence Models ML ML uses structural equivalence, except that each datatype declaration creates a new type unlike all others. datatype fahrenheit = F of real datatype centigrade = C of real val a = F 150.0 (* F and C are data constructors *) val b = C 150.0 if (a = b)... (* type error *) Note that the use of data constructor is mandatory: val c: fahrenheit = 150.0 (* type error *) This makes it possible to uniquely identify the types of literals. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 48 / 69

Type Conversion Sometimes type equivalence is too strong a requirement for ensuring type correctness. For example, An expression e 1 + e 2 can still be valid even if e 1 and e 2 are not of the exact same type. (e.g. double + int is acceptable in many languages) An assignment x = e is generally valid if e s type can be converted into x s type In these cases, languages type conversion rules are used. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 49 / 69

Type Conversion When a type mismatch occurs in an operation, some languages allow an (explicit or implicit) type conversion to be applied. Explicit conversion Programmer indicates how the conversion should be done through type casting: C int x = (int)3.14 + 5; ML val x = 3.14 + real(5); val y = floor(3.14) + ceiling(2.5) + truncate(3.9); Implicit/Automatic conversion (Coercion) Compiler decides how the conversion should be done based on the language s type coercion rules: C int x = 3.14 + 5; // 3.14 + 5.0 = 8.14 then to 8 PSU CS320 Fall 17 Week 7&8: Types and Type Systems 50 / 69

Type Coercion More type coercion examples: An integer can be coerced into a real: C/Java double d = 2 Ascalarcanbecoercedintoanarray: Fortran 90 b = a * 2.4-1.2 The result of a coercion maybe of a new type: Pascal type typea = 0..20, typeb = 10..20; var a: typea, b: typeb;... a + b... (* a + b is of type integer *) The null pointer in many languages can be coerced into any record or object type PSU CS320 Fall 17 Week 7&8: Types and Type Systems 51 / 69

Type Coercion Coercion rules must be taken into consideration in type checking before issuing a type-mismatch error, the typechecker has to see whether the same expression can be interpreted as correct if a coercion is applied. Question: Is coercion good or bad? There are two opposing philosophies: Maximizing flexibility In C, coercion is the rule; only if no conversion is possible in a type mismatch flagged as an error. Maximizing type security In Pascal, Ada, ML, almost no coercion is provided. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 52 / 69

Type Inference Languages such as ML allow the user to provide incomplete type declaration information; their compiler infers any missing type information from the given declarations. For example, This is a complete type declaration: fun area(length:int, width:int):int = length * width; This an incomplete type declaration: fun area(length, width):int = length * width; The compiler infers as follows: The result type of length * width is int; the only possible condition for the operation * to produce an int result is both operands are of type int; hence,length and width are both int. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 53 / 69

Type Definition Revisit Recall that Type = Set of Values + Operations While built-in types all satisfy this definition, user-defined datatypes do not they specify only the structure of types: C struct stack { int top; int storage[100]; } The above code defines a new struct type, yet there is no definition for any operations. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 54 / 69

Type Definition Revisit Operations can be defined on objects of this type, but they are not part of the data type: C struct stack { int top; int storage[100]; } void push(int i, struct stack *s) { s->storage[(s->top)++] = i; } int pop(struct stack *s) { return s->storage[--(s->top)]; } Consequently, there is no guarantee that the objects of this type are manipulated via those operations only. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 55 / 69

Abstract Datatypes Abstract datatypes (ADTs) are introduced to resolve this issue. An ADT groups a datatype and its operations into a cluster, and sets limits on accessing data objects; in other words, ADT = Datatype + Operations + Encapsulation The operations provide the only interface to access and manipulate the type The structure and the implementation of the type is hidden from the user PSU CS320 Fall 17 Week 7&8: Types and Type Systems 56 / 69

Encapsulation Information Hiding Information hiding is one of the great themes of modern programming language design. It means that the implementation detail of a data object is hidden from the user: User does not need to know the hidden information in order to use the object allows the implementation be handled separately. User is not permitted to directly manipulate the hidden information even if desiring to do so protect the integrity of the data object. With ADT, the programmer has total control on what portion of data objects should be visible, and what operations should be allowed on data objects. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 57 / 69

ADT Illustration Pseudo C abstype Stack { // private int top; int storage[100]; void push(int i, Stack s) { s.storage[(s->top)++] = i; } int pop(stack s) { return s->storage[--(s->top)]; } // public void push(int i, Stack s); int pop(stack s); }... this code looks awfully similar to OOP s class definition! PSU CS320 Fall 17 Week 7&8: Types and Type Systems 58 / 69

ADT in C++ ADTs can be implemented by OOP classes with private state: C++ class Stack { private: int top; int storage[100]; public: Stack() { top = 0; } void push(int i) { storage[top++] = i; } int pop() { return storage[--top]; } } int main() { Stack s; // create a stack s.push(1); // push two elements s.push(2); s.pop(); // pop two elements s.pop(); } PSU CS320 Fall 17 Week 7&8: Types and Type Systems 59 / 69

ADT in ML ML has a built-in construct for ADT: abstype ML exception Empty; abstype stack = Stack of int list (* hidden *) with val newstack = Stack nil; (* public *) fun push(i, Stack l) = Stack (i::l); fun pop(stack nil) = raise Empty pop(stack l) = (hd l, Stack(tl l)) end; (* create a stack and push two elements *) val s = newstack; val s = push(1,s); val s = push(2,s); (* pop two elements *) val (i,s) = pop(s); (* i = 2 *) val (i,s) = pop(s); (* i = 1 *) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 60 / 69

ADTs vs. OOP Classes Question: How do ADTs differ from OOP classes? There is a superficial syntactic difference: in most OO languages, each function defined for a class object takes the object itself as an implicit argument: s.push(x); push(s,x); // OO style, s is an implicit arg // ADT style, s is an explicit arg There is a corresponding change in metaphor: instead of applying functions to values, we talk of sending messages to objects OO languages have some form of inheritance PSU CS320 Fall 17 Week 7&8: Types and Type Systems 61 / 69

Modules Generalized from the ADT concept, the primary purpose of modules is to provide information hiding at a large granularity level. For example, a large program can be divided into several modules, each with a separate namespace. Generally a module consists of two separate parts: An interface, consisting of a set of names and their types An implementation, providing (hidden) detailed implementation for every entry in the interface One advantage of this separation is that clients of module X can be compiled on the basis of the information in the interface of X, without needing access to the the implementation of X (which might not even exist yet!) PSU CS320 Fall 17 Week 7&8: Types and Type Systems 62 / 69

ADTs vs. Modules An ADT is one particular kind of modules, containing: a single abstract type, with its representation a collection of operators, with their implementations Modules, more generally, might contain: multiple type definitions arbitrary collections of functions (not necessarily abstract operators on the type) variables, constants, exceptions, etc. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 63 / 69

Modules Example Ada Ada s modules are called packages. Specifications give the names and representations of types, and function signatures in the package: package Stack is type Stack(size: positive) is private; procedure push(i: in integer; s: in out Stack); procedure pop(i: out integer; s: in out Stack); private type Stack(size: positive) is record top: integer range 0..size := 0; storage: array (1..size) of integer; end record; end Stack; Bodies give the definitions of the functions, and possibly additional definitions: package body Stack is procedure push(...) is begin... statements... end; procedure pop(...) is begin... statements... end; procedure other(...) is begin... statements... end; end Stack; PSU CS320 Fall 17 Week 7&8: Types and Type Systems 64 / 69

Modules Example Modula-2 Modula-2 uses a pointer-based interface for its modules, hence there is no need to include type representations in the specification. Specification: DEFINITION MODULE stack; TYPE stacktype; (* a pointer type *) PROCEDURE push (VAR stk: stacktype; elm: INTEGER); PROCEDURE pop (VAR stk: stacktype) : INTEGER; END stack. Implementation: IMPLEMENTATION MODULE stack; CONST max = 100; TYPE stacktype = POINTER TO RECORD top: [0..max]; storage: ARRAY [1..max] OF INTEGER; END; PROCEDURE push (...); BEGIN... statements... END; PROCEDURE pop (...) : INTEGER; BEGIN... statements... END; END stack. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 65 / 69

Modules Example C? C provides a primitive form of (unnamed) modules, i.e., files: typedef struct stack { int top; int storage[100]; } Stack; void push(int i, Stack *s); int pop(stack *s); void push(int i, Stack *s) { s->storage[(s->top)++] = i; } int pop(stack *s) { return s->storage[--(s->top)]; } The top-level declarations in a.c file are its components By default, all components are exported, but they can be hidden using the static specifier The.h file serves as a rough kind of interface specification Manual methods must be used to ensure that such files are accurate and complete, and that they are used where needed The major defect of C s approach is that all the names exported from all the files linked into a program occupy one global name space, and hence must be unique. There is no dot notation. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 66 / 69

Parameterized Modules Languages that support modules typically also support parameterized modules, which allow the same set of data and operations to be applied to different types of objects. Parameterized modules is a static mechanism in most languages the parameters are type-checked by compiler; local names are resolved by static scope rules; etc. Sometimes the behavior of the code differs significantly depending on the types being manipulated. The following is an example of parameterized modules. PSU CS320 Fall 17 Week 7&8: Types and Type Systems 67 / 69

C++ Template Example Template definition: template <class Type> class stack { public: stack() { storage = new Type [100]; size = 0; } void push(type elm) { storage[size++] = elm; } Type pop() { return storage[--size]; } private: int size; Type *storage; } Instantiation of the template: void main() { stack<int> s1; stack<double> s2; s1.push(5); s2.push(4.3); } PSU CS320 Fall 17 Week 7&8: Types and Type Systems 68 / 69

Summary Types are a way to classify data and regulate operations in a program There are multiple views of a type: Denotational: A type is just a set of values that share some properties Contextual: A type is specified by an interface, i.e. a set of operations that apply to its values Structural: A type is uniquely specified by its construction structure with the type constructors A language s type system defines a set of semantic rules for regulating its types usage, including equivalence, conversion, and inference Most modern languages strive to be strongly-typed by designing their type system to catch all type errors ADTs and modules are important mechanisms for building reusable, modular code PSU CS320 Fall 17 Week 7&8: Types and Type Systems 69 / 69