COS 140: Foundations of Computer Science

COS 140: Foundations of Computer Science Variables and Primitive Data Types Fall 2017 Introduction 3 What is a variable?......................................................... 3 Variable attributes.......................................................... 4 Binding................................................................. 5 Variable names 6 Variable names............................................................ 6 Name length.............................................................. 9 Special names............................................................. 10 Assignment 11 Data types 13 Primitive data types 14 Numeric types............................................................ 15 Booleans................................................................ 17 Characters............................................................... 18 Arrays.................................................................. 20 Records................................................................. 28 Pointers................................................................. 29 1

Homework Reading: Chapter 16 Homework: Exercises at end of chapter Homework due 10/23 Copyright c 2002 2017 UMaine Computer Science Department 2 / 29 Introduction 3 / 29 What is a variable? In this lecture: imperative/oo languages A variable is an area of memory that: can have a value can have its value changed by the program Different sizes: byte, word, multiple words Copyright c 2002 2017 UMaine Computer Science Department 3 / 29 2

Variable attributes Name what do we call it? Address where does it live in memory? Type what kind of thing can it hold? Value contents of the variable Scope who can see this variable? Lifetime duration of program, or shorter period? Copyright c 2002 2017 UMaine Computer Science Department 4 / 29 Binding Associates an attriburte value with an attribute Not just value attribute Two types: Static binding: occurs before run-time, not changed afterward Allows compiler to check for errors, but......need information about the binding at compile time Dynamic binding: takes place/is changed at run-time Copyright c 2002 2017 UMaine Computer Science Department 5 / 29 3

Variable names 6 / 29 Variable names Name: identifier associated variable Same name different variables in different contexts Different names same variable Unnamed variables: access via pointer Copyright c 2002 2017 UMaine Computer Science Department 6 / 29 Variable names Strings (symbols) in a symbol table Symbol tables created at compile/interpretation time Associate names with entities (e.g., variables) Copyright c 2002 2017 UMaine Computer Science Department 7 / 29 4

Variable names Which characters to allow? Have to recognize names during lexical analysis no spaces Sometimes special initial character e.g., a letter For enhanced readability, may allow connectors (e.g., -, ), case-sensitivity Copyright c 2002 2017 UMaine Computer Science Department 8 / 29 Length of variable names Names should be meaningful longer names But symbol table should be reasonable size, easy to maintain shorter names BASIC: one letter + optional number Can t complain about the size! Readable for mathematical applications, but not much else Limited number of characters e.g., FORTRAN (6 or 31), C (63 significant) Trades-off readability and wasted space Table maintenance easy, but whatever the size, someone always wants more Unlimited number of characters e.g., Ada, Lisp Complete flexibility, but......potentially great deal of space, hard to maintain Copyright c 2002 2017 UMaine Computer Science Department 9 / 29 5

Special names Some names not allowed or limited Reserved words can only be used in language-defined context(s) if in C, e.g. Predefined words defined in the language, but can be changed (e.g., Ada, C [libraries]) Keywords special meaning in some contexts, but can be used as a name (e.g., for in Lisp, if in FORTRAN) Copyright c 2002 2017 UMaine Computer Science Department 10 / 29 Assignment 11 / 29 Giving variables values Assignment construct binds a value to a variable Often, a construct such as: x = a+b x s value is bound to result of a+b Not algebra: can have: x = x+1 Copyright c 2002 2017 UMaine Computer Science Department 11 / 29 6

Assignment construct examples C, Java, Python... x = x + 1; Pascal x := x + 1 Lisp (setf x (1+ x)) R x <<- x + 1 or x + 1 -> x Smalltalk x x + 1 Forth x 1 + x! COBOL ADD X 1 GIVING X TCL set x x + 1 Copyright c 2002 2017 UMaine Computer Science Department 12 / 29 Data types 13 / 29 Data types All data has a type: integer, floating point, etc. Type species size, interpretation, operations for data All variables have a type attribute Depending on language, variables type static or dynamic Determining variable type: Declared somewhere in program (static, explicit) Inferred by compiler/interpreter (static, implicit; dynamic) Copyright c 2002 2017 UMaine Computer Science Department 13 / 29 7

Primitive data types 14 / 29 What are primitive data types? Primitive data types: defined by language Not (usually) defined in terms of other types Building blocks of new types Unstructured/atomic data types (scalars): Integers Floating point numbers Booleans Characters Pointers Structured data types: Strings Arrays Records Copyright c 2002 2017 UMaine Computer Science Department 14 / 29 Integers Integer type represents subset of the integers Z (duh) Most common data type You ve seen: sign-magnitude, two s complement Most languages: several sizes (byte, 2 byte, 4 byte...) Can t represent all possible integers in computer instead i...j, where i 0 j For two s complement, with n bits, can represent 2 n 1...2 n 1 1 E.g., 8 bits can represent 10000000 2 (= 128 10 ) to 01111111 2 (= 127 10 ). Some languages: unsigned integers (C) Copyright c 2002 2017 UMaine Computer Science Department 15 / 29 8

Floating point numbers Represent subset of real numbers R Usually two sizes Two parts: fraction and exponent E.g.: 0.1101 2, exponent 101 2 Fractional part = 1 2 + 1 4 + 0 8 + 1 16 = 0.8125 10 Exponent = 5, so multiply fraction by 2 5 So number is 0.8125 32 = 26.0 10 Can t represent all numbers accurately e.g., π, 1 3 Copyright c 2002 2017 UMaine Computer Science Department 16 / 29 Booleans True/false values Good for flags (switches), conditions to be checked Sometimes languages provide as primitive data type If not: 0 is false, everything else true (C) 0,, [], etc., is false, everything else is true (Python) Representation: smallest efficiently-addressable unit (often byte) Copyright c 2002 2017 UMaine Computer Science Department 17 / 29 9

Characters Character represented as a pattern of 1s and 0s Representations: ASCII 7 bits code; 8-bit variants of this (e.g., ISO 8859 1) are the most common codes used E.g.: a = 33, b = 34,...; space = 32 Unicode 16 bits (or longer); accommodates other alphabets and symbols E.g.: - codepoints 47196, 51060; ; EBCDIC 8 bits; old IBM code Copyright c 2002 2017 UMaine Computer Science Department 18 / 29 Strings Sequence of characters: textual data Special operators (e.g., substring, concatenation), special relational operators, sometimes special assignment Lengths: Fixed length: always same size Limited-dynamic can change, but there s a maximum (e.g., C) Length + sequence (often array) of characters Null-terminated sequence of characters Dynamic can change without limit (e.g., Lisp) more flexible, but overhead for allocation/deallocation Copyright c 2002 2017 UMaine Computer Science Department 19 / 29 10

Arrays Sometimes need to represent groups of related data items (e.g., integers, characters, etc.) An array is a data type that stores homogeneous data in contiguous memory Can be single-dimensional (vectors) or multi-dimensional Use index to find element Value of index doesn t affect time taken to find element random access storage Copyright c 2002 2017 UMaine Computer Science Department 20 / 29 Design issues for Arrays Syntax for array indices e.g., Temperatures(20) or Temperatures[20] Subscripts: what is the lower bound? bounds checking or not? must subscripts be integers? Number of dimensions allowed Usually no real limits C: limits to one dimension, but allows array elements to be themselves arrays Copyright c 2002 2017 UMaine Computer Science Department 21 / 29 11

Array descriptors What information is needed by the programming language/program about an array? Base address where is the first element Element type Index type doesn t have to be integer for some languages Index lower, upper bounds Number of locations in array Copyright c 2002 2017 UMaine Computer Science Department 22 / 29 Finding address of array element Element location (or address, A) depends on base address (B), index (I), index lower bound (L), and element size (S) A = B +(I L) S Given an array Taxes whose start location is 1024, element type is a long integer (8 bytes), and whose indices start at 0, find location of element Taxes[15] 1024 1028 1032... Array "Taxes" Taxes[0] Taxes[1] Taxes[15] A = 1024+(15 0) 8 = 1144 Copyright c 2002 2017 UMaine Computer Science Department 23 / 29 12

Another example Suppose array Foo has elements of a type that requires 6 bytes to store, that it begins at location 4096, and that its indices begin at 10. What is the address of Foo[201]? Answer: A = B +(I L) S = 5242 = 4096+(201 10) 6 Copyright c 2002 2017 UMaine Computer Science Department 24 / 29 Optimizing address computation Can write expression so static parts can be calculated at compile time and stored in a constant: A = B +(I L) S = B +(I S) (L S) = (I S)+(B L S) constant part Copyright c 2002 2017 UMaine Computer Science Department 25 / 29 13

Multidimensional arrays Can have more than 1 dimension E.g., 10 10 array is 2D, 100 entries, 10 rows, 10 columns Indices: [row,column] or [row][column] How to store dimensions? Row-major order: Rows are kept together, elements stored one row after another T: array 1..10, 1..10 of integers row 1 row 2 row 10... Column-major order: keeps columns together Row-major probably more common Copyright c 2002 2017 UMaine Computer Science Department 26 / 29 Address calculation Array T is an 2D array of temperatures taken over a 1-meter grid, 100 m on a side Assume row-major order, float data type (assume 4 bytes), base address for T = 2048, indices each start at 0 (so 0..99) What is the address of T[20,15]? First: what is start of row 20? A [20,0] = B +(I L) S = 2048+(20 0) (4 100) = 10,048 Next, find offset of element in row 20 same kind of calculation: A [20,15] = 10,048+(I L) S = 10,048+(15 0) 4 = 10,108 Copyright c 2002 2017 UMaine Computer Science Department 27 / 29 14

Heterogenous types: Records, structs Records contain heterogeneous related data E.g., data about employee: name (string), address (string), salary (fixed-point integer), height (float),... Most languages support this type: struct in C, defstruct in Lisp, record in Pascal, etc. Design issues: How are fields selected? How is field checked? Can we assign one structure to another? How to implement e.g., how to find element from selector? (field must have type, offset) Copyright c 2002 2017 UMaine Computer Science Department 28 / 29 Pointers These point to an object: contain its address Used for user-created dynamic variables (from heap) Used for indirect addressing (e.g., C) abstraction of assembly language s indirect addressing mode Can assign to and through them: int a = 3; // integer int* p; // pointer to integer p = &a; // p = a s address *p = 4; // a now = 4 Often used (in C, e.g.) to access array elements int a[100]; // 100-element array of ints int* p = a; // p = addr of a s start (no &) for (int i=0;i<100;i++) { *p = 0; p++; } Copyright c 2002 2017 UMaine Computer Science Department 29 / 29 15