Introduction Primitive Composite Structured Abstract Introduction Introduction Data Type is a Collection of Data Objects Possible r-values for a memory cell Set of operations on those objects Descriptor Collection of attributes for a variable Binding a Data Type binds: Range of possible values Set of operations How the data will be stored Structure of descriptor dope vector Signature implications of operations on the data Primitive (Scalar) Not defined in terms of other types Usually ties to hardware implementation Composite Data type made up of similar primitives Complex structures created by compiler
Structured An aggregate of other data types Heterogeneous Composite of Abstract Data Type Combination of data and methods that operate on that data Common Primitive Objects and Classes Primitive Types Boolean Character Integer Decimal (historical) Boolean True or False (1 or 0) Improves program readability Implementation Single Bit - saves storage Byte - usually faster access C implemented as integer False - 0 True - any other value Character Stored as either ASCII Codes - 7 bits (127 core characters) Unicode - 16 bits (international characters) Possible Values Character Data (A...Z, a...z) Numeric Digits (0...9) Special Symbols (! @ # $ % ^ &...) Escape Codes (nul, cr, ack, bs, sp, esc,...) Integer Representation of 2 s Compliment Commonly a four-byte representation Range: -(2 31 ) to 2 31-1 Options: signed, unsigned, long, short, byte
Integer 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 +1 1 4 0 1 0 1 0 1 1 0 16 2 64 4 85 16 Sign Bit 64 Sign Bit -86 IEEE Standard 754 for storage 32- and 64-bit precisions Numbers consist of three fields Sign Field Exponent Mantissa Sign Field (S) One bit Zero is positive Exponent (E) Excess-127 notation Values range from 0 to 255 (for 8-bit exponent) Represent exponents ranging from -127 to 128 Exponent is biased Mantissa (M) First bit of mantissa is always one It is not explicitly stored Inserted by hardware Effectively yields and extra bit of precision! Parameters Value! E=255 and M! 0 An invalid number! E=255 and M = 0 "! 0<E<255 2 {E-127} (1.M)! E=0 and M! 0 2 {-126}.M! E=0 and M=0 0 32-bit precision 8-bit Exponent, 23-bit Mantissa Range 10-38 to 10 38 64-bit precision (Double Precision) 11-bit Exponent, 52-bit Mantissa Range 10-308 to 10 308 IEEE 32-bit Representation Sign bit Exponent 8-bits Mantissa 24-bits 00000000000000000000000000000000 +1.0 2 0 *1 2 (127-127) *1.0
IEEE 32-bit Representation Sign bit Exponent 8-bits Mantissa 24-bits 00111111100000000000000000000000 IEEE 32-bit Representation Sign bit Exponent 8-bits Mantissa 24-bits 00111111110000000000000000000000 +1.0 2 0 *1 2 (127-127) *1.0 +1.5 2 0 *1.5 2 (127-127) *1.1 IEEE 32-bit Representation Sign bit Exponent 8-bits Mantissa 24-bits 11000000101000000000000000000000-5 2 2 *1.25 2 (129-127) *1.01 Decimal Binary Coded Decimal (BCD) Stores a fixed number of digits One or Two digits stored per byte Nine (9) is 1001 binary (four bits) For business applications (COBOL) Very accurate Limited Range Wastes memory 02 UNIT-PRICE PICTURE IS 999V99. 02 BAL-ON-HAND PICTURE IS 9(5). Composite Composite Types Increase Readability Common Implementations Ordinal (Enumerated) Types Sub-range String Data Type Arrays
Enumerated Types List (enumerate) possible data values Values associated with positive integers Values become Symbolic Constants Greatly increase program readability Colors, Months, Days of Week Increase Reliability Compiler can check operations and ranges Enumerated Types C, C++ typedef enum {RED, BLUE, GREEN} colortype; colortype color = RED; Pascal type colortype = (RED, BLUE, GREEN); var color : colortype; color := BLUE; Java Enumerated interface Pascal, C, C++ do not allow reuse of names across type definitions. Sub-Range Types Contiguous subsequence of ordinal type Behaves as parent type Increased reliability and readability Compiler can insert code to restrict range Pascal: type posint = 0.. MAXINT; C++: Range<0, MAXINT> i = x; Composed of a character sequence ASCII Characters (7/8-bit) Unicode Characters (16-bit) String Specific Operations Increase Writability Instantiating Strings test vs test Concatenation & + strcat() Relational Operations < > Lexicographical Ordering (by code) Java -.compareto() method Input/Output Formating
Substring Operations Selection based on position Selection based on pattern Substring Assignment Overlay Issue str1 = stringtest str1[2:5] = str1[1:4] print str1 Substring Assignment Overlay Issue str1 = stringtest str1[2:5] = str1[1:4] print str1 What s printed? sssssgtest - if character by character copy sstrigtest - if block copy Memory Allocation for Strings Static Length Strings Limited Dynamic Length Strings Dynamic Length Strings Static Length String Fixed Declared Length FORTRAN, COBOL, Pascal Two parts of a string Static String Length (14) Address Descriptor Record (Compile- Time) Padded with blanks Data Storage R E L A T I V I T Y Most implementations output entire declared length. Limited Dynamic Length String Variable Length to Declared Bounds Limited Dynamic String Maximum Length (14) Current Length (10) Address Length of Current String R E L A T I V I T Y Descriptor Record (Compile-Time) Dynamic Maintenance Limited Dynamic Length String Variable Length to Declared Bounds C, C++ Limited Dynamic String Maximum Length (14) Address R E L A T I V I T Y C & C++ do not track current length in descriptor. Instead, string is null terminated -- \0 or 0x00 hex
Dynamic Length String Unbound Length Perl, Javascript, PHP Dynamic String Address String is always null terminated R E L A T I V I T Y Descriptor Record (Run-Time) Only characters in current string are output.!provides potential space savings!high cost in storage management Arrays Arrays Array Concepts Array Storage Array Slices Associative Arrays Arrays An aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate relative to the first element. Ordered sequence of identical objects Ordering determined by a scalar object Usually integer or enumerated data Referred to as the Subscript or Index Arrays Design Issues What types are legal for subscripts? Are subscripting expressions in element references range checked? When are subscript ranges bound? When does allocation take place? What is the maximum number of subscripts? Can array objects be initialized? Are any kind of slices allowed? Arrays Array Initialization List of values placed in array in the order in which the array elements are stored in memory Indexing Specifying an element s position Mapping function from indices to elements map(array_name, index_value) " an element
Arrays Array Operations APL - all about arrays Assignment RHS can be an aggregate constant or an array name Concatenation for all single-dimensioned arrays Relational operators what is exact meaning?)) Intrinsics (functions or operators) matrix multiplication, vector dot product Array Storage Storage Allocation Static Fixed Stack Dynamic Stack Dynamic Heap Dynamic Array Storage Static Loaded into memory at program load Provides execution efficiency No allocation/deallocation penalty FORTRAN 77 Array Storage Fixed Stack Dynamic Subscript range is statically bound [ ] Storage is bound at elaboration (creation) Activation Record Instance Space efficiency C/C++ locals not declared static Array Storage Stack Dynamic Subscript range and storage are dynamic Becomes fixed once variable is instatiated Fixed for lifetime of variable Flexible Array size need not be known until it is to be used Array Storage Heap Dynamic Subscript range and storage are dynamic Bindings are never fixed All Java arrays (objects) are heap dynamic PHP, Perl & Javascript Arrays can change size as needed
To store and retrieve data values Determine element s L-value (address) Array subscript range Upper & Lower Bounds array[l1:u1, L2:U2] Lower bound is often 0 (zero) Array Descriptor (Dope Vector) Single-dimension array Multi-dimension array Determining Element s Address var arr: array[-2.. 2, -3.. 3] of int; arr[1, 2] := 6; Allocate storage beginning at! total_bytes =(U1-L1+1)*(U2-L2+1)*element size L-value access function: es - (element size) based on element type: Integer - 4 bytes Float - 4 bytes (single) or 8 bytes (double) Char - 1 byte Structures - based on size of pointer (4 bytes) L-Value Access Function: row_size = numberofelementsinrow * elementsize row_size = (U2 - L2 + 1) * es row = i - L1 col = j - L2 L-Value Access Function: L-value(arr[i, j]) =! + row * row_size + col * es For the statement: arr[1,2] = 6; Where is the 6 stored?
# i arr[-2, -3] arr[l1, L2] arr[l1, L2+1] arr[l1, L2+2] # j arr[l1+1, L1] arr[l1, U2] arr[-2, 3] Logical Storage arr[-2.. 2, -3.. 3] arr[l1 : U1, L2 : U2] arr[i, j] arr[1, 2] arr[1, 2]? Actual Storage L-value of arr[1,2] => L-value(arr[i, j]) = # + rows * row_size + cols * es = # + (i - L1) * row_size + (j - L2) * es = # + (i-l1) * (U2-L2+1) * es + (j-l2) * es = # + es * ( (i-l1) * (U2-L2+1) + (j-l2) ) = # + 4 * ( (1-(-2)) * (3-(-3)+1) + (2-(-3)) ) = # + 4 * ( (3) * (7) + (5) ) = # + 4 * (26 element offset) # 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Virtual Origin (VO) Element at i= 0; j = 0; $ arr[0, 0] L-value(arr[0, 0]) = # + es * ( (i-l1) * (U2-L2+1) + (j-l2) ) = # + 4 * ( (0-(-2)) * (3-(-3)+1) + (0-(-3)) ) = # + 4 * ( (2) * (7) + (3) ) = # + 4 * (17 element offset) = # + 68 byte offset # VO Dope Vector use a dope vector to access an array element VO - virtual origin (address) row size element size ARRAY STORAGE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Array Slices Array Slices Slices A substructure of an array - row, column, plane A referencing mechanism Very useful in languages with array operations Slice Examples (FORTRAN 90): INTEGER MAT (1:4, 1:4) MAT(1:4, 1) - the first column MAT(2, 1:4) - the second row
Associative Arrays An unordered collection of data elements indexed by an equal number of values (keys) Associative Arrays in Perl (PHP is similar) Declare and Initialize %hi_temps = ("Monday" => 77, "Tuesday" => 79, ); Index and Assign value $hi_temps{"wednesday"} = 83; Remove Elements delete $hi_temps{"tuesday"}; Structured Records and Unions Records A heterogeneous aggregate of data elements where individual elements are identified by names Individual Elements - Fields struct date { char *month; int day; int year; }; Records C / C++ Declarations: struct date { char *month; int day; int year; }; struct date mydate; Structure Type typedef struct { char *month; int day; int year; } datetype; datetype mydate; User Type Definition Records C / C++ Use: Records Record Descriptor Compile Time datetype mydate; mydate.day = 13; mydate.year = 2004; datetype* pdate; pdate->day = 13; pdate->year = 2004; Field Access Dereferencing (Pointers)
Records Comparing Records and Arrays Array element access is slower Subscripts are dynamic (data[i]) Field names are static (mydate.day) Unions Variables allowed to store different type values at different times during execution Pascal: type intreal = record tagg : Boolean of true : (blint : integer); false : (blreal : real); end;