Data Representa5on. CSC 2400: Computer Systems. What kinds of data do we need to represent?

CSC 2400: Computer Systems Data Representa5on What kinds of data do we need to represent? - Numbers signed, unsigned, integers, floating point, complex, rational, irrational, - Text characters, strings, - Images pixels, colors, shapes, - Sound - Logical true, false - Instructions - q Data type: - representation and operations within the computer 1

Word-Oriented Memory Organization q esses Specify Byte Locations - ess of first byte in word - esses of successive words differ by 4 (32-bit) or 8 (64-bit) 32-bit 64-bit Words Words Bytes. 0000?? 0004?? 0008?? 0012?? 0000?? 0008?? 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 Data Representations q Sizes of C Objects (in Bytes) C Data Type Sparc/Unix Typical 32-bit Intel IA32 int 4 4 4 long int 8 4 4 char 1 1 1 short 2 2 2 float 4 4 4 double 8 8 8 long double 8 8 10/12 char * 8 4 4 (or any other pointer) 2

Byte Ordering q How should bytes within multi-byte word be ordered in memory? q Conventions - Sun s, Mac s are Big Endian machines o Least significant byte has highest address - Alphas, PC s are Little Endian machines o Least significant byte has lowest address Byte Ordering Example q Big Endian - Least significant byte has highest address q Little Endian - Least significant byte has lowest address q Example - Variable x has 4-byte representation 0x01234567 - ess given by &x is 0x100 Big Endian Little Endian 0x100 0x101 0x102 0x103 01 23 45 67 0x100 0x101 0x102 0x103 67 45 23 01 3

Integers Unsigned Integers q An n-bit unsigned integer represents 2 n values: from 0 to 2 n -1. 2 2 2 1 2 0 0 0 0 0 0 0 1 1 0 1 0 2 0 1 1 3 1 0 0 4 1 0 1 5 1 1 0 6 1 1 1 7 4

Signed Integers q How do computers differen1ate between posi1ve and nega1ve integers? - Posi1ve integers have most significant bit 0 - Nega1ve integers have most significant bit 1 q Nega1ve integer representa1ons: 1. Sign Magnitude 2. One s Complement 3. Two s Complement 1. Sign Magnitude q Use the leimost bit to store the sign - Zero for posi1ve number - One for nega1ve number q Examples 0 0 1 0 1 1 0 0 è 44 1 0 1 0 1 1 0 0 è -44 Sign Magnitude q Hard to do arithme1c this way, so it is rarely used - What is the result of 44 44? 5

1. Sign Magnitude (contd.) 0 0 1 0 1 1 0 0 è 44 1 0 1 0 1 1 0 0 è -44 Sign Magnitude q For numbers represented on n bits: - Range of posi1ve integers: from 0 to (2 n-1 1) - Range of nega1ve integers: from (2 n-1 1) to 1 2. One s Complement q LeImost bit is 0 for posi1ve numbers 0 0 1 0 1 1 0 0 è 44 q To obtain the corresponding nega1ve number (- 44), flip every bit: 1 1 0 1 0 0 1 1 è -44 q So - 44 is the one s complement of 44. 6

2. One s Complement (contd.) q What is the result of 44 44? 0 0 1 0 1 1 0 0 ( 44) 1 1 0 1 0 0 1 1 (-44) q Issue: two different representa1ons for zero 3. Two s Complement q LeImost bit is 0 for posi1ve numbers 0 0 1 0 1 1 0 0 è 44 q To obtain the corresponding nega1ve number - 44, add 1 to the one s complement of 44: 1 1 0 1 0 0 1 1 è one s complement + 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 è two s complement 7

3. Two s Complement (contd.) q What is the result of 44 44? 0 0 1 0 1 1 0 0 ( 44) 1 1 0 1 0 1 0 0 (-44) q Used by most computer systems q For numbers represented on n bits: - Range of posi1ve integers: - Range of nega1ve integers: from 0 to (2 n-1 1) from 2 n-1 to 1 Converting Two s Complement to Decimal 1. If leading bit is one, take two s complement to get a positive number. 2. Add powers of 2 that have 1 in the corresponding bit positions. 3. If original number was negative, add a minus sign. X 01101000 two 2 6 +2 5 +2 3 64+32+8 104 ten Assuming 8-bit 2 s complement numbers. n 2 n 0 1 1 2 2 4 3 8 4 16 5 32 6 64 7 128 8 256 9 512 10 1024 8

More Examples X 00100111 two 2 5 +2 2 +2 1 +2 0 32+4+2+1 39 ten X 11100110 two -X 00011010 2 4 +2 3 +2 1 16+8+2 26 ten X -26 ten n 2 n 0 1 1 2 2 4 3 8 4 16 5 32 6 64 7 128 8 256 9 512 10 1024 Assuming 8-bit 2 s complement numbers. Two s Complement Signed Integers q Most significant bit is sign bit and has weight 2 n-1. q Range of an n-bit number: -2 n-1 through 2 n-1 1. - The most negative number (-2 n-1 ) has no positive counterpart. -2 3 2 2 2 1 2 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 2 0 0 1 1 3 0 1 0 0 4 0 1 0 1 5 0 1 1 0 6 0 1 1 1 7-2 3 2 2 2 1 2 0 1 0 0 0-8 1 0 0 1-7 1 0 1 0-6 1 0 1 1-5 1 1 0 0-4 1 1 0 1-3 1 1 1 0-2 1 1 1 1-1 9

Number Representa5ons Review q LeImost bit zero indicates q Sign Magnitude: - nega1ve values q One s Complement: - nega1ve values q Two s complement: - nega1ve values positive number most significant bit 1 has weight 0 are the 1 s complement of positive values (flip every bit) are the 2 s complement of positive values most significant bit 1 has weight 2 n-1 Fill in the Table Bit Pattern Value (Sign Magnitude) Value (One s Complement) Value (Two s Complement) 000 001 010 011 100 101 110 111 10

Ques5on q What value does 10011001 represent? ASCII 11

The ASCII Code American Standard Code for Information Interchange 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI 16 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 32 SP! " # $ % & ' ( ) * +, -. / 48 0 1 2 3 4 5 6 7 8 9 : ; < >? 64 @ A B C D E F G H I J K L M N O 80 P Q R S T U V W X Y Z [ \ ] ^ _ 96 ` a b c d e f g h i j k l m n o 112 p q r s t u v w x y z { } ~ DEL Lower case: 97-122 and upper case: 65-90 E.g., a is 97 and A is 65 (i.e., 32 apart) char Constants q C has char constants (sort of) q Examples Constant Binary Representation (assuming ASCII) Note 'a' 01100001 letter '0' 00110000 digit '\x61' 01100001 hexadecimal form Use single quotes for char constant Use double quotes for string constant * Technically 'a' is of type int; automatically truncated to type char when appropriate 12

More char Constants Escape characters Constant Binary Representation (assuming ASCII) Note '\a' 00000111 alert (bell) '\b' 00001000 backspace '\f' 00001100 form feed '\n' 00001010 newline '\r' 00001101 carriage return '\t' 00001001 horizontal tab '\v' 00001011 vertical tab '\\' 01011100 backslash '\?' 00111111 question mark '\'' 00100111 single quote '\"' 00100010 double quote '\0' 00000000 null Used often Interesting Properties of ASCII Code q What is relationship between a decimal digit ('0', '1', ) and its ASCII code? q What is the difference between an upper-case letter ('A', 'B', ) and its lower-case equivalent ('a', 'b', )? q Given two ASCII characters, how do we tell which comes first in alphabetical order? q Are 128 characters enough? (http://www.unicode.org/) 13

Other Data Types q Floating Points - IEEE representation, to be covered later q Text strings - sequence of characters, terminated with NULL (0) q Image - array of pixels o monochrome: one bit (1/0 black/white) o color: red, green, blue (RGB) components (e.g., 8 bits each) o other properties: transparency q Sound - sequence of fixed-point numbers Limits of the Machine: How much memory space for data? 14

Storage Units 1 bit smallest unit of memory 1 byte 8 bits 4 bytes 1 word (system dependent) Words q On most machines, bytes are assembled into larger structures called words, where a word is usually defined to be the number of bits the processor can operate on at one 1me. q Some machines use four- byte words (32 bits), while some others use 8- byte words (64 bits) and some machines use less conven1onal sizes. 32-bit 64-bit Words Words Bytes. 0000?? 0004?? 0008?? 0012?? 0000?? 0008?? 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 15

The sizeof Operator Category sizeof Operators sizeof(type) sizeof(expr) q Unique among operators: evaluated at compile- 1me q Evaluates to type size_t; on tanner, same as unsigned int q Examples int i 10; double d 100.0; sizeof(int) /* On tanner, evaluates to 4 */ sizeof(i) /* On tanner, evaluates to 4 */ sizeof(double) /* On tanner, evaluates to 8 */ sizeof(d) /* On tanner, evaluates to 8 */ sizeof(d + 200.0) /* On tanner, evaluates to 8 */ Determining Data Sizes q To determine data sizes on your computer #include <stdio.h> int main() { printf("char: %d\n", (int)sizeof(char)); printf("short: %d\n", (int)sizeof(short)); printf("int: %d\n", (int)sizeof(int)); printf("long: %d\n", (int)sizeof(long)); printf("float: %d\n", ); printf("double: %d\n", ); printf("long double: %d\n", ); return 0; } q Output on tanner char: 1 short: 2 int: 4 long: 4 float: 4 double: 8 long double: 16 16

Overflow: Running Out of Room q Adding two large integers together - Sum might be too large to store in available bits - What happens? 01000 (8) 11000 (-8) + 01001 (9) + 10111 (-9) 10001 (-15) 01111 (+15) Assuming 5-bit 2 s complement numbers. q We have overflow if: - signs of both operands are the same, and - sign of sum is different. Overflow q Unsigned integers - All arithme1c is modulo arithme1c - Sum would just wrap around q Signed integers - Can get nonsense values - Example with 16- bit integers o Sum: 10000+20000+30000 o Result: - 5536 17

Try It Out q Write a program that computes the sum 10000+20000+30000 Use only short int variables in your code: short int a 10000; short int b 20000; short int c 20000; short int sum a + b + c; printf("sum %d\n", sum); Exercise q Assume only four bits are available for represen1ng integers, and signed integers are represented in 2 s complement. q Compute the value of the expression 7 + 7 18

Casting Signed to Unsigned q C Allows Conversions from Signed to Unsigned short int x 15213; unsigned short int ux (unsigned short) x; short int y -15213; unsigned short int uy (unsigned short) y; q Resulting Value - No change in bit representation - Nonnegative values unchanged o ux 15213 - Negative values change into (large) positive values o uy 50323 Try It Out q C code: char a 0xFF; unsigned char b 0xFF; printf("a %d\n", a); printf("b %d\n", b); 19

Int to Char? Try It Out #include <stdio.h> int main() { char c 0x81; int i; } i c; printf(" integer %x\n character %x\n", i, c); i 0x87654321; c i; printf(" integer %d\n character code %d\n", i, c); return 0; C vs. Java: Cast Conversions q Java: demo1ons are not automa1c C: demo1ons are automa1c int i; char c; i c; /* Implicit promotion */ /* Sign extension in Java and C */ c i; /* Implicit demotion */ /* Java: Compiletime error */ /* C: OK; truncation */ c (char)i; /* Explicit demotion */ /* Truncation in Java and C */ 20

What did we learn? q Computer represents everything in binary - Integers, floa1ng- point numbers, characters, - Pixels, sounds, colors, etc. q Memory is bytes, words, endian, two s complement, ASCII, conver1ng between hex- binary- decimal, limits of machine, overflow, cast conversions 21