Data Representation in Computer Memory

Data Representation in Computer Memory Digital computer stores the data in the form of binary bit sequences. Binary number system has two symbols: 0 and 1, called bits. Number of the bits are fixed and limited. These sequences of the binary bits are coded to represent a number or a character, or other type of the data. A n-bit storage location can represent up to 2 n distinct values. For example, a 4-bit memory location can hold one of these sixteen binary patterns: 0000 1000 0001 1001 0010 1010 0011 1011 0100 1100 0101 1101 0110 1110 0111 1111 Hence, it can represent at most 16 distinct values. You could use them to represent 8 different numbers such as 0 to 7.

Representation of the Integer Numbers in the memory Computers use a fixed number of bits to represent an integer. The commonly-used bitlengths for integers are 8-bit, 16-bit, 32-bit or 64-bit. Besides bit-lengths, there are two representation schemes for integers: Unsigned Integers: can represent zero and positive integers. Signed Integers: can represent zero, positive and negative integers. Three representation schemes had been proposed for signed integers: Sign-Magnitude representation 1's Complement representation 2's Complement representation

Numeric Types in Matlab double double precision numbers single single precision numbers int8 8-bit signed integers uint8 8-bit unsigned integers int16 16-bit signed integers uint16 16-bit unsigned integers int32 32-bit signed integers uint32 32-bit unsigned integers int64 64-bit signed integers uint64 64-bit unsigned integers By default, MATLAB stores all numeric values as double-precision floating point.

n-bit Unsigned Integers An n-bit pattern can represent 2 n distinct integers. An n-bit unsigned integer can represent integers from 0 to (2 n )-1, as tabulated below: Unsigned integers can represent zero and positive integers, but not negative integers.

Example 1: Suppose that n=8 and the binary pattern is (0100 0001) 2, the value of this unsigned integer is 1 2^0 + 1 2^6 = 65. Example 2: Suppose that n=16 and the binary pattern is (0001 0000 0000 1000) 2, the value of this unsigned integer is 1 2^3 + 1 2^12 = 4104. Example 3: Suppose that n=16 and the binary pattern is (0000 0000 0000 0000) 2, the value of this unsigned integer is 0.

Signed Integers Signed integers can represent zero, positive integers, as well as negative integers. Three representation schemes are available for signed integers: Sign-Magnitude representation 1's Complement representation 2's Complement representation The sign bit (last bit) is used to represent the sign of the integer - with 0 means positive integers and 1 means negative integers.

Range of n-bit 2's Complement Signed Integers An n-bit 2's complement signed integer can represent integers from -2^(n-1) to +2^(n-1)-1, as tabulated.

Computers use the 2's complement representation: The most significant bit (the most left bit) is the sign bit, with value of 0 representing positive integers and 1 representing negative integers. The remaining n-1 bits represents the value of the integer, as follows: for positive integers, the absolute value of the integer is equal to "the magnitude of the (n-1)-bit binary pattern". for negative integers, the absolute value of the integer is equal to "the magnitude of the complement of the (n-1)-bit binary pattern plus one" (hence called 2's complement). 8 bits signed integer representation

Using 8-bit signed integer to represent integers in the computer memory d 7 =S d 6 d 5 d 4 d 3 d 2 d 1 d 0 The first bit contains the sign information s = 0 for + and s = 1 for -, the 6 next bits contain the digits of the number. Using 16-bit signed integer to represent integers in the computer memory d 15 =S d 14 d 2 d 1 d 0 The first bit contains the sign information s = 0 for + and s = 1 for -, the 15 next bits contain the digits of the number. Using 32-bit signed integer to represent integers in the computer memory The first bit contains the sign information s = 0 for + and s = 1 for -, the 31 next bits contain the digits of the number.

Example 1: Suppose that n=8 and the binary representation (0 100 0001) 2. Sign bit is 0 positive Absolute value is (100 0001) 2 = 65 Hence, the integer is +65 Example 2: Suppose that n=8 and the binary representation (1 000 0001) 2. Sign bit is 1 negative Absolute value is the complement of (000 0001) 2 plus 1, i.e., (111 1110) 2 + (1) 2 = 127 Hence, the integer is -127. Example 3: Suppose that n=8 and the binary representation (0 000 0000) 2. Sign bit is 0 positive Absolute value is (000 0000) 2 = 0 Hence, the integer is 0 Example 4: Suppose that n=8 and the binary representation (1 111 1111) 2. Sign bit is 1 negative Absolute value is the complement of (111 1111) 2 plus 1, i.e., (000 0000) 2 + 1) 2 = 1 Hence, the integer is -1

Computers use 2's complement in representing signed integers. This is because: There is only one representation for the number zero in 2's complement, instead of two representations in sign-magnitude and 1's complement. Positive and negative integers can be treated together in addition and subtraction. Subtraction can be carried out using the "addition logic". Example 1: Addition of Two Positive Integers: Suppose that n=8, 65 + 5 = 70 65 (0100 0001) 2 5 (0000 0101) 2 (+ 0100 0110) 2 70 (OK) Example 2: Subtraction is treated as Addition of a Positive and a Negative Integers: Suppose that n=8, 65-5 = 65 + (-5) = 60 65D (0100 0001) 2-5D (1111 1011) 2 if 5 (0000 0101) 2, -5 is (1 111 1010 +1) (+ 0011 1100) 2 60 (discard carry - OK) Example 3: Addition of Two Negative Integers: Suppose that n=8, -65-5 = (-65) + (-5) = -70-65 (1011 1111) 2-5 (1111 1011) 2 (+ 1011 1010) 2-70 (discard carry - OK)

for n=8, the range of 2's complement signed integers is -128 to +127. During addition (and subtraction), it is important to check whether the result exceeds this range, in other words, whether overflow or underflow has occurred. Example 4: Overflow: Suppose that n=8, 127D + 2D = 129D (overflow - beyond the range) 127D 0 111 1111B 2D 0 000 0010B (+ 1000 0001B 129D should be obtained but It is truncated the upper limit 127 (wrong) Example 5: Underflow: Suppose that n=8, -125D - 5D = -130D (underflow - below the range) -130 D should be obtained but It is truncated the lower limit -128 (wrong)

x=int8(127); y=int8(2); z=int8(0); z=x+y x=int8(-125); y=int8(-5); z=int8(0); z=x+y z = 127 Not correct z = -128 Not correct

Floating-Point Numbers Type Storage size Value range Precision float 4 byte 1.2E-38 to 3.4E+38 6 decimal places double 8 byte 2.3E-308 to 1.7E+308 15 decimal places Single-Precision Floating Point Bits Usage 31 Sign (0 = positive, 1 = negative) 30 to 23 Exponent, biased by 127 22 to 0 Fraction f of the number 1.f Double-Precision Floating Point Bits Usage 63 Sign (0 = positive, 1 = negative) 62 to 52 Exponent, biased by 1023 51 to 0 Fraction f of the number 1.f

Single-precision floating-point format Single precision is termed REAL in Fortran, float in C, C++, C#, Java, Float in Haskell, and as Single in Object Pascal (Delphi), Visual Basic, and MATLAB. The real value assumed by a given 32 bit binary 32 data with a given biased exponent e (the 8 bit unsigned integer). or The true significand includes 23 fraction bits to the right of the binary point and an implicit leading bit (to the left of the binary point) with value 1 unless the exponent is stored with all zeros. Thus only 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits (equivalent to log10(2 24 ) 7.225 decimal digits). In Matlab: log10(2^24)

b 21 0 1 1 1 1 1 0 0 = 124 b 21 1 + b 23 2 2 2 thus:

b 22 b 21 Sign ( 1) sign = 1 1 =-1 Exponent 1 0 0 0 0 1 1 1 = 135 2 (e 127) = 2 (135 127) =2 8 Mantissa = 1 + b 23 1 2 1 + b 23 2 2 2 =1+1x2 1 + 1x2 2 = 1 + 1 2 + 1 4 = 1.75 value = 1 x1.75x2 8 = 448

format long x=single(3); y=single(0.01); v=single(0); z=single(0); v=x+y z=x-y v = 3.0100000 z = 2.9900000 format long x=single(3); y=single(0.000001); v=single(0); z=single(0); v=x+y z=x-y v = 3.0000010 z = 2.9999990 format long x=single(3); y=single(0.0000001); v=single(0); z=single(0); v=x+y z=x-y v = 3 z = 3 Not correct Single precision type can represent the number up to the log10(2 24 ) 7.225 significant digits.

double-precision floating-point format The real value assumed by a given 64-bit double-precision datum with a given biased exponent e and a 52-bit fraction is or This gives 15 17 significant decimal digits precision. The format is written with the significand having an implicit integer bit of value 1. With the 52 bits of the fraction significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) 15.955).