Integers and Floating Point - PDF Free Download

CMPE12 More about Numbers Integers and Floating Point (Rest of Textbook Chapter 2 plus more)"

Review: Unsigned Integer A string of 0s and 1s that represent a positive integer." String is X n-1, X n-2, X 1, X 0, where X k is either a 0 or a 1 and has a weight of 2 k." The represented number is the sum of all the weights for each 1 in the string." 14-2"

Signed Integers" Allow us to represent positive and negative integers." 4 important types:" "Sign and Magnitude -- Leftmost bit is the sign and the remaining bits are the unsigned magnitude." "1ʼs complement -- The additive inverse of a number is the bit-wise complement of the number." "2ʼs complement -- The additive inverse of a number is the bit-wise complement plus one to the number." "Bias or excess notation -- a bias is subtracted from the unsigned value to get the bias value." 14-3"

Quick Review of Signed Integers" Decimal 1 s complement 2 s complement Sign-and- Magnitude 22 0010110 0010110 0010110-22 1101001 1101010 1010110 14-4"

Biased notation" How does it work? The signed integer is biased so that the bias value is represented by 000 000." Advantages" Preserves lexical order" Single zero" Most versatile " Disadvantages:" - Add and sub require one additional operation to adjust the bias" Rep Value 000-3 001-2 010-1 011 0 100 1 101 2 110 3 111 4 Bias 3 14-5"

Conversion: Decimal/BiasX" Rep D Value Decimal -> BiasX" Add X, then Convert to Binary" BiasX -> Decimal" Convert Binary to Decimal, then Subtract X" 000 0-3 001 1-2 010 2-1 011 3 0 100 4 1 101 5 2 110 6 3 111 7 4 14-6"

Addition of Bias representations?" unsigned" x! +y! z! (x+b)! +(y+b)! (z+b)! What you want" What you get" (x+ b)! +(y+ b)! (z+2b)! z+2b! - b! z+ b! How you convert" So you must subtract out the additional Bias when you are finished!" 14-7"

unsigned" x -y z (x+b) -(y+b) (z+b) What you want Subtraction?" What you get" (x+b) -(y+b) (z-0) z+0 + b z+b How you convert" So you must add back the Bias when you are finished!" 14-8"

Biased notation mapping" Number" represented" -3! -2! -1! 0! 1! 2! 3! 4! Number" 000! 001! 010! 011! 100! 101! 110! 111! encoded" Range on n bits:" -(2 n-1-1) to 2 n-1" if Bias is 011 11 2 " 14-9"

32-bit word" 32 bit word can represent ~ 4.3 billion values" Integers: 0 -> ~4.3 billion" Signed Integers: ~ -2.15B -> 2.15B" Fractional numbers?" Very large numbers?" Numbers with very small magnitude?" 14-10"

Scientific Notation" Example: 6.023*10 23" Of form A.xxx *(BASE) exponent" In Binary: 1.xxx * 2 exponent" Or maybe Y 16.xx 16 * 16 exponent" Standard: IEEE standard for floating point arithmetic " 14-11"

IEEE standard for floating point" 1.xxx * 2 exponent in a 32-bit word " The 1. and the 2 can be assumed." xxx xx and exponent (and sign) is all that must be specified." 14-12"

Floating Point Numbers" 8 bits 23 bits" S Exponent Fraction (xx xx)" 1 means" negative" (In Bias 127)" How do we convert to Decimal?" If 00000000 < Exponent < 111111111 " N = (-1) S * 1.Fraction * 2 Exponent-127" 14-13"

Converting from Decimal to Float" 1. Convert to Binary (eg. -10010.01101)" 2. Normalize (form = 1.xxxxxx *2 EXP )" 3. Convert EXP to bias127 (add 127 to it)" 4. MSB [31] gets sign" 5. [23:30] gets EXP (bias127)" 6. [0:22] gets xxxxxxxxxxxxxxxxxxxxxxx" 14-14"

Convert to IEEE FP" 56.5" -5.625" -.0004 (do to 5 binary places)" 14-15"

Your hard work has not gone unnoticed!" From: The Chronicle of Higher Education " The average full-time undergraduate student studies about 15 hours a week but the duration varies by major, according to this year's National Survey of Student Engagement." Engineering majors spend the most time studying, 19 hours a week, but even among those who exceed 20 hours, nearly a quarter still often show up for class without assignments completed." 14-16"

Misconceptions about floats" Floats are not reals. Ex. 2/3" Floats are not decimals. 0.1 10 = 0.0011001100110011 2" Not all integers < 2 31 can be represented. 224 +1 = 1000000000000000000000001 2" 13-17"

More on IEEE 754 FP Standard" Distribution of floats on number line" Denormalized floats" Double precision floats" Arithmetic on floates" 14-18"

How FP numbers distributed" A 32 bit number can represent at most 2 32 values" IEEE 754 FP can represent numbers larger than 2 127 so many integers between 0 and 2 127 are not represented." High density close to 0." Low density far from 0 " 13-19"

Specifically: 2 23 values for each value of exponent (23 bits)" Between 1/2048 and 1/1024 there are 2 23 floats." Between 1 and 2 there are 2 23 floats." Between 2 30 and 2 31 there are 2 23 floats." Between 2 x to 2 x+1 for -127 < x < 128 there are 2 23 floats." 13-20"

Number Line" 2 23 2 23 2 23 2 23 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 23 2 40 2 41 14-21"

Denormalized Floating Point Numbers" 8 bits 23 bits" S 00000000 Fraction (xx xx)" 1 means" negative" (Shows as Denormalized)" How do we convert to Decimal?" N = (-1) S * 0.Fraction * 2-126" 1.00000000000000000000000 * 2-126 is smallest Normalized number; 0.11111111111111111111111 * 2-126 is largest Denormalized number. " 16-22"

What if Exponent is 11111111?" If FRAC is 0, the 32 bits represent + or infinity." If FRAC is nonzero, the 32 bits represent NaN (Not a Number)" Ex: 0/0" 15-23"

Infinity: EXP = 11111111; FRAC=0" Infinity avoids exception on overflow. (overflow definition: result exceeds value that can be represented)" Examples of operations that return infinity: 1/0, -1/0, 3 inf, sqrt(+inf) 13-24"

Double Precision IEEE 754 Floating Point Numbers" 11 bits 52 bits" S Exponent Fraction (xx xx)" 1 means" (In Bias 1023)" negative" To convert to Decimal" If 00000000000 < Exponent < 111111111111 " N = (-1) S * 1.Fraction * 2 Exponent-1023" 16-25"

Double precision floating-point" 11 bits 52 bits" S Exponent Fraction (xx xx)" 1 means" negative" (In Bias 1023)" -(2-1024 - 1 ) <= exp <= 2 1024 " 2 1024 is about 2*10 308" 15-26"

Double Precision Float" 52 significant figures base 2 is approximately 16 significant figures in base 10." 14-27"

Single vs. Double FP" Range:" SP: ~2-126 to 2 128. approximately: 10-38 to 10 38" DP approximately: 2*10-308 to 2*10 308" Significant figures: "" SP: 23 significant bits, 2 23 = 8,388,608" almost 9 significant decimal digits" DP: 52 significant bits, 2 52 = 4*2 20 *2 30 " > 15 significant decimal digits " 13-28"

What is this single-precision floating-point number?" 0 01111010 000000..000 A. 2-5 B. 0 C. 0.0000000 D. 1 * 2exp(01111010 2 ) E. None of the above 15-29"

What is this floating-point number?" 000000000 01000000..000 A. 1.01 B. 1.01*2-127 C. 2-129 D. 2-128 E. None of the above 15-30"

Adding two scientific notation numbers" 5.345*10 23 + 1.236*10 25" 1. Make their exponent the same (0.05345*10 25 + 1.236*10 25 )" 2. Add the non-exponents (1.28945*10 25 )" 3. Normalize (already done)" 15-31"

Adding two floats" 0 11111100 01100000..000 0 11111000 110100000...000 1. 1.011*2 11111100 + 1.1101*2 11111000" 2. Make their exponent the same (1.011*2 11111100 +.00011101*2 11111100 )" 3. Add nonexponents (1.01111101 *2 11111100 )" 4. Normalize (already done)" 0 11111100 0111110100..000 15-32"

Multiplying two scientific notation numbers" 1. 5.3*10 23 * 8.1*10 25" 2. Multiply the non-exponents and add the exponents (42.93*10 48 )" 3. Normalize (4.293*10 49 )" 15-33"

Multiplying two floats" 0 10000011 0100000..000 0 10000001 110000000...000 1. 1.01*2 4 * 1.11*2 2" 2. Multiply the non-exponents and add the exponents (10.0011*2 6 )" 3. Normalize (1.00011*2 7 )" 0 10000110 0001100000..000 15-34"

Add these two floats" 0 11111100 01110000..000 0 11111110 100100000.....000 1. Write each in normalized form " 2. Make their exponent the same " 3. Add nonexponents" 4. Normalize" 15-35"

Multiplying these two floats" 0 10000100 0100000..000 0 01111000 100000000...000 1. Write normal form of numbers " 2. Multiply the non-exponents and add the exponents" 3. Normalize" 15-36"

How is FP arithmetic done?" Software: very, very slow." Hardware floating-point: expensive, but usually worth it." Two measures of performance:" "1. MIPS: millions of instructions executed per second." "2. MFLOP: millions of floating point operations per second." 15-37"