Floating-Point Arithmetic

Size: px
Start display at page:

Download "Floating-Point Arithmetic"

Transcription

1 ENEE446---Lectures-4/10-15/08 A. Yavuz Oruç Professor, UMD, College Park Copyright 2007 A. Yavuz Oruç. All rights reserved. Floating-Point Arithmetic Integer or fixed-point arithmetic provides a complete representation over a domain of integers or fixed-point numbers, but it is inadequate for representing extreme domains of real numbers. Example: With 4 bits we can represent the following sets of numbers and many more: {0,1/16,2/16,,15/16}--All fractions (Not all fractions numbers are all fractions) {0,1/8,2/8,,7/8,1,1+1/8, 1+2/8, 1+3/8,,1+7/8} {0,1/4,2/4,3/4,1,1+1/4, 1+2/4, 1+3/4,2, 2+1/4, 2+2/4, 2+3/4,,2+7/8, 3, 3+1/4, 3+2/4, 3+3/4} {0,1/2,1,1+1/2, 2,2+1/2,3, 3+1/2,4, 4+1/2, 5,5+1/2, 6,6+1/2, 7,7+1/2} {0,1,2,,15}--All integers (Not all integers numbers are all integers) So, we can represent numbers in any range but we are always limited to 2 n numbers.

2 With a floating-point number system, we can represent very large numbers and very small numbers together! We use the scientific notation: u = ±m u b x u m u is a p-digit number, called the mantissa or significand x u is a k-digit number, called the exponent b > 2 is called the base.

3 The mantissa provides the precision or resolution of a floating-point number system whereas the exponent gives its range. Example: With p = 10, k = 20 and b = 10, and assuming that mantissas are sign-magnitude decimal fractions and exponents are decimal integers, we can represent numbers in the interval In this representation: [-( )*10 20, ( )*10 20 ] The least and most positive numbers are and ( )*10 20 The least and most negative numbers are and -( )*10 20

4 In nearly all modern processors, m u is a binary fraction x u is a binary exponent and base b = 2 Very often, m u is normalized so that it is between 1 and 2 (excluding 2). If mantissas are expressed in sign-magnitude notation, this means that they always begin with a 1 followed by the binary point as in or , etc. In some representations, 1 that is on the left of the binary point is removed from the notation and is called a hidden bit. (Hidden bit is always 1 for sign-magnitude mantissas.

5 Machine Representation of Floating-Point Numbers sign k-bit biased exponent p-bit mantissa with a hidden bit S X M 1 Hidden bit The true exponent, x, is found by subtracting a fixed number from the biased exponent, X. This fixed number is called the bias. For a k-bit exponent, the bias is 2 k-1-1, and the true exponent, x and X are related by x = X - (2 k-1-1)

6 Example: k = 3, x = X - ( ) = X - 3 X x algebraic value

7 Example: With p=2, k=2, and 1-bit sign, we have 32 floating-point numbers with biased exponents as shown in the table below. S = 0 exponent = -1 (denormalized) = = 1/ = 1/ = 3/ = = 5/ = 3/ = 7/ = = 5/ = = 7/ = = = = 7 S = 1 exponent = -1 (denormalized) = = -1/ = -1/ = -3/ = = -5/ = -3/ = -7/ = = -5/ = = -7/ = = = = -7

8 With a k-bit biased exponent and p-bit mantissa the most positive and negative representable numbers are ±2 2k 1 (1 1 2 p ) without a hidden bit, ±2 2k 1 (2 1 2 p ) with a hidden bit, Typical allocation of bits between the mantissa and exponent parts (Last two rows are IEEE754 standard formats for single and double precision floating-point arithmetic. Representation size Sign Exponent Mantissa

9 Precision of A floating-point representation In the IEEE-754 single precision floating-point representation, the mantissa is 23 bits long. This means that any two numbers in this representation cannot be closer than 1/2 23 = In double precision, this difference reduces to 1/2 52 = Given that 2 is a factor of 10, both binary fractions have an exact representation in decimal.

10 This can be seen if we write 1 2 p = 5 p 10 p Hence, we can compute 5 p as a (p Log 10 5)-digit number in decimal, and then divide it by 10 p by shifting the radix-point to right p places, where p is 23 or 52. Indeed, the number of digits in each of the representations is equal to 23 Log 10 5 =17 and 52 Log 10 5 =37.

11 The least and most positive and negative representations in IEEE-754 single precision floating-point format (with the hidden bit of 1) are Most positive Most negative (2- ) = +( ) (2- ) = -(1- ) Least positive (denormalized) Least negative (denormalized) = = The exponent is reserved to represent extreme numbers such as, 0/0, /, etc.

12 IEEE754 Normalized and De-Normalized Numbers Denormalized -1 +1/2 p -1/2 p-1-1/2 p /2 p 1/2 p-1 1 1/2 p Normalized -2 +1/2 p /2 p Normalized + Denormalized -2 +1/2 p /2 p -1/2 p-1-1/2 p /2 p 1/2 p-1 1 1/2 p 1 2 1/2 p

13 Extreme Numbers in Floating-Number Systems In floating-point computations, besides the problem of precision, two other kinds of errors come from the results being either too large (overflow) or too small (underflow). Any result that is greater than the largest representable number is converted to. Any result that is less than 1/2 p is truncated to 0 +. Likewise, any result that is less than the largest representable negative number is converted to -. Any negative result that is greater than the least negative number is converted to 0-.

14 In mathematics, is used to represent a number that is greater than all real numbers. It is the limit point of real numbers as they get arbitrarily large, and used to represent an arbitrarily large value rather than a specific value as a finite real number would represent. For example, u 2 and u 3 will both tend to as u becomes arbitrarily large or tends to even though u 3 > u 2 for all u > 1. In real arithmetic, we also encounter numbers and/or computations such as, 0/0, -, 0, and /. Ratios such as 0/0 or / arise in the limit of computations such as (u-1)/(u 3-1) as u tends to 1 or. We can also have 0 when we try to compute u (1/u) as u tends to 0.

15 NaNs, QNaNs and SNaNs Floating-point number systems set aside certain binary patterns to represent and other undefined expressions and values that involve. In IEEE-754 floating-point number system, the exponent is reserved to represent undefined values such as, 0/0, -, 0, and /. The last four cases are referred to as Not-a-Number (NaN) and represent the outcomes of undefined real number operations. These special values are represented by setting X to 2 k -1, or equivalently x to 2 k-1.

16 The mantissa of the representation is used to distinguish between and NaNs. If M = 0 and X = 2 k -1, then the representation denotes. If M 0 but X = 2 k -1, then the representation is for NaN. In all of these special representations, the sign bit is used to distinguish between positive and negative versions of these numbers, i.e., +0, -0, +, -, +NaN, -NaN. The NaNs are further refined into quiet NANs (QNaNs) and signaling NaNs (SNaNs). The QNaNs are designated by setting the most significant bit of the mantissa, and the SNaNs are specified by clearing the same bit. The QNaNs can be viewed as NaNs that can be tolerated during the course of a floating-point computation whereas SNaNs will force the processor to signal an invalid operation as in the case of division of 0 by 0.

17 Example: The numbers in the first row below represent + and -, respectively, and those in the second row represent NaNs in a 16-bit floating-point number system: = = = +SNaN = -QNaN

18 Approximation of Real Numbers by Floating-Point Numbers As p gets large, the distance between consecutive mantissas gets smaller, and tends to 0 as p tends to. However, regardless of how large p becomes, not all decimal fractions can be represented in a binary mantissa format. For example, any decimal fraction which includes 2 -s in its binary expansion, where s > p, cannot be represented in p bits, but this not is the end of the story, a whole bunch of other numbers cannot be represented either even if they are greater than 2 -p.

19 In fact, for any decimal fraction, d, to have an exact binary mantissa representation in p bits, 2 p d must be an integer since if and only if d = m p 1 + m p m p 2 p d = 2 p 1 m p p 2 m p m 0 which implies that the left hand side of the equation must be an integer for the equation to hold since the right hand side is an integer.

20 Now, suppose that d is an r-digit decimal fraction and it has an exact representation in p bits. It is easy to show that r < p, and by the argument above, must be an integer. 2 p d 10 r = 2p r d 10 r 10 r 5 r This implies that 5 r must evenly divide d 10 r or d 2 r must be an integer since 5 is relatively prime to 2, and 5 r cannot divide 2 p-r. Conversely, it can be shown that if r < p, d < 1, 5 r evenly divides d 10 r or equivalently d 2 r is an integer then d must have an exact representation in p bits.

21 For example, can be represented exactly in p = 3 bits since * 8 = 1 is an integer and r = 3 < p. By the same token, all multiples of that can be written in three or fewer digits can be represented exactly by a 3-bit mantissa. These are 0.125, 0.25, 0.375, 0.5, 0.625, 0.750, and No other decimal fraction can be represented by a 3-bit mantissa. Likewise, when p = 4, only the integral multiples of the decimal fraction can be represented by 4-bit mantissas since 5 4 only divides = 625 and its integral multiples evenly. Clearly, there are exactly 15 such proper fractions, i.e., excluding 0, when p = 4.

22 In general, it is easy to verify that only the 2 p -1 multiples of the fraction 1/2 p can be represented in p bits, excluding 0 as shown below. These are all the fractions that can be represented in p bits. (2 p -1)/2 p (2 p -2)/2 p 3/2 p 2/2 p 1/2 p

23 For each fraction m with an exact representation in p bits, there is an infinite set of numbers in the open interval (m,m-1/2 p ). None of these has an exact representation in p bits. Each such number must therefore be approximated one way or another, and the most natural choices are the boundary points of the interval.

24 This is because one of these boundary points would be closer to the number that is being approximated than all the others in the representation. If, for any number m u in this interval as shown in the figure below, m u (m 1/ 2 p ) < m m u or m 1/ 2 p < m u < m 1/ 2 p+1 then m u is closer to m-1/2 p than it is to m. Therefore, it should be approximated by m-1/2 p. On the other hand, if then m u is closer to m, and it should be approximated by m. Finally, if then it can be approximated by either of the end point numbers. m 1 2 p m u m m m 1 2 p +1 m u = m 1/ 2 p+1

25 Example 2.1.Let p = 8, m = 12/16 = In binary, m has an exact representation and is given by Now, consider the numbers in the interval ( 12/16 1/ 256,12/16) None of these numbers have an exact representation if we use an 8-bit mantissa. One such number is which is clearly greater than approximated by 12/16. 12/16 1/ /1024 = 12/16 1/ /16 1/512. Therefore, it should be

26 The process of approximating a floating-point number is often carried out by rounding or truncating it. In both cases, digits outside the available number of digits are removed from the representation. However, when a (p+r)-bit mantissa is rounded to a p-bit mantissa, we add 1/2 p to it if the (p+1) st bit is 1 and drop the last r bits if the same bit is 0. When it is truncated, we simply drop the rightmost r bits.

27 In the above example, the 10-bit fraction that represents m u = ( ) 2 12/16 1/ /1024, is approximated by an 8-bit fraction, ( ) 2. The latter number represents 12/16. This amounts to rounding rather than truncation as the latter fraction is obtained by adding ( ) 2 to ( ) 2 in order to represent m u in 8 bits.

28 Approximating x by truncation would result in ( ) 2 with the last two bits removed without altering the rest of digits. This would give 12/16 1/256 that is clearly not the closest 8-bit fraction to x in this case. On the other hand, if m u =(12/16 1/256, 1/512), i.e., m u = ( ) 2 then it is exactly in the middle of the interval (12/16 1/256, 12/16). Rounding will approximate it to 12/16 and truncating will carry it to 12/16 1/256. In this case, both approximations are equally far apart from m u.

29 In general, rounding a real number always leads to the closest representable floating-point number except when the number is at an equal distance between one of the endpoints and the middle point of an interval into which it falls. In the latter case, truncation is as precise as rounding. In the truncation of decimal numbers, this happens when the digit to be rounded is 5, and by convention, it is rounded up to the next digit as, for example, 49.5 would be rounded to 50 rather than 49. Truncating it would give 49 which is as far apart from 49.5 as 50. Rounding or truncating a number introduces computational errors into an operation. These errors are usually unavoidable, and can have significant undesirable effects in the result of the computation.

30 Example: Consider, for example, the machine numbers and = (1.1) = 96, = ( ) = These representations are ``adjacent'', i.e., we cannot represent any other numbers between 96 and if we use an 11-bit mantissa. Now suppose we want to add 1000 fractions to 96 all of which are less than , say they are around If we perform the addition so that each fraction is added to 96 one after another, the result of the first addition will be about 96.02, but it will be truncated back to 96, assuming that we are using an 11-bit mantissa. Similarly, adding the second, the third, and adding all subsequent fractions will have no effect, so the result of the computation will be 96 whereas the correct result should have been = 116. Therefore, care should be taken when adding fractions or small numbers to large numbers. In this example, a result which is much closer to 116 can be obtained by first summing the thousand fractions and then adding this sum to 96.

31 2 s Complement Floating-point Number System Most processors use a sign-magnitude representation to represent mantissas in floating-point numbers. Instead, one can also use 1's or 2's complement notation as in fixed-point numbers to represent signed mantissas. This makes the subtraction of mantissas easier to handle. Determining the value of a floating-point number with a 2's complement mantissa is only slightly more complex. In fact, if the sign bit of the mantissa is 0 then the value of the number is the same as if its mantissa is expressed in sign-magnitude notation. When the leading bit is 1 then the number is negative, and its value is determined by complementing its bits and adding 1/2 p to it, where p is the number of bits in the mantissa part of the number.

32 Example: Consider the floating-point number in 2's complement notation. Its value is determined by complementing the bits and adding to it. -( ) 2 = -( ) 2 = -( ) 10.

33

34 Floating-Point Addition and Subtraction When adding or subtracting two floating-point numbers, we must first align their exponents. This is done by shifting the mantissa of the floating-point number with the smaller exponent to right while increasing its exponent and until its exponent is equal to the exponent of the other floating-point number. After the exponents are aligned, the operation (either addition or subtraction) is performed on the two mantissas, and the larger exponent is made the exponent of the result. The final step is to shift the mantissa and increase or decrease the exponent so that the mantissa is in normalized form.

35 Example Let u = 5.0 and v = 1.25 be represented as 16-bit floating-point numbers with a 4-bit biased-exponent, and 11-bit sign magnitude with a hidden bit. Let M u, M v, M r represent the mantissas of u, v, and u-v, and let E u, E v, E r represent the biased exponents of u, v, and u-v. The difference u-v is computed as follows:

36 Design of A Floating-Point Adder/Subtractor alignment and shift logic 1 2 add/sub M u M v S u S v Bus & function select logic a-bus (p+1)-bit complementer b-bus E u E v operation 1 1 (p+1)-bit adder 3 k-bit adder S r M r Exponent correction control logic 4 Normalization logic clock

37 Algorithm 2.1 (sign-magnitude floating-point addition) { //Add the hidden bits to M u and M v if they are not denormalized if(e u!= 0) M u = 1 + M u ; if(e v!= 0) M v = 1 + M v ; //Align if (E u > E v ) {M v = M v 2 Ev-Eu ; E r = E u ;} else if (E u < E v ) {M u = M u 2 Eu-Ev ; E r = E v ;} else E r = E u ; //Add if operation = 0 subtract if operation = 1 switch(s u ) {case 0: switch(s v ) {case 0: switch(operation) { case 0: M r = M u + M v ; break; case 1: M r = M u + ~ M v + 1; break;} break; case 1: switch(operation) { case 0: M r = M u + ~ M v + 1; break; case 1: M r = M u + M v ; break; } break;} break; case 1: switch(s v ) {case 0: switch(operation) { case 0: M r = ~ M u M v ; break; case 1: M r = ~ M u ~ M v + 1; break; } break; case 1: switch(operation) { case 0: M r = ~ M u ~ M v + 1; break; case 1: M r = ~ M u M v ; break; } break;} break; } //Normalize if (M r >= 4) {M r = M r /2; E r = E r +1; e r = e r +1; } else while (M r < 1/2) {M r = M r 2; E r = E r - 1; e r = e r - 1;} if (E r < 256) F = 0; else {F = 1; E r = 255; e r = 128; M r = ;} //Set the sign bit and magnitude If (M r > 0 ) S r = 0; else {S r = 1; M r = ~ M r + 1;} }

38 Floating-Point Multiplication and Division When multiplying or dividing two floating-point numbers, the exponents and mantissas are again treated separately in these operations. Unlike as in floating-point addition/subtraction, it is not necessary to align the exponents to multiply or divide two floating-point numbers. Simply, the mantissas are multiplied (or divided), and the exponents are added (or subtracted). If the sign bits of the two numbers are the same, then the resulting sign bit is 0, otherwise it is set to 1. Finally, the resulting number is normalized by shifting the mantissa and increasing the exponent of the result, if needed.

39 In the case of multiplication, the biased exponent of the product must be corrected since adding two biased exponents introduces an extra bias. That is, when two floating-point numbers, u and v are multiplied, adding their exponents E u = e u +2 k-1-1 and E v = e v +2 k-1-1 results in e u +2 k e v +2 k-1-1 = e u + e v + 2 (2 k-1-1) This bias must be corrected by subtracting 2 k-1-1 from it. In contrast, when two floating-point numbers, u and v are divided, subtracting their exponents E u = e u +2 k-1-1 and E v = e v +2 k-1-1 results in e u + 2 k e v - 2 k = e u - e v

40 Therefore, in this case, we need to add 2 k-1-1 to correct the bias. These extra steps can be carried out concurrently while mantissas are being multiplied or divided since the exponent of the result is not needed in the computation of the mantissas. All these ideas have been formalized in the algorithm below:

41 Algorithm (floating-point multiplication/division) {//u is a sign-magnitude, biased exponent floating-point number. //v is a sign-magnitude, biased exponent floating-point number. //operation is a binary variable which specifies whether a multiplication or division is to be performed. //Add the hidden bit to M u and M v M u = 1 + M u ; M v = 1 + M v ; //Multiply if operation = 0 divide if operation = 1 switch(operation) {case 0: M r = M u M v ; E r = E u + E v - 2 k-1 +1; break; case 1: M r = M u / M v ; E r = E u + E v + 2 k-1-1; break;} //Set the sign bit if (S u = S v ) S r = 0; else S r = 1; //Normalize if (M r >= 2) {M r = M r /2; E r = E r +1; e r = e r +1; } else while (M r < 1/2) {M r = M r 2; E r = E r - 1; e r = e r - 1;} if (E r < 256) F = 0; else {F = 1; E r = 255; e r = 128; M r = ;} }

42 The multiplication and division steps in this algorithm are left unspecified and can be carried out using any of the multiplication and division algorithms we described for integer operands. In the case of multiplication, the product of two p-bit mantissas is given by the expression: 1+ M 1 1+ M 2 = (2p + M 1 ) (2 p + M 2 ) 2 p 2 p 2 2p So, effectively, we are multiplying two (p+1)-bit integers to obtain a 2(p+1)-bit product, and then divide the product by 2 2p. The division by 2 2p amounts to shifting the binary point from the right hand side of the right most bit in the product to the right of the 2 nd left most bit.

43 Furthermore, only the highest p+1 bits of the 2(p+1)-bit product are retained since the precision of the representation is limited to p+1 bits. This makes it redundant to compute the lower p+1 bits of the product which comes from the multiplication of the lower (p+1)/2 bits of the two mantissas.

44 Example 2.1. Let u = -6.5 and v = 3.5 be represented as 16-bit floating-point numbers with a 4-bit biased-exponent, and 11-bit sign magnitude with a hidden bit. The product u v is computed as follows: Step 1: Express u and v as floating-point numbers. u = , v = Step 2: Compute the exponent E r = E u + E v E r = = 1010 Step 3: Compute the mantissa M r = (1+M u ) (1+M v ). ( ) ( ) = Step 4: Normalize M r by shifting it right once M r = Step 5: Adjust the exponent by incrementing it by 1. E r = 1011 Step 6: Combine the sign, S r, E r, M r. u v =

45 ( ) ( ) ( * ) ( * * ) (this is ignored, not just because it is 0 but also, it is outside the 12-bit representation) = (52 56) /1024 = 2912/1024 = ( )/1024 = 2912/1024 =

46 Division works similarly with the following formula: 1+ M 1 1+ M 2 2 p 2 p = 2p + M 1 2 p + M 2 Again, it is seen that the division of mantissas is reduced to the division of two (p+1)-bit integers. Any of the division algorithms can be used to carry out this division. The (p+1)-bit quotient obtained in the division becomes the mantissa of the result, and since the division involves two numbers which are between 2 p and 2 p+1, any mantissa which results from a division of two normalized numbers will always be between 1/2 and 2.

47 Let u = -6.5 and v = 3.5 be represented as 16-bit floating-point numbers with a 4-bit biased-exponent, and 11-bit sign magnitude with a hidden bit. The division u / v is computed as follows: Step 1: Express u and v as floating-point numbers. u = , v = Step 2: Compute the exponent E r = E u - E v E r = = 1000 Step 3: Compute the mantissa M r = (1+M u ) / (1+M v ) / = Step 4: Normalize M r by shifting it left once M r = Step 5: Adjust the exponent by decrementing it by 1. E r = 0111 Step 6: Combine the sign, S r, E r, M r. u v =

48 / = 11010/ = 1/2 (11010/ 1110) (dividend, u) 1110 (divisor, v) 1110 (shift and subtract) (shift and subtract) (shift and subtract) (shift twice and subtract) (shift and subtract) (shift twice and subtract) (shift and subtract) (shift twice and subtract) (u/v) 1010

49 We terminate the division at the end of 12 bits since the number of bits in the mantissa is limited to 12 bits. Moreover, since the remainder is not equal to 0 after the last shift and subtraction step, the ratio u/v does not have an exact representation in 12 bits. In fact, a closer examination of the process shows that the shift and subtract steps entered a repetitious pattern once the remainder of 0110 is obtained. Therefore, u/v cannot have an exact representation regardless of how many bits we use.

50 The division algorithm can be implemented in hardware using a k- bit 2 s complement adder/subtractor, a p-bit multiplier, and a p- bit divider. For multiplication, we can use the compact multiplier hardware that was described earlier or design an algorithm which generates only the most significant p bits of the product since the remaining p bits are discarded.

51 For division, we can use either restoring or non-restoring division, and discard the remainder. Unlike the implementation of the floating-point addition and subtraction operations, floating multiplication and division operations are generally implemented separately in hardware. This stems from the fact that division takes more clock cycles to execute than multiplication. As we will see in subsequent chapters, in processors in which several operations can be scheduled for execution in parallel, it is desirable to execute these operations on different hardware units to speed up computations.

52

53 Machine Arithmetic in Real Processors Motorola integer arithmetic instructions

54 PowerPC processors have four multiply and two divide instructions. Multiply instructions provide either the higher or lower half of a 64-bit product when two 32-bit numbers are multiplied. More specifically, in 32-bit mode, mulhw and mulhwu instructions multiply two 32-bit signed or unsigned operands and store the higher 32 bits of the product in a register. Similarly, mullw and mulli instructions retain the lower 32 bits of a 64-bit (or 48-bit) product that results from the multiplication of two 32- bit register operands or a 32-bit operand and a 16-bit signed number. A full 64-bit product can be obtained by a pair of multiply instructions. For example, mulhw and mullw can be used together to multiply two 32-bit signed numbers into a 64-bit signed product. It is also possible to obtain a 64-bit product using the 64-bit multiplication instructions with 32-bit operands. These same instructions can also be used together to obtain a signed 128-bit product of two 64-bit operands.

55 PowerPC's divide instructions divw, divd (signed division), divwu, divdu (unsigned division) divide a 32 or 64-bit dividend by a 32 or 64-bit divisor to produce a 32-bit quotient without a remainder. Even though the remainder is not computed by the execution of these division instructions, it can be obtained by subtracting the product of the quotient with the divisor from the dividend. In division instructions, division by 0 is not allowed, and sets the OV(overflow) flag when it is attempted. The OV flag is also set when the divw or divd instruction is used to divide or by 1 (Can you guess why?)

56 Instruction Operation Comments faddx,faddsx f d = f a + f b Floating-point operands in f a and f b are added and stored in f d. fsubx,fsubsx f d = f a - f b Floating-point operands in f a and f b are subtracted and stored in f d. fmulx,fmulsx f d = f a f b Floating-point operands in f a and f b are multiplied and stored in f d. fdivx,fdivsx f d = f a / f b Floating-point operand in f a is divided by that in f b and stored in f d. fmaddx,fmaddsx f d = f a f b +f c f a f b +f c is stored in f d. fnmaddx,fnmaddsx f d = -(f a f b +f c ) -(f a f b +f c ) is stored in f d. fmsubx,fmsubsx f a f b - f c f a f b - f c is stored in f d. fnmsubx,fnmsubsx f d = -(f a f b - f c ) -(f a f b - f c ) is stored in f d. fabs f d = f a Sign bit of f a is cleared and f a is stored in f d. fnabs f d = - f a Sign bit of f a is set and f a is stored in f d. fneg f d = -f a Sign bit of f a is inverted and f a is stored in f d. fres f d = 1/f a Estimate of the reciprocal of f a is stored in f d. fsqrtx,fsqrtsx f d = f a Square root of f a is stored in f d. frsqrtex f d = 1/ f a Estimate of the reciprocal of the square root of f a is stored in f d. Motorola floating-point arithmetic instructions

57 Instruction Operation Comments add r/m d = r/m d + r/m s or immediate operand Operands in r/m d and r/m s or immediate operand are added and stored in r/m d. adc r/m d = r/m d + r/m s or immediate operand + CF (carry) Same as add except that CF is included in the addition. sub r/m d = r/m d - r/m s or immediate operand Operands in r/m d and r/m s are subtracted and stored in r/m d. sbb r/m d = r/m d - r/m s or immediate operand Same as sub except that CF is included in the subtraction. dec r/m d = r/m d + 1 Decrement the operand in r/m d and store it in r/m d. inc r/m d = r/m d - 1 Increment the operand in r/m d and store it in r/m d. neg r/m d = -r/m d Negate the operand in r/m d and store it in r/m d. mul rdx:rax = rax r/m s An unsigned 128-bit product of 64-bit operands in rax and r/m s is stored in the register pair rdx:rax. imul rdx:rax = rax r/m s ; or r d = r d r/m s or immediate operand; or r d = r a r/m b, immediate operand; A signed 128-bit product of 64-bit operands in rax and r/m s is stored in the register pair rdx:rax. Higher order 64 bits of a signed product of operands in r d and r/m s or immediate operand is stored in r d. Higher order 64 bits of a signed product of operands in r a and r/m b or immediate operand is stored in r d. div rax = Quotient [rdx:rax / r/m s ] Unsigned 128-bit operand in the register pair rdx:rax is rdx = Remainder [rdx:rax / r/m s ] idiv rax = Quotient [rdx:rax / r/m s ] rdx = Remainder [rdx:rax / r/m s ] divided by the 64 operand in r/m d. Signed 128-bit operand in the register pair rdx:rax is divided by the 64 operand in r/m d. A subset of Intel 64 architecture integer arithmetic instructions

58 Like PowerPC processors, Intel 64 processors support both signed and unsigned multiplication and division. The multiplication instructions, imul and mul handle signed and unsigned multiplication with a variety of operand combinations including 16, 32 and 64-bit products for two 8, 16, 32 and 64-bit operands. Likewise, idiv and div instructions provide signed and unsigned division for 16, 32, 64 and 128-bit dividends with corresponding 8, 16, 32 and 64-bit divisors. Results of the multiplication and division instructions are stored in specialized pairs of 64-bit registers, rax and rdx except for some of the signed multiplication instructions.

59 Intel 64 architecture processors also perform decimal arithmetic using packed and unpacked decimal operands. A packed decimal operand contains 8 decimal digits in the 32-bit mode, and 16 decimal digits in the 64-bit mode. An unpacked decimal uses only lower four bits of a byte, and so in the 32-bit mode, an unpacked decimal operands contains only four decimal digits, and in the 64- bit mode, it contains eight decimal digits.

60 Intel 64 architecture processors do not have decimal add or subtract instructions. Instead, they have instructions to convert binary values to packed and unpacked decimals. When two BCD (binary-codeddecimal) digits u and v are added as 4-bit binary numbers, a correction is performed by adding 6 to the sum u + v when it exceeds 9. This is because x-10 (mod 16) = x + 6 (mod 16) since -10 and 6 are congruent mod 16. i.e., adding 6 is the same as subtracting 10 in modulo 16. For example, (7+8) 10 = ( ) BCD = (1 0101) BCD = (15) 10 (9+8) 10 = ( ) BCD = (1 0111) BCD = (17) 10

61 Similarly, when subtracting two decimal digits in binary, if the difference u - v > 10, it must be decreased by 6. For example, (7-5) 10 = ( ) BCD = (0 0010) BCD = (2) 10 (7-8) 10 = ( ) BCD = (0 1001) BCD = (-1) 10 (5-7) 10 = ( ) BCD = (0 1000) BCD = (-2) 10

62 Instruction Operation Comments fadd fpu stack register = fpu stack register + r/m s or immediate operand fsub fpu stack register = fpu stack register - r/m s or immediate operand fsubr fpu stack register = r/m s or immediate operand - fpu stack register fmul fpu stack register = fpu stack register r/m s or immediate operand fdiv fpu stack register = fpu stack register / r/m s or immediate operand fdivr fpu stack register = r/m s or immediate operand / fpu stack register fsin fpu stack register 0 = sine (fpu stack register 0 ) argument in radians fcos fpu stack register 0 = cosine (fpu stack register 0 ) argument in radians fsincos fpu stack register 0 = sine (fpu stack register 0 ) fpu stack register 1 = cosine (fpu stack register 0 ) argument in radians fptan fpu stack register 0 = tangent(fpu stack register 0 ) argument in radians fatan fpu stack register 0 = arctangent(fpu stack register 0 ) fsqrt fpu stack register 0 = squareroot(fpu stack register 0 ) A subset of Intel 64 architecture floating-point arithmetic instructions

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3 Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part

More information

Divide: Paper & Pencil

Divide: Paper & Pencil Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or

More information

Chapter 4. Operations on Data

Chapter 4. Operations on Data Chapter 4 Operations on Data 1 OBJECTIVES After reading this chapter, the reader should be able to: List the three categories of operations performed on data. Perform unary and binary logic operations

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers

More information

Chapter 2 Data Representations

Chapter 2 Data Representations Computer Engineering Chapter 2 Data Representations Hiroaki Kobayashi 4/21/2008 4/21/2008 1 Agenda in Chapter 2 Translation between binary numbers and decimal numbers Data Representations for Integers

More information

CO212 Lecture 10: Arithmetic & Logical Unit

CO212 Lecture 10: Arithmetic & Logical Unit CO212 Lecture 10: Arithmetic & Logical Unit Shobhanjana Kalita, Dept. of CSE, Tezpur University Slides courtesy: Computer Architecture and Organization, 9 th Ed, W. Stallings Integer Representation For

More information

Data Representations & Arithmetic Operations

Data Representations & Arithmetic Operations Data Representations & Arithmetic Operations Hiroaki Kobayashi 7/13/2011 7/13/2011 Computer Science 1 Agenda Translation between binary numbers and decimal numbers Data Representations for Integers Negative

More information

Adding Binary Integers. Part 5. Adding Base 10 Numbers. Adding 2's Complement. Adding Binary Example = 10. Arithmetic Logic Unit

Adding Binary Integers. Part 5. Adding Base 10 Numbers. Adding 2's Complement. Adding Binary Example = 10. Arithmetic Logic Unit Part 5 Adding Binary Integers Arithmetic Logic Unit = Adding Binary Integers Adding Base Numbers Computer's add binary numbers the same way that we do with decimal Columns are aligned, added, and "'s"

More information

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction 1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

More information

MIPS Integer ALU Requirements

MIPS Integer ALU Requirements MIPS Integer ALU Requirements Add, AddU, Sub, SubU, AddI, AddIU: 2 s complement adder/sub with overflow detection. And, Or, Andi, Ori, Xor, Xori, Nor: Logical AND, logical OR, XOR, nor. SLTI, SLTIU (set

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-A Floating-Point Arithmetic Israel Koren ECE666/Koren Part.4a.1 Preliminaries - Representation

More information

Number Systems. Both numbers are positive

Number Systems. Both numbers are positive Number Systems Range of Numbers and Overflow When arithmetic operation such as Addition, Subtraction, Multiplication and Division are performed on numbers the results generated may exceed the range of

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Arithmetic Unit 10122011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Fixed Point Arithmetic Addition/Subtraction

More information

unused unused unused unused unused unused

unused unused unused unused unused unused BCD numbers. In some applications, such as in the financial industry, the errors that can creep in due to converting numbers back and forth between decimal and binary is unacceptable. For these applications

More information

Chapter 10 - Computer Arithmetic

Chapter 10 - Computer Arithmetic Chapter 10 - Computer Arithmetic Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 10 - Computer Arithmetic 1 / 126 1 Motivation 2 Arithmetic and Logic Unit 3 Integer representation

More information

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j ENEE 350 c C. B. Silio, Jan., 2010 FLOATING POINT REPRESENTATIONS It is assumed that the student is familiar with the discussion in Appendix B of the text by A. Tanenbaum, Structured Computer Organization,

More information

Finite arithmetic and error analysis

Finite arithmetic and error analysis Finite arithmetic and error analysis Escuela de Ingeniería Informática de Oviedo (Dpto de Matemáticas-UniOvi) Numerical Computation Finite arithmetic and error analysis 1 / 45 Outline 1 Number representation:

More information

Chapter 3: Arithmetic for Computers

Chapter 3: Arithmetic for Computers Chapter 3: Arithmetic for Computers Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point Computer Architecture CS 35101-002 2 The Binary Numbering

More information

CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 37 2.2 Positional Numbering Systems 38 2.3 Decimal to Binary Conversions 38 2.3.1 Converting Unsigned Whole Numbers 39 2.3.2 Converting

More information

CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 37 2.2 Positional Numbering Systems 38 2.3 Decimal to Binary Conversions 38 2.3.1 Converting Unsigned Whole Numbers 39 2.3.2 Converting

More information

Numeric Encodings Prof. James L. Frankel Harvard University

Numeric Encodings Prof. James L. Frankel Harvard University Numeric Encodings Prof. James L. Frankel Harvard University Version of 10:19 PM 12-Sep-2017 Copyright 2017, 2016 James L. Frankel. All rights reserved. Representation of Positive & Negative Integral and

More information

Floating-point representations

Floating-point representations Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010

More information

Floating-point representations

Floating-point representations Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-C Floating-Point Arithmetic - III Israel Koren ECE666/Koren Part.4c.1 Floating-Point Adders

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 11: Floating Point & Floating Point Addition Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Single Precision Format

More information

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8 Computer Arithmetic Ch 8 ALU Integer Representation Integer Arithmetic Floating-Point Representation Floating-Point Arithmetic 1 Arithmetic Logical Unit (ALU) (2) (aritmeettis-looginen yksikkö) Does all

More information

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8 Computer Arithmetic Ch 8 ALU Integer Representation Integer Arithmetic Floating-Point Representation Floating-Point Arithmetic 1 Arithmetic Logical Unit (ALU) (2) Does all work in CPU (aritmeettis-looginen

More information

Chapter 5 : Computer Arithmetic

Chapter 5 : Computer Arithmetic Chapter 5 Computer Arithmetic Integer Representation: (Fixedpoint representation): An eight bit word can be represented the numbers from zero to 255 including = 1 = 1 11111111 = 255 In general if an nbit

More information

CHAPTER 5: Representing Numerical Data

CHAPTER 5: Representing Numerical Data CHAPTER 5: Representing Numerical Data The Architecture of Computer Hardware and Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint

More information

COMP2611: Computer Organization. Data Representation

COMP2611: Computer Organization. Data Representation COMP2611: Computer Organization Comp2611 Fall 2015 2 1. Binary numbers and 2 s Complement Numbers 3 Bits: are the basis for binary number representation in digital computers What you will learn here: How

More information

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. ! Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

More information

Thomas Polzer Institut für Technische Informatik

Thomas Polzer Institut für Technische Informatik Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers VO

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 8 Aritmatika Komputer 1 1 PT. Elektronika FT UNY Does the calculations Arithmetic & Logic Unit Everything else in the computer is there to service this unit Handles

More information

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive. IEEE 754 Standard - Overview Frozen Content Modified by on 13-Sep-2017 Before discussing the actual WB_FPU - Wishbone Floating Point Unit peripheral in detail, it is worth spending some time to look at

More information

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y Arithmetic A basic operation in all digital computers is the addition and subtraction of two numbers They are implemented, along with the basic logic functions such as AND,OR, NOT,EX- OR in the ALU subsystem

More information

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as: N Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as: a n a a a The value of this number is given by: = a n Ka a a a a a

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 4-B Floating-Point Arithmetic - II Spring 2017 Koren Part.4b.1 The IEEE Floating-Point Standard Four formats for floating-point

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

Internal Data Representation

Internal Data Representation Appendices This part consists of seven appendices, which provide a wealth of reference material. Appendix A primarily discusses the number systems and their internal representation. Appendix B gives information

More information

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B) Computer Arithmetic Data is manipulated by using the arithmetic instructions in digital computers. Data is manipulated to produce results necessary to give solution for the computation problems. The Addition,

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

Roundoff Errors and Computer Arithmetic

Roundoff Errors and Computer Arithmetic Jim Lambers Math 105A Summer Session I 2003-04 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Roundoff Errors and Computer Arithmetic In computing the solution to any mathematical problem,

More information

Instruction Set extensions to X86. Floating Point SIMD instructions

Instruction Set extensions to X86. Floating Point SIMD instructions Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-B Floating-Point Arithmetic - II Israel Koren ECE666/Koren Part.4b.1 The IEEE Floating-Point

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic CS 365 Floating-Point What can be represented in N bits? Unsigned 0 to 2 N 2s Complement -2 N-1 to 2 N-1-1 But, what about? very large numbers? 9,349,398,989,787,762,244,859,087,678

More information

Kinds Of Data CHAPTER 3 DATA REPRESENTATION. Numbers Are Different! Positional Number Systems. Text. Numbers. Other

Kinds Of Data CHAPTER 3 DATA REPRESENTATION. Numbers Are Different! Positional Number Systems. Text. Numbers. Other Kinds Of Data CHAPTER 3 DATA REPRESENTATION Numbers Integers Unsigned Signed Reals Fixed-Point Floating-Point Binary-Coded Decimal Text ASCII Characters Strings Other Graphics Images Video Audio Numbers

More information

Number Systems CHAPTER Positional Number Systems

Number Systems CHAPTER Positional Number Systems CHAPTER 2 Number Systems Inside computers, information is encoded as patterns of bits because it is easy to construct electronic circuits that exhibit the two alternative states, 0 and 1. The meaning of

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-09-07 1. We are still at 50. If you are still waiting and are not interested in knowing if a slot frees up, let me know. 2. There is a correction to HW 1, problem 4; the condition

More information

COMPUTER ORGANIZATION AND ARCHITECTURE

COMPUTER ORGANIZATION AND ARCHITECTURE COMPUTER ORGANIZATION AND ARCHITECTURE For COMPUTER SCIENCE COMPUTER ORGANIZATION. SYLLABUS AND ARCHITECTURE Machine instructions and addressing modes, ALU and data-path, CPU control design, Memory interface,

More information

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS (09 periods) Computer Arithmetic: Data Representation, Fixed Point Representation, Floating Point Representation, Addition and

More information

IEEE Standard 754 Floating Point Numbers

IEEE Standard 754 Floating Point Numbers IEEE Standard 754 Floating Point Numbers Steve Hollasch / Last update 2005-Feb-24 IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based

More information

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties

More information

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point

More information

Introduction to Computers and Programming. Numeric Values

Introduction to Computers and Programming. Numeric Values Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 5 Reading: B pp. 47-71 Sept 1 003 Numeric Values Storing the value of 5 10 using ASCII: 00110010 00110101 Binary notation: 00000000

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

Chapter 2. Data Representation in Computer Systems

Chapter 2. Data Representation in Computer Systems Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting

More information

EEM336 Microprocessors I. Arithmetic and Logic Instructions

EEM336 Microprocessors I. Arithmetic and Logic Instructions EEM336 Microprocessors I Arithmetic and Logic Instructions Introduction We examine the arithmetic and logic instructions. The arithmetic instructions include addition, subtraction, multiplication, division,

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr Number Systems and Binary Arithmetic Quantitative Analysis II Professor Bob Orr Introduction to Numbering Systems We are all familiar with the decimal number system (Base 10). Some other number systems

More information

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754 Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that

More information

Chapter 2. Positional number systems. 2.1 Signed number representations Signed magnitude

Chapter 2. Positional number systems. 2.1 Signed number representations Signed magnitude Chapter 2 Positional number systems A positional number system represents numeric values as sequences of one or more digits. Each digit in the representation is weighted according to its position in the

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. CS 261 Fall 2018 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Arithmetic for Computers James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Arithmetic for

More information

Number Systems and Computer Arithmetic

Number Systems and Computer Arithmetic Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation Chapter 2 Float Point Arithmetic Topics IEEE Floating Point Standard Fractional Binary Numbers Rounding Floating Point Operations Mathematical properties Real Numbers in Decimal Notation Representation

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 4: Floating Point Required Reading Assignment: Chapter 2 of CS:APP (3 rd edition) by Randy Bryant & Dave O Hallaron Assignments for This Week: Lab 1 18-600

More information

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point Chapter 3 Part 2 Arithmetic for Computers Floating Point Floating Point Representation for non integral numbers Including very small and very large numbers 4,600,000,000 or 4.6 x 10 9 0.0000000000000000000000000166

More information

CHW 261: Logic Design

CHW 261: Logic Design CHW 261: Logic Design Instructors: Prof. Hala Zayed Dr. Ahmed Shalaby http://www.bu.edu.eg/staff/halazayed14 http://bu.edu.eg/staff/ahmedshalaby14# Slide 1 Slide 2 Slide 3 Digital Fundamentals CHAPTER

More information

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR UNIT I Digital Systems: Binary Numbers, Octal, Hexa Decimal and other base numbers, Number base conversions, complements, signed binary numbers, Floating point number representation, binary codes, error

More information

Classes of Real Numbers 1/2. The Real Line

Classes of Real Numbers 1/2. The Real Line Classes of Real Numbers All real numbers can be represented by a line: 1/2 π 1 0 1 2 3 4 real numbers The Real Line { integers rational numbers non-integral fractions irrational numbers Rational numbers

More information

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B)

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B) COMPUTER ARITHMETIC 1. Addition and Subtraction of Unsigned Numbers The direct method of subtraction taught in elementary schools uses the borrowconcept. In this method we borrow a 1 from a higher significant

More information

COMPUTER ORGANIZATION AND. Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

COMPUTER ORGANIZATION AND. Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers ARM D COMPUTER ORGANIZATION AND Edition The Hardware/Software Interface Chapter 3 Arithmetic for Computers Modified and extended by R.J. Leduc - 2016 In this chapter, we will investigate: How integer arithmetic

More information

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state Chapter 4 xi yi Carry in ci Sum s i Carry out c i+ At the ith stage: Input: ci is the carry-in Output: si is the sum ci+ carry-out to (i+)st state si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi

More information

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Floating Point Arithmetic Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Floating Point (1) Representation for non-integral numbers Including very

More information

Bits, Words, and Integers

Bits, Words, and Integers Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are

More information

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0. C NUMERIC FORMATS Figure C-. Table C-. Listing C-. Overview The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports an

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers Arithmetic for Computers Operations on integers Addition and subtraction Multiplication

More information

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop. CS 320 Ch 10 Computer Arithmetic The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop. Signed integers are typically represented in sign-magnitude

More information

Arithmetic for Computers. Hwansoo Han

Arithmetic for Computers. Hwansoo Han Arithmetic for Computers Hwansoo Han Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers Representation

More information

Computer Organisation CS303

Computer Organisation CS303 Computer Organisation CS303 Module Period Assignments 1 Day 1 to Day 6 1. Write a program to evaluate the arithmetic statement: X=(A-B + C * (D * E-F))/G + H*K a. Using a general register computer with

More information

Numerical computing. How computers store real numbers and the problems that result

Numerical computing. How computers store real numbers and the problems that result Numerical computing How computers store real numbers and the problems that result The scientific method Theory: Mathematical equations provide a description or model Experiment Inference from data Test

More information

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. https://xkcd.com/217/

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. https://xkcd.com/217/ CS 261 Fall 2017 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Clark N. Taylor Department of Electrical and Computer Engineering Brigham Young University clark.taylor@byu.edu 1 Introduction Numerical operations are something at which digital

More information

Chapter 3. Arithmetic Text: P&H rev

Chapter 3. Arithmetic Text: P&H rev Chapter 3 Arithmetic Text: P&H rev3.29.16 Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers Representation

More information

COMPUTER ARITHMETIC (Part 1)

COMPUTER ARITHMETIC (Part 1) Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture COMPUTER ARITHMETIC (Part 1) Introduction The two principal concerns for computer arithmetic

More information

Adding Binary Integers. Part 4. Negative Binary. Integers. Adding Base 10 Numbers. Adding Binary Example = 10. Arithmetic Logic Unit

Adding Binary Integers. Part 4. Negative Binary. Integers. Adding Base 10 Numbers. Adding Binary Example = 10. Arithmetic Logic Unit Part 4 Adding Binary Integers Arithmetic Logic Unit = Adding Binary Integers Adding Base Numbers Computer's add binary numbers the same way that we do with decimal Columns are aligned, added, and "'s"

More information

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation Data Representation II CMSC 313 Sections 01, 02 The conversions we have so far presented have involved only unsigned numbers. To represent signed integers, computer systems allocate the high-order bit

More information

Floating-point representation

Floating-point representation Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,

More information

CS321. Introduction to Numerical Methods

CS321. Introduction to Numerical Methods CS31 Introduction to Numerical Methods Lecture 1 Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 0633 August 5, 017 Number

More information

Floating Point January 24, 2008

Floating Point January 24, 2008 15-213 The course that gives CMU its Zip! Floating Point January 24, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties class04.ppt 15-213, S 08 Floating

More information

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CSE 351 Autumn Instructor: Justin Hsia Floating Point CSE 351 Autumn 2016 Instructor: Justin Hsia Teaching Assistants: Chris Ma Hunter Zahn John Kaltenbach Kevin Bi Sachin Mehta Suraj Bhat Thomas Neuman Waylon Huang Xi Liu Yufang Sun http://xkcd.com/899/

More information

FLOATING POINT NUMBERS

FLOATING POINT NUMBERS Exponential Notation FLOATING POINT NUMBERS Englander Ch. 5 The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 10 0 123.4 x 10 1 12.34 x 10 2 1.234 x 10 3

More information