IEEE Standard for Floating Point Numbers

Size: px
Start display at page:

Download "IEEE Standard for Floating Point Numbers"

Transcription

1 GENERAL ARTICLE IEEE Standard for Floating Point Numbers VRajaraman Floating point numbers are an important data type in computation which is used extensively. Yet, many users do not know the standard which is used in almost all computer hardware to store and process these. In this article, we explain the standards evolved by The Institute of Electrical and Electronic Engineers in 1985 and augmented in 2008 to represent floating point numbers and process them. This standard is now used by all computer manufacturers while designing floating point arithmetic units so that programs are portable among computers. Introduction There are two types of arithmetic which are performed in computers: integer arithmetic and real arithmetic. Integer arithmetic is simple. A decimal number is converted to its binary equivalent andarithmeticisperformedusingbinaryarithmeticoperations. Thelargestpositiveintegerthatmaybestoredinan8-bitbyteis +127,if1 bitisusedforsign. If16 bitsareused,thelargest positiveintegeris+32767andwith32bits,itis , quite large! Integers are used mostly for counting. Most scientific computations are however performed using real numbers, that is, numbers with a fractional part. In order to represent real numbers incomputers,wehavetoasktwoquestions. Thefirstistodecide howmanybitsareneededtoencoderealnumbersandthesecond is to decide how to represent real numbers using these bits. Normally, in numerical computation in science and engineering, onewouldneedatleast7to8significantdigitsprecision.thus, thenumberofbitsneededtoencode8decimaldigitsisapproximately26,as log 2 10 = 3.32bits are needed onthe averageto encodeadigit.incomputers,numbersarestoredasasequenceof 8-bitbytes. Thus32bits(4bytes)whichisbiggerthan26bitsis alogicalsizetouseforrealnumbers. Given32bitstoencodereal VRajaramanisatthe IndianInstituteofScience, Bengaluru. Several generations of scientists and engineers in India havelearntcomputer science using his lucidly writtentextbookson programming and computer fundamentals. Hiscurrentresearch interests are parallel computingandhistoryof computing. Keywords Floating point numbers, rounding, decimal floating point numbers, IEEE Standard. RESONANCE January

2 GENERAL ARTICLE Figure1. Fixedpointrepresentation of realnumbers in binaryusing 32 bits. numbers,thenextquestionishowtobreakitupintoaninteger partandafractionalpart. Onemethodistodividethe32bitsinto 2parts,oneparttorepresentanintegerpartofthenumberandthe otherthefractionalpartasshowninfigure1. Inthisfigure,wehavearbitrarilyfixedthe(virtual)binarypoint betweenbits7and8. Withthisrepresentation,calledfixedpoint representation,thelargestandthesmallestpositivebinarynumbersthatmayberepresentedusing32bitsare Binary Floating Point Numbers Astherangeofreal numbers representable with fixedpointisnot sufficient, normalized floating point is used to represent real numbers. This rangeofrealnumbers, whenfixedpoint representationis used, is not sufficient in many practical problems. Therefore, another representation called normalizedfloatingpointrepresentation is used for real numbers in computers. In this representation,32bitsaredividedintotwoparts:apartcalledthemantissa withitssignandtheothercalledtheexponentwithitssign.the mantissa represents fractions with a non-zero leading bit and the exponentthepowerof2bywhichthemantissaismultiplied. This method increases the range of numbers that may be represented using32bits.inthismethod,abinaryfloatingpointnumberis represented by (sign) mantissa 2 ±exponent wherethesignisonebit,themantissaisabinaryfractionwitha non-zeroleadingbit,andtheexponentisabinaryinteger. If32 12 RESONANCE January 2016

3 GENERAL ARTICLE bits are available to store floating point numbers, we have to decide the following: 1.Howmanybitsaretobeusedtorepresentthemantissa(1bit isreservedforthesignofthenumber). 2.Howmanybitsaretobeusedfortheexponent. 3.Howtorepresentthesignoftheexponent. Thenumberofbitstobeusedforthemantissaisdeterminedby the number of significant decimal digits required in computation, which is at least seven. Based on experience in numerical computation, it is found that at least seven significant decimal digits are needed if results are expected without too much error. The number of bits required to represent seven decimal digits is approximately 23. The remaining 8 bits are allocated for the exponent. The exponent may be represented in sign magnitude form. In this case, 1 bit is used for sign and 7 bits for the magnitude. The exponent will range from 127 to The only disadvantage to this method is that there are two representationsfor0exponent:+0and 0. Toavoidthis,onemayusean excess representation or biased format to represent the exponent. Theexponenthasnosigninthisformat. Therange0to255ofan 8-bitnumberisdividedintotwoparts0to127and128to255as shown below: Thenumberofbits tobeusedforthe mantissa is determined by the number of significant decimal digits required in computation, whichisatleast seven. All bit strings from 0 to 126 are considered negative, exponent 127 represents 0, and values greater than 127 are positive. Thus therangeofexponentsis 127to+128. Givenanexponentstring exp,thevalueoftheexponentwillbetakenas(exp 127).The main advantage of this format is a unique representation for exponent zero. With the representation of binary floating point numbers explained above, the largest floating point number which can be represented is The main advantage of biased exponent format is unique representation for exponent 0. RESONANCE January

4 GENERAL ARTICLE bits =( ) The smallest floating point number is bits Example. Represent in 32-bit binary floating point format = = Normalized23bitmantissa= Asexcessrepresentationisbeingusedforexponent,itisequalto 127+6=133. Thustherepresentationis = = The32-bitstringusedtostore inacomputerwillthusbe Here, the most significant bit is the sign, the next 8 bits the exponent,andthelast23bitsthemantissa. Institute of Electrical and Electronics Engineers standardized how floating point binary numbers would be represented in computers, how these would be rounded, how0and wouldbe represented and how totreatexception condition such as attempttodivideby0. IEEEFloatingPointStandard Floatingpointbinarynumberswerebeginningtobeusedinthe mid50s.therewasnouniformityintheformatsusedtorepresent floating point numbers and programs were not portable from one manufacturer s computer to another. By the mid 1980s, with the advent of personal computers, the number of bits used to store floatingpointnumberswasstandardizedas32bits. AStandards CommitteewasformedbytheInstituteof ElectricalandElectronics Engineers to standardize how floating point binary numbers would be represented in computers. In addition, the standard specified uniformity in rounding numbers, treating exception conditionssuchasattempttodivideby0,andrepresentationof0 and infinity( This standard, calledieee Standard 754 for 14 RESONANCE January 2016

5 GENERAL ARTICLE floating point numbers, was adopted in 1985 by all computer manufacturers. It allowed porting of programs from one computer to another without the answers being different. This standard defined floating point formats for 32-bit and 64-bit numbers. With improvement in computer technology it became feasible to usealargernumberof bitsforfloatingpointnumbers.afterthe standard was used for a number of years, many improvements were suggested. The standard was updated in The current standard is IEEE version. This version retained all the features of the 1985 standard and introduced new standards for 16 and128-bit numbers. It also introduced standards for representing decimal floating point numbers. We will describe these standards laterinthisarticle. Inthissection,wewilldescribeIEEE Standard. The current standard is IEEE version. This version retained all the features of the 1985 standard and introduced new standards for 16-bit and128-bit numbers. It also introduced standards for representing decimal floating point numbers. IEEE floating point representation for binary real numbers consists of three parts. For a 32-bit(called single precision) number, they are: 1.Sign,forwhich1bitisallocated. 2. Mantissa(called significand in the standard) is allocated 23 bits. 3. Exponent is allocated 8 bits. As both positive and negative numbers are required for the exponent, instead of using a separate sign bit for the exponent, the standard uses a biased representation. The value of the bias is 127. Thus an exponent 0meansthat 127isstoredintheexponentfield. Astored value198meansthattheexponentvalueis( )=71. Theexponents 127(all0s)and+128(all1s)arereservedfor representing special numbers which we discuss later. To increase the precision of the significand, the IEEE 754 Standard uses a normalized significand which implies that its most significantbitisalways1. Asthisisimplied,itisassumedtobe on the left of the(virtual) decimal point of the significand. Thus intheieeestandard,thesignificandis24bitslong 23bitsof thesignificandwhichisstoredinthememoryandanimplied1as the most significant 24th bit. The extra bit increases the number RESONANCE January

6 GENERAL ARTICLE Figure 2. IEEE 754 representation of 32-bit floating point number. of significant digits in the significand. Thus, a floating point numberintheieeestandardis ( 1) s (1.f) 2 2 exponent 127 wheres isthesignbit; s=0isusedforpositivenumbersands =1forrepresentingnegativenumbers. frepresentsthebitsinthe significand. Observe that the implied 1 as the most significant bit of the significand is explicitly shown for clarity. For a32-bitwordmachine,theallocationofbitsfor afloating pointnumberaregiveninfigure2.observethattheexponentis placed before the significand(see Box 1). Example. Represent inIEEE bitfloatingpoint format = = Normalized significand = Exponent: (e 127)=5or,e=132. ThebitrepresentationinIEEEformatis Box1. WhydoesIEEE754standardusebiasedexponentrepresentationandplaceitbeforethesignificand? Computers normally have separate arithmetic units to compute with integer operands and floating point operands. TheyarecalledFPU(FloatingPointUnit)andIU(IntegerUnit). AnIUworksfasterandischeaper tobuild.inmostrecentcomputers,thereareseveraliusandasmallernumberoffpus.itispreferabletouse IUswheneverpossible.ByplacingthesignbitandbiasedexponentbitsasshowninFigure2,operationssuch as x<r.o.>y,wherer.o.isarelationaloperator,namely,>,<,,andx,yarereals,canbecarriedoutbyinteger arithmetic units as the exponents are integers. Sorting real numbers may also be carried out by IUs as biased exponentarethemostsignificantbits.thisensureslexicographicalorderingofrealnumbers;thesignificand neednotbeconsideredasafraction. 16 RESONANCE January 2016

7 GENERAL ARTICLE SpecialValuesinIEEE Standard(32Bits) RepresentationofZero: Asthesignificandisassumedtohavea hidden1asthemostsignificantbit,all0sinthesignificandpart ofthenumberwillbetakenas Thuszeroisrepresented intheieeestandardbyall0sfortheexponentandall0sforthe significand. All0sfortheexponentisnotallowedtobeusedfor anyothernumber. Ifthesignbitis0andalltheotherbits0,the numberis+0. Ifthesignbitis1andalltheotherbits0,itis 0. Even though +0 and 0 have distinct representations they are assumed equal. Thus the representations of +0 and 0 are: As the significand is assumedtohavea hidden1asthemost significantbit,all0s in the significand part ofthenumberwillbe takenas Thuszerois representedinthe IEEE Standard by all 0s for the exponent andall0sforthe significand. Representation of Infinity: All 1s in the exponent field is assumedtorepresentinfinity( Asignbit0represents+ and asignbit1represents. Thustherepresentationsof+ and are: Representation of Non Numbers: When an operation is performedbyacomputeronapairofoperands,theresultmaynotbe mathematically defined. For example, if zero is divided by zero, All1sinthe exponent field is assumed to representinfinity( ). RESONANCE January

8 GENERAL ARTICLE When an arithmetic operation is performed on two numbers which results in an indeterminate answer,itiscalled NaN(NotaNumber) in IEEE Standard. theresultisindeterminate. SucharesultiscalledNotaNumber (NaN) in the IEEE Standard. In fact the IEEE Standard defines twotypesofnan.whentheresultofanoperationisnotdefined (i.e., indeterminate) it is called a Quiet NaN(QNaN). Examples are:0/0,( ),.QuietNaNsarenormallycarriedoverinthe computation. The other type of NaN is called a Signalling Nan (SNaN).Thisisusedtogiveanerrormessage.Whenanoperation leadstoafloatingpointunderflow,i.e.,theresultofacomputation issmallerthanthesmallestnumberthatcanbestoredasafloating pointnumber,ortheresultisanoverflow,i.e.,itislargerthanthe largestnumberthatcanbestored, SNaNisused. Whennovalid valueisstoredina variablename(i.e., itisundefined)andan attemptismadetouseitinanarithmeticoperation,snanwould result.qnanisrepresentedby0or1asthesignbit,all1sas exponent,anda0astheleft-mostbitofthesignificandandatleast one1intherestofthesignificand. SNaNisrepresentedby0or 1asthesignbit,all1sasexponent,anda1astheleft-mostbitof thesignificandandanystringofbitsfortheremaining22bits.we give below the representations of QNaN and SNaN. QNaN 0or Themostsignificantbitofthesignificandis0.Thereisatleast one1intherestofthesignificand. SNaN 0 or The most significant bit of the significand is 1. Any combination ofbitsisallowedfortheotherbitsofthesignificand. 18 RESONANCE January 2016

9 GENERAL ARTICLE LargestandSmallestPositiveFloatingPointNumbers: Largest Positive Number Significand: =1+( )= Exponent: ( )=127. LargestNumber = ( ) Iftheresultofacomputationexceedsthelargestnumberthatcan bestoredinthecomputer,thenitiscalledanoverflow. SmallestPositiveNumber Significand = 1.0. Exponent=1 127= 126. Thesmallestnormalizednumberis= SubnormalNumbers:Whenalltheexponentbitsare0andthe leading hidden bit of the siginificand is 0, then the floating point number is called a subnormalnumber.thus, one logical representationofasubnormalnumberis ( 1) s 0.f (all0sfortheexponent), where f hasatleastone1(otherwisethenumberwillbetakenas 0).However,thestandarduses 126,i.e.,bias+1fortheexponent ratherthan 127whichisthebiasforsomenotsoobviousreason, possiblybecausebyusing 126insteadof 127,thegapbetween the largest subnormal number and the smallest normalized number is smaller. Thelargestsubnormalnumberis Itisclose tothesmallestnormalizednumber Whenallthe exponent bits are 0 and the leading hiddenbitofthe siginificand is 0, then thefloatingpoint numberiscalleda subnormal number. RESONANCE January

10 GENERAL ARTICLE By using subnormal numbers, underflow whichmayoccurin some calculations are gradual. Thesmallestpositivesubnormalnumberis thevalueofwhichis = Aresultthatissmallerthanthesmallestnumberthatcanbestored inacomputeriscalledanunderflow. You may wonder why subnormal numbers are allowed in the IEEE Standard. By using subnormal numbers, underflow which may occur in some calculations are gradual. Also, the smallest numberthatcouldberepresentedinamachineisclosertozero. (SeeBox2.) Box2. MachineEpsilonandDwarf In any of the formats for representing floating point numbers, machine epsilon is defined as the difference between1andthenextlargernumberthatcanbestoredinthatformat. Forexample,in32-bitIEEEformatwith a23-bitsignificand,themachineepsilonis2 23 = Thisessentiallytellsusthattheprecisionof decimalnumbersstoredinthisformatis7digits.thetermprecisionandaccuracyarenotthesame.accuracy impliescorrectnesswhereasprecisiondoesnot.for64-bitrepresentationofieeefloatingpointnumbersthe significandlengthis52bits.thus,machineepsilonis2 52 = Therefore,decimalcalculationswith 64 bits give 16 digit precision. This is a conservative definition used by industry. In MATLAB, machine epsilon is as defined above. However, academics define machine epsilon as the upper bound of the relative error when numbersarerounded. Thus,for32-bitfloatingpointnumbers,themachineepsilonis2 23 / The machine epsilon is useful in iterative computation. When two successive iterates differ by less than epsilon onemayassumethattheiterationhasconverged.theieeestandarddoesnotdefinemachineepsilon. Tinyordwarfistheminimumsubnormalnumberthatcanberepresentedusingthespecifiedfloatingpoint format. Thus, for IEEE 32-bit format, it is whosevalueis: = Anynumberlessthanthiswillsignalunderflow. Theadvantage ofusingsubnormalnumbersisthatunderflowisgradualaswasstatedearlier. 20 RESONANCE January 2016

11 GENERAL ARTICLE Whenamathematicaloperationsuchasadd,subtract,multiply, or divide is performed with two floating point numbers, the significand of the result may exceed 23 bits after the adjustment oftheexponent.insuchacase,therearetwoalternatives. Oneis totruncatetheresult,i.e.,ignoreallbitsbeyondthe32ndbit. The other is to round the result to the nearest significand. For example,ifasignificand is andtheoverflowbitis1, aroundedvaluewouldbe Inotherwords,ifthefirst bitwhichistruncatedis1,add1totheleastsignificantbit,else ignorethebits. Thisiscalledroundingupwards.Iftheextrabits areignored,itiscalledroundingdownwards. TheIEEEStandard suggests to hardware designers to get the best possible result whileperformingarithmeticthatarereproducibleacrossdifferent manufacturer s computers using the standard. The Standard suggeststhatintheequation c=a <op>b, where a andb are operands and<op> an arithmetic operation, the resultcshouldbeasifitwascomputedexactlyandthenrounded. Thisiscalledcorrectrounding. InTables1and2,wesummarisethediscussionsinthissection. Table 1. IEEE floatingpointstandard.weusef to represent the significand, eto representtheexponent, b the bias of the exponent, and±forthesign. Value Sign Exponent(8bits) Significand(23bits) (all23bits0) (all23bits0) +1.f 2 (e b) to aa.aa (a=0or1) eexponent,bbias f 2 (e b) to aa.aa (a=0or1) (all23bits0) (all23bits0) SNaN 0or to leadingbit (atleastone1in therest) QNaN 0or leadingbit1 Positive subnormal to (atleastone1) 0.f 2 x+1-b (x is the number of leading 0s in significand) RESONANCE January

12 GENERAL ARTICLE Table2. Operations on special numbers. All NaNs are Quiet NaNs. Operation Result Operation Result n/± 0 ±0/±0 NaN ± /± ± NaN ±n/0 ± ± /± NaN + ± 0 NaN IEEE 754 Floating Point 64-Bit Standard 1985 With64bits available to store floating point numbers, there is more freedom to increase the significant digits in the significand and increase the range by allocating more bits to the exponent. The standard allocates 1 bit for sign,11bitsforthe exponent, and 52 bitsforthe significand. TheIEEE754floatingpoint standardfor64-bit(calleddouble precision) numbers is very similar to the 32-bit standard. The main differenceis theallocation bits for theexponent and the significand. With64bitsavailabletostorefloatingpointnumbers, there is more freedom to increase the significant digits in the significandandincreasetherangebyallocatingmorebitstothe exponent. Thestandardallocates1bitforsign,11bitsforthe exponent,and52bitsforthesignificand. Theexponentusesa biasedrepresentationwithabiasof1023.therepresentationis shown below: 1 bit 11 bits 52 bits Anumber inthisstandardisthus ( 1) s (1.f) 2 (exponent 1023), wheres=±1,and f isthevalueofthesignificand. The largest positive number which can be represented in this standard is bit 11bits 52bits whichis=( ) 2 ( ) =( ) Thesmallest positivenumberthatcanberepresentedusing64 22 RESONANCE January 2016

13 GENERAL ARTICLE bits using this standard is bit 11bits 52bits whosevalueis = Thedefinitionsof±0,±,QNaN,SNaNremainthesameexcept thatthenumberofbitsintheexponentandsignificandarenow11 and52respectively.subnormalnumbershaveasimilardefinition asin32-bitstandard. IEEE 2008 Standard has introduced standards for 16-bit and 128-bit floating point numbers. It has also introduced standards for floating point decimal numbers. IEEE754FloatingPointStandard 2008 This standard was introduced to enhance the scope of IEEE 754 floatingpoint standardof1985. Theenhancements wereintroduced to meet the demands of three sets of professionals: 1. Those working in the graphics area. 2. Those who use high performance computers for numeric intensive computations. 3. Professionals carrying out financial transactions who require exact decimal computation. This standard has not changed the format of 32- and 64-bit numbers. It has introduced floating point numbers which are 16 and 128 bits long. These are respectively called half and quadruple precision. Besides these, it has also introduced standards for floating point decimal numbers. The16-BitStandard The 16-bit format for real numbers was introduced for pixel storage in graphics and not for computation. (Some graphics processingunitsusethe16-bitformatforcomputation.) The16- bitformatisshowninfigure3. The 16-bit format for real numbers was introduced for pixel storage in graphics and not forcomputation. Figure3. Representation of 16-bitfloatingpointnumbers. RESONANCE January

14 GENERAL ARTICLE With16-bitrepresentation,5bitsareusedfortheexponentand10 bitsforthesignificand. Biasedexponentisusedwithabiasof15. Thustheexponentrangeis: 14to+15.Rememberthatall0sand all1sfortheexponentarereservedtorepresent0and respectively. Themaximumpositiveintegerthatcanbestoredis =1+( ) 2 15 =( ) Theminimumpositivenumberis =2 14 = The minimum subnormal 16-bit floating point number is With improvements in computer technology, using128bitsto represent floating point numbers has become feasible. This is sometimesusedin numeric-intensive computation, where large rounding errors may occur. Thedefinitionsof±0,±,NaN,andSNaNfollowthesameideas asin32-bitformat. The 128-Bit Standard With improvements in computer technology, using 128 bits to represent floating point numbers has become feasible. This is sometimes used in numeric-intensive computation, where large roundingerrorsmayoccur.inthisstandard,besidesasignbit,14 bits are used for the exponent, and 113 bits are used for the significand. The representation is shown in Figure 4. Anumberinthisstandardis ( 1) s (1.f) 2 (exponent 16384). Figure 4.Representation of 128-bitfloatingpointnumber. 24 RESONANCE January 2016

15 GENERAL ARTICLE The largest positive number which can be represented in this format is: bit 14bits 113bits whichequals ( ) = Thesmallest normalizedpositive number whichcanberepresentedis The main motivation for introducing a decimal standard is thefactthata terminating decimal fraction need not give aterminatingbinary fraction = A subnormal number is represented by ( 1) s 0.f Thesmallestpositivesubnormalnumberis = Thedefinitionsof±0,±,andQNaN,andSNaNarethesameas inthe1985standardexceptfortheincreaseofbitsintheexponent and significand. There are other minor changes which we will not discuss. The Decimal Standard The main motivation for introducing a decimal standard is the factthataterminatingdecimalfractionneednotgiveaterminatingbinaryfraction. Forexample,(0.1) 10 = ( (0011) recurring) 2.Ifonecomputes usingbinaryarithmetic and rounding the non-terminating binary fraction up to the nextlargernumber,theansweris: insteadof This is unacceptable in many situations, particularly in financial transactions. There was a demand from professionals carrying out financial transactions to introduce a standard for representing decimal floating point numbers. Decimal floating point numbers are not new. They were available in COBOL. There was, however, no standard and this resulted in non-portable programs. IEEE hasthus introduceda standardfor Therewasademand from professionals carrying out financial transactionsto introduce a standard for representing decimal floating point numbers. RESONANCE January

16 GENERAL ARTICLE decimal floating point numbers to facilitate portability of programs. Decimal Representation The idea of encoding decimal numbers using bits is simple. For example, to represent 0.1 encoded(not converted) to binary, it is writtenas Theexponentinsteadofbeinginterpretedas a power of 2 is now interpreted as a power of 10. Both the significand and the exponent are now integers and their binary representations have a finite number of bits. One method of representing usingthisideaina32-bitwordisshownin Figure 5. Wehaveassumedan8-bitbinaryexponentinbiasedform.Inthis representation, there is no hidden 1 in the significand. The significand is an integer.(fractions are represented by using an appropriate power of 10 in the exponent field.) For example, willbewrittenas and753658,an integer, will be converted to binary and stored as the significand. Thepowerof10,namely 3,willbeanintegerintheexponent part. The IEEE Standard does not use this simple representation. This isduetothefactthatthelargestdecimalsignificandthatmaybe representedusingthismethodis It is preferable to obtain 7 significant digits, i.e., significand up to TheIEEEStandardachievesthisbyborrowingsome bits from the exponent field and reducing the range of the exponent. In other words, instead of maximum exponent of 128, it isreducedto96andthebitssavedareusedtoincreasethesignificant digitsinthesignificand.thecodingofbitsisasshowninfigure6. Figure5. Representation of decimal floating point numbers. Figure 6. Representation of 32-bitfloatingpointnumbers inieee2008standard. 26 RESONANCE January 2016

17 GENERAL ARTICLE Observe the difference in terminology for exponent and significand fields. In this coding, one part of the five combination bits is interpreted as exponent and the other part as an extension of significand bits using an ingenious method. This will be explained later in this section. IEEE2008Standarddefinestwoformatsforrepresentingdecimal floating point numbers. One of them converts the binary significandfieldtoadecimalintegerbetween0and10 p 1 where pisthenumberofsignificantbitsinthesignificand.theother, called densely packed decimal form, encodes the bits in the significand field directly to decimal. Both methods represent decimal floating point numbers as shown below. Decimalfloatingpointnumber=( 1) s f 10 exp bias,wheres is the sign bit, f is a decimal fraction(significand), exp and biasare decimal numbers. The fraction need not be normalized but is usually normalized with a non-zero most significant digit. There isnohiddenbitasinthebinarystandard. Thestandarddefines decimal floating point formats for 32-bit, 64-bit and 128-bit numbers. IEEE 2008 Standard defines two formats for representing decimal floating point numbers. One of themconverts the binary significand fieldtoadecimal integer between 0 and10 p 1 wherepis the number of significantbitsinthe significand. The other, called densely packed decimal form, encodes thebitsin thesignificandfield directly to decimal. Densely Packed Decimal Significand Representation Normally,whendecimalnumbersarecodedtobinary,4bitsare required to represent each digit, as 3 bits give only 8 combinations that arenotsufficienttorepresent 10 digits. Four bits give16 combinations out of which we need only 10 combinations to representdecimalnumbersfrom0to9.thus,wewaste6combinationsoutof16. Fora32-bitnumber,if1bitisusedforsign,8 bitsfortheexponentand23bitsforthesignificand,themaximum positivevalueof thesignificandwillbe Theexponent rangewillbe00to99. Withabias of50,themaximum positivedecimalnumberwillbe The number of significand digits in the above coding is not sufficient. A clever coding scheme for decimal encoding of binary numbers was discovered by Cowlinshaw. This method encodes10bitsas3decimaldigits(rememberthat2 10 =1024and RESONANCE January

18 GENERAL ARTICLE onecanencode000to999usingtheavailable1024combinations of10bits). IEEE754Standardusesthiscodingscheme. Wewill now describe the standard using this coding method for 32-bit numbers. The standard defines the following: 1. Asignbit0for+and1for. 2. Acombinationfieldof5bits.Itiscalledacombinationfieldas onepartofitisusedfortheexponentandanotherpartforthe significand. 3. Anexponentcontinuationfieldthatisusedfortheexponent. 4. Acoefficientfieldthatencodesstringsof10bitsas3digits. Thus, 20 bits are encoded as 6 digits. This is part of the significand field. For32-bitIEEEword,thedistributionofbitswasgiveninFigure 6 which is reproduced below for ready reference. Sign Combinationfield Exponent continuation field Coefficient field 1bit 5bits 6bits 20bits The 5 bits of the combination field are used to increase the coefficient field by 1 digit. It uses the first ten 4-bit combinations of16possible4-bitcombinationstorepresentthis. Asthereare 32combinationsof5bits,outofwhichonly10areneededtoextend the coefficient digits, the remaining 22-bit combinations are availabletoincreasetheexponentrangeandalsoencode andnan.this isdonebyaningeniousmethodexplainedintable3. Combination bits Type Most significant bits of exponent Most significant digit of coefficient Table 3. Useofcombination bits in IEEE Decimal Standard. b 1 b 2 b 3 b 4 b 5 xyabc 11abc Finite Finite NaN xy ab 0abc 100c 28 RESONANCE January 2016

19 GENERAL ARTICLE Sign bit 1 1 Combination field(bits) 5 5 Exponentcontinuationfield(bits) 8 12 Coefficient field(bits) Numberofbitsinnumber Significant digits Exponentrange(decimal) Exponent bias(decimal) Table 4. Decimalrepresentation for 64-bit and 128-bit numbers. Thus, the most significant 2 bits of the exponent are constrained to take on values 00, 01, 10. Using the 6 bits of exponent combinationbits,wegetanexponentrangeof3 2 6 =192.The exponentbiasis96.(rememberthattheexponentisaninteger thatcanbeexactlyrepresentedinbinary.) Withtheadditionof1 significant digit to the 6 digits of the coefficient bits, the decimal representation of 32-bit numbers has 7 significant digits. The largestnumberthatcanbestoredis Asimilar methodisusedtorepresent64-and128-bitfloatingpointnumbers. ThisisshowninTable4. In the IEEE 2008 Decimal Standard, the combination bits along with exponent bits are used to represent NaN, signaling NaN and ± asshownintable5. Quantity Sign bit Combination bits b 0 b 1 b 2 b 3 b 4 b 5 b 6 SNaN 0or QNaN 0or x x x x (xis0or1) Table 5. Representation of NaNs,±,and±0. RESONANCE January

20 GENERAL ARTICLE Binary Integer Representation of Significand In this method, as we stated earlier, if there are p bits in the extendedcoefficient fieldconstituting thesignificand, they are converted as a decimal integer. This representation also uses the bitsofthecombinationfieldtoincreasethenumberofsignificant digits.theideaissimilarandwewillnotrepeatit.therepresentation of NaNs and ± is the same as in the densely packed format. The major advantage of this representation is the simplicity of performing arithmetic using hardware arithmetic units designed to perform arithmetic using binary numbers. The disadvantage is the time taken for conversion from integer to decimal. The densely packed decimal format allows simple encoding of binary to decimal but needs a specially designed decimal arithmeticunit. Acknowledgements IthankProf.PCPBhatt,Prof.CRMuthukrishnan,andProf.S KNandyforreadingthefirstdraftofthisarticleandsuggesting improvements. Suggested Reading Address for Correspondence V Rajaraman Supercomputer Education and Research Centre Indian Institute of Science Bengaluru , India. rajaram@serc.iisc.in [1] David Goldberg,Whatevery computer scientist should know about floatingpoint arithmetic, ACM Computing Surveys,Vol.28,No.1, pp5 48,March1991. [2] V Rajaraman, ComputerOrientedNumericalMethods, 3rd Edition, PHI Learning,NewDelhi,pp.15 32,2013. [3] Marc Cowlinshaw, Denselypacked decimal encoding, IEEE Proceedings Computer and Digital Techniques, Vol.149,No.3,pp , May [4] Steve Hollasch, IEEE Standard 754 for floatingpoint numbers, steve.hollasch.net/cgindex/coding/ieeefloat.html [5] Decimal Floating Point, Wikipedia, en.wikipedia.org/wiki/ Decimal_floating_point 30 RESONANCE January 2016

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI 3.5 Today s Contents Floating point numbers: 2.5, 10.1, 100.2, etc.. How

More information

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive. IEEE 754 Standard - Overview Frozen Content Modified by on 13-Sep-2017 Before discussing the actual WB_FPU - Wishbone Floating Point Unit peripheral in detail, it is worth spending some time to look at

More information

Finite arithmetic and error analysis

Finite arithmetic and error analysis Finite arithmetic and error analysis Escuela de Ingeniería Informática de Oviedo (Dpto de Matemáticas-UniOvi) Numerical Computation Finite arithmetic and error analysis 1 / 45 Outline 1 Number representation:

More information

IEEE Standard 754 Floating Point Numbers

IEEE Standard 754 Floating Point Numbers IEEE Standard 754 Floating Point Numbers Steve Hollasch / Last update 2005-Feb-24 IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based

More information

Numeric Encodings Prof. James L. Frankel Harvard University

Numeric Encodings Prof. James L. Frankel Harvard University Numeric Encodings Prof. James L. Frankel Harvard University Version of 10:19 PM 12-Sep-2017 Copyright 2017, 2016 James L. Frankel. All rights reserved. Representation of Positive & Negative Integral and

More information

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right. Floating-point Arithmetic Reading: pp. 312-328 Floating-Point Representation Non-scientific floating point numbers: A non-integer can be represented as: 2 4 2 3 2 2 2 1 2 0.2-1 2-2 2-3 2-4 where you sum

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic CS 365 Floating-Point What can be represented in N bits? Unsigned 0 to 2 N 2s Complement -2 N-1 to 2 N-1-1 But, what about? very large numbers? 9,349,398,989,787,762,244,859,087,678

More information

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties

More information

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction 1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

More information

COMP2611: Computer Organization. Data Representation

COMP2611: Computer Organization. Data Representation COMP2611: Computer Organization Comp2611 Fall 2015 2 1. Binary numbers and 2 s Complement Numbers 3 Bits: are the basis for binary number representation in digital computers What you will learn here: How

More information

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation Chapter 2 Float Point Arithmetic Topics IEEE Floating Point Standard Fractional Binary Numbers Rounding Floating Point Operations Mathematical properties Real Numbers in Decimal Notation Representation

More information

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. https://xkcd.com/217/

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. https://xkcd.com/217/ CS 261 Fall 2017 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now

More information

Floating Point Representation in Computers

Floating Point Representation in Computers Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong What are Floating Point Numbers? Any

More information

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. CS 261 Fall 2018 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now

More information

Floating Point Numbers

Floating Point Numbers Floating Point Numbers Summer 8 Fractional numbers Fractional numbers fixed point Floating point numbers the IEEE 7 floating point standard Floating point operations Rounding modes CMPE Summer 8 Slides

More information

Floating Point Numbers

Floating Point Numbers Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran

More information

Floating Point Numbers

Floating Point Numbers Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Fractions in Binary Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU)

More information

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. ! Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

More information

Data Representation Floating Point

Data Representation Floating Point Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:

More information

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754 Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

Floating Point January 24, 2008

Floating Point January 24, 2008 15-213 The course that gives CMU its Zip! Floating Point January 24, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties class04.ppt 15-213, S 08 Floating

More information

Data Representation Floating Point

Data Representation Floating Point Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:

More information

Chapter 2. Data Representation in Computer Systems

Chapter 2. Data Representation in Computer Systems Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 11: Floating Point & Floating Point Addition Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Single Precision Format

More information

IEEE-754 floating-point

IEEE-754 floating-point IEEE-754 floating-point Real and floating-point numbers Real numbers R form a continuum - Rational numbers are a subset of the reals - Some numbers are irrational, e.g. π Floating-point numbers are an

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-09-07 1. We are still at 50. If you are still waiting and are not interested in knowing if a slot frees up, let me know. 2. There is a correction to HW 1, problem 4; the condition

More information

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers. class04.ppt 15-213 The course that gives CMU its Zip! Topics Floating Point Jan 22, 2004 IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Floating Point Puzzles For

More information

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers

More information

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers Real Numbers We have been studying integer arithmetic up to this point. We have discovered that a standard computer can represent a finite subset of the infinite set of integers. The range is determined

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Systems I Floating Point Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties IEEE Floating Point IEEE Standard 754 Established in 1985 as uniform standard for

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = ( Floating Point Numbers in Java by Michael L. Overton Virtually all modern computers follow the IEEE 2 floating point standard in their representation of floating point numbers. The Java programming language

More information

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754 Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that

More information

Floating Point Numbers. Lecture 9 CAP

Floating Point Numbers. Lecture 9 CAP Floating Point Numbers Lecture 9 CAP 3103 06-16-2014 Review of Numbers Computers are made to deal with numbers What can we represent in N bits? 2 N things, and no more! They could be Unsigned integers:

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 4: Floating Point Required Reading Assignment: Chapter 2 of CS:APP (3 rd edition) by Randy Bryant & Dave O Hallaron Assignments for This Week: Lab 1 18-600

More information

Chapter Three. Arithmetic

Chapter Three. Arithmetic Chapter Three 1 Arithmetic Where we've been: Performance (seconds, cycles, instructions) Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing

More information

Giving credit where credit is due

Giving credit where credit is due CSCE 230J Computer Organization Floating Point Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due Most of slides for this lecture are based

More information

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time. Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time The machine model Fabián E. Bustamante, Spring 2010 IEEE Floating point Floating point

More information

Chapter 2 Data Representations

Chapter 2 Data Representations Computer Engineering Chapter 2 Data Representations Hiroaki Kobayashi 4/21/2008 4/21/2008 1 Agenda in Chapter 2 Translation between binary numbers and decimal numbers Data Representations for Integers

More information

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

More information

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation Class Outline Computational Methods CMSC/AMSC/MAPL 460 Errors in data and computation Representing numbers in floating point Ramani Duraiswami, Dept. of Computer Science Computations should be as accurate

More information

CO212 Lecture 10: Arithmetic & Logical Unit

CO212 Lecture 10: Arithmetic & Logical Unit CO212 Lecture 10: Arithmetic & Logical Unit Shobhanjana Kalita, Dept. of CSE, Tezpur University Slides courtesy: Computer Architecture and Organization, 9 th Ed, W. Stallings Integer Representation For

More information

Giving credit where credit is due

Giving credit where credit is due JDEP 284H Foundations of Computer Systems Floating Point Dr. Steve Goddard goddard@cse.unl.edu Giving credit where credit is due Most of slides for this lecture are based on slides created by Drs. Bryant

More information

unused unused unused unused unused unused

unused unused unused unused unused unused BCD numbers. In some applications, such as in the financial industry, the errors that can creep in due to converting numbers back and forth between decimal and binary is unacceptable. For these applications

More information

INTEGER REPRESENTATIONS

INTEGER REPRESENTATIONS INTEGER REPRESENTATIONS Unsigned Representation (B2U). Two s Complement (B2T). Signed Magnitude (B2S). Ones Complement (B2O). scheme size in bits (w) minimum value maximum value B2U 5 0 31 B2U 8 0 255

More information

System Programming CISC 360. Floating Point September 16, 2008

System Programming CISC 360. Floating Point September 16, 2008 System Programming CISC 360 Floating Point September 16, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Powerpoint Lecture Notes for Computer Systems:

More information

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Representing numbers in floating point and associated issues Ramani Duraiswami, Dept. of Computer Science Class Outline Computations should be as accurate and as

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. 1 Part 1: Data Representation Our goal: revisit and re-establish fundamental of mathematics for the computer architecture course Overview: what are bits

More information

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Representing numbers in floating point and associated issues Ramani Duraiswami, Dept. of Computer Science Class Outline Computations should be as accurate and as

More information

IEEE Standard for Floating-Point Arithmetic: 754

IEEE Standard for Floating-Point Arithmetic: 754 IEEE Standard for Floating-Point Arithmetic: 754 G.E. Antoniou G.E. Antoniou () IEEE Standard for Floating-Point Arithmetic: 754 1 / 34 Floating Point Standard: IEEE 754 1985/2008 Established in 1985 (2008)

More information

IEEE Floating Point Numbers Overview

IEEE Floating Point Numbers Overview COMP 40: Machine Structure and Assembly Language Programming (Fall 2015) IEEE Floating Point Numbers Overview Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

More information

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CSE 351 Autumn Instructor: Justin Hsia Floating Point CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan Administrivia Lab

More information

Floating-Point Arithmetic

Floating-Point Arithmetic Floating-Point Arithmetic Raymond J. Spiteri Lecture Notes for CMPT 898: Numerical Software University of Saskatchewan January 9, 2013 Objectives Floating-point numbers Floating-point arithmetic Analysis

More information

Data Representations & Arithmetic Operations

Data Representations & Arithmetic Operations Data Representations & Arithmetic Operations Hiroaki Kobayashi 7/13/2011 7/13/2011 Computer Science 1 Agenda Translation between binary numbers and decimal numbers Data Representations for Integers Negative

More information

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0. C NUMERIC FORMATS Figure C-. Table C-. Listing C-. Overview The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports an

More information

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point Chapter 3 Part 2 Arithmetic for Computers Floating Point Floating Point Representation for non integral numbers Including very small and very large numbers 4,600,000,000 or 4.6 x 10 9 0.0000000000000000000000000166

More information

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic September 28, 2018 Lecture 1 September 28, 2018 1 / 25 Floating point arithmetic Computers use finite strings of binary digits to represent

More information

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point

More information

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: September 18, 2017 at 12:48 CS429 Slideset 4: 1 Topics of this Slideset

More information

Representing and Manipulating Floating Points. Jo, Heeseung

Representing and Manipulating Floating Points. Jo, Heeseung Representing and Manipulating Floating Points Jo, Heeseung The Problem How to represent fractional values with finite number of bits? 0.1 0.612 3.14159265358979323846264338327950288... 2 Fractional Binary

More information

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with

More information

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3 Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part

More information

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CSE 351 Autumn Instructor: Justin Hsia Floating Point CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan http://xkcd.com/571/

More information

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic 1 EE 109 Unit 19 IEEE 754 Floating Point Representation Floating Point Arithmetic 2 Floating Point Used to represent very small numbers (fractions) and very large numbers Avogadro s Number: +6.0247 * 10

More information

Floating-point representations

Floating-point representations Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010

More information

Floating-point representations

Floating-point representations Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010

More information

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation Floating Point Arithmetic fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation for example, fixed point number

More information

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CSE 351 Autumn Instructor: Justin Hsia Floating Point CSE 351 Autumn 2016 Instructor: Justin Hsia Teaching Assistants: Chris Ma Hunter Zahn John Kaltenbach Kevin Bi Sachin Mehta Suraj Bhat Thomas Neuman Waylon Huang Xi Liu Yufang Sun http://xkcd.com/899/

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 4-B Floating-Point Arithmetic - II Spring 2017 Koren Part.4b.1 The IEEE Floating-Point Standard Four formats for floating-point

More information

Floating-point representation

Floating-point representation Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,

More information

CS101 Lecture 04: Binary Arithmetic

CS101 Lecture 04: Binary Arithmetic CS101 Lecture 04: Binary Arithmetic Binary Number Addition Two s complement encoding Briefly: real number representation Aaron Stevens (azs@bu.edu) 25 January 2013 What You ll Learn Today Counting in binary

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-B Floating-Point Arithmetic - II Israel Koren ECE666/Koren Part.4b.1 The IEEE Floating-Point

More information

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. CS 33 Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Byte-Oriented Memory Organization 00 0 FF F Programs refer to data by address

More information

Scientific Computing. Error Analysis

Scientific Computing. Error Analysis ECE257 Numerical Methods and Scientific Computing Error Analysis Today s s class: Introduction to error analysis Approximations Round-Off Errors Introduction Error is the difference between the exact solution

More information

FLOATING POINT NUMBERS

FLOATING POINT NUMBERS Exponential Notation FLOATING POINT NUMBERS Englander Ch. 5 The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 10 0 123.4 x 10 1 12.34 x 10 2 1.234 x 10 3

More information

Floating Point Considerations

Floating Point Considerations Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the

More information

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1 Floating-Point Arithmetic Numerical Analysis uses floating-point arithmetic, but it is just one tool in numerical computation. There is an impression that floating point arithmetic is unpredictable and

More information

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points Representing and Manipulating Floating Points Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE23: Introduction to Computer Systems, Spring 218,

More information

Computer Arithmetic Floating Point

Computer Arithmetic Floating Point Computer Arithmetic Floating Point Chapter 3.6 EEC7 FQ 25 About Floating Point Arithmetic Arithmetic basic operations on floating point numbers are: Add, Subtract, Multiply, Divide Transcendental operations

More information

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued CS 6210 Fall 2016 Bei Wang Lecture 4 Floating Point Systems Continued Take home message 1. Floating point rounding 2. Rounding unit 3. 64 bit word: double precision (IEEE standard word) 4. Exact rounding

More information

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers Real Numbers We have been studying integer arithmetic up to this point. We have discovered that a standard computer can represent a finite subset of the infinite set of integers. The range is determined

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Computer Systems, Section 2.4 Abstraction Anything that is not an integer can be thought of as . e.g. 391.1356 Or can be thought of as + /

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

In this lesson you will learn: how to add and multiply positive binary integers how to work with signed binary numbers using two s complement how fixed and floating point numbers are used to represent

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective A Programmer's Perspective Representing Numbers Gal A. Kaminka galk@cs.biu.ac.il Fractional Binary Numbers 2 i 2 i 1 4 2 1 b i b i 1 b 2 b 1 b 0. b 1 b 2 b 3 b j 1/2 1/4 1/8 Representation Bits to right

More information

The Perils of Floating Point

The Perils of Floating Point The Perils of Floating Point by Bruce M. Bush Copyright (c) 1996 Lahey Computer Systems, Inc. Permission to copy is granted with acknowledgement of the source. Many great engineering and scientific advances

More information

Floating Point Numbers

Floating Point Numbers Floating Point Floating Point Numbers Mathematical background: tional binary numbers Representation on computers: IEEE floating point standard Rounding, addition, multiplication Kai Shen 1 2 Fractional

More information

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j ENEE 350 c C. B. Silio, Jan., 2010 FLOATING POINT REPRESENTATIONS It is assumed that the student is familiar with the discussion in Appendix B of the text by A. Tanenbaum, Structured Computer Organization,

More information

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with

More information

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Signed Multiplication Multiply the positives Negate result if signs of operand are different Another Improvement Save on space: Put multiplier in product saves on speed: only single shift needed Figure: Improved hardware for multiplication Signed Multiplication Multiply the positives Negate result

More information

15213 Recitation 2: Floating Point

15213 Recitation 2: Floating Point 15213 Recitation 2: Floating Point 1 Introduction This handout will introduce and test your knowledge of the floating point representation of real numbers, as defined by the IEEE standard. This information

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Floating Point Arithmetic Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Floating Point (1) Representation for non-integral numbers Including very

More information

Number Systems. Both numbers are positive

Number Systems. Both numbers are positive Number Systems Range of Numbers and Overflow When arithmetic operation such as Addition, Subtraction, Multiplication and Division are performed on numbers the results generated may exceed the range of

More information

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column 1's column 10's column 100's column 1000's column 1's column 2's column 4's column 8's column Number Systems Decimal numbers 5374 10 = Binary numbers 1101 2 = Chapter 1 1's column 10's column 100's

More information

Module 12 Floating-Point Considerations

Module 12 Floating-Point Considerations GPU Teaching Kit Accelerated Computing Module 12 Floating-Point Considerations Lecture 12.1 - Floating-Point Precision and Accuracy Objective To understand the fundamentals of floating-point representation

More information