IEEE Standard for Floating Point Numbers
|
|
- Ariel Fleming
- 6 years ago
- Views:
Transcription
1 GENERAL ARTICLE IEEE Standard for Floating Point Numbers VRajaraman Floating point numbers are an important data type in computation which is used extensively. Yet, many users do not know the standard which is used in almost all computer hardware to store and process these. In this article, we explain the standards evolved by The Institute of Electrical and Electronic Engineers in 1985 and augmented in 2008 to represent floating point numbers and process them. This standard is now used by all computer manufacturers while designing floating point arithmetic units so that programs are portable among computers. Introduction There are two types of arithmetic which are performed in computers: integer arithmetic and real arithmetic. Integer arithmetic is simple. A decimal number is converted to its binary equivalent andarithmeticisperformedusingbinaryarithmeticoperations. Thelargestpositiveintegerthatmaybestoredinan8-bitbyteis +127,if1 bitisusedforsign. If16 bitsareused,thelargest positiveintegeris+32767andwith32bits,itis , quite large! Integers are used mostly for counting. Most scientific computations are however performed using real numbers, that is, numbers with a fractional part. In order to represent real numbers incomputers,wehavetoasktwoquestions. Thefirstistodecide howmanybitsareneededtoencoderealnumbersandthesecond is to decide how to represent real numbers using these bits. Normally, in numerical computation in science and engineering, onewouldneedatleast7to8significantdigitsprecision.thus, thenumberofbitsneededtoencode8decimaldigitsisapproximately26,as log 2 10 = 3.32bits are needed onthe averageto encodeadigit.incomputers,numbersarestoredasasequenceof 8-bitbytes. Thus32bits(4bytes)whichisbiggerthan26bitsis alogicalsizetouseforrealnumbers. Given32bitstoencodereal VRajaramanisatthe IndianInstituteofScience, Bengaluru. Several generations of scientists and engineers in India havelearntcomputer science using his lucidly writtentextbookson programming and computer fundamentals. Hiscurrentresearch interests are parallel computingandhistoryof computing. Keywords Floating point numbers, rounding, decimal floating point numbers, IEEE Standard. RESONANCE January
2 GENERAL ARTICLE Figure1. Fixedpointrepresentation of realnumbers in binaryusing 32 bits. numbers,thenextquestionishowtobreakitupintoaninteger partandafractionalpart. Onemethodistodividethe32bitsinto 2parts,oneparttorepresentanintegerpartofthenumberandthe otherthefractionalpartasshowninfigure1. Inthisfigure,wehavearbitrarilyfixedthe(virtual)binarypoint betweenbits7and8. Withthisrepresentation,calledfixedpoint representation,thelargestandthesmallestpositivebinarynumbersthatmayberepresentedusing32bitsare Binary Floating Point Numbers Astherangeofreal numbers representable with fixedpointisnot sufficient, normalized floating point is used to represent real numbers. This rangeofrealnumbers, whenfixedpoint representationis used, is not sufficient in many practical problems. Therefore, another representation called normalizedfloatingpointrepresentation is used for real numbers in computers. In this representation,32bitsaredividedintotwoparts:apartcalledthemantissa withitssignandtheothercalledtheexponentwithitssign.the mantissa represents fractions with a non-zero leading bit and the exponentthepowerof2bywhichthemantissaismultiplied. This method increases the range of numbers that may be represented using32bits.inthismethod,abinaryfloatingpointnumberis represented by (sign) mantissa 2 ±exponent wherethesignisonebit,themantissaisabinaryfractionwitha non-zeroleadingbit,andtheexponentisabinaryinteger. If32 12 RESONANCE January 2016
3 GENERAL ARTICLE bits are available to store floating point numbers, we have to decide the following: 1.Howmanybitsaretobeusedtorepresentthemantissa(1bit isreservedforthesignofthenumber). 2.Howmanybitsaretobeusedfortheexponent. 3.Howtorepresentthesignoftheexponent. Thenumberofbitstobeusedforthemantissaisdeterminedby the number of significant decimal digits required in computation, which is at least seven. Based on experience in numerical computation, it is found that at least seven significant decimal digits are needed if results are expected without too much error. The number of bits required to represent seven decimal digits is approximately 23. The remaining 8 bits are allocated for the exponent. The exponent may be represented in sign magnitude form. In this case, 1 bit is used for sign and 7 bits for the magnitude. The exponent will range from 127 to The only disadvantage to this method is that there are two representationsfor0exponent:+0and 0. Toavoidthis,onemayusean excess representation or biased format to represent the exponent. Theexponenthasnosigninthisformat. Therange0to255ofan 8-bitnumberisdividedintotwoparts0to127and128to255as shown below: Thenumberofbits tobeusedforthe mantissa is determined by the number of significant decimal digits required in computation, whichisatleast seven. All bit strings from 0 to 126 are considered negative, exponent 127 represents 0, and values greater than 127 are positive. Thus therangeofexponentsis 127to+128. Givenanexponentstring exp,thevalueoftheexponentwillbetakenas(exp 127).The main advantage of this format is a unique representation for exponent zero. With the representation of binary floating point numbers explained above, the largest floating point number which can be represented is The main advantage of biased exponent format is unique representation for exponent 0. RESONANCE January
4 GENERAL ARTICLE bits =( ) The smallest floating point number is bits Example. Represent in 32-bit binary floating point format = = Normalized23bitmantissa= Asexcessrepresentationisbeingusedforexponent,itisequalto 127+6=133. Thustherepresentationis = = The32-bitstringusedtostore inacomputerwillthusbe Here, the most significant bit is the sign, the next 8 bits the exponent,andthelast23bitsthemantissa. Institute of Electrical and Electronics Engineers standardized how floating point binary numbers would be represented in computers, how these would be rounded, how0and wouldbe represented and how totreatexception condition such as attempttodivideby0. IEEEFloatingPointStandard Floatingpointbinarynumberswerebeginningtobeusedinthe mid50s.therewasnouniformityintheformatsusedtorepresent floating point numbers and programs were not portable from one manufacturer s computer to another. By the mid 1980s, with the advent of personal computers, the number of bits used to store floatingpointnumberswasstandardizedas32bits. AStandards CommitteewasformedbytheInstituteof ElectricalandElectronics Engineers to standardize how floating point binary numbers would be represented in computers. In addition, the standard specified uniformity in rounding numbers, treating exception conditionssuchasattempttodivideby0,andrepresentationof0 and infinity( This standard, calledieee Standard 754 for 14 RESONANCE January 2016
5 GENERAL ARTICLE floating point numbers, was adopted in 1985 by all computer manufacturers. It allowed porting of programs from one computer to another without the answers being different. This standard defined floating point formats for 32-bit and 64-bit numbers. With improvement in computer technology it became feasible to usealargernumberof bitsforfloatingpointnumbers.afterthe standard was used for a number of years, many improvements were suggested. The standard was updated in The current standard is IEEE version. This version retained all the features of the 1985 standard and introduced new standards for 16 and128-bit numbers. It also introduced standards for representing decimal floating point numbers. We will describe these standards laterinthisarticle. Inthissection,wewilldescribeIEEE Standard. The current standard is IEEE version. This version retained all the features of the 1985 standard and introduced new standards for 16-bit and128-bit numbers. It also introduced standards for representing decimal floating point numbers. IEEE floating point representation for binary real numbers consists of three parts. For a 32-bit(called single precision) number, they are: 1.Sign,forwhich1bitisallocated. 2. Mantissa(called significand in the standard) is allocated 23 bits. 3. Exponent is allocated 8 bits. As both positive and negative numbers are required for the exponent, instead of using a separate sign bit for the exponent, the standard uses a biased representation. The value of the bias is 127. Thus an exponent 0meansthat 127isstoredintheexponentfield. Astored value198meansthattheexponentvalueis( )=71. Theexponents 127(all0s)and+128(all1s)arereservedfor representing special numbers which we discuss later. To increase the precision of the significand, the IEEE 754 Standard uses a normalized significand which implies that its most significantbitisalways1. Asthisisimplied,itisassumedtobe on the left of the(virtual) decimal point of the significand. Thus intheieeestandard,thesignificandis24bitslong 23bitsof thesignificandwhichisstoredinthememoryandanimplied1as the most significant 24th bit. The extra bit increases the number RESONANCE January
6 GENERAL ARTICLE Figure 2. IEEE 754 representation of 32-bit floating point number. of significant digits in the significand. Thus, a floating point numberintheieeestandardis ( 1) s (1.f) 2 2 exponent 127 wheres isthesignbit; s=0isusedforpositivenumbersands =1forrepresentingnegativenumbers. frepresentsthebitsinthe significand. Observe that the implied 1 as the most significant bit of the significand is explicitly shown for clarity. For a32-bitwordmachine,theallocationofbitsfor afloating pointnumberaregiveninfigure2.observethattheexponentis placed before the significand(see Box 1). Example. Represent inIEEE bitfloatingpoint format = = Normalized significand = Exponent: (e 127)=5or,e=132. ThebitrepresentationinIEEEformatis Box1. WhydoesIEEE754standardusebiasedexponentrepresentationandplaceitbeforethesignificand? Computers normally have separate arithmetic units to compute with integer operands and floating point operands. TheyarecalledFPU(FloatingPointUnit)andIU(IntegerUnit). AnIUworksfasterandischeaper tobuild.inmostrecentcomputers,thereareseveraliusandasmallernumberoffpus.itispreferabletouse IUswheneverpossible.ByplacingthesignbitandbiasedexponentbitsasshowninFigure2,operationssuch as x<r.o.>y,wherer.o.isarelationaloperator,namely,>,<,,andx,yarereals,canbecarriedoutbyinteger arithmetic units as the exponents are integers. Sorting real numbers may also be carried out by IUs as biased exponentarethemostsignificantbits.thisensureslexicographicalorderingofrealnumbers;thesignificand neednotbeconsideredasafraction. 16 RESONANCE January 2016
7 GENERAL ARTICLE SpecialValuesinIEEE Standard(32Bits) RepresentationofZero: Asthesignificandisassumedtohavea hidden1asthemostsignificantbit,all0sinthesignificandpart ofthenumberwillbetakenas Thuszeroisrepresented intheieeestandardbyall0sfortheexponentandall0sforthe significand. All0sfortheexponentisnotallowedtobeusedfor anyothernumber. Ifthesignbitis0andalltheotherbits0,the numberis+0. Ifthesignbitis1andalltheotherbits0,itis 0. Even though +0 and 0 have distinct representations they are assumed equal. Thus the representations of +0 and 0 are: As the significand is assumedtohavea hidden1asthemost significantbit,all0s in the significand part ofthenumberwillbe takenas Thuszerois representedinthe IEEE Standard by all 0s for the exponent andall0sforthe significand. Representation of Infinity: All 1s in the exponent field is assumedtorepresentinfinity( Asignbit0represents+ and asignbit1represents. Thustherepresentationsof+ and are: Representation of Non Numbers: When an operation is performedbyacomputeronapairofoperands,theresultmaynotbe mathematically defined. For example, if zero is divided by zero, All1sinthe exponent field is assumed to representinfinity( ). RESONANCE January
8 GENERAL ARTICLE When an arithmetic operation is performed on two numbers which results in an indeterminate answer,itiscalled NaN(NotaNumber) in IEEE Standard. theresultisindeterminate. SucharesultiscalledNotaNumber (NaN) in the IEEE Standard. In fact the IEEE Standard defines twotypesofnan.whentheresultofanoperationisnotdefined (i.e., indeterminate) it is called a Quiet NaN(QNaN). Examples are:0/0,( ),.QuietNaNsarenormallycarriedoverinthe computation. The other type of NaN is called a Signalling Nan (SNaN).Thisisusedtogiveanerrormessage.Whenanoperation leadstoafloatingpointunderflow,i.e.,theresultofacomputation issmallerthanthesmallestnumberthatcanbestoredasafloating pointnumber,ortheresultisanoverflow,i.e.,itislargerthanthe largestnumberthatcanbestored, SNaNisused. Whennovalid valueisstoredina variablename(i.e., itisundefined)andan attemptismadetouseitinanarithmeticoperation,snanwould result.qnanisrepresentedby0or1asthesignbit,all1sas exponent,anda0astheleft-mostbitofthesignificandandatleast one1intherestofthesignificand. SNaNisrepresentedby0or 1asthesignbit,all1sasexponent,anda1astheleft-mostbitof thesignificandandanystringofbitsfortheremaining22bits.we give below the representations of QNaN and SNaN. QNaN 0or Themostsignificantbitofthesignificandis0.Thereisatleast one1intherestofthesignificand. SNaN 0 or The most significant bit of the significand is 1. Any combination ofbitsisallowedfortheotherbitsofthesignificand. 18 RESONANCE January 2016
9 GENERAL ARTICLE LargestandSmallestPositiveFloatingPointNumbers: Largest Positive Number Significand: =1+( )= Exponent: ( )=127. LargestNumber = ( ) Iftheresultofacomputationexceedsthelargestnumberthatcan bestoredinthecomputer,thenitiscalledanoverflow. SmallestPositiveNumber Significand = 1.0. Exponent=1 127= 126. Thesmallestnormalizednumberis= SubnormalNumbers:Whenalltheexponentbitsare0andthe leading hidden bit of the siginificand is 0, then the floating point number is called a subnormalnumber.thus, one logical representationofasubnormalnumberis ( 1) s 0.f (all0sfortheexponent), where f hasatleastone1(otherwisethenumberwillbetakenas 0).However,thestandarduses 126,i.e.,bias+1fortheexponent ratherthan 127whichisthebiasforsomenotsoobviousreason, possiblybecausebyusing 126insteadof 127,thegapbetween the largest subnormal number and the smallest normalized number is smaller. Thelargestsubnormalnumberis Itisclose tothesmallestnormalizednumber Whenallthe exponent bits are 0 and the leading hiddenbitofthe siginificand is 0, then thefloatingpoint numberiscalleda subnormal number. RESONANCE January
10 GENERAL ARTICLE By using subnormal numbers, underflow whichmayoccurin some calculations are gradual. Thesmallestpositivesubnormalnumberis thevalueofwhichis = Aresultthatissmallerthanthesmallestnumberthatcanbestored inacomputeriscalledanunderflow. You may wonder why subnormal numbers are allowed in the IEEE Standard. By using subnormal numbers, underflow which may occur in some calculations are gradual. Also, the smallest numberthatcouldberepresentedinamachineisclosertozero. (SeeBox2.) Box2. MachineEpsilonandDwarf In any of the formats for representing floating point numbers, machine epsilon is defined as the difference between1andthenextlargernumberthatcanbestoredinthatformat. Forexample,in32-bitIEEEformatwith a23-bitsignificand,themachineepsilonis2 23 = Thisessentiallytellsusthattheprecisionof decimalnumbersstoredinthisformatis7digits.thetermprecisionandaccuracyarenotthesame.accuracy impliescorrectnesswhereasprecisiondoesnot.for64-bitrepresentationofieeefloatingpointnumbersthe significandlengthis52bits.thus,machineepsilonis2 52 = Therefore,decimalcalculationswith 64 bits give 16 digit precision. This is a conservative definition used by industry. In MATLAB, machine epsilon is as defined above. However, academics define machine epsilon as the upper bound of the relative error when numbersarerounded. Thus,for32-bitfloatingpointnumbers,themachineepsilonis2 23 / The machine epsilon is useful in iterative computation. When two successive iterates differ by less than epsilon onemayassumethattheiterationhasconverged.theieeestandarddoesnotdefinemachineepsilon. Tinyordwarfistheminimumsubnormalnumberthatcanberepresentedusingthespecifiedfloatingpoint format. Thus, for IEEE 32-bit format, it is whosevalueis: = Anynumberlessthanthiswillsignalunderflow. Theadvantage ofusingsubnormalnumbersisthatunderflowisgradualaswasstatedearlier. 20 RESONANCE January 2016
11 GENERAL ARTICLE Whenamathematicaloperationsuchasadd,subtract,multiply, or divide is performed with two floating point numbers, the significand of the result may exceed 23 bits after the adjustment oftheexponent.insuchacase,therearetwoalternatives. Oneis totruncatetheresult,i.e.,ignoreallbitsbeyondthe32ndbit. The other is to round the result to the nearest significand. For example,ifasignificand is andtheoverflowbitis1, aroundedvaluewouldbe Inotherwords,ifthefirst bitwhichistruncatedis1,add1totheleastsignificantbit,else ignorethebits. Thisiscalledroundingupwards.Iftheextrabits areignored,itiscalledroundingdownwards. TheIEEEStandard suggests to hardware designers to get the best possible result whileperformingarithmeticthatarereproducibleacrossdifferent manufacturer s computers using the standard. The Standard suggeststhatintheequation c=a <op>b, where a andb are operands and<op> an arithmetic operation, the resultcshouldbeasifitwascomputedexactlyandthenrounded. Thisiscalledcorrectrounding. InTables1and2,wesummarisethediscussionsinthissection. Table 1. IEEE floatingpointstandard.weusef to represent the significand, eto representtheexponent, b the bias of the exponent, and±forthesign. Value Sign Exponent(8bits) Significand(23bits) (all23bits0) (all23bits0) +1.f 2 (e b) to aa.aa (a=0or1) eexponent,bbias f 2 (e b) to aa.aa (a=0or1) (all23bits0) (all23bits0) SNaN 0or to leadingbit (atleastone1in therest) QNaN 0or leadingbit1 Positive subnormal to (atleastone1) 0.f 2 x+1-b (x is the number of leading 0s in significand) RESONANCE January
12 GENERAL ARTICLE Table2. Operations on special numbers. All NaNs are Quiet NaNs. Operation Result Operation Result n/± 0 ±0/±0 NaN ± /± ± NaN ±n/0 ± ± /± NaN + ± 0 NaN IEEE 754 Floating Point 64-Bit Standard 1985 With64bits available to store floating point numbers, there is more freedom to increase the significant digits in the significand and increase the range by allocating more bits to the exponent. The standard allocates 1 bit for sign,11bitsforthe exponent, and 52 bitsforthe significand. TheIEEE754floatingpoint standardfor64-bit(calleddouble precision) numbers is very similar to the 32-bit standard. The main differenceis theallocation bits for theexponent and the significand. With64bitsavailabletostorefloatingpointnumbers, there is more freedom to increase the significant digits in the significandandincreasetherangebyallocatingmorebitstothe exponent. Thestandardallocates1bitforsign,11bitsforthe exponent,and52bitsforthesignificand. Theexponentusesa biasedrepresentationwithabiasof1023.therepresentationis shown below: 1 bit 11 bits 52 bits Anumber inthisstandardisthus ( 1) s (1.f) 2 (exponent 1023), wheres=±1,and f isthevalueofthesignificand. The largest positive number which can be represented in this standard is bit 11bits 52bits whichis=( ) 2 ( ) =( ) Thesmallest positivenumberthatcanberepresentedusing64 22 RESONANCE January 2016
13 GENERAL ARTICLE bits using this standard is bit 11bits 52bits whosevalueis = Thedefinitionsof±0,±,QNaN,SNaNremainthesameexcept thatthenumberofbitsintheexponentandsignificandarenow11 and52respectively.subnormalnumbershaveasimilardefinition asin32-bitstandard. IEEE 2008 Standard has introduced standards for 16-bit and 128-bit floating point numbers. It has also introduced standards for floating point decimal numbers. IEEE754FloatingPointStandard 2008 This standard was introduced to enhance the scope of IEEE 754 floatingpoint standardof1985. Theenhancements wereintroduced to meet the demands of three sets of professionals: 1. Those working in the graphics area. 2. Those who use high performance computers for numeric intensive computations. 3. Professionals carrying out financial transactions who require exact decimal computation. This standard has not changed the format of 32- and 64-bit numbers. It has introduced floating point numbers which are 16 and 128 bits long. These are respectively called half and quadruple precision. Besides these, it has also introduced standards for floating point decimal numbers. The16-BitStandard The 16-bit format for real numbers was introduced for pixel storage in graphics and not for computation. (Some graphics processingunitsusethe16-bitformatforcomputation.) The16- bitformatisshowninfigure3. The 16-bit format for real numbers was introduced for pixel storage in graphics and not forcomputation. Figure3. Representation of 16-bitfloatingpointnumbers. RESONANCE January
14 GENERAL ARTICLE With16-bitrepresentation,5bitsareusedfortheexponentand10 bitsforthesignificand. Biasedexponentisusedwithabiasof15. Thustheexponentrangeis: 14to+15.Rememberthatall0sand all1sfortheexponentarereservedtorepresent0and respectively. Themaximumpositiveintegerthatcanbestoredis =1+( ) 2 15 =( ) Theminimumpositivenumberis =2 14 = The minimum subnormal 16-bit floating point number is With improvements in computer technology, using128bitsto represent floating point numbers has become feasible. This is sometimesusedin numeric-intensive computation, where large rounding errors may occur. Thedefinitionsof±0,±,NaN,andSNaNfollowthesameideas asin32-bitformat. The 128-Bit Standard With improvements in computer technology, using 128 bits to represent floating point numbers has become feasible. This is sometimes used in numeric-intensive computation, where large roundingerrorsmayoccur.inthisstandard,besidesasignbit,14 bits are used for the exponent, and 113 bits are used for the significand. The representation is shown in Figure 4. Anumberinthisstandardis ( 1) s (1.f) 2 (exponent 16384). Figure 4.Representation of 128-bitfloatingpointnumber. 24 RESONANCE January 2016
15 GENERAL ARTICLE The largest positive number which can be represented in this format is: bit 14bits 113bits whichequals ( ) = Thesmallest normalizedpositive number whichcanberepresentedis The main motivation for introducing a decimal standard is thefactthata terminating decimal fraction need not give aterminatingbinary fraction = A subnormal number is represented by ( 1) s 0.f Thesmallestpositivesubnormalnumberis = Thedefinitionsof±0,±,andQNaN,andSNaNarethesameas inthe1985standardexceptfortheincreaseofbitsintheexponent and significand. There are other minor changes which we will not discuss. The Decimal Standard The main motivation for introducing a decimal standard is the factthataterminatingdecimalfractionneednotgiveaterminatingbinaryfraction. Forexample,(0.1) 10 = ( (0011) recurring) 2.Ifonecomputes usingbinaryarithmetic and rounding the non-terminating binary fraction up to the nextlargernumber,theansweris: insteadof This is unacceptable in many situations, particularly in financial transactions. There was a demand from professionals carrying out financial transactions to introduce a standard for representing decimal floating point numbers. Decimal floating point numbers are not new. They were available in COBOL. There was, however, no standard and this resulted in non-portable programs. IEEE hasthus introduceda standardfor Therewasademand from professionals carrying out financial transactionsto introduce a standard for representing decimal floating point numbers. RESONANCE January
16 GENERAL ARTICLE decimal floating point numbers to facilitate portability of programs. Decimal Representation The idea of encoding decimal numbers using bits is simple. For example, to represent 0.1 encoded(not converted) to binary, it is writtenas Theexponentinsteadofbeinginterpretedas a power of 2 is now interpreted as a power of 10. Both the significand and the exponent are now integers and their binary representations have a finite number of bits. One method of representing usingthisideaina32-bitwordisshownin Figure 5. Wehaveassumedan8-bitbinaryexponentinbiasedform.Inthis representation, there is no hidden 1 in the significand. The significand is an integer.(fractions are represented by using an appropriate power of 10 in the exponent field.) For example, willbewrittenas and753658,an integer, will be converted to binary and stored as the significand. Thepowerof10,namely 3,willbeanintegerintheexponent part. The IEEE Standard does not use this simple representation. This isduetothefactthatthelargestdecimalsignificandthatmaybe representedusingthismethodis It is preferable to obtain 7 significant digits, i.e., significand up to TheIEEEStandardachievesthisbyborrowingsome bits from the exponent field and reducing the range of the exponent. In other words, instead of maximum exponent of 128, it isreducedto96andthebitssavedareusedtoincreasethesignificant digitsinthesignificand.thecodingofbitsisasshowninfigure6. Figure5. Representation of decimal floating point numbers. Figure 6. Representation of 32-bitfloatingpointnumbers inieee2008standard. 26 RESONANCE January 2016
17 GENERAL ARTICLE Observe the difference in terminology for exponent and significand fields. In this coding, one part of the five combination bits is interpreted as exponent and the other part as an extension of significand bits using an ingenious method. This will be explained later in this section. IEEE2008Standarddefinestwoformatsforrepresentingdecimal floating point numbers. One of them converts the binary significandfieldtoadecimalintegerbetween0and10 p 1 where pisthenumberofsignificantbitsinthesignificand.theother, called densely packed decimal form, encodes the bits in the significand field directly to decimal. Both methods represent decimal floating point numbers as shown below. Decimalfloatingpointnumber=( 1) s f 10 exp bias,wheres is the sign bit, f is a decimal fraction(significand), exp and biasare decimal numbers. The fraction need not be normalized but is usually normalized with a non-zero most significant digit. There isnohiddenbitasinthebinarystandard. Thestandarddefines decimal floating point formats for 32-bit, 64-bit and 128-bit numbers. IEEE 2008 Standard defines two formats for representing decimal floating point numbers. One of themconverts the binary significand fieldtoadecimal integer between 0 and10 p 1 wherepis the number of significantbitsinthe significand. The other, called densely packed decimal form, encodes thebitsin thesignificandfield directly to decimal. Densely Packed Decimal Significand Representation Normally,whendecimalnumbersarecodedtobinary,4bitsare required to represent each digit, as 3 bits give only 8 combinations that arenotsufficienttorepresent 10 digits. Four bits give16 combinations out of which we need only 10 combinations to representdecimalnumbersfrom0to9.thus,wewaste6combinationsoutof16. Fora32-bitnumber,if1bitisusedforsign,8 bitsfortheexponentand23bitsforthesignificand,themaximum positivevalueof thesignificandwillbe Theexponent rangewillbe00to99. Withabias of50,themaximum positivedecimalnumberwillbe The number of significand digits in the above coding is not sufficient. A clever coding scheme for decimal encoding of binary numbers was discovered by Cowlinshaw. This method encodes10bitsas3decimaldigits(rememberthat2 10 =1024and RESONANCE January
18 GENERAL ARTICLE onecanencode000to999usingtheavailable1024combinations of10bits). IEEE754Standardusesthiscodingscheme. Wewill now describe the standard using this coding method for 32-bit numbers. The standard defines the following: 1. Asignbit0for+and1for. 2. Acombinationfieldof5bits.Itiscalledacombinationfieldas onepartofitisusedfortheexponentandanotherpartforthe significand. 3. Anexponentcontinuationfieldthatisusedfortheexponent. 4. Acoefficientfieldthatencodesstringsof10bitsas3digits. Thus, 20 bits are encoded as 6 digits. This is part of the significand field. For32-bitIEEEword,thedistributionofbitswasgiveninFigure 6 which is reproduced below for ready reference. Sign Combinationfield Exponent continuation field Coefficient field 1bit 5bits 6bits 20bits The 5 bits of the combination field are used to increase the coefficient field by 1 digit. It uses the first ten 4-bit combinations of16possible4-bitcombinationstorepresentthis. Asthereare 32combinationsof5bits,outofwhichonly10areneededtoextend the coefficient digits, the remaining 22-bit combinations are availabletoincreasetheexponentrangeandalsoencode andnan.this isdonebyaningeniousmethodexplainedintable3. Combination bits Type Most significant bits of exponent Most significant digit of coefficient Table 3. Useofcombination bits in IEEE Decimal Standard. b 1 b 2 b 3 b 4 b 5 xyabc 11abc Finite Finite NaN xy ab 0abc 100c 28 RESONANCE January 2016
19 GENERAL ARTICLE Sign bit 1 1 Combination field(bits) 5 5 Exponentcontinuationfield(bits) 8 12 Coefficient field(bits) Numberofbitsinnumber Significant digits Exponentrange(decimal) Exponent bias(decimal) Table 4. Decimalrepresentation for 64-bit and 128-bit numbers. Thus, the most significant 2 bits of the exponent are constrained to take on values 00, 01, 10. Using the 6 bits of exponent combinationbits,wegetanexponentrangeof3 2 6 =192.The exponentbiasis96.(rememberthattheexponentisaninteger thatcanbeexactlyrepresentedinbinary.) Withtheadditionof1 significant digit to the 6 digits of the coefficient bits, the decimal representation of 32-bit numbers has 7 significant digits. The largestnumberthatcanbestoredis Asimilar methodisusedtorepresent64-and128-bitfloatingpointnumbers. ThisisshowninTable4. In the IEEE 2008 Decimal Standard, the combination bits along with exponent bits are used to represent NaN, signaling NaN and ± asshownintable5. Quantity Sign bit Combination bits b 0 b 1 b 2 b 3 b 4 b 5 b 6 SNaN 0or QNaN 0or x x x x (xis0or1) Table 5. Representation of NaNs,±,and±0. RESONANCE January
20 GENERAL ARTICLE Binary Integer Representation of Significand In this method, as we stated earlier, if there are p bits in the extendedcoefficient fieldconstituting thesignificand, they are converted as a decimal integer. This representation also uses the bitsofthecombinationfieldtoincreasethenumberofsignificant digits.theideaissimilarandwewillnotrepeatit.therepresentation of NaNs and ± is the same as in the densely packed format. The major advantage of this representation is the simplicity of performing arithmetic using hardware arithmetic units designed to perform arithmetic using binary numbers. The disadvantage is the time taken for conversion from integer to decimal. The densely packed decimal format allows simple encoding of binary to decimal but needs a specially designed decimal arithmeticunit. Acknowledgements IthankProf.PCPBhatt,Prof.CRMuthukrishnan,andProf.S KNandyforreadingthefirstdraftofthisarticleandsuggesting improvements. Suggested Reading Address for Correspondence V Rajaraman Supercomputer Education and Research Centre Indian Institute of Science Bengaluru , India. rajaram@serc.iisc.in [1] David Goldberg,Whatevery computer scientist should know about floatingpoint arithmetic, ACM Computing Surveys,Vol.28,No.1, pp5 48,March1991. [2] V Rajaraman, ComputerOrientedNumericalMethods, 3rd Edition, PHI Learning,NewDelhi,pp.15 32,2013. [3] Marc Cowlinshaw, Denselypacked decimal encoding, IEEE Proceedings Computer and Digital Techniques, Vol.149,No.3,pp , May [4] Steve Hollasch, IEEE Standard 754 for floatingpoint numbers, steve.hollasch.net/cgindex/coding/ieeefloat.html [5] Decimal Floating Point, Wikipedia, en.wikipedia.org/wiki/ Decimal_floating_point 30 RESONANCE January 2016
CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI 3.5 Today s Contents Floating point numbers: 2.5, 10.1, 100.2, etc.. How
More informationThe Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.
IEEE 754 Standard - Overview Frozen Content Modified by on 13-Sep-2017 Before discussing the actual WB_FPU - Wishbone Floating Point Unit peripheral in detail, it is worth spending some time to look at
More informationFinite arithmetic and error analysis
Finite arithmetic and error analysis Escuela de Ingeniería Informática de Oviedo (Dpto de Matemáticas-UniOvi) Numerical Computation Finite arithmetic and error analysis 1 / 45 Outline 1 Number representation:
More informationIEEE Standard 754 Floating Point Numbers
IEEE Standard 754 Floating Point Numbers Steve Hollasch / Last update 2005-Feb-24 IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based
More informationNumeric Encodings Prof. James L. Frankel Harvard University
Numeric Encodings Prof. James L. Frankel Harvard University Version of 10:19 PM 12-Sep-2017 Copyright 2017, 2016 James L. Frankel. All rights reserved. Representation of Positive & Negative Integral and
More informationFloating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.
Floating-point Arithmetic Reading: pp. 312-328 Floating-Point Representation Non-scientific floating point numbers: A non-integer can be represented as: 2 4 2 3 2 2 2 1 2 0.2-1 2-2 2-3 2-4 where you sum
More informationFloating Point Arithmetic
Floating Point Arithmetic CS 365 Floating-Point What can be represented in N bits? Unsigned 0 to 2 N 2s Complement -2 N-1 to 2 N-1-1 But, what about? very large numbers? 9,349,398,989,787,762,244,859,087,678
More informationFloating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)
Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
More informationFloating Point. The World is Not Just Integers. Programming languages support numbers with fraction
1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in
More informationCOMP2611: Computer Organization. Data Representation
COMP2611: Computer Organization Comp2611 Fall 2015 2 1. Binary numbers and 2 s Complement Numbers 3 Bits: are the basis for binary number representation in digital computers What you will learn here: How
More informationChapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation
Chapter 2 Float Point Arithmetic Topics IEEE Floating Point Standard Fractional Binary Numbers Rounding Floating Point Operations Mathematical properties Real Numbers in Decimal Notation Representation
More informationCS 261 Fall Floating-Point Numbers. Mike Lam, Professor. https://xkcd.com/217/
CS 261 Fall 2017 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now
More informationFloating Point Representation in Computers
Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong What are Floating Point Numbers? Any
More informationCS 261 Fall Floating-Point Numbers. Mike Lam, Professor.
CS 261 Fall 2018 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now
More informationFloating Point Numbers
Floating Point Numbers Summer 8 Fractional numbers Fractional numbers fixed point Floating point numbers the IEEE 7 floating point standard Floating point operations Rounding modes CMPE Summer 8 Slides
More informationFloating Point Numbers
Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran
More informationFloating Point Numbers
Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Fractions in Binary Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU)
More informationFloating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !
Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating
More informationData Representation Floating Point
Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:
More informationFloating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754
Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that
More informationFloating-Point Numbers in Digital Computers
POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored
More informationFloating Point January 24, 2008
15-213 The course that gives CMU its Zip! Floating Point January 24, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties class04.ppt 15-213, S 08 Floating
More informationData Representation Floating Point
Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:
More informationChapter 2. Data Representation in Computer Systems
Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 11: Floating Point & Floating Point Addition Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Single Precision Format
More informationIEEE-754 floating-point
IEEE-754 floating-point Real and floating-point numbers Real numbers R form a continuum - Rational numbers are a subset of the reals - Some numbers are irrational, e.g. π Floating-point numbers are an
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Logistics Notes for 2016-09-07 1. We are still at 50. If you are still waiting and are not interested in knowing if a slot frees up, let me know. 2. There is a correction to HW 1, problem 4; the condition
More informationFloating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.
class04.ppt 15-213 The course that gives CMU its Zip! Topics Floating Point Jan 22, 2004 IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Floating Point Puzzles For
More informationChapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers
Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers
More informationReal Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers
Real Numbers We have been studying integer arithmetic up to this point. We have discovered that a standard computer can represent a finite subset of the infinite set of integers. The range is determined
More informationFloating-Point Numbers in Digital Computers
POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored
More informationCPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS
CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:
More informationSystems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties
Systems I Floating Point Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties IEEE Floating Point IEEE Standard 754 Established in 1985 as uniform standard for
More informationCPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS
CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:
More informationTable : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (
Floating Point Numbers in Java by Michael L. Overton Virtually all modern computers follow the IEEE 2 floating point standard in their representation of floating point numbers. The Java programming language
More informationFloating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754
Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that
More informationFloating Point Numbers. Lecture 9 CAP
Floating Point Numbers Lecture 9 CAP 3103 06-16-2014 Review of Numbers Computers are made to deal with numbers What can we represent in N bits? 2 N things, and no more! They could be Unsigned integers:
More informationFoundations of Computer Systems
18-600 Foundations of Computer Systems Lecture 4: Floating Point Required Reading Assignment: Chapter 2 of CS:APP (3 rd edition) by Randy Bryant & Dave O Hallaron Assignments for This Week: Lab 1 18-600
More informationChapter Three. Arithmetic
Chapter Three 1 Arithmetic Where we've been: Performance (seconds, cycles, instructions) Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing
More informationGiving credit where credit is due
CSCE 230J Computer Organization Floating Point Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due Most of slides for this lecture are based
More informationFloating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.
Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time The machine model Fabián E. Bustamante, Spring 2010 IEEE Floating point Floating point
More informationChapter 2 Data Representations
Computer Engineering Chapter 2 Data Representations Hiroaki Kobayashi 4/21/2008 4/21/2008 1 Agenda in Chapter 2 Translation between binary numbers and decimal numbers Data Representations for Integers
More informationModule 2: Computer Arithmetic
Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N
More informationWhat we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation
Class Outline Computational Methods CMSC/AMSC/MAPL 460 Errors in data and computation Representing numbers in floating point Ramani Duraiswami, Dept. of Computer Science Computations should be as accurate
More informationCO212 Lecture 10: Arithmetic & Logical Unit
CO212 Lecture 10: Arithmetic & Logical Unit Shobhanjana Kalita, Dept. of CSE, Tezpur University Slides courtesy: Computer Architecture and Organization, 9 th Ed, W. Stallings Integer Representation For
More informationGiving credit where credit is due
JDEP 284H Foundations of Computer Systems Floating Point Dr. Steve Goddard goddard@cse.unl.edu Giving credit where credit is due Most of slides for this lecture are based on slides created by Drs. Bryant
More informationunused unused unused unused unused unused
BCD numbers. In some applications, such as in the financial industry, the errors that can creep in due to converting numbers back and forth between decimal and binary is unacceptable. For these applications
More informationINTEGER REPRESENTATIONS
INTEGER REPRESENTATIONS Unsigned Representation (B2U). Two s Complement (B2T). Signed Magnitude (B2S). Ones Complement (B2O). scheme size in bits (w) minimum value maximum value B2U 5 0 31 B2U 8 0 255
More informationSystem Programming CISC 360. Floating Point September 16, 2008
System Programming CISC 360 Floating Point September 16, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Powerpoint Lecture Notes for Computer Systems:
More informationComputational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Representing numbers in floating point and associated issues Ramani Duraiswami, Dept. of Computer Science Class Outline Computations should be as accurate and as
More informationCS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.
CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. 1 Part 1: Data Representation Our goal: revisit and re-establish fundamental of mathematics for the computer architecture course Overview: what are bits
More informationComputational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Representing numbers in floating point and associated issues Ramani Duraiswami, Dept. of Computer Science Class Outline Computations should be as accurate and as
More informationIEEE Standard for Floating-Point Arithmetic: 754
IEEE Standard for Floating-Point Arithmetic: 754 G.E. Antoniou G.E. Antoniou () IEEE Standard for Floating-Point Arithmetic: 754 1 / 34 Floating Point Standard: IEEE 754 1985/2008 Established in 1985 (2008)
More informationIEEE Floating Point Numbers Overview
COMP 40: Machine Structure and Assembly Language Programming (Fall 2015) IEEE Floating Point Numbers Overview Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah
More informationFloating Point. CSE 351 Autumn Instructor: Justin Hsia
Floating Point CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan Administrivia Lab
More informationFloating-Point Arithmetic
Floating-Point Arithmetic Raymond J. Spiteri Lecture Notes for CMPT 898: Numerical Software University of Saskatchewan January 9, 2013 Objectives Floating-point numbers Floating-point arithmetic Analysis
More informationData Representations & Arithmetic Operations
Data Representations & Arithmetic Operations Hiroaki Kobayashi 7/13/2011 7/13/2011 Computer Science 1 Agenda Translation between binary numbers and decimal numbers Data Representations for Integers Negative
More informationC NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.
C NUMERIC FORMATS Figure C-. Table C-. Listing C-. Overview The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports an
More informationComputer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point
Chapter 3 Part 2 Arithmetic for Computers Floating Point Floating Point Representation for non integral numbers Including very small and very large numbers 4,600,000,000 or 4.6 x 10 9 0.0000000000000000000000000166
More informationMAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic
MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic September 28, 2018 Lecture 1 September 28, 2018 1 / 25 Floating point arithmetic Computers use finite strings of binary digits to represent
More informationComputer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University
Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point
More informationRepresenting and Manipulating Floating Points
Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: September 18, 2017 at 12:48 CS429 Slideset 4: 1 Topics of this Slideset
More informationRepresenting and Manipulating Floating Points. Jo, Heeseung
Representing and Manipulating Floating Points Jo, Heeseung The Problem How to represent fractional values with finite number of bits? 0.1 0.612 3.14159265358979323846264338327950288... 2 Fractional Binary
More informationRepresenting and Manipulating Floating Points
Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with
More informationFloating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3
Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part
More informationFloating Point. CSE 351 Autumn Instructor: Justin Hsia
Floating Point CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan http://xkcd.com/571/
More informationEE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic
1 EE 109 Unit 19 IEEE 754 Floating Point Representation Floating Point Arithmetic 2 Floating Point Used to represent very small numbers (fractions) and very large numbers Avogadro s Number: +6.0247 * 10
More informationFloating-point representations
Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010
More informationFloating-point representations
Lecture 10 Floating-point representations Methods of representing real numbers (1) 1. Fixed-point number system limited range and/or limited precision results must be scaled 100101010 1111010 100101010.1111010
More informationfractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation
Floating Point Arithmetic fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation for example, fixed point number
More informationFloating Point. CSE 351 Autumn Instructor: Justin Hsia
Floating Point CSE 351 Autumn 2016 Instructor: Justin Hsia Teaching Assistants: Chris Ma Hunter Zahn John Kaltenbach Kevin Bi Sachin Mehta Suraj Bhat Thomas Neuman Waylon Huang Xi Liu Yufang Sun http://xkcd.com/899/
More informationEE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing
EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 4-B Floating-Point Arithmetic - II Spring 2017 Koren Part.4b.1 The IEEE Floating-Point Standard Four formats for floating-point
More informationFloating-point representation
Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,
More informationCS101 Lecture 04: Binary Arithmetic
CS101 Lecture 04: Binary Arithmetic Binary Number Addition Two s complement encoding Briefly: real number representation Aaron Stevens (azs@bu.edu) 25 January 2013 What You ll Learn Today Counting in binary
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-B Floating-Point Arithmetic - II Israel Koren ECE666/Koren Part.4b.1 The IEEE Floating-Point
More informationCS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Byte-Oriented Memory Organization 00 0 FF F Programs refer to data by address
More informationScientific Computing. Error Analysis
ECE257 Numerical Methods and Scientific Computing Error Analysis Today s s class: Introduction to error analysis Approximations Round-Off Errors Introduction Error is the difference between the exact solution
More informationFLOATING POINT NUMBERS
Exponential Notation FLOATING POINT NUMBERS Englander Ch. 5 The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 10 0 123.4 x 10 1 12.34 x 10 2 1.234 x 10 3
More informationFloating Point Considerations
Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the
More informationMost nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1
Floating-Point Arithmetic Numerical Analysis uses floating-point arithmetic, but it is just one tool in numerical computation. There is an impression that floating point arithmetic is unpredictable and
More informationRepresenting and Manipulating Floating Points
Representing and Manipulating Floating Points Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE23: Introduction to Computer Systems, Spring 218,
More informationComputer Arithmetic Floating Point
Computer Arithmetic Floating Point Chapter 3.6 EEC7 FQ 25 About Floating Point Arithmetic Arithmetic basic operations on floating point numbers are: Add, Subtract, Multiply, Divide Transcendental operations
More informationCS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued
CS 6210 Fall 2016 Bei Wang Lecture 4 Floating Point Systems Continued Take home message 1. Floating point rounding 2. Rounding unit 3. 64 bit word: double precision (IEEE standard word) 4. Exact rounding
More informationReal Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers
Real Numbers We have been studying integer arithmetic up to this point. We have discovered that a standard computer can represent a finite subset of the infinite set of integers. The range is determined
More informationFloating Point Arithmetic
Floating Point Arithmetic Computer Systems, Section 2.4 Abstraction Anything that is not an integer can be thought of as . e.g. 391.1356 Or can be thought of as + /
More information2 Computation with Floating-Point Numbers
2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers
More informationIn this lesson you will learn: how to add and multiply positive binary integers how to work with signed binary numbers using two s complement how fixed and floating point numbers are used to represent
More informationComputer Organization: A Programmer's Perspective
A Programmer's Perspective Representing Numbers Gal A. Kaminka galk@cs.biu.ac.il Fractional Binary Numbers 2 i 2 i 1 4 2 1 b i b i 1 b 2 b 1 b 0. b 1 b 2 b 3 b j 1/2 1/4 1/8 Representation Bits to right
More informationThe Perils of Floating Point
The Perils of Floating Point by Bruce M. Bush Copyright (c) 1996 Lahey Computer Systems, Inc. Permission to copy is granted with acknowledgement of the source. Many great engineering and scientific advances
More informationFloating Point Numbers
Floating Point Floating Point Numbers Mathematical background: tional binary numbers Representation on computers: IEEE floating point standard Rounding, addition, multiplication Kai Shen 1 2 Fractional
More information±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j
ENEE 350 c C. B. Silio, Jan., 2010 FLOATING POINT REPRESENTATIONS It is assumed that the student is familiar with the discussion in Appendix B of the text by A. Tanenbaum, Structured Computer Organization,
More informationRepresenting and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University
Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with
More informationSigned Multiplication Multiply the positives Negate result if signs of operand are different
Another Improvement Save on space: Put multiplier in product saves on speed: only single shift needed Figure: Improved hardware for multiplication Signed Multiplication Multiply the positives Negate result
More information15213 Recitation 2: Floating Point
15213 Recitation 2: Floating Point 1 Introduction This handout will introduce and test your knowledge of the floating point representation of real numbers, as defined by the IEEE standard. This information
More information2 Computation with Floating-Point Numbers
2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers
More informationFloating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Floating Point Arithmetic Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Floating Point (1) Representation for non-integral numbers Including very
More informationNumber Systems. Both numbers are positive
Number Systems Range of Numbers and Overflow When arithmetic operation such as Addition, Subtraction, Multiplication and Division are performed on numbers the results generated may exceed the range of
More informationNumber Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column
1's column 10's column 100's column 1000's column 1's column 2's column 4's column 8's column Number Systems Decimal numbers 5374 10 = Binary numbers 1101 2 = Chapter 1 1's column 10's column 100's
More informationModule 12 Floating-Point Considerations
GPU Teaching Kit Accelerated Computing Module 12 Floating-Point Considerations Lecture 12.1 - Floating-Point Precision and Accuracy Objective To understand the fundamentals of floating-point representation
More information