High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals

Size: px
Start display at page:

Download "High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals"

Transcription

1 High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals Ulrich Kulisch and Gerd Bohlender Institut für Angewandte und Numerische Mathematik, Karlsruher Institut für Technologie D Karlsruhe, Germany Abstract Floating-point arithmetic is the tool that is most commonly used for scientific computation. It is well known that conventional floating-point addition is not associative, i.e., for floating-point numbers a, b, c in general a + (b + c) (a + b) + c. This, however, is a relict of old and poor technologies. We show in this paper, that n floating-point numbers a i, i = 1(1)n, 1 very well can be added in a way that produces a result that is independent of the order in which the summands are added. This method does not round after each single addition. It accumulates the summands exactly into a modest fixed-point register on the arithmetic unit and rounds the sum to a floating-point number only once at the very end of the accumulation. By pipelining the sum can be computed in the time the processor needs to read the summands, i.e., the sum is computed at extreme speed. The method presented here has the potential of replacing conventional methods for the implementation of floating-point addition and subtraction. It very naturally can be applied to the accumulation of n floating-point intervals. Here again the result is independent of the order in which the intervals are added. 1 Introduction Conventionally, computing speed is measured in flops (floating-point operations per second). This assumes that the entire computation is reduced to the four elementary floating-point operations. Beyond of the four operations +,,, and / several old mechanic calculators provided a number of compound operations like accumulate or multiply and accumulate. This allowed a continuous accumulation of numbers and of products of numbers into different positions of a wide fixed-point register. This fixed-point 1 1(1)n means: from 1 in steps of 1 to n. 1

2 2 addition was the fastest way to use the computer. It was applied as often as possible. No intermediate results needed to be written down and typed in again for the next operation. No intermediate roundings or normalizations had to be performed. No error analysis was necessary. As long as no underflow or overflow occurred, which was obvious and visible, the result was always correct. It was independent of the order in which the summands were added. If required, a rounding was performed only once at the very end of the accumulation. The method discussed here can be traced back to the old computer by G. W. Leibniz (1685). It shows an excellent feeling of the old mathematicians for efficiency in computing. It is high time that such fast compound arithmetic operations are also offered on electronic computers. Old technologies (around 197) did not allow this and this old state of the art still determines the present computer architecture. The fact that neither the IEEE floating-point arithmetic standard 754 nor the standard IEEE 1788 for interval arithmetic provide and require the accumulation and the dot product of two floating-point vectors as elementary arithmetic operations displays this tragic situation. Modern technology is very powerful. A modest fixed-point register for the accumulation of floating-point numbers and of simple products of floating-point numbers on the arithmetic unit can very easily be provided. It would be rewarded by extreme speed and increased accuracy. Hardware implementations of the exact dot product (EDP) at Karlsruhe in 1993 [2] and at Berkeley in 213 [3] show that the EDP can be computed in about 1/5th to 1/6th of the time needed for computing a possibly wrong dot product in conventional floating-point arithmetic. Influenced by the need for fast and exact computation of the dot product modern processors by Intel and IBM provide register memory of 16 K bits on the arithmetic unit [8]. So they differ from the traditional von Neumann architecture. This difference may be small but it is essential for high speed computation of accumulations and of the EDP. Access to register memory is much faster than access to main memory. About 1 K bits suffice for a real dot product, 2 K bits for an interval dot product, and 4 K bits for a complex interval dot product. One half of these bits suffice for high speed accumulation of floating-point numbers, of floating-point intervals, and of complex floating-point intervals, respectively. So the 16 K bits still give room for a number of interrupts. Using a software routine for a correctly rounded dot product as alternative for a hardware implemented EDP leads to a comparatively slow process. A correctly rounded dot product is built upon a computation of the dot product in conventional floating-point arithmetic. This is already 5 to 6 times slower than an EDP. High accuracy is obtained by clever and sophisticated mathematical considerations which all together make it slower than the EDP by more than one magnitude. High speed and accuracy, however, are essential for acceptance and success of interval arithmetic. Iterative refinement or defect correction methods are basic ingredients for success of numerical and of interval analysis. It is the EDP which makes these techniques fast and successful. For details see Chapter 9 in [19]. A hardware

3 3 s exponent e fraction f u Figure 1: The floating-point number format. supported EDP brings speed and accuracy to floating-point interval arithmetic. With it interval arithmetic becomes a fast and exception-free computing tool! High speed floating-point interval arithmetic carries the potential of replacing floating-point arithmetic in many applications. The method for high speed accumulation of floating-point numbers and floating-point intervals presented here is a specialization of techniques for fast and exact computation of dot products of two floating-point vectors 2 used in the XSC-languages since 198 [9, 1, 11, 23, 24] to the accumulation of simple floating-point numbers. This paper shows how simple and fast a set of floating-point numbers and floating-point intervals can be exactly accumulated. Of course, exact accumulation of floating-point numbers can be obtained as a particular case of an EDP. A special routine for it, however, makes it even simpler and faster. It may even replace conventional implementations of the elementary floating-point operations addition and subtraction. We now develop circuitry for fast and exact accumulation of floating-point numbers. We do this for a data format which is close to double precision. Having in mind that the target of our study is floating-point interval arithmetic we avoid the huge exponent ranges of the IEEE 754 arithmetic standard. We choose a format that is more appropriate for floating-point interval arithmetic. 2 Basic Assumptions Motivated by the book The End of Error by John Gustafson [4] which extends arithmetic for closed real intervals to just connected sets 3 of real numbers we develop the method for a particular data format. The study, however, is not restricted to this format. The basic ideas can be applied to other formats as well. We assume that the floating-point number is represented by a 64 bit word as shown in Figure 1. Here one bit is used for the sign s, 9 bits are used for the exponent, 53 bits for the fraction and one bit for the ubit u. As usual the leading bit of the fraction of a normalized binary floating-point number is not stored. So the fraction actually consists of 54 bits. For the exponent e1 subnormal numbers with a denormalized mantissa are permitted. The ubit u is used to indicate whether the number is exact (u = ) or inexact (u = 1). The ubit is very useful for advanced interval arithmetic. If it is the corresponding bracket is closed and it is open if the ubit is 1. So if F = F(b, l, e1, e2) denotes a floating-point system with base b, l digits in the mantissa, least exponent e1 and greatest exponent e2, we have b = 2, l = 2 For details see Chapter 1 in [17], or Chapter 8 in [19]. 3 These can be closed, open, half-open, bounded or unbounded.

4 4 54, e1 = 255, and e2 = 256. For numerical examples we occasionally use base b = 1. We are now going to compute the sum s := n a ν = a 1 + a a n, a i F (2, l, e1, e2), i = 1(1)n. (1) ν=1 All floating-point numbers of this system can be taken into a fixed-point register of length L = e2 + l + e1 = = 565 bits without loss of information, see Figure 2. The size of this register is independent of the number of summands that are to be added: If one of the summands has an exponent, k e2 l e1 CR Figure 2: Complete register with long shift for exact accumulation of floatingpoint numbers. its mantissa can be taken into a register of length l. If another summand has exponent 1, it can be treated as having an exponent if the register provides further digits on the left and the mantissa is shifted one place to the left. An exponent 1 in one of the summands requires a corresponding shift to the right. The largest exponents in magnitude that may occur in the summands are e2 and e1. So any summand can be treated as having an exponent and be taken into a fixed-point register of length e2 + l + e1 without loss of information. The contents of this register can be zero, positive or negative. This information is held in the status register. We stress the fact that subnormals (gradual underflow) are included in the representation. No particular restrictions on the digits of the mantissa are made in case of an exponent e1. If the register shown in Figure 2 is built as an accumulator with an adder, all summands could be added in without loss of information. To accommodate possible overflows, it is convenient to provide a few, say k, more digits of base b on the left. These are used to count the number of overflows that occur during the accumulation. With such an accumulator, every sum (1) can be added without loss of information. As many as b k overflows may occur and be accommodated without loss of information. In the worst case, presuming every sum causes an overflow, we can accommodate sums with n b k summands. Since every sum of floating-point numbers can be exactly accumulated in this register we call it a Complete Register, CR for short. Assuming here that k is sufficiently large, k = 5 for instance, we get a register size of k + e2 + l + e1 = k bits. This is a little more than 9 words

5 5 of 64 bits. The summands are shifted to the proper position and added in. The final sum has to be in the single exponent range e1 e e2, otherwise it is not representable as a floating-point number and the problem has to be scaled. Once more we stress the fact that the size of this register only depends on the floating-point format. It is indeed independent of the number n of summands that are to be accumulated. The size of the complete register would grow with the exponent range of the data format in use. If this range should be extremely large, as for instance in the case of an extended precision floating-point format, only an inner part of the register would be supported by hardware. The outer parts which are used very rarely could be simulated in software. The long data format of the IBM /37 architecture covered a range of about 1 75 to 1 75, which is very modest. This architecture dominated the market for more than 25 years and most problems could conveniently be solved with machines of this architecture within this range of numbers. 3 Fast Carry Resolution The addition of a floating-point number into the complete register CR seems to be slow. It appears to require a long shift and it may produce a carry propagation over several hundred bits. We now discuss a very fast solution for both problems for the data format outlined in Fig. 1. As soon as the principles are clear, the technique can easily be applied to other data formats as well. The mantissa here consists of l = 54 bits. We assume additionally that the complete register CR is subdivided into words of 64 bits. The mantissa of the summand touches at most two consecutive 64-bit words of the complete register which are determined by the exponent of the summand. A shifter then aligns the 54 bit summand into the correct position for the subsequent addition into the two consecutive words of the CR. This addition may produce a carry (or a borrow in case of subtraction). The carry is absorbed by the next more significant 64-bit word of the complete register in which not all digits are 1 (or for subtraction), Figure 3(a). For fast identification of this word two information bits or flags are appended to each complete register word, see Figure 3(b). One of these bits, the all bits 1 flag, is set to 1 if all 64 bits of the register word are 1. This means that a carry will propagate through the entire word. The other bit, the all bits flag, is set to, if all 64 bits of the register word are. This means that in case of subtraction a borrow will propagate through the entire word. During the addition of the summand into two consecutive words of the CR, a search is started for the next more significant word where the all bits 1 flag is not set. This is the word which will absorb a possible carry. If the addition generates a carry, this word must be incremented by one and all intermediate words must be changed from all bits 1 to all bits. The easiest way to do this is simply to switch the flag bit from all bits 1 to all bits with the additional semantics that if a flag bit is set, the appropriate constant (all bits or all bits 1) must be generated instead of reading the CR word contents when fetching

6 6 a word from the CR, Figure 3(b). Borrows are handled in an analogous way. a) carry generation XXXXXX XXXXXX 111 carry skip area local fixed-point addition k XXXXXX carry resolution address carry start address b) XXXXXX XXXXXX k extracted carry flag word carry flag logic carry dissolution indicator Figure 3: Fast carry resolution. This carry handling scheme allows a very fast carry resolution. The exponent of the summand delivers the address for its addition. By using the flags the carry word can be incremented/decremented simultaneously with the addition of the summand. If the addition of the summand produces a carry, the incremented/decremented carry word is written into the CR. Otherwise nothing is changed. Other, perhaps simpler methods for fast carry resolution are possible. We sketch here a second solution where adder equipment is provided for the entire width of the complete register CR. For more details in case of the dot product computation see Chapter 1 in [17] or Chapter 8 in [19]. An adder for the entire width of the CR certainly is too slow to perform an addition in a single cycle. So we subdivide the CR into shorter pieces (of 16, 32, 64 bits). The width of these segments is chosen such that a parallel addition can be performed in a single cycle. Now each one of these adder segments can produce a carry. These carries are written into carry registers between adjacent adders. See Figure 4. If a single addition has to be performed, these carries have to be propagated straight away. If more than one summand has to be added, the carries are added to the next more significant adder in the next addition cycle together with the next summand. Only at the very end of the accumulation, when no more summands are coming, carries may have to be eliminated. However, the summands consist of 54 bits only. So during an addition of a summand carries are only produced in a small part of the complete

7 7 cy S3 S2 S1 cy +/- +/- +/- cy cy segmented summand segmented adder CR CR CR segmented complete register Figure 4: Segmented addition. register CR. The carry elimination, on the other hand, takes place during each step of the accumulation whenever a carry is left. So in an average case there will only be very few carries left at the end of the accumulation and a few additional cycles will suffice to absorb the remaining carries. Thus segmenting the adder enables it to read and process a summand in each cycle. In each step of the accumulation, an addition only has to be activated for a small number (two) of the adder segments and for those adders where a non zero carry is waiting to be absorbed. This adder selection can reduce the power consumption for the accumulation step significantly. 4 Accumulation of Floating-point Numbers We assume in this section that the data are stored in the double precision 64-bit format as outlined in Section 2. There the mantissa has 54 bits. A central building block is the complete register CR. It is a fixed-point register wherein any sum of floating-point numbers can be represented without error. It allows accumulation of any finite number of floating-point numbers exactly or with a single rounding at the very end of the accumulation. As shown in Section 2, it consists of about 9 words of 64 bits. The 54-bit summand is added to two consecutive words of the CR. These are determined by the exponent of the summand. After the addition, these two words are written back into the same two CR words that the portion has been read from. A 54- out of 118-bit shifter is used to align the summand onto the relevant word boundaries. Figure 5 shows a block diagram for the accumulation. The addition can be executed by a 54-bit adder. It may cause a carry. The carry is absorbed by incrementing (or decrementing in the case of a borrow) a more significant word of the CR as determined by the carry handling scheme. The entire process can be pipelined. The pipeline is sketched in Figure 6. The exponent of the summand consists of 9 bits. The six low order (less significant) bits are used to perform the shift. The 3 more significant bits deliver the CR address to which the summand has to be added. So the originally very long shift is split into a short shift and an addressing operation. The shifter performs a

8 8 interface 64 register file (or memory access unit) 54 9 exp mantissa 54 shifter 54 to shifted summand 64 3 inc/dec /- addition of the 54 bit summand XXXXXX XXXXXX CR 1 1 carry resolution address addition address Figure 5: Block diagram for fast and exact accumulation of floating-point numbers. relatively short shift operation. The addressing selects the two words of the CR for the addition. The carry logic determines the word that absorbs the carry. All these address decodings can be hard wired. The result of each addition is written back into the same CR words to which the addition has been executed. The two carry flags appended to each accumulator word are indicated in the figure. In practice the flags are kept in separate registers. We stress the fact that in the circuit just discussed virtually no unoverlapped computing time is needed for the arithmetic. In the pipeline the arithmetic is performed in the time that is needed to read the data into the accumulation unit. We consider a computer which is able to read data into the arithmetic logical

9 9 unit in portions of 64 bits. If the 54-bit summand spans two consecutive 64-bit words of the complete register, a closer look shows that the 1 least significant bits of those two words are never changed by addition of the summand. Thus the adder needs to be 54 bits wide only. Figure 5 shows a sketch for the accumulation of a summand. In the circuit, a 54- to 118-bit shifter is used. Sophisticated logic is used for the generation of the carry resolution address, since this address must be generated very quickly. Only one address decoder is needed to find the starting address for an addition. The more significant part of the summand is added to the contents of the CR word with the next address. A tree structured carry logic now determines the CR word which absorbs the carry. Figure 6 sketches a pipeline for this kind of addition. In the figure we assume that two machine cycles are needed to decode and read one 64-bit word into the accumulation unit. In the block diagram for the accumulation in Figure 5, data busses of 64 bits are frequently used. In a pipeline the arithmetic is performed cycle read shift add/subt address decoding a i read a i address decoding a i+1 shift a i read a i+1 address decoding CR address decoding a i+2 shift a i+1 add/subt a i read a i+2 address decoding CR store result and flags address decoding a i+3 shift a i+2 add/subt a i+1 read a i+3 address decoding CR store result and flags address decoding a i+4 shift a i+3 add/subt a i+2 read a i+4 address decoding CR store result and flags Figure 6: Pipeline for the accumulation of floating-point numbers. in the time which is needed to read the data into the accumulation unit. Here, we assume that with the necessary address decoding, this requires two cycles for the 54-bit summand. An adder width of a power of 2 may simplify shifting as well as address decoding. The lower bits of the exponent of the summand control the shift operation while the higher bits are directly used as the starting address for the accumulation of the summand into the CR. The two flag registers appended to each complete register word are indicated in the figure 5. In practice the flags are kept in separate registers. If not processed any further the exact result of the accumulation usually has to be rounded into a floating-point number. The flag bits that are used for the fast carry resolution can also be used for the rounding. By looking at the flag bits, the leading result word in the complete register can easily be identified. This and the next CR word are needed to compose the mantissa of the result. This 128 bit quantity must then be shifted to form a normalized mantissa of a 64-bit number.

10 1 In floating-point interval arithmetic two complete registers are used for the representation of the lower and the upper bound. For the correct rounding downwards or upwards it is necessary to check whether any of the discarded bits is a one. This is done by testing the remaining bits of the complete register by looking at the all bits flags of the following words. If these are not all zero the rounding upwards or downwards is executed by enlarging the 54th bit of the mantissa by 1 in magnitude. To develop its full power the method discussed here requires a small amount of register memory (k bits) on the arithmetic unit. Access to this memory is considerably faster than access to the main memory of the computer. Modern processors by leading manufacturers provide this much memory on the arithmetic unit. See [8], for instance. Finally in the final sum the ubit still has to be set. It is set to only if the ubits of all summands have been. In all other cases the ubit of the final sum is set to 1. A ubit stands for a closed interval bracket and a ubit 1 stands for an open interval bracket. We illustrate the influence of the ubit on interval addition by a simple example: [3, 3] + [1, 2] + [4, 5) + ( 1, ) = [4, 5] + [4, 5) + ( 1, ) = [8, 1) + ( 1, ) = (7, ). The method for fast and exact accumulation of floating-point numbers discussed in this paper is the basic ingredient for high speed accumulation of n floating-point intervals A i = [a i1, a i2 ] IF, i = 1(1)n. Again the result is obtained with extreme speed and increased accuracy. The following formula speaks for itself n n n n n A ν = a ν1, a ν2 = a ν1, a ν2, (2) ν=1 ν=1 ν=1 where,, and denote the roundings : IR IF, : R F, and : R F. The latter two are the monotone downwardly resp. upwardly directed roundings. Here, of course, two complete registers are needed for the exact accumulation of the lower bounds a ν1 and of the upper bounds a ν2, ν = 1(1)n. Each one of the angle brackets shown here can be open or closed. This depends on the brackets of the intervals A i, i = 1(1)n. Again the bracket on a particular side of the interval can be closed only if the bracket of every summand is closed on that side of the interval. In other words: the ubit of the sum is obtained by the logical or of the ubits of all summands in combination with the rounding information. Basically the method discussed in this paper is not new. It has been used for exact evaluation of the dot product of two vectors the components of which are floating-point numbers in the XSC-languages (PASCAL-XSC, ACRITH-XSC, C-XSC), [9, 1, 11, 23, 24] since 198. Hardware solutions are discussed at detail in the first chapter of the book [17] as well as in chapter eight of the book [19]. Its reduction to the accumulation of floating-point numbers discussed here even has the potential of replacing conventional methods for the implementation of floating-point addition and subtraction. Formula (2) for the accumulation of floating-point intervals shows that the computation of the lower and the upper ν=1 ν=1

11 11 bound are independent of each other. So by doubling the hardware both bounds can be computed simultaneously. This would reduce the speed for accumulating n floating-point intervals to the speed of accumulating n floating-point numbers. Resume: The paper shows that fixed-point accumulation of floating-point numbers and floating-point intervals is superior to accumulation in floatingpoint arithmetic with respect to speed and accuracy. The summands are just added into a fixed-point register on the arithmetic unit which covers the full floating-point range. No intermediate roundings and normalizations are performed. No intermediate results need to be stored and read in again for the next operation. The result is independent of the order in which the summands are added. By pipelining the accumulation can be done in the time the processor needs to read the data, i.e., no other method can be faster. For the given data fixed-point accumulation of the floating-point summands is exact. If desired, only one rounding is performed at the very end of the accumulation. It delivers a correctly rounded result. The method described in this paper is not new at all. It can be traced back to the very early computer by G. W. Leibniz. In case of an unstable (not so well conditioned) problem it may happen that the best possible floating-point arithmetic does not compute a correct solution. In such a case higher precision (multiple precision) arithmetic has to be used. A multiple precision number is represented as an array of floating-point numbers. The value of this number is the sum of its components. It can be represented in the complete register as a long fixed-point variable. In Section 9 of [19] the operations +,,, /, and the square root for multiple precision numbers and intervals are defined. On the basis of these operations algorithms for the elementary functions also can be and are provided in the XSC-languages [9, 1, 11, 24]. The concept of function and operator overloading in these languages makes applications of multiple precision real and interval arithmetic very simple. Acknowledgement: The authors owe thanks to John Gustafson and Goetz Alefeld for useful comments on the paper. They are also grateful to two anonymous referees. Their comments led to major improvements of the paper. References [1] E. Adams, U. Kulisch, (eds.), Scientific Computing with Automatic Result Verification, Academic Press, [2] Ch. Baumhof, A new VLSI vector arithmetic coprocessor for the PC, in: Institute of Electrical and Electronics Engineers (IEEE), S. Knowles and W. H. McAllister (eds.), Proceedings of 12th Symposium on Computer Arithmetic ARITH, Bath, England, July 19 21, 1995, pp , IEEE Computer Society Press, Piscataway, NJ, [3] D. Biancolin and J. Koenig, Hardware Accelerator for Exact Dot Product, ASPIRE Laboratory, University of California, Berkeley, 215. [4] J. L. Gustafson, The End of Error. CRC Press, A Chapman and Hall Book, 215.

12 12 [5] R. Hammer, M. Hocks, U. Kulisch and D. Ratz, Numerical Toolbox for Verified Computing I: Basic Numerical Problems, Springer, [6] R. Hammer, M. Hocks, U. Kulisch and D. Ratz, C++ Toolbox for Verified Computing: Basic Numerical Problems. Springer, [7] R. Hammer, M. Hocks, U. Kulisch and D. Ratz, Numerical Toolbox for Verified Computing I: Basic Numerical Problems, MIR, Moskau, 25 (in Russian). [8] INTEL, Intel Architecture Instruction Set Extensions Programming Reference, , December 213, [9] R. Klatte, U. Kulisch, M. Neaga, D. Ratz and Ch. Ullrich, PASCAL-XSC Sprachbeschreibung mit Beispielen, Springer, See also xsc/ or [1] R. Klatte, U. Kulisch, M. Neaga, D. Ratz and Ch. Ullrich, PASCAL-XSC Language Reference with Examples, Springer, See also xsc/ or Russian translation MIR, Moscow, 1995, third edition 26. See also xsc/ or [11] R. Klatte, U. Kulisch, C. Lawo, M. Rauch and A. Wiethoff, C-XSC A C++ Class Library for Extended Scientific Computing, Springer, See also xsc/ or [12] U. Kulisch, Grundlagen des Numerischen Rechnens Mathematische Begründung der Rechnerarithmetik, Informatik 19, Bibliographisches Institut, Mannheim Wien Zürich, [13] U. Kulisch and W. L. Miranker, Computer Arithmetic in Theory and Practice, Academic Press, [14] U. Kulisch and W. L. Miranker (eds.), A New Approach to Scientific Computation, Academic Press, [15] U. Kulisch, T. Teufel and B. Hoefflinger, Genauer und trotzdem schneller: Ein neuer Coprozessor für hochgenaue Matrix- und Vektoroperationen. Titelgeschichte, Elektronik 26 (1994), [16] U. Kulisch, R. Lohner, A. Facius, Perspectives on Enclosure Methods, Springer, 21. [17] U. Kulisch, Advanced Arithmetic for the Digital Computer Design of Arithmetic Units, Springer, 22. [18] U. Kulisch, An Axiomatic Approach to Computer Arithmetic with an Appendix on Interval Hardware, LNCS, 724, pp , Springer, 212. [19] U. Kulisch, Computer Arithmetic and Validity Theory, Implementation, and Applications, de Gruyter, Berlin, 28, ISBN , second edition 213, ISBN [2] U. Kulisch, Mathematics and Speed for Interval Arithmetic A Complement to IEEE P1788. Prepared for and sent to IEEE P1788 in January 214. To be published. [21] U. Kulisch, Up-to-date Interval Arithmetic From closed intervals to connected sets of real numbers, LNCS, Springer 216. [22] IBM, IBM System/37 RPQ. High Accuracy Arithmetic, SA , IBM Deutschland GmbH (Department 3282, Schönaicher Strasse 22, D-7132 Böblingen), 1984.

13 13 [23] IBM, IBM High-Accuracy Arithmetic Subroutine Library (ACRITH), IBM Deutschland GmbH (Department 3282, Schönaicher Strasse 22, D-7132 Böblingen), 1983, third edition, General Information Manual, GC Program Description and User s Guide, SC Reference Summary, GX [24] IBM, ACRITH XSC: IBM High Accuracy Arithmetic Extended Scientific Computation. Version 1, Release 1, IBM Deutschland GmbH (Department 3282, Schönaicher Strasse 22, D-7132 Böblingen), General Information, GC Reference, SC Sample Programs, SC How To Use, SC Syntax Diagrams, SC [25] J. D. Pryce (Ed.), P1788, IEEE Standard for Interval Arithmetic, or

High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals

High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals High Speed Associative Accumulation of Floating-point Numbers and Floating-point Intervals Ulrich Kulisch and Gerd Bohlender Institut für Angewandte und Numerische Mathematik Karlsruher Institut für Technologie

More information

Foreword. to the GAMM IMACS Proposal for Accurate Floating-Point Vector Arithmetic 1

Foreword. to the GAMM IMACS Proposal for Accurate Floating-Point Vector Arithmetic 1 GAMM IMACS Proposal for Accurate Floating-Point Vector Arithmetic Foreword to the GAMM IMACS Proposal for Accurate Floating-Point Vector Arithmetic 1 The Proposal for Accurate Floating-Point Vector Arithmetic

More information

Verification of Numerical Results, using Posits, Valids, and Quires

Verification of Numerical Results, using Posits, Valids, and Quires Verification of Numerical Results, using Posits, Valids, and Quires Gerd Bohlender, Karlsruhe Institute of Technology CoNGA Singapore, March 28, 2018 Outline Floating-Point Arithmetic Pure vs. Applied

More information

Systolic Super Summation with Reduced Hardware

Systolic Super Summation with Reduced Hardware Systolic Super Summation with Reduced Hardware Willard L. Miranker Mathematical Sciences Department IBM T.J. Watson Research Center Route 134 & Kitichwan Road Yorktown Heights, NY 10598 Abstract A principal

More information

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B) Computer Arithmetic Data is manipulated by using the arithmetic instructions in digital computers. Data is manipulated to produce results necessary to give solution for the computation problems. The Addition,

More information

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j ENEE 350 c C. B. Silio, Jan., 2010 FLOATING POINT REPRESENTATIONS It is assumed that the student is familiar with the discussion in Appendix B of the text by A. Tanenbaum, Structured Computer Organization,

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-C Floating-Point Arithmetic - III Israel Koren ECE666/Koren Part.4c.1 Floating-Point Adders

More information

Floating Point Considerations

Floating Point Considerations Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

Bits, Words, and Integers

Bits, Words, and Integers Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are

More information

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS (09 periods) Computer Arithmetic: Data Representation, Fixed Point Representation, Floating Point Representation, Addition and

More information

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

More information

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B)

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B) COMPUTER ARITHMETIC 1. Addition and Subtraction of Unsigned Numbers The direct method of subtraction taught in elementary schools uses the borrowconcept. In this method we borrow a 1 from a higher significant

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

Floating-point representation

Floating-point representation Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

COMPUTER ORGANIZATION AND ARCHITECTURE

COMPUTER ORGANIZATION AND ARCHITECTURE COMPUTER ORGANIZATION AND ARCHITECTURE For COMPUTER SCIENCE COMPUTER ORGANIZATION. SYLLABUS AND ARCHITECTURE Machine instructions and addressing modes, ALU and data-path, CPU control design, Memory interface,

More information

Up-to-date Interval Arithmetic From Closed Intervals to Connected Sets of Real Numbers

Up-to-date Interval Arithmetic From Closed Intervals to Connected Sets of Real Numbers Up-to-date Interval Arithmetic From Closed Intervals to Connected Sets of Real Numbers Ulrich Kulisch Preprint 15/02 INSTITUT FÜR WISSENSCHAFTLICHES RECHNEN UND MATHEMATISCHE MODELLBILDUNG Anschriften

More information

Complete Interval Arithmetic and its Implementation on the Computer

Complete Interval Arithmetic and its Implementation on the Computer Complete Interval Arithmetic and its Implementation on the Computer Ulrich W. Kulisch Institut für Angewandte und Numerische Mathematik Universität Karlsruhe Abstract: Let IIR be the set of closed and

More information

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y Arithmetic A basic operation in all digital computers is the addition and subtraction of two numbers They are implemented, along with the basic logic functions such as AND,OR, NOT,EX- OR in the ALU subsystem

More information

Chapter 2: Number Systems

Chapter 2: Number Systems Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

Digital Fundamentals

Digital Fundamentals Digital Fundamentals Tenth Edition Floyd Chapter 2 2009 Pearson Education, Upper 2008 Pearson Saddle River, Education NJ 07458. All Rights Reserved Decimal Numbers The position of each digit in a weighted

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

Chapter 10 Binary Arithmetics

Chapter 10 Binary Arithmetics 27..27 Chapter Binary Arithmetics Dr.-Ing. Stefan Werner Table of content Chapter : Switching Algebra Chapter 2: Logical Levels, Timing & Delays Chapter 3: Karnaugh-Veitch-Maps Chapter 4: Combinational

More information

Computer Organisation CS303

Computer Organisation CS303 Computer Organisation CS303 Module Period Assignments 1 Day 1 to Day 6 1. Write a program to evaluate the arithmetic statement: X=(A-B + C * (D * E-F))/G + H*K a. Using a general register computer with

More information

CHAPTER 5: Representing Numerical Data

CHAPTER 5: Representing Numerical Data CHAPTER 5: Representing Numerical Data The Architecture of Computer Hardware and Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint

More information

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = ( Floating Point Numbers in Java by Michael L. Overton Virtually all modern computers follow the IEEE 2 floating point standard in their representation of floating point numbers. The Java programming language

More information

Number Systems and Computer Arithmetic

Number Systems and Computer Arithmetic Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text

More information

Floating-Point Arithmetic

Floating-Point Arithmetic Floating-Point Arithmetic Raymond J. Spiteri Lecture Notes for CMPT 898: Numerical Software University of Saskatchewan January 9, 2013 Objectives Floating-point numbers Floating-point arithmetic Analysis

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 4: Floating Point Required Reading Assignment: Chapter 2 of CS:APP (3 rd edition) by Randy Bryant & Dave O Hallaron Assignments for This Week: Lab 1 18-600

More information

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0. C NUMERIC FORMATS Figure C-. Table C-. Listing C-. Overview The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports an

More information

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8 Data Representation At its most basic level, all digital information must reduce to 0s and 1s, which can be discussed as binary, octal, or hex data. There s no practical limit on how it can be interpreted

More information

c-xsc R. Klatte U. Kulisch A. Wiethoff C. Lawo M. Rauch A C++ Class Library for Extended Scientific Computing Springer-Verlag Berlin Heidelberg GmbH

c-xsc R. Klatte U. Kulisch A. Wiethoff C. Lawo M. Rauch A C++ Class Library for Extended Scientific Computing Springer-Verlag Berlin Heidelberg GmbH R. Klatte U. Kulisch A. Wiethoff C. Lawo M. Rauch c-xsc A C++ Class Library for Extended Scientific Computing Translated by G. F. Corliss C. Lawo R. Klatte A. Wiethoff C. Wolff Springer-Verlag Berlin Heidelberg

More information

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be

More information

A Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović

A Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović A Hardware Accelerator for Computing an Exact Dot Product Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović 1 Challenges with Floating Point Addition and multiplication are not associative

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

Monday, January 27, 2014

Monday, January 27, 2014 Monday, January 27, 2014 Topics for today History of Computing (brief) Encoding data in binary Unsigned integers Signed integers Arithmetic operations and status bits Number conversion: binary to/from

More information

Floating Point January 24, 2008

Floating Point January 24, 2008 15-213 The course that gives CMU its Zip! Floating Point January 24, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties class04.ppt 15-213, S 08 Floating

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers

More information

High-Precision Anchored Accumulators for Reproducible Floating-Point Summation

High-Precision Anchored Accumulators for Reproducible Floating-Point Summation High-Precision Anchored Accumulators for Reproducible Floating-Point Summation David Lutz and Neil Burgess Kulisch accumulators treat FP numbers and products as very wide fixed-point numbers use a 4288-bit

More information

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction 1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

More information

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science) Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 4-B Floating-Point Arithmetic - II Spring 2017 Koren Part.4b.1 The IEEE Floating-Point Standard Four formats for floating-point

More information

CS6303 COMPUTER ARCHITECTURE LESSION NOTES UNIT II ARITHMETIC OPERATIONS ALU In computing an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. The ALU is

More information

Number Systems. Both numbers are positive

Number Systems. Both numbers are positive Number Systems Range of Numbers and Overflow When arithmetic operation such as Addition, Subtraction, Multiplication and Division are performed on numbers the results generated may exceed the range of

More information

Chapter 10 - Computer Arithmetic

Chapter 10 - Computer Arithmetic Chapter 10 - Computer Arithmetic Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 10 - Computer Arithmetic 1 / 126 1 Motivation 2 Arithmetic and Logic Unit 3 Integer representation

More information

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1 Floating-Point Arithmetic Numerical Analysis uses floating-point arithmetic, but it is just one tool in numerical computation. There is an impression that floating point arithmetic is unpredictable and

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-09-07 1. We are still at 50. If you are still waiting and are not interested in knowing if a slot frees up, let me know. 2. There is a correction to HW 1, problem 4; the condition

More information

IEEE Standard 754 Floating Point Numbers

IEEE Standard 754 Floating Point Numbers IEEE Standard 754 Floating Point Numbers Steve Hollasch / Last update 2005-Feb-24 IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING 26 CHAPTER 1. SCIENTIFIC COMPUTING amples. The IEEE floating-point standard can be found in [131]. A useful tutorial on floating-point arithmetic and the IEEE standard is [97]. Although it is no substitute

More information

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

UNIT-II. Part-2: CENTRAL PROCESSING UNIT Page1 UNIT-II Part-2: CENTRAL PROCESSING UNIT Stack Organization Instruction Formats Addressing Modes Data Transfer And Manipulation Program Control Reduced Instruction Set Computer (RISC) Introduction:

More information

UNIT-III COMPUTER ARTHIMETIC

UNIT-III COMPUTER ARTHIMETIC UNIT-III COMPUTER ARTHIMETIC INTRODUCTION Arithmetic Instructions in digital computers manipulate data to produce results necessary for the of activity solution of computational problems. These instructions

More information

Arithmetic Logic Unit

Arithmetic Logic Unit Arithmetic Logic Unit A.R. Hurson Department of Computer Science Missouri University of Science & Technology A.R. Hurson 1 Arithmetic Logic Unit It is a functional bo designed to perform "basic" arithmetic,

More information

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009 Digital Signal Processing 8 December 24, 2009 VIII. DSP Processors 2007 Syllabus: Introduction to programmable DSPs: Multiplier and Multiplier-Accumulator (MAC), Modified bus structures and memory access

More information

CHW 261: Logic Design

CHW 261: Logic Design CHW 261: Logic Design Instructors: Prof. Hala Zayed Dr. Ahmed Shalaby http://www.bu.edu.eg/staff/halazayed14 http://bu.edu.eg/staff/ahmedshalaby14# Slide 1 Slide 2 Slide 3 Digital Fundamentals CHAPTER

More information

CHAPTER TWO. Data Representation ( M.MORRIS MANO COMPUTER SYSTEM ARCHITECTURE THIRD EDITION ) IN THIS CHAPTER

CHAPTER TWO. Data Representation ( M.MORRIS MANO COMPUTER SYSTEM ARCHITECTURE THIRD EDITION ) IN THIS CHAPTER 1 CHAPTER TWO Data Representation ( M.MORRIS MANO COMPUTER SYSTEM ARCHITECTURE THIRD EDITION ) IN THIS CHAPTER 2-1 Data Types 2-2 Complements 2-3 Fixed-Point Representation 2-4 Floating-Point Representation

More information

DESIGN OF A COMPOSITE ARITHMETIC UNIT FOR RATIONAL NUMBERS

DESIGN OF A COMPOSITE ARITHMETIC UNIT FOR RATIONAL NUMBERS DESIGN OF A COMPOSITE ARITHMETIC UNIT FOR RATIONAL NUMBERS Tomasz Pinkiewicz, Neville Holmes, and Tariq Jamil School of Computing University of Tasmania Launceston, Tasmania 7250 AUSTRALIA Abstract: As

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Clark N. Taylor Department of Electrical and Computer Engineering Brigham Young University clark.taylor@byu.edu 1 Introduction Numerical operations are something at which digital

More information

Chapter 3 Data Representation

Chapter 3 Data Representation Chapter 3 Data Representation The focus of this chapter is the representation of data in a digital computer. We begin with a review of several number systems (decimal, binary, octal, and hexadecimal) and

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-A Floating-Point Arithmetic Israel Koren ECE666/Koren Part.4a.1 Preliminaries - Representation

More information

Logic, Words, and Integers

Logic, Words, and Integers Computer Science 52 Logic, Words, and Integers 1 Words and Data The basic unit of information in a computer is the bit; it is simply a quantity that takes one of two values, 0 or 1. A sequence of k bits

More information

Digital Fundamentals

Digital Fundamentals Digital Fundamentals Tenth Edition Floyd Chapter 1 Modified by Yuttapong Jiraraksopakun Floyd, Digital Fundamentals, 10 th 2008 Pearson Education ENE, KMUTT ed 2009 Analog Quantities Most natural quantities

More information

FLOATING POINT NUMBERS

FLOATING POINT NUMBERS Exponential Notation FLOATING POINT NUMBERS Englander Ch. 5 The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 10 0 123.4 x 10 1 12.34 x 10 2 1.234 x 10 3

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers Implementation Today Review representations (252/352 recap) Floating point Addition: Ripple

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-B Floating-Point Arithmetic - II Israel Koren ECE666/Koren Part.4b.1 The IEEE Floating-Point

More information

Computer Arithmetic Floating Point

Computer Arithmetic Floating Point Computer Arithmetic Floating Point Chapter 3.6 EEC7 FQ 25 About Floating Point Arithmetic Arithmetic basic operations on floating point numbers are: Add, Subtract, Multiply, Divide Transcendental operations

More information

Wednesday, January 28, 2018

Wednesday, January 28, 2018 Wednesday, January 28, 2018 Topics for today History of Computing (brief) Encoding data in binary Unsigned integers Signed integers Arithmetic operations and status bits Number conversion: binary to/from

More information

Floating-Point Arithmetic

Floating-Point Arithmetic ENEE446---Lectures-4/10-15/08 A. Yavuz Oruç Professor, UMD, College Park Copyright 2007 A. Yavuz Oruç. All rights reserved. Floating-Point Arithmetic Integer or fixed-point arithmetic provides a complete

More information

Module 12 Floating-Point Considerations

Module 12 Floating-Point Considerations GPU Teaching Kit Accelerated Computing Module 12 Floating-Point Considerations Lecture 12.1 - Floating-Point Precision and Accuracy Objective To understand the fundamentals of floating-point representation

More information

Chapter 2. Data Representation in Computer Systems

Chapter 2. Data Representation in Computer Systems Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 11: Floating Point & Floating Point Addition Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Last time: Single Precision Format

More information

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. ! Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

More information

Chapter 4. Operations on Data

Chapter 4. Operations on Data Chapter 4 Operations on Data 1 OBJECTIVES After reading this chapter, the reader should be able to: List the three categories of operations performed on data. Perform unary and binary logic operations

More information

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning 4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

More information

Chapter 3: part 3 Binary Subtraction

Chapter 3: part 3 Binary Subtraction Chapter 3: part 3 Binary Subtraction Iterative combinational circuits Binary adders Half and full adders Ripple carry and carry lookahead adders Binary subtraction Binary adder-subtractors Signed binary

More information

arxiv:cs/ v1 [cs.ds] 11 May 2005

arxiv:cs/ v1 [cs.ds] 11 May 2005 The Generic Multiple-Precision Floating-Point Addition With Exact Rounding (as in the MPFR Library) arxiv:cs/0505027v1 [cs.ds] 11 May 2005 Vincent Lefèvre INRIA Lorraine, 615 rue du Jardin Botanique, 54602

More information

Floating Point Square Root under HUB Format

Floating Point Square Root under HUB Format Floating Point Square Root under HUB Format Julio Villalba-Moreno Dept. of Computer Architecture University of Malaga Malaga, SPAIN jvillalba@uma.es Javier Hormigo Dept. of Computer Architecture University

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Floating point numbers are frequently used in many applications. Implementation of arithmetic units such as adder, multiplier, etc for Floating point numbers are more complex

More information

System Programming CISC 360. Floating Point September 16, 2008

System Programming CISC 360. Floating Point September 16, 2008 System Programming CISC 360 Floating Point September 16, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Powerpoint Lecture Notes for Computer Systems:

More information

Mathematical preliminaries and error analysis

Mathematical preliminaries and error analysis Mathematical preliminaries and error analysis Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan August 28, 2011 Outline 1 Round-off errors and computer arithmetic IEEE

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic September 28, 2018 Lecture 1 September 28, 2018 1 / 25 Floating point arithmetic Computers use finite strings of binary digits to represent

More information

3.1 DATA REPRESENTATION (PART C)

3.1 DATA REPRESENTATION (PART C) 3.1 DATA REPRESENTATION (PART C) 3.1.3 REAL NUMBERS AND NORMALISED FLOATING-POINT REPRESENTATION In decimal notation, the number 23.456 can be written as 0.23456 x 10 2. This means that in decimal notation,

More information

Finite arithmetic and error analysis

Finite arithmetic and error analysis Finite arithmetic and error analysis Escuela de Ingeniería Informática de Oviedo (Dpto de Matemáticas-UniOvi) Numerical Computation Finite arithmetic and error analysis 1 / 45 Outline 1 Number representation:

More information

Implementation of Floating Point Multiplier Using Dadda Algorithm

Implementation of Floating Point Multiplier Using Dadda Algorithm Implementation of Floating Point Multiplier Using Dadda Algorithm Abstract: Floating point multiplication is the most usefull in all the computation application like in Arithematic operation, DSP application.

More information

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION QUESTION BANK UNIT-II 1. What are the disadvantages in using a ripple carry adder? (NOV/DEC 2006) The main disadvantage using ripple carry adder is time delay.

More information

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CSE 351 Autumn Instructor: Justin Hsia Floating Point CSE 351 Autumn 2016 Instructor: Justin Hsia Teaching Assistants: Chris Ma Hunter Zahn John Kaltenbach Kevin Bi Sachin Mehta Suraj Bhat Thomas Neuman Waylon Huang Xi Liu Yufang Sun http://xkcd.com/899/

More information

INTERVAL INPUT AND OUTPUT Eero Hyvönen University of Helsinki Department of Computer Science Keywords: interval, input, ou

INTERVAL INPUT AND OUTPUT Eero Hyvönen University of Helsinki Department of Computer Science Keywords: interval, input, ou INTERVAL INPUT AND OUTPUT Eero Hyvönen University of Helsinki Department of Computer Science eero.hyvonen@cs.helsinki.fi Keywords: interval, input, output Abstract More and more novice users are starting

More information

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754 Floating Point Puzzles Topics Lecture 3B Floating Point IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties For each of the following C expressions, either: Argue that

More information

Number Systems CHAPTER Positional Number Systems

Number Systems CHAPTER Positional Number Systems CHAPTER 2 Number Systems Inside computers, information is encoded as patterns of bits because it is easy to construct electronic circuits that exhibit the two alternative states, 0 and 1. The meaning of

More information

Chapter 4 Arithmetic Functions

Chapter 4 Arithmetic Functions Logic and Computer Design Fundamentals Chapter 4 Arithmetic Functions Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Iterative combinational

More information

Computer Organization and Levels of Abstraction

Computer Organization and Levels of Abstraction Computer Organization and Levels of Abstraction Announcements Today: PS 7 Lab 8: Sound Lab tonight bring machines and headphones! PA 7 Tomorrow: Lab 9 Friday: PS8 Today (Short) Floating point review Boolean

More information

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time. Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time The machine model Fabián E. Bustamante, Spring 2010 IEEE Floating point Floating point

More information

A Multiple-Precision Division Algorithm

A Multiple-Precision Division Algorithm Digital Commons@ Loyola Marymount University and Loyola Law School Mathematics Faculty Works Mathematics 1-1-1996 A Multiple-Precision Division Algorithm David M. Smith Loyola Marymount University, dsmith@lmu.edu

More information

Number Representations

Number Representations Number Representations times XVII LIX CLXX -XVII D(CCL)LL DCCC LLLL X-X X-VII = DCCC CC III = MIII X-VII = VIIIII-VII = III 1/25/02 Memory Organization Viewed as a large, single-dimension array, with an

More information