Floating Point Arithmetic

Similar documents
COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 3: Arithmetic for Computers

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

Binary Addition. Add the binary numbers and and show the equivalent decimal addition.

(+A) + ( B) + (A B) (B A) + (A B) ( A) + (+ B) (A B) + (B A) + (A B) (+ A) (+ B) + (A - B) (B A) + (A B) ( A) ( B) (A B) + (B A) + (A B)

Module 2: Computer Arithmetic

Floating Point Arithmetic

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Numeric Encodings Prof. James L. Frankel Harvard University

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Chapter 10 - Computer Arithmetic

Chapter 4 Section 2 Operations on Decimals

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

Number Systems. Both numbers are positive

CHW 261: Logic Design

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

Divide: Paper & Pencil

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Digital Fundamentals

Chapter 4. Operations on Data

Numerical Representations On The Computer: Negative And Rational Numbers

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

CMPSCI 145 MIDTERM #1 Solution Key. SPRING 2017 March 3, 2017 Professor William T. Verts

ECE 2030B 1:00pm Computer Engineering Spring problems, 5 pages Exam Two 10 March 2010

ECE 30 Introduction to Computer Engineering

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

CHAPTER 5: Representing Numerical Data

Numerical Representations On The Computer: Negative And Rational Numbers

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Number Systems and Computer Arithmetic

Lecture 8: Addition, Multiplication & Division

Chapter 5 : Computer Arithmetic

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666


Exponential Notation

ECE232: Hardware Organization and Design

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Computer Organisation CS303

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

COMP2611: Computer Organization. Data Representation

Chapter 3: part 3 Binary Subtraction

3.1 DATA REPRESENTATION (PART C)

Arithmetic Logic Unit

Chapter 2. Data Representation in Computer Systems

Chapter 2 Data Representations

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand

Computer Architecture and Organization

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

Kinds Of Data CHAPTER 3 DATA REPRESENTATION. Numbers Are Different! Positional Number Systems. Text. Numbers. Other

UNIT-III COMPUTER ARTHIMETIC

Signed umbers. Sign/Magnitude otation

Rules of Exponents Part 1[Algebra 1](In Class Version).notebook. August 22, 2017 WARM UP. Simplify using order of operations. SOLUTION.

CO212 Lecture 10: Arithmetic & Logical Unit

Chapter 2: Number Systems

Number representations

Groups of two-state devices are used to represent data in a computer. In general, we say the states are either: high/low, on/off, 1/0,...

Data Representations & Arithmetic Operations

Principles of Computer Architecture. Chapter 3: Arithmetic

CPE300: Digital System Architecture and Design

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

EE 486 Winter The role of arithmetic. EE 486 : lecture 1, the integers. SIA Roadmap - 2. SIA Roadmap - 1

A. Incorrect! To simplify this expression you need to find the product of 7 and 4, not the sum.

CHAPTER 1 Numerical Representation

COMPUTER ORGANIZATION AND ARCHITECTURE

Microcomputers. Outline. Number Systems and Digital Logic Review

CS101 Lecture 04: Binary Arithmetic

FLOATING POINT NUMBERS

Advanced Computer Architecture-CS501

Introduction to Computers and Programming. Numeric Values

Organisasi Sistem Komputer

COMP Overview of Tutorial #2

COMPUTER ARITHMETIC (Part 1)

CS321 Introduction To Numerical Methods

Foundations of Computer Systems

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

Computer System and programming in C

Number System. Introduction. Decimal Numbers

MIPS Integer ALU Requirements

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

COMP2121: Microprocessors and Interfacing. Number Systems

DEPARTMENT OF MATHS, MJ COLLEGE

BINARY SYSTEM. Binary system is used in digital systems because it is:

Unit 7 Number System and Bases. 7.1 Number System. 7.2 Binary Numbers. 7.3 Adding and Subtracting Binary Numbers. 7.4 Multiplying Binary Numbers

60-265: Winter ANSWERS Exercise 4 Combinational Circuit Design

Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j

Chapter 5: Computer Arithmetic. In this chapter you will learn about:

An instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit

Le L c e t c ur u e e 2 To T p o i p c i s c t o o b e b e co c v o e v r e ed e Variables Operators

Floating Point Arithmetic

By, Ajinkya Karande Adarsh Yoga

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Floating-point representations

Learning Objectives. Binary over Decimal. In this chapter you will learn about:

Transcription:

Floating Point Arithmetic Floating point numbers are frequently used in many applications. Implementation of arithmetic units such as adder, multiplier, etc for Floating point numbers are more complex than for fixed point arithmetic. In the below sections we discuss the representation of Floating point numbers, the Algorithm for Floating point multiplication, VHDL Implementation of a floating point multiplier & the procedures for Floating point addition, subtraction & division are described. REPRESENTATION OF FLOATING POINT NUMBERS Data type representations in VHDL: Floating Point Types A floating point type has a set of values in a given range of real numbers. Examples of floating point type declarations are type TTL_VOLTAGE is range -5.5 to -1.4; type REAL_DATA is range 0.0 to 31.9; An example of an object declaration is variable LENGTH: REAL_DATA range 0.0 to 15.9;... variable LI, L2, L3: REAL_DATA range 0.0 to 15.9; LENGTH is a variable object of type REAL_DATA that has been constrained to take real values in the range 0.0 through 15.9 only. Notice that in this case, the range constraint was specified in the variable declaration itself. Alternately, it is possible to declare a subtype and then use this subtype in the variable declarations as shown. subtype RD16 is REAL_DATA range 0.0 to 15.9;... variable LENGTH: RD16;... variable Li, L2, L3: RD16; The range bounds specified in a floating point type declaration must be constants or locally static expressions. Floating -point literals are values of a floating point type. Examples of floating point literals are 16.26 0.0 0.002 3_1.4_2

Floating point literals differ from integer literals by the presence of the dot (. ) character. Thus 0 is an integer literal while 0.0 is a floating point literal. Floating point literals can also be expressed in an exponential form. The exponent represents a power of ten and the exponent value must be an integer. Examples are 62.3 E-2 5.0 E+2 Integer and floating point literals can also be written in a base other than 10 (decimal). The base can be any value between 2 and 16. Such literals are called based literals. In this case, the exponent represents a power of the specified base. The syntax for a based literal is Examples are base # based-value # -- form 1 base # based-value # E exponent -- form 2 2#101_101_000# represents (101101000)2 (360) in decimal, 16#FA# represents (FA)16 (11111010)2 (250) in decimal, 16#E#E1 represents (E)16* (16^1) 14* 16 (224) in decimal, 2#110.01 # represents (110.01)2 (6.25) in decimal. The base and the exponent values in a based literal must be in decimal notation. The only predefined floating point type is REAL. The range of REAL is again implementation dependent but it must at least cover the range -I.OE38 to +I.OE38 and it must allow for at least six decimal digits of precision.

Computation of floating point values Consider a floating point number N F x 2 E Assume that the fraction F and exponent E are allocated 4 bits each. Example 1. Compute the value of the floating point number N when F 0.101 & E 0101 Solution: In F the MSB represents the sign bit. Hence 0 represents a positive number. The magnitude of F is calculated as shown in the table. Sign MSB Decimal Point 2-1 0.5 2-2 0.25 2-3 0.125 Digits of F 0. 1 1 0 Inference Positive fraction F 0.5 + 0.25 0.75 i.e., F ½ + ¼ 5/8 Similarly the magnitude of E is calculated as shown in the table Sign MSB 2 2 4 2 1 2 2 0 1 Digits of E 0 1 0 1 Inference Positive exponent E 4 + 1 5 Hence the value of the floating point number N F x 2 E +5/8 x 2 5

Example 2. Repeat the above example for F 1.011 & E 1011 Note : for negative numbers represented in the 2 s complement form the magnitude of the number is computed as (sign bit value) + (magnitude of rest of bits). The magnitude of F is calculated as shown in the table. Magnitude Of bit position Sign (MSB) 2 0 1 Decimal Point 2-1 0.5 2-2 0.25 2-3 0.125 Digits of F 1. 0 1 1 Inference Negative fraction F -1 + ( 0.25 + 0.125) 0.375 i.e., F -1 + ( ¼ + 1/8) -1 + 3/8-5/8 Similarly the magnitude of E is calculated as shown in the table below Sign (MSB) 2 3 8 2 2 4 2 1 2 2 0 1 Digits of E 1 0 1 1 Inference Negative exponent E -8 + (2 + 1) -5 Hence the value of the floating point number N F x 2 E -5/8 x 2-5

Normalization of floating point numbers In order to utilize all the bits in F and have the maximum number of significant figures, F should be normalized so that its magnitude is as large as possible. If F is not normalized, we can normalize F by shifting it left until the sign bit and the next bit are different. Shifting F left is equivalent to multiplying by 2, so every time we shift we must decrement E by 1 to keep N the same. After normalization the magnitude of F will be as large as possible, since any further shifting would change the sign bit. In the following examples, F is normalized to start with and then it is normalized by shifting left.

Representations for number 0 (zero) Zero can be represented with a 4-bit exponent & a 4-bit fraction as 0.000 x 2-8 as shown in table below. Zero cannot be normalized, so F 0.000 when N 0. Any exponent could then be used; however, it is best to have a uniform representation of 0.We will associate the negative exponent with the largest magnitude with the fraction 0. In a 4-bit 2 s complement integer number system, the most negative number is 1000, which represents 8. Thus when F and E are 4 bits, 0 is represented by 0.000 x 2-8 (negative exponents implies a smaller number. For example 2-2 ¼ 0.25 is smaller than 2-1 ½ 0.5. ) The smallest magnitude of F is calculated as shown in the table. Magnitude Of bit position Sign (MSB) 2 0 1 Decimal Point 2-1 0.5 2-2 0.25 2-3 0.125 Digits of F 0. 0 0 0 Inference positive fraction F 0 Similarly the smallest magnitude of E is calculated as shown in the table below Sign (MSB) 2 3 8 2 2 4 2 1 2 2 0 1 Digits of E 1 0 0 0 Inference Negative exponent E -8 + (0) -8 Smallest nonzero positive number that can be represented with a 4-bit exponent & a 4-bit fraction is 0.001 x 2-8 0.125 x 2-8

Floating point operations Floating point Addition Consider the design of an adder for floating point numbers. Two floating point numbers will be added to form a floating point sum ; (F 1 X 2 E1 ) + (F 2 X 2 E2 ) FX2 E Assume that the numbers to be added are properly normalized and that the answer should be put in normalized form. In order to add two fractions, the associated exponents must be equal. Thus, if the exponents E 1, E 2, are different, we must unnormalize one of the fractions and adjust the exponent accordingly. To illustrate the process, we add F 1 x 2 E1 0.111 x 2 5 and F 2 x 2 E2 0.101 x 2 3 Since E 2, E 1, are different we unnormalize F 2 by shifting right two times and adding 2 to the exponent : F2 0.101 x 2 3 0.0101 x 2 4 0.00101 x 2 5 Note that shifting right one place is equivalent to dividing by 2, so each time we shift we must add 1 to the exponent to compensate. When the exponents are equal, we add the fractions. (0.111 x 2 5) + (0.00101 x 2 5 ) 01.00001 x 2 5 This addition caused an overflow into the sign bit position, so we shift right and add 1 to the exponent to correct the fraction overflow. The final result is F x 2 E 0.100001 x 2 6

When one of the fractions is negative, the result of adding fractions may be unnormalized, as illustrated in the following example: (1.100 x 2-2 ) + (0.100 x 2-1 ) (1.110 x 2-1 ) + (0.100 x 2-1 ) (after shifting F 1 ) 0.010 x 2-1 (result of adding fractions is unnormalized) 0.100 x 2-2 (normalized by shifting left and subtracting 1 from exponent) In summary, the steps required to carry ort floating-point addition are as follows: 1. If the exponents are not equal, shift the fraction with the smaller exponent right and add 1 to its exponent; repeat until the exponents are equal. 2. Add the fractions. 3. (a) If fraction overflow occurs, shift right and add 1 to the exponent to correct the overflow. (b) If the fraction is unnormalized, shift left and subtract 1 from the exponent until the fraction is normalized. (c) If the fraction is 0, set the exponent to the appropriate value. 4. Check for exponent overflow. Step 4 is necessary, since step 3a or 3b may produce an exponent overflow. If E 1 >> E 2 and F 2 is positive, F 2 will become all 0s as we right-shift F 2 to equalize the exponents. In this case, the result is F F 1 and E E 1, so it is a waste of time to do the shifting. If E 1 >> E 2 and F 2 is negative, F 2 will become all 1s (instead of all 0s) as we rightshift F 2 to equalize the exponents. When we add the fractions, we will get the wrong answer. To avoid this problem, we can skip the shifting when E 1 >>E 2 and set F F 1 and E E 1.

Similarly, if E 2 >> E 1, we can skip the shifting and set F F 2 and E E 2. For the 4-bit fractions is our example, if E 1 E 2 > 3, we can skip the shifting. Floating-point subtraction Floating-point subtraction is the same as floating-point addition, except in step 2 we must subtract the fractions instead of adding them. Floating-point Division The quotient of two floating-point numbers is (F 1 x 2 E 1) / (F 2 x 2 E 2) (F 1 / F 2 ) X 2 E 1 - E 2 F X 2 E Thus, the basic rule for floating-point division is divide the fractions and subtract the exponents. In addition to considering the same special cases as for multiplication (explained below), we must test for divide the fractions and subtract the exponents. In addition to considering the same special cases as for multiplication, we must test for divide by 0 before dividing. If F 1 and F 2 are normalized, then the largest positive quotient (F) will be 0.1111 /0.1000 01.111 which is less than 10 2, so the fraction overflow is easily corrected. For example, (0.110101 2 2 ) (0.101 2-3 ) 01.010 2 5 0.101 2 6 Alternatively, if F 1 > F 2, we can shift F 1 right before dividing and avoid fraction overflow in the first place.

FLOATING-POINT MULTIPLICATION.Given two floating point numbers, N1 F 1 x 2 E1 & N2 F 2 X 2 E2 the product N1 x N2 is (F 1 x 2 E1 ) x (F 2 x 2 E2 ) (F 1 x F 2 ) x 2 (E1+E2) F x 2 E The fraction part of the product is the product of the fractions - F F 1 x F 2, and the exponent part of the product is the sum of a the exponents E E1+E2. In the algorithm below we assume that F 1 and F 2 are properly normalized to start with, and also that the final result is to be normalized.

Special cases Though only multiplying the fractions and adding the exponents is required there are several special cases that must be considered. 1. if F (fraction part of product) is 0, the exponent E must be set to the largest negative value (1000). (refer section on representation of zero) 2. Fraction overflow If multiplication of 1 by 1 (1.000 x 1.000) is carried out the result should be + 1. Since we cannot represent + 1 as 2 s complement fraction, we call this as a special case a fraction overflow. To correct this situation, we set F ½ (0.100) and add 1 to the exponent E. This is justified, since 1 x 2 E ½ x 2 E+1. 3. Normalization of the product Consider the multiplication example given below (0.1x 2 E1 ) x (0.1 x 2 E2 ) 0.01 x 2 E1+E2 0.1x 2 E1+E2-1

In this example, we normalize the result (0.01 x 2 E1+E2 ) by shifting the fraction (0.01) left one place (i.e., F becomes 0.1) and subtracting 1 from the exponent to compensate. 4. Exponent overflow If the resulting exponent is too large in magnitude to represent in our number system (i.e., say with 4 bits in our case), we have an exponent overflow. (Sometimes, an overflow in the negative direction is referred to as an underflow). Since we are using 4- bit exponents, if the exponent is not in the range 1000 to 0111 (-8 to +7), an overflow has occurred. Since an exponent overflow cannot be corrected, an overflow, an overflow indicator should be turned on.

A flowchart for the floating point multiplier is shown in Figure 1. After the fraction multiply is completed, all the special cases must be tested for. Since we have assumed that F 1 and F 2 are normalized, the smallest possible magnitude for the product is 0.01, (as indicated in the preceding example). Therefore, only one left shift is required to normalize F. Figure 1

Hardware for the exponent adder and fraction multiplier The hardware required to implement the floating point multiplier consists of an exponent adder and a fraction multiplier. Exponent adder Since each floating point number has a 4-bit exponent, addition of two 4-bit exponents (E1 & E2) requires a 5-bit adder as shown below

Special cases: Examples of exponent overflow: 7 + 6 00111 + 00110 01101 13 (Maximum allowable value is 7) -7 + (-6) 11001 + 11010 10011-13 (Most negative allowable value is 8 ) When the exponents are added, an overflow can occur. If E 1 and E 2 are positive and the sum (E) is negative, or if E 1 and E 2 are negative and the sum is positive, the result is a 2 s complement overflow. However, this overflow might be corrected when 1 is added to or subtracted from E during normalization of fraction overflow. To allow for this case, we have made the X register 5 bits long. When E 1 is loaded into X, the sign bit must be extended so that we have a correct 2 s complement representation. Since there are two sign bits, if the addition of E 1 and E 2 produces an overflow, the lower sign bit will get changed, but the high order sign bit will be unchanged. Each of the above examples has an overflow, since the lower sign bit has the wrong value Correction of exponent overflow The input & output signals required for the exponent adder are as follows 1. Load Load E 1, E 2 into the appropriate registers 2. Adx Add exponents; this signal also starts the fraction multiplier. 3. SM8 Set exponent to minus 8 4. RSF Shift fraction right; also increment E. 5. LSF Shift fraction left; also decrement E. 6. V Overflow indicator.

FRACTION MULTIPLIER (refer previous chapter for block diagram & working of faster multiplier i.e, 2 s complement multiplier ). Since we are multiplying 3 bits plus sign by 3 bits plus sign, the result will be 6 bits plus sign. After the fraction multiply, the 7-bit result (F) will be the lower 3 bits of A concatenated with B. The control signals required for the fraction multiplier are 1. St Start the floating point multiplication. 2. Mdone Fraction multiply is done 3. Load F 1, F 2, into the appropriate registers (also clear A in preparation for multiplication) 4. Adx this signal starts the fraction multiplier. 5. RSF Shift fraction (F) right; 6. LSF Shift fraction left; 7. MDone Floating point multiplication is complete. The state graph for the multiplier control (Figure 4) is similar to the state graph of 2 s complement multiplier dealt in chapter design of networks for arithmetic operation, except the load state is not needed, because the registers are loaded by the main

controller. When Adx 1, the multiplier is started, and done is turned on when the multiplication in completed. Figure 4 : state graph for multiplier control MAIN CONTROLLER FOR FLOATING POINT MULTIPLICATION Fig.2 Block diagram showing the main controller & its I/O signals

The SM Chart (Fig.3) for the main controller shown in Fig 2 above of the floating point multiplier is based on the flowchart shown in Fig.1. In the SM chart the controller for the multiplier is a separate state machine, which is linked into the main controller. The SM chart uses the following inputs and control signals 1. St Start the floating point multiplication. 2. Mdone Fraction multiply is done 3. FZ Fraction is Zero 4. FV Fraction overflow. 5. Fnrom F is normalized. 6. EV Exponent overflow 7. Load Load F 1, E 1, F 2, E 2 into the appropriate registers (also clear A in preparation for multiplication) 8. Adx Add exponents; this signal also starts the fraction multiplier. 9. SM8 Set exponent to minus 8 10. RSF Shift fraction right; also increment E. 11. LSF Shift fraction left; also decrement E. 12. V Overflow indicator. 13. Done Floating point multiplication is complete. The SM chart for the main controller has four states. In S0, the registers are loaded when the start signal is 1. In S1, the exponents are added, and fraction multiply is started. In S2, wait until the fraction multiply is done and then test for special cases and take appropriate action. It may seem surprising that the tests on FZ, FV, and Fnorm can all be done in the same state, since they are done in sequence on the flowchart. However, FZ, FV, and fnorm are generated by combinational circuits that operate in parallel and hence can be tested in the same state. However, we must wait until the exponent has been

incremented or decremented at the next clock before we can check for exponent overflow in S3, In S3, the Done signal is turned on and the controller waits for ST 0 before returning to S0. Figure 3: SM Chart for floating-point Multiplication

The VHDL behavioral description in the program uses three processes. The main process generated control signals based on the SM chart. A second process generates the control signals for the fraction multiplier. The third process tests the control signals and updates the appropriate registers on the rising edge of the clock. In state S2 of the main process, A 0000 implies that F 0 (FZ 1 on SM chart). If we multiply 1,000 x 1.000, the result is A&B 01000000, and a fraction overflow has occurred (FV 1). If A(2) A(1), i.e., the sign bit of F and the following bits are the same then F is unnormalized (Fnorm 0). In state S3, if the two high order bits of X are different, an exponent overflow has occurred (EV 1)