Hardware Modules for Safe Integer and Floating-Point Arithmetic

Hardware Modules for Safe Integer and Floating-Point Arithmetic A Thesis submitted to the Graduate School Of The University of Cincinnati In partial fulfillment of the requirements for the degree of Master of Science in the Department of Electrical Engineering and Computing Systems of the College of Engineering and Applied Science 2013 by Amrita Ratan BS, North Dakota State University, 2010 Committee Chair: Dr. Carla Purdy i

ABSTRACT Integer and floating-point data types are widely used to represent numerical data in computer arithmetic. Since the range of values representable in a computer are limited, arithmetic operations on these data types can lead to overflows and underflows. More often than not, silent overflows and underflows are permitted by software and even though hardware records these errors, it may not be designed to handle them. These errors can be a threat to security and can reduce the reliability of software. In this work we present a hardware design for an Arithmetic and Logic Unit (ALU), and for a Floating- Point Unit (FPU). The ALU and FPU record and gracefully handle overflows and underflows. They have three modes of operation and handle overflows and underflows differently in each mode. The mode may be selected based on the requirement of the application. The designs have been implemented in Verilog and have been verified on the Altera DE2-115 Board. This method can be modified for use in any hardware design. ii

iii

Table of Contents 1. OVERVIEW... 1 2. BACKGROUND... 3 2.1 Software Reliability... 3 2.2 Integer Exceptions... 4 2.3 Floating-Point Exceptions... 7 2.4 Aim of the Thesis... 9 3. APPROACH... 10 3.1. Structure of This Chapter... 10 3.2 ALU and FPU Design Outline... 10 3.3 ALU for Handling Integer Overflows and Underflows... 15 3.3.1 Addition and Subtraction... 15 3.3.2 Multiplication... 16 3.3.3 Division... 17 3.4 FPU Design for Handling Floating-Point Overflows, Underflows, Invalid and Divide-by-Zero Exceptions... 20 3.4.1 Addition and Subtraction... 22 3.4.2 Multiplication... 26 3.4.3 Division... 28 3.5 ALU and FPU Implementation on Altera s DE2-115 Board... 31 4. RESULTS... 34 4.1 Results of Arithmetic Operations Performed by the ALU... 34 4.1.1 Addition... 34 4.1.2 Subtraction... 38 4.1.3 Multiplication... 42 4.1.4 Division... 44 4.2 Results of Arithmetic Operations Performed by the FPU... 47 4.2.1 Addition... 47 4.2.2 Subtraction... 54 4.2.3 Multiplication... 61 4.2.4 Division... 67 iv

5. CONCLUSIONS AND FUTURE WORK... 74 REFERENCES... 75 v

List of Figures Figure 3.1: Flowchart describing the three modes of ALU/FPU operation... 13 Figure 3.2: Modular design of the ALU... 14 Figure 3.3: Modular design of the FPU... 14 Figure 3.4: Verilog code implementing partial product generation from Booth algorithm... 17 Figure 3.5: Verilog implementation of first shift-add/subtract cycle in nonrestoring division algorithm.. 19 Figure 3.6: The sign, exponent and significand fields in single precision and double precision formats... 21 Figure 3.7: Monitoring significand overflow and corresponding exponent overflow during addition... 23 Figure 3.8: Flowchart describing basic process for recording overflow, underflow and invalid operation in addition and subtraction in FPU... 25 Figure 3.9: Flowchart describing process for recording overflow, underflow and invalid operation in multiplication in FPU... 28 Figure 3.10: Logic for determining overflow during floating-point division... 30 Figure 3.11: Flowchart describing process for recording overflow, underflow and invalid operation in division in FPU... 32 vi

List of Tables Table 3.1: Booth multiplier partial product generation and multiplier recoding... 17 Table 3.2: Base 2 representation of floating-point data in single precision format... 22 Table 4.1: Maximum and minimum values representable by operands A and B and result Sum... 35 Table 4.2: Results of integer number addition with ALU set in each one of its three modes... 38 Table 4.3: Maximum and minimum values representable by operands A and B and result Difference... 39 Table 4.4: Results of integer number subtraction with ALU set in each one of its three modes... 42 Table 4.5: Maximum and minimum values representable by operands A and B and result Product.. 43 Table 4.6: Results of integer number multiplication with ALU set in its first mode... 44 Table 4.7: Maximum and minimum values representable by operands A and B and result Remainder and Quotient... 46 Table 4.8: Results of integer number division with ALU set in each one of its three modes... 47 Table 4.9: Maximum and minimum values represented by variables A, B and Sum... 49 Table 4.10: Results of floating-point addition with FPU set in each one of its three modes... 54 Table 4.11: Maximum and minimum values representable by variables A, B and Difference... 55 Table 4.12: Results of floating-point subtraction with FPU set in each one of its three modes... 60 Table 4.13: Maximum and minimum values representable by variable A, B and Product.... 62 Table 4.14: Results of floating-point multiplication with FPU set in each one of its three modes... 67 Table 4.15: Maximum and minimum values representable by variables A, B, Quotient and Remainder.. 69 Table 4.16: Results of floating-point division with FPU set in each one of its three modes... 73 vii

1. OVERVIEW Computer arithmetic is a well-established area of research and arithmetic units are of utmost importance to digital computers [32][33]. With advances in processor design and the emergence of new technologies, research in the area of computer arithmetic has also been evolving. Core areas such as computer graphics, cryptography, digital signal processing, telecommunications and multimedia processing make heavy use of arithmetic operations and fuel research in the computer arithmetic domain [32]. Some of the important topics in this area are: design of fast algorithms for operations such as multiplication and division, high-performance and low power design for operations such as multiplication, fast and correct rounding in compliance with IEEE 754-2008 standard for floating-point data and use of logarithmic and residue number systems for high-speed algorithms. Among several other important aspects is the aspect of reliability and higher numeric precision in computations [34]. Overflows and underflows reduce the accuracy and reliability of results from arithmetic computations. In this work we examine overflows and underflows that result from computations on integer and floating-point data. We present our hardware logic design of an Arithmetic and Logic Unit (ALU) that performs basic arithmetic operations on integer data and records and gracefully handles overflows and underflows. We also present our hardware logic design for a Floating-Point Unit (FPU) that performs basic arithmetic operations on floating-point data and records and gracefully handles overflows, underflows, invalid operations and the divide-by-zero exception. Both of our designs are implemented in Verilog and have been verified in an embedded system environment on the Altera DE-II board. This thesis is organized in the following manner: Chapter 2 describes the impact of overflows and underflows and other arithmetic exceptions on the accuracy of computer arithmetic and the previous work done in this area. Chapter 3 describes the detailed design of our ALU and FPU. Chapter 4 presents 1

a summary of the results obtained from thoroughly simulating our designs and from downloading them onto hardware. Chapter 5 presents our conclusions and future work. 2

2. BACKGROUND This chapter presents a brief description and discussion of the impact of overflows and underflows incurred in integer arithmetic and overflows, underflows and other exceptions incurred in floating-point arithmetic. The goals of the thesis are also presented in this chapter. 2.1 Software Reliability It would be ideal if software always worked as intended by its authors and such software would also be much more reliable and easier to safeguard from attacks [2]. One barrier to reliability is errors in numeric computation caused by the limited size of numeric variables. Here we wish to study errors of this type in both integer and floating-point data and describe ways to avoid such errors. We will study the errors that occur as a result of the four basic and ubiquitous arithmetic operations: addition, subtraction, multiplication and division. In computer arithmetic, the range of values that can be represented by integer or floating-point data types is limited. For this reason, arithmetic operations on numeric data can cause overflows, underflows, truncation and sign errors [1]. For example, an overflow occurs when a numeric variable is assigned a value that is greater than the maximum value that it can store. Similarly, underflows occur when a number is too small to be represented by its numeric variable. We will examine integer and floating-point overflows separately. The integer data type is used heavily in computer programming and in computer arithmetic. It is widely used to store integer data, memory addresses, loop variables and array indices and to perform integer arithmetic. Because of their frequent use, integers are also often a part of security-sensitive code. An application s reliability can therefore be compromised by the manipulation of integers especially when they are used in security-sensitive applications [4]. Unexpected and inaccurate results can be obtained 3

during the manipulation of integers if integer overflow and underflow conditions are not detected and handled. Floating-point numbers represent real numbers which are used extensively in scientific and medical computing, engineering applications and even finance [16][17]. Designing a computer platform also requires an in-depth knowledge of how floating-point numbers work as this knowledge is vital to designing instruction sets, operating systems, programming languages and compilers [17]. Real numbers are of course an infinite set of numbers. Their limited size representation in computer arithmetic leads to inaccuracy in floating-point number representation and arithmetic. Errors are also introduced when data represented in base 10 must be converted to or from the internal machine representation, which is typically base 2 or base 16 [23]. For this study, we will be focusing on errors generated due to arithmetic operations on floating-point data and we will not be dealing with errors introduced as a result of conversion of floating-point data to and from different formats. 2.2 Integer Exceptions An integer overflow occurs when an integer variable is assigned a value that is greater than the maximum value that can be represented by an integer. For example, the result produced by an arithmetic operation may be larger than the maximum value that can be represented by an integer variable. Similarly, a simple assignment statement or a type conversion operation to change a variable s data type can also cause an overflow. Therefore, at times integer overflow conditions hidden in code can be quite difficult to identify. Integer overflow causes an incorrect value to be placed in integer variables but this value is predictable because of the wrap around effect of integer overflow. For example, an 8-bit signed integer that is assigned its maximum positive value, 127, when incremented by 1, changes value to -128 which is the 4

minimum value that it can hold. This behavior is referred to as the wrap around effect of integer overflow. This effect of overflow is also applicable to unsigned integer variables. The direct consequence of overflow is incorrect results from arithmetic computations but this flaw has also been exploited to crash systems or to create the environment for buffer overflow vulnerability in systems [7]. Buffer overflow is one of the most dangerous software vulnerabilities and it can be exploited to even take total control of a target system [3]. It can be easily induced when the integer used to denote buffer size is made to overflow. Integer overflows are therefore hazardous, exploitable and expensive [13]. They have been positioned by MITRE in its 2011 list of Top 25 Most Dangerous Software Errors [13]. They are also placed among one of the two top sources of software vulnerabilities in operating systems [14]. Some of the well-known vulnerabilities caused by integer overflow are mentioned below: Integer overflow vulnerability was found in the Apple QuickTime/Darwin Streaming Server. By passing an extremely large value to one of its modules, it was possible to crash the server. It was also suspected that the issue could be exploited to corrupt memory [4][8]. Integer overflow vulnerability was discovered in Snort, a network intrusion prevention system. As a result, it was possible to trigger an integer overflow which in turn could be used to trigger a buffer overflow, by sending specially modified packets over a network monitored by Snort [4][9]. In some old versions of FishCart, an e-commerce system, integer overflow vulnerability was found in its arithmetic rounding function. This could be exploited to cause negative totals by placing an order for a very big quantity of items [10][7]. Integer overflow vulnerability found in Mozilla Firefox 3.5.x before 3.5.11 and 3.6.x before 3.6.7 could have been exploited to execute arbitrary code on target systems [12][13]. 5

Most of the popular high level programming languages such as C, C++ and Java do not signal integer overflows and continue program execution with incorrect results [11]. Various software tools have been developed to manage this flaw in programming languages. A template class in C++ has been developed to prevent integer overflows [6]. Several operators such as addition, subtraction and assignment are overridden and throw an exception if an integer overflow condition is detected. The overridden operators detect overflow conditions before executing operations, thereby preventing the generation of incorrect results. An As-if Infinitely Ranged Integer Model for C and C++ has been presented to indicate vulnerabilities arising from integer overflow, truncation and unexpected wrapping. A runtime constraint handler is invoked if the result of an operation is not equal to the value that would have been generated had the integer been capable of representing values over an infinite range [15]. The GCC compiler has a ftrapv flag for creating traps for signed integer overflows [1]. This compiler is able to detect overflows that arise from addition, subtraction and multiplication. A compiler extension, IntPatch, has been designed to specifically mend integer overflows in C and C++ that can lead to buffer overflows [14]. A tool named COJAC has been developed for identifying integer overflows in Java bytecode and indicating overflows at runtime [11]. This tool identifies overflows arising from arithmetic operations and truncation errors resulting from narrowing type-cast operations. Integer overflow conditions are often recorded at the hardware level. For example, most of the processors based on 32-bit Intel architecture have overflow and carry flags that signal overflows [1]. Overflow is recorded by common hardware implementations of integer addition and subtraction such as the ripple-carry adder and carry-lookahead adder. Multiplication and division circuits with overflow detection are also available [18][19]. 6

2.3 Floating-Point Exceptions A floating-point number consists of a sign bit, an exponent field and a significand field to represent real numbers. The exponent and significand fields are implemented using fixed-length integers [20]. Truncation and rounding techniques have to be employed for representing real numbers with limitedsized exponent and significand fields, thereby leading to loss of precision[16][17]. Conversion of floatingpoint numbers from decimal to binary format and vice versa introduces inaccuracies as well [28]. Arithmetic operations on floating-point numbers can also generate inaccurate and unreliable results as they can cause overflow if the result is too large or an underflow if the result is too small to be represented by the floating-point variable [20]. Floating-point computations can cause an overflow of the exponent or the significand. For example, during multiplication, the exponents of the multiplier and multiplicand are added and this may cause an exponent overflow if the sum is too big. Similarly, exponent underflow can occur if the sum is smaller than the smallest representable number. Also, while adding significands during an addition operation, a significand overflow can occur when a carry-out is produced from the most significant bit position. Significand overflow, in this case, is handled by right-shifting the significand and decrementing the exponent [20]. As per the IEEE754-2008 floating-point standard, overflow and underflow exceptions are raised to handle exponent overflow and exponent underflow respectively [23]. The IEEE754-2008 floating-point standard also provides the invalid, inexact and the divide-by-zero exception. If the result of an arithmetic operation has no acceptable representation, the invalid exception is raised [23]. An inexact operation is raised when the exact result of a computation is not representable after rounding [22][31]. The divide-by-zero exception is signaled during a division operation, when an attempt to divide a finite, non-zero number by zero is made [23]. For this work we 7

will study overflows, underflows, invalid and divide-by-zero exceptions generated by arithmetic operations on floating-point numbers. Some of the famous bugs involving floating-point numbers are mentioned below [16]: Intel Pentium processor s first version performed flawed floating-point division. For some division operations the processor would return extra incorrect digits. This was due to a bad implementation of the division algorithm. Version 7.0 of Maple, a tool for mathematics and modeling, had a bug wherein the operation of (1001!)/(1000!) would incorrectly generate 1 as result. Some versions of Microsoft Excel had bugs related to binary to decimal conversion inaccuracies and errors. One of the prominent errors was produced during the computation of 65536 2-37 generating a result of 100001. To handle overflows, floating-point arithmetic provides a representation for the positive and negative infinities, which are used in order to continue with the computation when an overflow occurs. In a similar fashion, NaNs (Not a Number) are used to handle invalid operations [29]. Subnormal numbers are provided to represent very small numbers close to zero and their representation is not normalized, that is, the left most bit of the significand of a subnormal number is not a one. They handle underflow conditions by allowing underflow to be gradual [17]. The default response to these exceptions is to employ the special numbers to generate a result and proceed with the computation. Nevertheless, the IEEE754-2008 standard also supports the use of trap handlers to specify the value returned when an exception is raised [29]. A novel floating-point representation is presented in [30] by S. Matsui et al for preventing overflows and underflows in floating-point arithmetic. This work defines a representation with a variable border 8

between the exponent and significand field. A floating-point number system with an exponent field larger than the significand field was also proposed to reduce chances of overflows and underflows [30]. The work in [31] presents procedures to handle exceptions in several floating-point computations from a circuit design point of view. In this work we will present our hardware design for an FPU for gracefully handling the overflow, underflow, invalid and divide-by-zero exceptions. 2.4 Aim of the Thesis Accuracy in computer arithmetic is essential for developing robust and reliable computer systems. Integer and floating-point numbers are used extensively for representing numerical data in computer arithmetic and they have a serious impact on the reliability of the system. The aim of the thesis is to propose a hardware-level design for an Arithmetic and Logic Unit (ALU) and a Floating-Point Unit (FPU). Our design records overflows, underflows, invalid and divide-by-zero exceptions and gracefully handles the overflows and underflows incurred from arithmetic operations performed on integer and floatingpoint data. We present the design of an ALU and an FPU which can operate in three modes for gracefully handling overflow and underflow conditions. Each mode of operation of the ALU and FPU has a specific way of handling overflows and underflows. At present, our FPU design does not perform rounding and therefore does not record the inexact exception which is determined after rounding [31]. We have developed the ALU and FPU design in the Verilog hardware description language. We also describe an implementation in an embedded system environment on the Altera DE-II board. 9

3. APPROACH 3.1. Structure of This Chapter This chapter presents the detailed design of the ALU and the FPU and the algorithms used for recording overflow, underflow and the other exceptions. Section 3.2 describes in detail the three modes of operation of the ALU and the FPU. It describes how overflows, underflows and other exceptions are handled in each of the three modes. It also describes the modular nature of our ALU and FPU design. Section 3.3 describes how overflows and underflows are recorded by the ALU while performing arithmetic operations on integer numbers. Separate subsections are given to discuss addition and subtraction, multiplication and division operations. Each subsection describes the algorithm used for generating the result of the arithmetic operation and describes the overflow and underflow conditions that need to be recorded. The division-by-zero exception is also recorded by the ALU. Section 3.4 describes how overflows, underflows, inexact, invalid and division by zero exceptions are recorded by the FPU. Separate subsections are given to discuss addition and subtraction, multiplication and division operations. Each subsection describes the algorithm used for generating the result of the arithmetic operation. The algorithm for recording overflow, underflow and other exceptions is also described in each subsection. Section 3.5 describes the implementation of the ALU and FPU design on Altera s DE2-115 board. 3.2 ALU and FPU Design Outline This chapter describes the design of the ALU for handling integer number overflows and underflows as well as the design of the FPU for handling floating-point number overflows, underflows and the inexact, invalid and divide-by-zero exceptions. The ALU and FPU have been implemented in the Verilog hardware 10

description language and their design has been simulated in the ModelSim environment. Their design has also been verified on the Altera DE-II board. For this study we examine the overflows and underflows that occur as a result of arithmetic operations. We have focused on the operations of addition, subtraction, multiplication and division as they are simple to understand and can be easily implemented to study overflows with clarity. Therefore, the ALU and FPU developed for the purpose of this study are designed to perform these four operations. To handle overflows, underflows and the other exceptions, the ALU and FPU can operate in one of the following three modes: I. In the first mode, an overflow or underflow is recorded but computation proceeds with incorrect results without signaling overflow or underflow. II. In the second mode, overflow and underflow is recorded and signaled and execution is terminated upon overflow or underflow. An error message is also displayed to the user. III. In the third mode, overflow or underflow is recorded and signaled and the system displays the value of the result after an overflow or an underflow occurs. After displaying the result, an error message is displayed to the user and the user has the option to continue the program execution or to enable a trap. The trap is to be defined by the user as per their requirement for handling overflow and underflow conditions. The ALU and FPU can be set in one of these three modes depending on how the overflows and underflows are to be handled. A flowchart describing the functionality of the ALU and FPU in their three modes of operation is shown in Figure 3.1. The ALU and FPU designs are modular and can be modified to expand their functionality and implement other arithmetic operations as well. Also, the algorithms that we have used for addition, subtraction, 11

multiplication and division can be replaced by other algorithms as appropriate. We have used the ripplecarry addition and subtraction algorithm, the Booth multiplier algorithm, the sequential add-shift multiplication algorithm and the non-restoring division algorithm for implementing arithmetic operations in the ALU and FPU. These algorithms have been implemented in Verilog and are based on their Verilog implementation provided in [20] by J. Cavanagh. Only the basic structure of the algorithms has been referred to from the work in [20]. The circuitry for detecting and recording overflow, underflow and the invalid and divide-by-zero exceptions has been developed by us and described in this work. Our ALU design processes integer numbers that are 8 bits or 16 bits wide. Our FPU processes numbers that are in the single precision or double precision format. This feature of our ALU and FPU design can be modified to process numbers of any required bit size. The high-level overview of the modular design of the ALU is shown in Figure 3.2. Figure 3.3 presents the same information for the FPU design. 12

START Obtain the ALU/FPU mode from user input Is ALU/FPU set to its first mode? Yes Perform arithmetic operation as specified by user input. In case of overflow/underflow, record the condition and continue execution. No Is ALU/FPU set to its second mode? No Is ALU/FPU set to its third mode? Yes Yes Perform arithmetic operation as specified by user input. In case of overflow/underflow, signal the condition and terminate execution. Perform arithmetic operation as specified by user input. In case of overflow/underflow signal the condition, display result and based on user s choice either continue execution or enable trap. STOP Figure 3.1: Flowchart describing the three modes of ALU/FPU operation 13

ALU Addition and Subtraction Module Algorithm: Ripple-Carry Addition and Subtraction Size of operands: 8 bit operands and 8 bit result Division Module Algorithm: Non-restoring Division Size of operands: 16 bit dividend, 8 bit divisor, 8 bit quotient, 8 bit result Multiplication Module Algorithm: Booth Multiplier Size of operands: 8 bit multiplier and multiplicand and 16 bit result Figure 3.2: Modular design of the ALU FPU Addition and Subtraction Module Algorithm: Ripple-Carry Addition and Subtraction Size of operands: single precision operands and single precision result Division Module Algorithm: Non-restoring Division Size of operands: double precision dividend and single precision divisor, quotient and remainder Multiplication Module Algorithm: Sequential Add-Shift Multiplication Size of operands: single precision operands and double precision result Figure 3.3: Modular design of the FPU 14

3.3 ALU for Handling Integer Overflows and Underflows The ALU is composed of separate modules for implementing the four arithmetic operations. The ALU design implements the following three steps: 1. The user selects the mode in which the ALU will perform computations. The ALU can function in one of the three modes mentioned in section 3.1. 2. In the next step, the ALU accepts the operands and operator (addition, subtraction, multiplication or division) from the user. 3. The ALU performs the arithmetic operation that is specified and handles overflow and underflow conditions based on the ALU mode that is selected. 3.3.1 Addition and Subtraction Adding or subtracting signed integers can cause an overflow if the sum is greater than the maximum value that can be represented by an integer variable [6][20]. Addition and subtraction can cause an underflow when the result is a negative value that is too small to be represented by the integer variable [6][20]. Both of these conditions are detected and handled by our ALU. The ripple-carry adder is used for performing both addition and subtraction. We decided on this algorithm because it is simple to implement [20]. Also, the ripple-carry adder utilizes straightforward logic to detect overflow and underflow. An overflow is recorded if the carry-outs from the two most significant bits of the result are different. Signed integer numbers are represented in 2 s complement notation for the addition and subtraction operations. An overflow bit is used to record overflow and underflow. 15

3.3.2 Multiplication Multiplication of two integers does not cause an overflow or underflow if the size of the product variable is twice the size of the multiplier or the multiplicand [21]. For example, for an 8 bit multiplier and multiplicand, if the size of the product is 16 bits, an overflow or underflow will never occur. In our implementation, the size of the product is twice the size of the multiplier and multiplicand and an overflow is not generated when multiplying integers. Nevertheless, all the three modes of handing overflows have been implemented for the multiplication operation. This makes it possible to handle overflows or underflows using the ALU if the multiplication operation is implemented with an algorithm that generates a product which has the same size as the operands. For our ALU design, we have implemented the Booth algorithm to perform multiplication on signed integers as it is able to process both positive and negative operands represented in the 2 s complement format [20]. The approach of the Booth algorithm is to decrease the number of partial products required for performing a multiplication operation. The algorithm recodes the multiplier, to change strings of consecutive ones to strings of consecutive zeros. The string of zeroes thus obtained will not generate partial products [36]. The algorithm begins by examining the least significant bit of the multiplier say, Bit i, and the bit to the right of the least significant bit, Bit i + 1. It recodes the multiplier to generate corresponding partial products as shown in the Table 3.1 [36]. This step is repeated for all the bits of the multiplier. A few lines of Verilog code from our implementation of the Booth algorithm are shown in Figure 3.4. This figure shows the recoding of a pair of bits of the multiplier to generate the corresponding partial product. 16

Bit i + 1 Bit i Partial Product Generated 0 0 0 * multiplicand 0 1 +1 * multiplicand 1 0-1 * multiplicand 1 1 0 * multiplicand Table 3.1: Booth multiplier partial product generation and multiplier recoding /* a: multiplicand, b: multiplier Examine bits b[2] and b[3] to determine corresponding partial product pp4 */ always @* begin case(b[3:2]) 2'b00: pp4=16'h00; 2'b01: pp4={a_ext_pos[12:0], 3'b000}; 2'b10: pp4={a_ext_neg[12:0], 3'b000}; 2'b11: pp4=16'h00; endcase end Figure 3.4: Verilog code implementing partial product generation from Booth algorithm 3.3.3 Division Division of two integers causes an overflow when the quotient is too big or an underflow when the quotient is too small to be represented by its integer variable [6]. For example, dividing a very large number by a very small number can generate overflow. For this study we have implemented the sequential shift-ad/subtract nonrestoring division algorithm, as it is simple to implement. The nonrestoring division algorithm does not handle numbers in 2 s complement representation. Therefore, 17

our ALU design performs division on only positive integers. Since the ALU design is modular, our division algorithm can be replaced by other division algorithms as required. In our implementation the quotient and remainder are half the bit size of the divisor. An overflow bit is used to record quotient overflow and underflow. The sequential shift-add/subtract nonrestoring division algorithm will generate only overflow exceptions since the dividend and divisor will be positive integers. However, if a division algorithm for handling both positive and negative integers is used, the overflow bit will record both overflow and underflow. The ALU is able to handle quotient overflows and underflows in each one of its three modes. The ALU also records and signals the invalid operation of divide-by-zero. The sequential shift-add/subtract nonrestoring division algorithm is a variation of the sequential shiftadd/subtract restoring division algorithm [24]. The sequential shift-add/subtract algorithm performs division by subtracting the 2 s complement of the divisor from the dividend and if the resulting partial product is a negative integer, it is restored by adding the divisor to the partial remainder. The nonrestoring division algorithm eliminates the restoring step and utilizes both negative and positive partial remainders [24]. Figure 3.5 shows a part of our Verilog implementation of the nonrestoring division algorithm. The nonrestoring division algorithm can be described by the following steps: 1. Left shift the dividend and subtract the divisor from the dividend. 2. If the partial product is a negative integer, the corresponding quotient bit obtained is a 0. For the next shift-add/subtract cycle, left shift the partial remainder and add the divisor to the partial product. 18

3. If the partial product generated is a positive integer, the corresponding quotient bit obtained is 1. For the next shift-add/subtract cycle, left shift the partial remainder and subtract the divisor from the partial product. 4. The shift-add/subtract cycle is repeated for n times, where n is the number of bits in the divisor. An additional restore cycle may be required if the final remainder has a sign that is different from the divisor. /* a: dividend, b: divisor rslt_temp: holds partial remainder, quotient and also holds remainder after the final shift-add/subtract cycle is completed This is the first shift-add/subtract cycle */ rslt_temp = a; if((a!=0)&&(b!=0)) begin rslt_temp = rslt_temp << 1; rslt_temp = {(rslt_temp[7:4]+ b_neg), rslt_temp[3:0]}; if (rslt_temp[7] == 1) begin rslt_temp = {rslt_temp[7:1], 1'b0}; part_rem_7 = 1'b1; end else begin rslt_temp = {rslt_temp[7:1], 1'b1}; part_rem_7 = 1'b0; end end else rslt_temp = rslt_temp; Figure 3.5: Verilog implementation of first shift-add/subtract cycle in nonrestoring division algorithm Overflow or underflow is detected by subtracting the high-order half of the divisor from the high-order half of the dividend. If the high-order half of the dividend is greater than the high-order half of the 19

divisor, an overflow will occur. This condition for recording overflow is employed in our implementation of sequential shift-add/subtract algorithm to determine the value of the overflow bit. 3.4 FPU Design for Handling Floating-Point Overflows, Underflows, Invalid and Divide-by-Zero Exceptions The FPU comprises of separate modules for implementing the four arithmetic operations. The FPU design implements the following three steps: 1. The user selects the mode in which the FPU will perform computations. The FPU can function in one of the three modes mentioned in section 3.1. 2. In the next step, the FPU accepts the operands and operator (addition, subtraction, multiplication or division) from the user. 3. The FPU performs the arithmetic operation that is specified and handles overflow conditions based on the FPU mode that is selected. The FPU also records special numbers available in floating-point data. Our FPU design is compliant with the IEEE 754-2008 Standard for Binary Floating-Point Arithmetic. Figure 3.6 shows the width of the sign, exponent and significand fields as specified in the 32 bits single precision and 64 bits double precision formats. The IEEE standard for floating-point data recommends that floating-point data stored in base 2 should be normalized [23]. The normalized representation requires that the most significant bit of the significand of any normalized non-zero, finite quantity should always be a 1. In the normalized representation, the first bit of the significand can be determined from the exponent encoding. Therefore the first bit of the significand of normalized floatingpoint data does not need to be stored and this bit is referred to as the hidden bit and the rest of the significand is called the trailing significant. 20

31 23 22 0 Sign bit Exponent Single Precision Format Significand 64 52 51 0 Sign bit Exponent Significand Double Precision Format Figure 3.6: The sign, exponent and significand fields in single precision and double precision formats The standard also has specifications for subnormal numbers which are numbers that are not normalized and the most significant digit of their significand is always a 0 [28]. The special quantities represented in floating-point data are NaNs (not a number), positive and negative infinities and positive and negative zeroes. The special quantities and subnormal numbers are not normalized [23]. Table 3.2 shows the binary encoding in single-precision format for the number 2, the special quantities and the subnormal number 5.877x10-39. It must be noted that the NaN encoding for our FPU design is shown in Table 3.2. Our FPU is designed to handle single precision floating-point data, double-precision floating-point data, normalized numbers, subnormal numbers and special quantities. One or more guard bits can be used for increasing accuracy during rounding-off floating-point data. The guard bits are placed to the right of the least significant bit of the significand [35]. For our FPU design, we have not included guard bits for manipulating floating-point data. 21

Floating-Point Data Sign Bit (1 bit wide) Biased Exponent (8 bits wide) Trailing Significant (23 bits wide) 2 0 1000_0000 000_0000_0000_0000_0000_0000 5.877x10-39 0 0000_0000 100_0000_0000_0000_0000_0000 +0 0 0000_0000 000_0000_0000_0000_0000_0000-0 1 0000_0000 000_0000_0000_0000_0000_0000 + 0 1111_1111 000_0000_0000_0000_0000_0000-1 1111_1111 000_0000_0000_0000_0000_0000 NaN 0 1111_1111 100_0000_0000_0000_0000_0000 Table 3.2: Base 2 representation of floating-point data in single precision format 3.4.1 Addition and Subtraction Addition or subtraction of floating-point numbers can cause an overflow if the result cannot be expressed by the maximum value of the exponent and can cause an underflow if the result cannot be expressed by the minimum value of the exponent [22, 27]. In case of underflow, the result is recorded as a subnormal number. An invalid result is also recorded by the FPU. An invalid operation occurs when adding together or subtracting the positive and negative infinities from each other. Therefore the cases of (+ ) + (- ), (- ) + (+ ), (+ ) (+ ) and (- ) (- ) are invalid operations, as specified in the IEEE 754-2008 standard [23]. An invalid result is also produced after floating-point addition or subtraction if one or both the operands are a NaN [23]. In our FPU design, the result and the operands are represented in the 32-bit single precision format. Separate bits are used to record exponent overflow and underflow and invalid exceptions. The ripplecarry adder/subtractor has been implemented for performing addition and subtraction, as it is simple to implement. This implementation is also employed to record significand overflow. Significand overflow or 22

underflow can be easily handled by right-shifting the significand and incrementing or decrementing the exponent accordingly [20]. However, it must be remembered that this can cause an exponent overflow or underflow. This condition is monitored in our Verilog implementation of floating-point addition and subtraction. As an example, lines of our Verilog code for monitoring this condition during the addition operation are shown in Figure 3.7. /* cout: carry-out obtained from ripple-carry adder sum_24precision: sum obtained from ripple-carry adder */ cout = cout_temp; sum_24precision = sum_temp; if(cout == 1) //significand overflow begin if(exp_b_bias == 8'b1111_1110 )//exponent monitored for overflow begin overflow_bit = 1'b1; end else //increment exponent if no exponent overflow begin {cout, sum_24precision } = {cout, sum_24precision} >> 1; exp_b_bias = exp_b_bias + 1; end end Figure 3.7: Monitoring significand overflow and corresponding exponent overflow during addition The basic steps employed in the FPU for performing addition/subtraction, based on the work in [20], and for recording overflow, underflow and invalid operations are given below: 1. Accept normalized operands. 2. Determine hidden bit from biased exponent encoding. 3. Check for invalid and NaN operands and determine result accordingly. 4. Check for zero operands. 23

5. If one of the operands is zero, then the sum is equal to the value of the non-zero operand. If both operands are zero then the result is computed as per the IEEE754-2008 standard. 6. Align the significands by right-shifting the significand of the smaller exponent and incrementing the smaller exponent. 7. Add/Subtract the significands. 8. Value of exponent of the result is equal to the value of the exponent of the larger operand. 9. Check for significand overflow or underflow and shift the significand as well as increment or decrement the exponent accordingly. 10. Check for exponent overflow or underflow. 11. Normalize the result. 12. Check for exponent overflow or underflow. The above steps are also presented in a flowchart format in Figure 3.8. 24

START Read in normalized operands Invalid or NaN operands? Yes Record invalid exception No Any zero operands? Yes Sum is equal to non-zero operand. No Align significands. Perform addition/subtraction. Significand overflow or underflow? Yes Shift significand and exponent No Normalize result Exponent overflow or underflow? STOP No Yes Record overflow or underflow Figure 3.8: Flowchart describing basic process for recording overflow, underflow and invalid operation in addition and subtraction in FPU 25

3.4.2 Multiplication Multiplication of real numbers is performed by multiplying the significands and adding the exponents. The result s exponent can overflow or underflow while addition thereby signifying that the product is too big or too small to be represented by its floating-point variable [25]. In case of underflow, the result is recorded as a subnormal number. The FPU also records an invalid operation when one or both operands is a NaN. The FPU also records invalid multiplication operations of (±0) * (± ), as specified by the IEEE754-2008 standard [23]. We have implemented the sequential add-shift multiplication algorithm in Verilog for multiplying the significands as it is simple to implement for multiplying the 24 bits (including the hidden bit) of the multiplier and multiplicand s significands. The sequential add-shift multiplication algorithm is very similar to the typical multiplication algorithm performed with a paper and pencil [36]. In our implementation, the multiplier and multiplicand are in the single precision format. The product is in the double precision format which ensures that multiplication will not cause an exponent overflow unless one or both of the operands is negative or positive infinity. The FPU is designed to record exponent overflow and underflow using an overflow bit and underflow bit respectively. Therefore, the FPU is also able to handle exponent overflow if an implementation different from our implementation is used. Invalid operations are recorded by using a separate invalid bit. The basic steps for recording overflow, underflow and invalid operations are presented in the flowchart in Figure 3.9. Also, the basic steps for performing multiplication, based on the work in [25], and for recording overflow, underflow and invalid operations are given below: 26

1. Accept normalized operands. 2. Determine hidden bit from biased exponent encoding. 3. Check for invalid combination or NaN operands and determine result accordingly. 4. Check for zero operands. If either one or both of the operands are zero, then the product is zero. 5. Determine the sign of the product. 6. Add the exponents. 7. Multiply the significands 8. Normalize product 9. Check for exponent overflow or underflow 27

START Read in normalized operands Invalid or NaN operands? No Any zero operands? Yes Yes Record invalid operation Set product to zero No Compute product s significand and exponent Normalize result Exponent overflow or underflow? STOP No Yes Record overflow or underflow Figure 3.9: Flowchart describing process for recording overflow, underflow and invalid operation in multiplication in FPU 3.4.3 Division Floating-point division is implemented by subtracting the exponents and performing division on the significands. The division operation can cause an exponent overflow or underflow [26] thereby signifying that the quotient is too big or too small to be represented by its floating-point variable. In case of 28

underflow, the result can be recorded as a subnormal number. The FPU records an invalid operation when one or both operands are NaN. The FPU also records invalid division operations specified by the IEEE 754-2008 standard. The invalid division operations are in (±0)/(±0) and (± )/(± ) formats [23]. We have implemented the sequential shift-add/subtract nonrestoring division algorithm in Verilog to perform the division operation because it is simple to implement. The algorithm has been explained in Section 3.3.3. This algorithm is able to perform division on both positive and negative divisor and dividends as floating-point signed numbers are represented in the sign-magnitude format. In our implementation the dividend confirms to the double precision format and the divisor confirms to the single precision format. The quotient is represented in the single-precision format. Significand overflow occurs if the higher order half of the dividend is greater than the higher order half of the divisor. This overflow can be prevented by right-shifting the dividend and incrementing its exponent by one [26]. However, after this step the exponent has to be checked for overflow. This logic is described in flowchart format in Figure 3.10. 29

START Higher-order half of dividend greater than higher-order half of divisor? No Yes Right-shift dividend s significand and increment dividend s exponent by one No Exponent overflow? Yes Set overflow bit STOP Figure 3.10: Logic for determining overflow during floating-point division The FPU records exponent overflow and underflow using overflow and underflow bits. Invalid operations are recorded by using a separate invalid bit. The basic steps for recording overflow, underflow and invalid operations are presented in the flowchart in Figure 3.11. Also, the basic steps for performing division, based on the work in [26], and for recording overflow, underflow and invalid operations are given below: 1. Accept normalized operands. 2. Determine hidden bit from biased exponent encoding. 30

3. Check for invalid combination of operands. 4. If divisor is zero, set quotient and remainder to zero. 5. Determine the sign of the quotient. 6. Check for division overflow by subtracting higher order half of dividend from higher order half of divisor. 7. In case of overflow, right-shift the dividend. Check for exponent overflow. 8. Subtract exponents to determine exponent of quotient. 9. Check for overflow or underflow. 10. Divide the significands. 11. Normalize the result. 12. Check for overflow or underflow. 3.5 ALU and FPU Implementation on Altera s DE2-115 Board The ALU and FPU designs have been implemented in the Verilog hardware description language as Verilog is portable. The ALU and FPU have been designed as combinational circuits. A combination of dataflow modeling and behavioral modeling code has been used for their Verilog implementation. The Verilog code is synthesizable and has been downloaded onto an FPGA board. For this study the designs have been downloaded and verified on Altera s DE2-115 board. This board has a Cyclone IV E FPGA chip which is used in a wide variety of general logic applications. This implementation also demonstrates the application of the ALU and FPU designs in embedded systems applications. As an example, our ALU and FPU designs can be used in a general purpose processor design that is implemented on Altera s FPGAs. Altera s DE2-115 board was primarily chosen for this implementation as it has sufficient input and output features to verify the design. The board s LCD display module, LEDs, seven segment display, push 31

START Read in normalized operands Invalid or NaN operands? No Dividend zero? No Yes Yes Record invalid operation Set quotient to zero Determine significand overflow by subtracting higher order halves of dividend Overflow? Yes Right shift dividend No Divide significands and normalize result No Exponent overflow or underflow? Yes Record overflow or underflow Exponent overflow or underflow? Yes Record overflow or underflow STOP No Figure 3.11: Flowchart describing process for recording overflow, underflow and invalid operation in division in FPU 32

buttons and toggle switches were employed for input and output to the board. The clock speed of this board was not an issue as the ALU and FPU have combinational designs. However, if there is a requirement to use the design at increased clock speed, faster arithmetic algorithms can be used instead of the more simplistic algorithms employed in the current design. 33

4. RESULTS This chapter presents summarized results from testing the addition, subtraction, multiplication and division operations performed by our ALU and FPU. Results are presented with the ALU and FPU set in each one of their three modes of operation. The chapter is divided into two sections. Section 4.1 presents results obtained from thoroughly testing the functionality of the ALU. It is divided into separate subsections for presenting results obtained from addition, subtraction, multiplication and division on integer numbers. Each subsection also describes which test cases were selected and why. Section 4.2 presents results obtained from thoroughly testing the functionality of the FPU. It is divided into separate subsections for presenting results obtained from addition, subtraction, multiplication and division on floating-point numbers. Each subsection also describes which test cases were selected and why. 4.1 Results of Arithmetic Operations Performed by the ALU 4.1.1 Addition An addition operation on signed integer variables can cause overflow as well as underflow [6]. The testing of addition operation has been performed on signed integer variables. The signed variables are represented in the 2 s complement format. The operands are named A and B and are 8 bits wide. In this chapter, a - sign is used for denoting an operand holding a negative value. For example, if operand B has a negative value then it is denoted by -B. The sum generated from addition is represented by a variable named Sum and it is also 8 bits wide. The overflow bit is named Overflow and it is used to record both overflows and underflows. Also, when the ALU operates in the third mode, a trap bit is employed to record the user s choice to enable a trap upon 34