Part III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1

Part III The Arithmetic/Logic Unit Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 1

About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 25, ISBN -19-515455-X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 154, Introduction to Computer Architecture, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. Behrooz Parhami Edition Released Revised Revised Revised Revised First July 23 July 24 July 25 Mar. 26 Jan. 27 Jan. 28 Jan. 29 Jan. 211 Oct. 214 Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 2

III The Arithmetic/Logic Unit Overview of computer arithmetic and ALU design: Review representation methods for signed integers Discuss algorithms & hardware for arithmetic ops Consider floating-point representation & arithmetic Topics in This Part Chapter 9 Chapter 1 Chapter 11 Chapter 12 Number Representation Adders and Simple ALUs Multipliers and Dividers Floating-Point Arithmetic Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 3

Preview of Arithmetic Unit in the Data Path Incr PC Next PC PC Next addr (PC) Instr cache inst jta rd 31 imm rs rt 1 2 Reg file / 16 (rs) (rt) 32 SE / 1 ALUOvfl Ovfl ALU Func ALU out Data addr Data in Data cache Register writeback Data out 1 2 op fn Register input Br&Jump RegDst RegWrite ALUSrc ALUFunc DataRead DataWrite RegInSrc Instruction fetch Reg access / decode ALU operation Data access Fig. 13.3 Key elements of the single-cycle MicroMIPS data path. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 4

9 Number Representation Arguably the most important topic in computer arithmetic: Affects system compatibility and ease of arithmetic Two s complement, flp, and unconventional methods Topics in This Chapter 9.1 Positional Number Systems 9.2 Digit Sets and Encodings 9.3 Number-Radix Conversion 9.4 Signed Integers 9.5 Fixed-Point Numbers 9.6 Floating-Point Numbers Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 5

Unsigned Binary Integers 111 1111 15 1 1 1 Turn x notches counterclockwise to add x 11 111 111 12 13 11 14 Inside: Natural number Outside: 4-bit encoding 2 3 5 4 11 11 1 14 13 12 11 1 15 9 8 1 2 3 4 5 6 7 11 1 11 9 8 1 7 6 111 11 Turn y notches clockwise to subtract y Figure 9.1 Schematic representation of 4-bit code for integers in [, 15]. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 6

9.2 Digit Sets and Encodings Conventional and unconventional digit sets Decimal digits in [, 9]; 4-bit BCD, 8-bit ASCII Hexadecimal, or hex for short: digits -9 & a-f Conventional ternary digit set in [, 2] Conventional digit set for radix r is [, r 1] Symmetric ternary digit set in [ 1, 1] Conventional binary digit set in [, 1] Redundant digit set [, 2], encoded in 2 bits ( 2 1 1 ) two and ( 1 1 2 ) two represent 22 Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 7

9.4 Signed Integers We dealt with representing the natural numbers Signed or directed whole numbers = integers {..., -3, -2, -1,, 1, 2, 3,... } Signed-magnitude representation +27 in 8-bit signed-magnitude binary code 1111 27 in 8-bit signed-magnitude binary code 1 1111 27 in 2-digit decimal code with BCD digits 1 1 111 Biased representation Represent the interval of numbers [-N, P] by the unsigned interval [, P + N]; i.e., by adding N to every number Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 8

Two s-complement Representation With k bits, numbers in the range [ 2 k 1, 2 k 1 1] represented. Negation is performed by inverting all bits and adding 1. 111 1111 1 + +1 1 1 Turn x notches counterclockwise to add x 11 111 111 2 3 4 5 _ +2 +3 + +4 +5 11 11 1 2 3 4 5 6 1 7 8 1 2 3 7 5 6 4 11 6 11 7 8 1 +7 +6 111 11 Turn 16 y notches counterclockwise to add y (subtract y) Figure 9.5 Schematic representation of 4-bit 2 s-complement code for integers in [ 8, +7]. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 9

Two s-complement Addition and Subtraction x k / c in Adder k / x y y k / Add Sub k / y or y c out Figure 9.6 Binary adder used as 2 s-complement adder/subtractor. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 1

9.5 Fixed-Point Numbers Positional representation: k whole and l fractional digits Value of a number: x = (x k 1 x k 2... x 1 x. x 1 x 2... x l ) r = S x i r i For example: 2.375 = (1.11) two = (1 2 1 ) + ( 2 ) + ( 2-1 ) + (1 2-2 ) + (1 2-3 ) Numbers in the range [, r k ulp] representable, where ulp = r l Fixed-point arithmetic same as integer arithmetic (radix point implied, not explicit) Two s complement properties (including sign change) hold here as well: (1.11) 2 s-compl = ( 2 1 ) + (1 2 ) + ( 2 1 ) + (1 2 2 ) + (1 2 3 ) = +1.375 (11.11) 2 s-compl = ( 1 2 1 ) + (1 2 ) + ( 2 1 ) + (1 2 2 ) + (1 2 3 ) =.625 Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 11

Fixed-Point 2 s-complement Numbers 1.111..1 1.11.125 + +.125.1 1.11.375.25 +.25 +.375.11 1.1.5 _ + +.5.1 1.11 1.1.625.75.875 1 +.625.11 +.75 +.875.11 1.1.111 1. Figure 9.7 Schematic representation of 4-bit 2 s-complement encoding for (1 + 3)-bit fixed-point numbers in the range [ 1, +7/8]. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 12

9.6 Floating-Point Numbers Useful for applications where very large and very small numbers are needed simultaneously Fixed-point representation must sacrifice precision for small values to represent large values x = (. 11) two Small number y = (11. ) two Large number Neither y 2 nor y / x is representable in the format above Floating-point representation is like scientific notation: -2 = -2 1 7 +. 7 = +7 1 9 Sign Significand Exponent base Exponent Also, 7E-9 Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 13

ANSI/IEEE Standard Floating-Point Format (IEEE 754) Revision (IEEE 754R) was completed in 28: The revised version includes 16-bit and 128-bit binary formats, as well as 64- and 128-bit decimal formats 8 bits, bias = 127, 126 to 127 Short (32-bit) format 23 bits for fractional part (plus hidden 1 in integer part) Short exponent range is 127 to 128 but the two extreme values are reserved for special operands (similarly for the long format) Sign Exponent 11 bits, bias = 123, 122 to 123 Significand 52 bits for fractional part (plus hidden 1 in integer part) Long (64-bit) format Figure 9.8 The two ANSI/IEEE standard floating-point formats. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 14

1 Adders and Simple ALUs Addition is the most important arith operation in computers: Even the simplest computers must have an adder An adder, plus a little extra logic, forms a simple ALU Topics in This Chapter 1.1 Simple Adders 1.2 Carry Propagation Networks 1.3 Counting and Incrementation 1.4 Design of Fast Adders 1.5 Logic and Shift Operations 1.6 Multifunction ALUs Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 15

1.1 Simple Adders Inputs Outputs x y c s 1 1 1 1 1 1 1 c x s HA y Digit-set interpretation: {, 1} + {, 1} = {, 2} + {, 1} Inputs x y c in c out s x y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figures 1.1/1.2 Outputs c out FA s Binary half-adder (HA) and full-adder (FA). c in Digit-set interpretation: {, 1} + {, 1} + {, 1} = {, 2} + {, 1} Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 16

Ripple-Carry Adder: Slow But Simple x 31 y 31 x y 1 1 x y c 32 c out c 31 c 2 c 1 FA... FA FA Critical path c c in s 31 s 1 s Figure 1.4 Ripple-carry binary adder with 32-bit inputs and output. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 17

1.2 Carry Propagation Networks g i p i Carry is: x i y i 1 1 1 1 annihilated or killed propagated generated (impossible) g i = x i y i p i = x i y i g k-1 p k-1 g k-2 p k-2 g i+1 p i+1...... g i p i g 1 p 1 g p c Carry network c k c k-1 c k-2...... c i+1 c i c 1 c Figure 1.5 The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits. s i Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 18

1.4 Design of Fast Adders Carries can be computed directly without propagation For example, by unrolling the equation for c 3, we get: c 3 = g 2 p 2 c 2 = g 2 p 2 g 1 p 2 p 1 g p 2 p 1 p c We define generate and propagate signals for a block extending from bit position a to bit position b as follows: g [a,b] = g b p b g b 1 p b p b 1 g b 2... p b p b 1 p a+1 g a p [a,b] = p b p b 1... p a+1 p a Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 19

Carry-Lookahead Logic with 4-Bit Block p i+3 g i+3 p i+2 g i+2 p i+1 g i+1 p i g i c i p [i, i+3] g [i, i+3] Block signal generation Intermeidte carries c i+3 c i+2 c i+1 Figure 1.13 Blocks needed in the design of carry-lookahead adders with four-way grouping of bits. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 2

Carry Look-Ahead Adder 418_4 21

16-bit CLA Adder 418_4 22

1.5 Logic and Shift Operations Conceptually, shifts can be implemented by multiplexing Right Left, x[3, 2] Shift amount, x[31, 1] 5 x[31, ] 6 6-bit code specifying shift direction & amount Right-shifted values..., x[31] x[3, ], x[31, ] 1 2 31 32 33 62 63 Multiplexer x[1, ],... 32 32 32...... 32 Left-shifted values x[],... 32 32 32 32 32 Figure 1.15 Multiplexer-based logical shifting unit. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 23

Arithmetic Shifts Purpose: Multiplication and division by powers of 2 sra $t,$s1,2 # $t ($s1) right-shifted by 2 srav $t,$s1,$s # $t ($s1) right-shifted by ($s) R op rs rt 31 25 2 15 1 5 1 1 1 1 ALU instruction Unused Source register rd Destination register sh Shift amount fn sra = 3 1 1 R op rs rt 31 25 2 15 1 5 1 1 1 1 ALU instruction Amount register Source register rd Destination register sh fn 1 Unused srav = 7 1 1 Figure 1.16 The two arithmetic shift instructions of MiniMIPS. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 24

Practical Shifting in Multiple Stages No shift 1 Logical left 1 Logical right 1 1 Arith right 2 x[31, ] x[3, ],, x[31, 1] 32 32 32 32 1 2 3 Multiplexer 32 x[31], x[31, 1] 2 2 2 1 2 3 ( or 4)-bit shift y[31, ] 1 2 3 ( or 2)-bit shift z[31, ] 1 2 3 ( or 1)-bit shift (a) Single-bit shifter (b) Shifting by up to 7 bits Figure 1.17 Multistage shifting in a barrel shifter. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 25

1.6 Multifunction ALUs Arith fn (add, sub,...) Operand 1 Arith unit Result Operand 2 Logic unit 1 Select fn type (logic or arith) Logic fn (AND, OR,...) General structure of a simple arithmetic/logic unit. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 26

Constant amount Variable amount Const Var 5 Amount 5 1 5 32 Shift function 2 Shifter No shift 1 Logical left 1 Logical right 11 Arith right Function class Shift 1 Set less 1 Arithmetic 11 Logic An ALU for MiniMIPS 5 LSBs Shifted y 2 x y 32 32 Add Sub k / c Adder c c 31 32 x y 32 or 1 MSB 1 2 3 32 s x Shorthand symbol for ALU Control Func ALU s AND OR 1 XOR 1 NOR 11 Logic unit 2 Logic function 32- input NOR Figure 1.19 A multifunction ALU with 8 control signals (2 for function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 27 Zero Ovfl y Ovfl Zero

11 Multipliers and Dividers Modern processors perform many multiplications & divisions: Encryption, image compression, graphic rendering Hardware vs programmed shift-add/sub algorithms Topics in This Chapter 11.1 Shift-Add Multiplication 11.2 Hardware Multipliers 11.3 Programmed Multiplication 11.4 Shift-Subtract Division 11.5 Hardware Dividers Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 28

Figure 11.1 11.1 Shift-Add Multiplication Multiplicand Multiplier Partial products bit-matrix Product Multiplication of 4-bit numbers in dot notation. x y y x 2 y x 2 1 1 y x 22 2 y x 23 3 z Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 29

11.2 Hardware Multipliers Shift Multiplier y Doublewidth partial product z Shift Multiplicand x (j) 1 Mux Enable Select y j c out Adder c in Add Sub Figure 11.4 Hardware multiplier based on the shift-add algorithm. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 3

Array Multipliers x 3 MA MA MA MA FA s Figure 11.7 x x x 2 1 MA z c MA MA MA FA 7 MA MA MA MA FA z 6 MA MA MA MA HA z 5 y y y y 1 2 3 z z z z z Array multiplier for 4-bit unsigned operands. 1 2 3 4 Our original dot-notation representing multiplication Straightened dots to depict array multiplier to the left Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 31

11.3 Programmed Multiplication MiniMIPS instructions related to multiplication mult $s,$s1 # set Hi,Lo to ($s) ($s1); signed multu $s2,$s3 # set Hi,Lo to ($s2) ($s3); unsigned mfhi $t # set $t to (Hi) mflo $t1 # set $t1 to (Lo) Example 11.3 Finding the 32-bit product of 32-bit integers in MiniMIPS Multiply; result will be obtained in Hi,Lo For unsigned multiplication: Hi should be all-s and Lo holds the 32-bit result For signed multiplication: Hi should be all-s or all-1s, depending on the sign bit of Lo Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 32

11.4 Shift-Subtract Division x Divisor Subtracted bit-matrix y Quotient z Dividend y x 23 3 y x 22 2 y x 21 1 y x 2 s Remainder Figure11.9 Division of an 8-bit number by a 4-bit number in dot notation. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 33

11.5 Hardware Dividers Quotient y Partial remainder z (j) (initially z) Divisor x Shift Shift y k j Load Quotient digit selector Mux 1 Enable Select 1 c out Trial difference Adder c in 1 (Always subtract) Figure 11.12 Hardware divider based on the shift-subtract algorithm. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 34

Array Dividers y 3 z 7 x 3 MS z 6 x x x 2 1 z 5 z 4 MS MS MS z 3 y 2 y 1 MS MS b d MS MS MS MS MS MS z 2 z 1 Our original dot-notation for division y s 3 MS MS MS s 2 s 1 s MS z Straightened dots to depict an array divider Figure 11.14 Array divider for 8/4-bit unsigned integers. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 35

12 Floating-Point Arithmetic Floating-point is no longer reserved for high-end machines Multimedia and signal processing require flp arithmetic Details of standard flp format and arithmetic operations Topics in This Chapter 12.2 Special Values and Exceptions 12.3 Floating-Point Addition 12.4 Other Floating-Point Operations 12.6 Result Precision and Errors Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 36

12.2 Special Values and Exceptions Zeros, infinities, and NaNs (not a number) Biased exponent =, significand = (no hidden 1) Biased exponent = 255 (short) or 247 (long), significand = NaN Biased exponent = 255 (short) or 247 (long), significand Arithmetic operations with special operands (+) + (+) = (+) ( ) = + (+) (+5) = + (+) / ( 5) = (+ ) + (+ ) = + x (+ ) = (+ ) x =, depending on the sign of x x / (+ ) =, depending on the sign of x (+ ) = + Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 38

12.3 Floating-Point Addition Numbers to be added: x = 25 1.1111 y = 21 1.111111 Operands after alignment shift: x = 25 1.1111 y = 25.1111111 Operand with smaller exponent to be preshifted Extra bits to be rounded off Result of addition: s = 25 1.1111111 s = 25 1.111 Rounded sum Figure 12.4 Alignment shift and rounding in floating-point addition. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 39

Hardware for Floating-Point Addition Add Sub Signs Input 1 Exponents Unpack Input 2 Significands Mux Sub Possible swap & complement Control & sign logic Align significands Add Normalize & round Figure 12.5 Simplified schematic of a floating-point adder. Sign Exponent Pack Significand Output Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 4

12.4 Other Floating-Point Operations Floating-point multiplication ( 2 e1 s1) ( 2 e2 s2) = 2 e1+ e2 (s1 s2) Product of significands in [1, 4) If product is in [2, 4), halve to normalize (increment exponent) Floating-point division ( 2 e1 s1) / ( 2 e2 s2) = 2 e1 e2 (s1 / s2) Ratio of significands in (1/2, 2) If ratio is in (1/2, 1), double to normalize (decrement exponent) Floating-point square-rooting (2 e s) 1/2 = 2 e/2 (s) 1/2 when e is even = 2 (e 1)2 (2s) 1/2 when e is odd Normalization not needed Overflow (underflow) possible Overflow (underflow) possible Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 41

Hardware for Floating-Point Multiplication and Division Figure 12.6 Simplified schematic of a floatingpoint multiply/divide unit. Mul Div Signs Control & sign logic Sign Input 1 Exponents Exponent Unpack Pack Output Input 2 Significands Multiply or divide Normalize & round Significand Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 42

The Floating-Point Unit in MiniMIPS Loc Loc 4 Loc 8 4 B / location... Memory up to 2 3 words... m 2 32 Loc m - 8 Loc m - 4 Coprocessor 1 EIU (Main proc.) ALU $ $1 $2 $31 Execution & integer unit Integer mul/div FPU (Coproc. 1) FP arith $ $1 $2 $31 Floatingpoint unit Pairs of registers, beginning with an even-numbered one, are used for double operands Chapter 1 Figure 5.1 Chapter 11 Hi Lo Chapter 12 TMU (Coproc. ) BadVaddr Status Cause Memory and processing subsystems for MiniMIPS. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 43 EPC Trap & memory unit

12.6 Result Precision and Errors Example 12.4 Laws of algebra may not hold in floating-point arithmetic. For example, the following computations show that the associative law of addition, (a + b) + c = a + (b + c), is violated for the three numbers shown. Numbers to be added first a =-2 5 1.11111 b = 25 1.11111 Compute a + b 25.11 a+b = 2-2 1.1 c =-2-2 1.1111 Compute (a + b) + c 2-2.1111 Sum = 2-6 1.111 Numbers to be added first b = 25 1.11111 c =-2-2 1.1111 Compute b + c (after preshifting c) 25 1.111111111 b+c = 25 1.11111 (Round) a =-2 5 1.11111 Compute a + (b + c) 25. Sum = (Normalize to special code for ) Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 44

Evaluation of Elementary Functions Approximating polynomials ln x = 2(z + z 3 /3 + z 5 /5 + z 7 /7 +... ) where z = (x 1)/(x + 1) e x = 1 + x/1! + x 2 /2! + x 3 /3! + x 4 /4! +... cos x = 1 x 2 /2! + x 4 /4! x 6 /6! + x 8 /8!... tan 1 x = x x 3 /3 + x 5 /5 x 7 /7 + x 9 /9... Iterative (convergence) schemes For example, beginning with an estimate for x 1/2, the following iterative formula provides a more accurate estimate in each step q (i+1) =.5(q (i) + x/q (i) ) Table lookup (with interpolation) A pure table lookup scheme results in huge tables (impractical); hence, often a hybrid approach, involving interpolation, is used. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 45

Function Evaluation by Table Lookup Input x h bits x H k - h bits x L x L f(x) Table for a Table for b Multiply Best linear approximation in subinterval x x H Add Output f(x) The linear approximation above is characterized by the line equation a + b x L, where a and b are read out from tables based on x H Figure 12.12 Function evaluation by table lookup and linear interpolation. Oct. 214 Computer Architecture, The Arithmetic/Logic Unit Slide 46