2-2. 2-2.2 Learning Outcomes piral 2-2 Arithmetic Components and Their Efficient Implementations I understand the control inputs to counters I can design logic to control the inputs of counters to create a desired count sequence I understand how smaller adder blocks can be combined to form larger ones I can build larger arithmetic circuits from smaller building blocks I understand the timing and control input differences between asynchronous and synchronous memories 2-2.3 2-2.4 Digital ystem Design Control (CU) and Datapath Unit (DPU) paradigm eparate logic into datapath elements that operate on data and control elements that generate control signals for datapath elements Datapath: s, muxes, comparators, counters, registers (shift, with enables, etc.), memories, FIFO s Control Unit: tate machines/sequencers DATAPATH COMPONENT clk reset Control ignals Control Condition ignals Data Inputs Datapath Data Outputs
2-2.5 2-2.6 Overflow Detecting Overflow Helps Us Perform Comparison OVERFLOW & COMPARION Overflow occurs when the result of an arithmetic operation is to be represented with the given number of bits Unsigned overflow occurs when adding or subtracting unsigned numbers igned (2 s complement overflow) overflow occurs when adding or subtracting 2 s complement numbers 2-2.7 2-2.8 Unsigned Overflow 2 s Complement Overflow + 7 = 7 With 4-bit unsigned numbers we can only represent 5. Thus, we say overflow has occurred. Overflow occurs when you cross this discontinuity +2 +4 +3 + + +5 +9 Plus 7 +8 + +2 +3 +4 +5 +6 +7 5 + 7 = +2-6 + -4 = - With 4-bit 2 s complement numbers we can only represent -8 to +7. Thus, we say overflow has occurred. -4-3 -5-2 -6 - -7-8 + +7 +2 +6 +3 +5 +4 Overflow occurs when you cross this discontinuity
Testing for Overflow Most fundamental test Check if answer is (i.e. Positive + Positive yields a negative) Unsigned overflow test [Different for add or sub] Addition: If carry-out of final position equals ubtraction: If carry-out of final addition equals igned (2 s complement) overflow test [ame for add or sub] Only occurs if Alternate test: if of final column are different 2-2.9 Testing for Unsigned Overflow Unsigned Overflow has occurred if Unsigned Addition: If final carry-out = Unsigned ubtraction: If final carry-out = + - + - 2-2. 2-2. 2-2.2 Testing for 2 s Comp. Overflow Checking for Overflow 2 s Complement Overflow Occurs If Test : If pos. + pos. = neg. or neg. + neg. = pos. Test 2: If carry-in to MB position and carry-out of MB position are different Produce additional outputs to indicate if unsigned (UOV) or signed (OV) overflow has occurred + (5) (6) (-4) + (-7) X Y X Y X Y X Y + (3) (2) (-2) + (-6)
2-2.3 2-2.4 Comparison Via ubtraction COMPARION uppose we want to compare two numbers: A & B uppose we let DIFF = A-B what could the result tell us If DIFF <, then If DIFF =, then IF DIFF >, then How would we know DIFF ==? If all bits of our answer. How would we know DIFF < (i.e. negative)? igned:! (but what about overflow) Unsigned: Huh? In unsigned there are no negative results 2-2.5 2-2.6 Computing A<B from "Negative" Result Unsigned Comparator Unsigned Perform A-B If A-B would yield a negative result, this will appear as in an unsigned subtraction And we know unsigned subtraction overflow occurs if o just check if igned Perform A-B If there is no overflow (V=), simply check if But if there is overflow?? Recall overflow has the effect of flipping the sign of the result to the opposite of what it should be. o if there is overflow (V=) check is (i.e. positive) ummary: A-B is "truly" negative if: A comparator can be built by using a subtractor A=B A A[3:] A>B DIFF[3:] Res[3:] ubtractor B B[3:] C4 A<B
2-2.7 2-2.8 igned Comparator ummary A comparator can be built by using a subtractor A=B You should now be able to build: Fast s Comparators A A[3:] Res[3:] DIFF[3:] A>B B ubtractor B[3:] C4 A<B 2-2.9 Addition s 2-2.2 Be sure to connect first to + = X = Y ADDER TIMING
Timing A chain of full adders presents an interesting timing analysis problem To correctly compute its own um and Carry-out, each full adder requires the carry-out bit from the full adder Because hardware works in parallel, the full adders further down the chain may produce the outputs because the carry has not had time to to them 2-2.2 Timing Example Assume that we were adding one set of inputs and then change to a new set of inputs: Old inputs: Old inputs: + = X = Y New inputs: + = X = Y 2-2.22 C out 2-2.23 2-2.24 Timing At the time just before we enter the new input values, all carries are s New inputs: + = X = Y Time - Timing Now we enter the new inputs and all the FA s starting adding their respective inputs + = X = Y Time Old inputs: New inputs: C out Due to propagation delay, the carries are still from the old inputs
2-2.25 2-2.26 Timing Timing Each adder computes from the current inputs (notice the sum of is incorrect at this point) + = X = Y Time The carry is rippling through each adder + = X = Y Time 2 Now the carries are all based off the new inputs 2-2.27 2-2.28 Timing Timing The carry is rippling through each adder + = X = Y Time 3 Only after the carry propagates through all the adders is the sum valid and correct = X Time + = Y 4
2-2.29 2-2.3 Ripple-Carry Ripple Carry Delay The longest path through a chain of full adders is the carry path We say that the carry through the adder time An n-bit ripple carry adder has a worst case delay proportional to C C C C C C out out in out in out C 4 C 3 C 2 C C 2-2.3 2-2.32 Glitches Output Glitches, output values due to arrival times of gate inputs Delay of the carry causes glitches on the sum bits Glitch = momentarily, incorrect output value early late C C out out 3
Critical Path 2-2.33 2-2.34 Critical Path = possible delay path Assume t sum = 5 ns, t carry = 4 ns 6 ns 2 ns 8 ns 4 ns MULTIPLIER 7 ns 3 ns 9 ns 5 ns Critical Path 2-2.35 2-2.36 Unsigned Multiplication Review igned Multiplication Techniques ame rules as decimal multiplication Multiply each bit of Q by M shifting as you go An m-bit * n-bit mult. produces an bit result (i.e. n-bit * n-bit produces bit result) Notice each partial product is a shifted copy of M or (zero) When adding signed (2 s comp.) numbers, some new issues arise Must * M (Multiplicand) Q (Multiplier) * = = -7 +6 * = = -7 +6
2-2.37 2-2.38 igned Multiplication Techniques Combinational Multiplier Also, must worry about negative multiplier MB of multiplier has negative weight If MB=, Partial Product (PP i ) Generation Multiply Q[i] * M if Q[i]= => PP i = if Q[i]= => PP i = * = -4 = -6 * = -4 = -6 2-2.39 2-2.4 Combinational Multiplier Multiplication Overview Partial Product (PP i ) Generation Multiply Q[i] * M if Q[i]= => PP i = if Q[i]= => PP i = gates can be used to generate each partial product M[3] M[2] M[] M[] if Q[ i]= M[3] M[2] M[] M[] if Q[ i]= Multiplication approaches: equential: hift-and-add produces one product bit per clock cycle time (usually slow) Combinational: Array multiplier uses an array of adders Can be as simple as N- ripple-carry adders for an NxN multiplication m3 m2 m m x q3 q2 q q m3q m2q mq mq m3q m2q mq mq - m3q2 m2q2 mq2 mq2 - - + m3q3 m2q3 mq3 mq3 - - - p7 p6 p5 p4 p3 p2 p p AND Gate Array produces partial product terms
2-2.4 2-2.42 Combinational Multiplier Array Multiplier Partial Products must be added together Combinational multipliers require long propagation delay through the adders propagation delay is proportional to the number of partial products (i.e. number of bits of input) and the width of each adder Can this be a HA? Maximum delay = Do you look for the longest path or the shortest path between any input and output? Compare with the delay of a shift-and-add method 2-2.43 2-2.44 Propagation Delay + Critical Path Critical Path = Longest possible delay path Assume t sum = 5 ns, t carry = 4 ns CoFA Ci CoFA Ci CoFA Ci CoFA Ci 6 ns CoFA Ci 2 ns CoFA Ci 8 ns CoFA Ci 4 ns CoFA Ci 7 ns 3 ns 9 ns 5 ns Critical Path
2-2.45 2-2.46 Combinational Multiplier Critical Paths Critical Path Critical Path 2 2-2.47 2-2.48 Combinational Multiplier Analysis Large Area due to -bit adders n- because the first adder adds the first two partial products and then each adder afterwards adds one more partial product Propagation delay is in two dimensions proportional to Pipelined Multiplier Now try to pipeline the previous design Determine the maximum stage delay to decide the pipeline clock rate. Assume zero-delay for stage latches. How does the latency of the pipeline compare with the simple combinational array of the previous stage?
2-2.49 2-2.5 Carry-ave Multiplier Carry ave s Instead of propagating the carries to the left in the same row, carries are now sent down to the next stage to reduce stage delay and facilitate pipelining CA s m3q m3q m2q mq mq m2q mq mq Consider the decimal addition of 47 + 96 + 58 = 2 One way is to add to get and Here the column cannot be added is produced In the carry-save style, we add the column and column simultaneous The upper three stages are 3-bit Carry ave s (CA s) each with 2-gate delays. The last stage is a Ripple Carry (RCA) which requires longer delay. It can be replaced by a CLA for larger multipliers. RCA m3q3 m3q2 m2q2 mq2 mq2 m2q3 mq3 mq3 4 7 + 9 6 2 4 3 + 5 8 2 3 6 5 4 4 7 9 6 + 5 8 2 2 + 8 _ 2 4 3 P[7] P[6] P[5] P[4] P[3] P[2] P[] P[] Carry-ave (3,2) s 2-2.5 2-2.52 A carry save adder is also called a (3,2) adder or a (3,2) counter (refer to Computer Arithmetic Algorithms by Israel Koren) as it takes three vectors, adds them up, and reduces them to two vectors, namely a sum vector and a carry vector CA s are based on the principle that carries do not have to be added, but can be combined An n-bit CA consist of n disjoint full adders + _ Carry vector um vector Carry-Lookahead s FAT ADDER
2-2.53 2-2.54 Ripple Carry s Ripple-carry adders (RCA) are slow due to carry propagation 6 At least 2 levels of logic per full adder 5 4 3 2 Fast s Rather than calculating one carry at a time and passing it down the chain, can we compute a group of carries at the same time To do this, let us define some new signals for each column of addition: p i = : This column will propagate a carry-in (if there is one) to the carry-out. p i is true when A i or B i is => p i = g i = : This column will generate a carry-out whether or not the carry-in is g i is true when A i and B i is => g i = Using these signals, we can define the carry-out (c i+ ) as: c i+ = 2-2.55 2-2.56 Carry Lookahead Logic Define each carry in terms of p i, g i and the initial carry-in (c ) and not in terms of carry chain (intermediate carries: c,c2,c3, ) c = c2 = c3 = c4 = Carry Lookahead Analogy Consider the carry-chain like a long tube broken into segments. Each segment is controlled by a valve (propagate signal) and can insert a fluid into that segment (generate signal) The carry-out of the diagram below will be true if g is true or p is true and g is true, or p, p and c is true
2-2.57 2-2.58 Carry Lookahead Use carry-lookahead logic to generate all the carries in one shot and then create the sum Example 4-bit CLA shown below How many levels of logic is the adder? 4-bit s 74L283 chip implements a 4-bit adder using CLA methodology A 3 A 2 A A = A + B 3 B 2 B B = B 4 3 2 = A 3 B 3 A 2 B 2 A B A B 74L283 3 2 2-2.59 C6 6-Bit CLA But how would we make a 6-bit adder? hould we really just chain these fast 4-bit adders together? Or can we do better? A[5:2] B[5:2] A[:8] B[:8] A[7:4] B[7:4] A[3:] B[3:] C2 PG PG PG PG _ [5:2] [:8] [7:4] [3:] 6-bit RCA Delay = = gate delays Delay of the above adder design = = gates Let us improve by looking ahead at a higher level to produce C6, C2, C8, C4 in parallel What s the difference between the equation for G here and C4 on the previous slides _ C8 _ Define P and G as the overall Propagate and Generate signals for a set of 4 bits P = p3 p2 p p G = g3 + p3 g2 + p3 p2 g + p3 p2 p g C4 _ 2-2.6 C
2-2.6 6-bit CLA Closer Look 2-2.62 REVIEW ON YOUR OWN FOR CLA LAB Each 4-bit CLA only propagates its overall carry-in if each of the 4 columns propagates: P = p3 p2 p p P = p7 p6 p5 p4 P2 = p p p9 p8 P3 = p5 p4 p3 p2 Each 4-bit CLA generates a carry if any column generates and the more significant columns propagate G = g3 + (p3 g2) + (p3 p2 g)+(p3 p2 p g) G3 = g5 + (p5 g4) + (p5 p4 g3)+(p5 p4 p3 g2) The higher order CLL logic (producing C4,C8,C2,C6) then is realized as: (C4) =>C = G + (P c) (C6) => C4 = G3 + (P3 G2) + (P3 P2 G) +(P3 P2 P G)+ (P3 P2 P P c) These equations are exactly the same CLL logic we derived earlier 2-2.63 2-2.64 6-Bit CLA 64-Bit CLA Understanding 6-bit CLA hierarchy C We can reuse the same CLL logic to build a 64-bit CLA C c63 s35 c5 P G G CLL P G CLL P G CLL P G CLL C2 C8 C4 C6 P G G C56 C52 CLL P G CLL P G CLL P G CLL C48 C44 C4 C36 C32 C28 C24 C2 C6 C2 C8 C4 C6 p3 g3 c4 P* G* c3 p2 g2 CLL p g Delay = = = Delay in producing pi,gi = = Delay in producing Pi*,Gi* = = Delay in producing C4,C8,C2,C6 = = Delay in producing c5 = = Delay in producing 5 c2 c p g c p3 g3 c4 P* G* c3 p2 g2 = = Delay in producing 63 Is the delay in producing s63 the same as in s35? = = Delay in producing 2 = = Delay in producing c2 CLL p g c p g = = Delay in producing pi*,gi* = = Delay in producing Pj**,Gj** = = Delay in producing C48 = = Delay in producing C6 = = Delay in producing C63 = = Delay in producing 63 = Total Delay c