EEC 483 Computer Organization Chapter 3. Arithmetic for Computers Chansu Yu Table of Contents Ch.1 Introduction Ch. 2 Instruction: Machine Language Ch. 3-4 CPU Implementation Ch. 5 Cache and VM Ch. 6-7 I/O & Multiprocessors Computer CPU Programmer interface instruction set ALU, Mux, Memory, Sequential circuit,... CPU designer interface component spec. connection spec. User interface keyboard/ mouse screen/ speaker Software interface (ch.2) Hardware interface (ch.3-5) 2 Textbook subtitle 1
Table of Contents Ch.1 Introduction Ch. 2 Instructions: Language of the Computer Ch. 3 CPU Implementation: Arithmetic 3.1 Introduction 3.3 Multiplication 3.2 Addition and subtraction 3.4 Division 3.5 Floating point Appendix C.5 Constructing an Arithmetic Logic Unit (ALU) Ch. 4 CPU Implementation: Pipeline Software interface Hardware interface Ch. 5 Cache and Virtual Memory Ch. 6-7 I/O and Multiprocessors 3 Arithmetic Where we've been: Performance (seconds, cycles, instructions) Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing the Architecture a b 32 operation ALU 32 result 4 32 2
3.2 Addition & Subtraction Just like in grade school (carry/borrow 1s) 0101 0111 0110 + 0001-0110 - 0101 Two's complement operations easy subtraction using addition of negative numbers 0111 + 1010 Overflow (result too large for finite computer word): e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 1000 5 Detecting Overflow No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive An exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address is saved for possible resumption 6 3
Cf. Exceptions and Interrupts MIPS exception facility Exceptions caused by errors External interrupts caused by I/O devices Exception handling CPU records information about what went wrong Software handler process the exception Error : report and halt, cure and continue (page fault), IO interrupt : data processing from/to the interrupted IO device 7 Cf. CPU Records Information Registers Cause EPC exception type and pending interrupt bits register containing address of instruction that caused exception which problem? which instruction? mfc0 (move from system control) instruction is used to copy EPC to general purpose register mfco $10, $epc 8 4
Cf. Exception Type (cause) 0 Int External interrupt (hardware) 4 AdEL Address error (load or instruction fetch) 5 AdES Address error (store) 6 IBE Bus error on instruction fetch 7 DBE Bus error on data load/store 8 Sys Syscall exception 9 Bp Breakpoint exception 10 RI Reserved instruction exception 11 CpU Coprocessor unimplemented 12 Ov Arithmetic overflow exception 13 Tr Trap 15 FPE Floating point exception 9 Cf. Software Handler Procedure Stop executing and Jump to fixed address 0x8000 0180 SPIM simulator uses 0x8000 0080 E.g.) CPU executes instruction whenever it is ON. If the CPU is powered on, somebody has to tell which address to start. (=reset address) 0x0000 0000 is the natural choice (MIPS) 0x000f fff0 for Intel CPU (ROM BIOS) 10 5
3.3 Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's look at 3 versions based on grade school algorithm 00100 (multiplicand) x_01011 (multiplier) Negative numbers: convert and multiply there are better techniques, we won t look at them 11 Multiplication Paper and pencil example (unsigned): Multiplicand Multiplier 01011 Product 000110111 m bits x n bits = m+n bit product Binary makes it easy: 0 => place 0 1 => place a copy ( 0 x multiplicand) ( 1 x multiplicand) 12 6
Multiplication Paper and pencil example (unsigned): Multiplicand Multiplier 01011 Product 000110111 m bits x n bits = m+n bit product Binary makes it easy: 0 => place 0 1 => place a copy ( 0 x multiplicand) ( 1 x multiplicand) 13 Product (T) =0 Multiplier 0 =1, thus, T += Multiplicand () Multiplier 1 =1, thus, T += Multiplicand<<1 (0) Multiplier 3 =1, thus, T += Multiplicand<<3 (000) Each step, shift Multiplicand one bit to the left Each step, shift Multiplier one bit to the right and check Multiplier 0 Multiplication T=0 Multiplier 0 =1 T += Multiplicand N Multiplicand<<1 Multiplier >>1 Done N 14 7
Multiplication: 5 () x 11 (01011) Iteration Step Multiplier (R) Multiplicand (D) Product (T) 0 Initial values 1 1: T = T + D (since R0=1) 2: Shift left D 2 1: T = T + D (since R0=1) 2: Shift left D 3 1: no operation (since R0=0) 2: Shift left D 4 1: T = T + D (since R0=1) 2: Shift left D 5 1: no operation (since R0=0) 2: Shift left D => 55 Size of register that holds Multiplier? Size of register that holds T? Size of register that holds Multiplicand? 15 Multiplication: Implementation Do addition (if 1 ) Write (if 1 ) Multiplicand 64 bits Shift left 64-bit ALU Multiplier Shift right 32 bits Product 64 bits Write Control test Number of bits??? - 32-bit architecture - Multiplier: 32-bit - Multiplicand: 64-bit!!! - Product: 64-bit!!! - ALU: 64-bit ALU!!! It is actually a series of 32-bit add operations. Replace 64-bit ALU with 32-bit ALU Shift product (res) instead of shifting m cand Next slice!!! 16 8
Multiplication Paper and pencil example (unsigned): Multiplicand Multiplier 01011 Product 000110111 m bits x n bits = m+n bit product Binary makes it easy: 0 => place 0 1 => place a copy ( 0 x multiplicand) ( 1 x multiplicand) 17 Product (T) =0 Multiplier 0 =1, thus, T += Multiplicand () Or, T += Multiplicand<<5 () & T>>5 Multiplier 1 =1, thus, T += Multiplicand<<1 (0) Or, T += Multiplicand<<5 () & T>>4 Multiplier 3 =1, thus, T += Multiplicand<<3 (000) Or, T += Multiplicand<<5 () & T>>2 How can we improve the design? Itera tion Step Multiplier (R) Multiplicand (D) Product (T) 0 Initial values 01011 1 1: T = T + D<<5 (since R0=1) 00010 10000 2 1: T = T + D<<5 (since R0=1) 00010 00111 10000 00011 11000 3 1: no operation (since R0=0) 00001 00001 11100 4 1: T = T + D<<5 (since R0=1) 00110 11100 00011 01110 5 1: no operation (since R0=0) 00001 10111 => 55 * 32-bit additions with 5-bit D and upper part of T 18 9
Implementation Do addition (if 1 ) Write (if 1 ) Multiplicand 32 bits 32-bit ALU Product Shift right Write Multiplier Shift right 32 bits Control test Number of bits??? - 32-bit architecture - Multiplier: 32-bit - Multiplicand: 64-bit => 32-bit - Product: 64-bit!!! - ALU: 64-bit ALU => 32-bit ALU 64 bits Product register wastes space that exactly matches size of multiplier Multiplier space can be saved. Combine Multiplier register and Product register (Multiplier register stored in lower part of Product register will be thrown away one bit at a time) Next slice!!! 19 How can we improve the design? Itera tion Step Multiplier (R) Multiplicand (D) Product (T) 0 Initial values 01011 1 1: T = T + D<<5 (since R0=1) 00010 10000 2 1: T = T + D<<5 (since R0=1) 00010 00111 10000 00011 11000 3 1: no operation (since R0=0) 00001 00001 11100 4 1: T = T + D<<5 (since R0=1) 00110 11100 00011 01110 5 1: no operation (since R0=0) 00001 10111 => 55 20 Not used 10
Final Version Start Product0 = 1 1. Test Product0 Product0 = 0 Multiplicand 32 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 32-bit ALU Product Shift right Write Control test 2. Shift the Product register right 1 bit 64 bits 32nd repetition? No: < 32 repetitions Just 1 step instead of 2 steps => Total of 2 steps per bit 21 Done Yes: 32 repetitions Multiply in MIPS Instruction Example Meaning Comments multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi Move from Lo mflo $1 $1 = Lo Used to get copy of Lo 22 11
3.4 Divide: Division in MIPS Instruction Example Meaning Comments divide div $2,$3 Lo = $2 $3, Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3 Lo = $2 $3, Unsigned quotient & remainder Hi = $2 mod $3 Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi Move from Lo mflo $1 $1 = Lo Used to get copy of Lo 23 Divide $2 / $3 = quotient... remainder Is quotient 32-bit or 16-bit? Is remainder 32-bit or 16-bit? Example $2=0111 1111... 1111 $3=0000 0000... 0000 Quotient=0111 1111... 1111 Quotient must be 32-bit! $2=0111 1111... 1111 $3=0100 0000... 0000 Quotient=1, Remainder=0011 1111... 1111 Remainder must be 32-bit! 24 n-bit / n-bit n-bit quotient, n-bit remainder More hardware (wasting) n-bit / n/2-bit n/2-bit quotient, n/2-bit remainder Overflow Two basic approaches Restoring : conventional Non-restoring 12
Implementation: Paper & Pencil 8 Quotient Divisor 23 2057 Dividend 184 217 1 Quotient Divisor 00023 02057 Dividend 000230000 0000-000227943 02057 We know where to start Just start from the first possible digit If the result is negative, move on to the next digit while recovering the dividend to the original value If the subtraction gives positive, Qi=1, Otherwise, Qi=0 and restore the dividend * Each step, shift right divisor * One nice thing with binary computation is that the quotient 25 bit can be 1 or 0 Implementation: Paper & Pencil Divisor 1000 1001 Quotient 10 Dividend 1000 10 101 1010 1000 10 Remainder (or Modulo result) We know 10 is less than 1000. But ALU does not know until it subtracts 10-1000 and gets the negative result. If it is negative, it needs to restore the dividend to the original value by adding 1000, i.e., (10-1000)+1000 = 10 See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder => Dividend = Quotient + Divisor 3 versions of divide, successive refinement 26 13
Dividend: 01011 (11), Divisor: (5) Iteration 0 1 2 3 4 5 6 Step Initial values 1: R = R - D 2: R<0 => Restore R, Shift left Q, Q0=0 3: Shift right D 1: R = R - D 2: R<0 => Restore R, Shift left Q, Q0=0 3: Shift right D 1: R = R - D 2: R<0 => Restore R, Shift left Q, Q0=0 3: Shift right D 1: R = R - D 2: R<0 => Restore R, Shift left Q, Q0=0 3: Shift right D 1: R = R - D 2: R>=0 => Shift left Q, Q0=1 3: Shift right D 1: R = R - D 2: R<0 => Restore R, Shift left Q, Q0=0 3: Shift right D Quotient (Q) 00001 00010 27 Divisor (D) 00010 10000 00001 01000 10100 01010 00010 Remainder (R) 01011 11011 01011 01011 11101 11011 01011 11111 00011 01011 11111 10111 01011 00001 11111 11100 00001 Quotient = 2 remainder = 1 dividend Restored Restored Restored Restored Not restored Restored Divide Algorithm Version 1 Takes n+1 steps for n-bit Quotient & Rem. Remainder Quotient Divisor 0000 0111 0000 0010 0000 Remainder > 0 Start: Place Dividend in Remainder 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Test Remainder Remainder < 0 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. 3. Shift the Divisor register right1 bit. n+1 repetition? 28 Done No: < n+1 repetitions Yes: n+1 repetitions (n = 4 here) 14
DIVIDE HARDWARE Version 1 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Divisor 64 bits Shift Right 64-bit ALU Quotient 32 bits Shift Left Remainder 64 bits Write Control 29 Observations on Divide Version 1 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save one iteration 30 15
DIVIDE HARDWARE Version 2 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Divisor 32-bit ALU 32 bits Quotient 32 bits Shift Left Remainder 64 bits Shift Left Write Control 31 Remainder Divide Algorithm Version 2 Quotient Divisor 0000 0111 0000 0010 Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder 0 Test Remainder Remainder < 0 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. nth repetition? 32 Done No: < n repetitions Yes: n repetitions (n = 4 here) 16
Observations on Divide Version 2 Eliminate Quotient register by combining with Remainder as shifted left Start by shifting the Remainder left as before. Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. Thus the final correction step must shift back only the remainder in the left half of the register 33 Divide Hardware: Final Version 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Divisor 32 bits 32-bit ALU HI LO Remainder (Quotient) 64 bits Shift Left Write Control 34 17
Remainder Divide Algorithm Version 3 Divisor 0000 0111 0010 Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder 0 Test Remainder Remainder < 0 3a. Shift the Remainder register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0. nth repetition? 35 No: < n repetitions Yes: n repetitions (n = 4 here) Done. Shift left half of Remainder right 1 bit. Observations on Divide Version 3 Same Hardware as Multiply : just need ALU to add or subtract, and 63-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary Note: Dividend and Remainder must have same sign Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., 7 2 = 3, remainder = 1 Possible for quotient to be too large: if divide 64-bit interger by 1, quotient is 64 bits ( called saturation ) 36 18