Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

Similar documents
Chapter 3 Arithmetic for Computers

Computer Architecture Set Four. Arithmetic

Integer Arithmetic. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Tailoring the 32-Bit ALU to MIPS

CPE 335 Computer Organization. MIPS Arithmetic Part I. Content from Chapter 3 and Appendix B

Homework 3. Assigned on 02/15 Due time: midnight on 02/21 (1 WEEK only!) B.2 B.11 B.14 (hint: use multiplexors) CSCI 402: Computer Architectures

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN

Chapter 3. Arithmetic Text: P&H rev

Boolean Algebra. Chapter 3. Boolean Algebra. Chapter 3 Arithmetic for Computers 1. Fundamental Boolean Operations. Arithmetic for Computers

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

MIPS Integer ALU Requirements

Number Systems and Their Representations

Lecture 8: Addition, Multiplication & Division

Arithmetic for Computers

Arithmetic for Computers. Hwansoo Han

Thomas Polzer Institut für Technische Informatik

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

NUMBER OPERATIONS. Mahdi Nazm Bojnordi. CS/ECE 3810: Computer Organization. Assistant Professor School of Computing University of Utah

Review: MIPS Organization

Number Systems and Computer Arithmetic

Math in MIPS. Subtracting a binary number from another binary number also bears an uncanny resemblance to the way it s done in decimal.

ECE260: Fundamentals of Computer Engineering

Integer Multiplication and Division

ECE331: Hardware Organization and Design

COMP 303 Computer Architecture Lecture 6

Computer Architecture. Chapter 3: Arithmetic for Computers

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Part I: Translating & Starting a Program: Compiler, Linker, Assembler, Loader. Lecture 4

EEC 483 Computer Organization

CS/COE 0447 Example Problems for Exam 2 Spring 2011

Chapter 3 Arithmetic for Computers (Part 2)

ECE331: Hardware Organization and Design

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 3: Arithmetic

Fast Arithmetic. Philipp Koehn. 19 October 2016

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

CENG3420 L05: Arithmetic and Logic Unit

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: The MIPS ISA (P&H ) Consulting hours. Milestone #1 (due 1/26)

Week 7: Assignment Solutions

TDT4255 Computer Design. Lecture 4. Magnus Jahre

CPS 104 Computer Organization and Programming

CENG 3420 Lecture 05: Arithmetic and Logic Unit

Chapter 3. Arithmetic for Computers

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

361 div.1. Computer Architecture EECS 361 Lecture 7: ALU Design : Division

Chapter Three. Arithmetic

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

Module 2: Computer Arithmetic

Chapter 3 Arithmetic for Computers

Lets Build a Processor

Chapter 3. Arithmetic for Computers

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

ECE 30 Introduction to Computer Engineering

COMPUTER ORGANIZATION AND. Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Review. Steps to writing (stateless) circuits: Create a logic function (one per output)

By, Ajinkya Karande Adarsh Yoga

More complicated than addition. Let's look at 3 versions based on grade school algorithm (multiplicand) More time and more area

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

Review 1/2 MIPS assembly language instructions mapped to numbers in 3 formats. CS61C Negative Numbers and Logical Operations R I J.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

ECE331: Hardware Organization and Design

Chapter 3. Arithmetic for Computers

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

Integer Multiplication and Division

CPE300: Digital System Architecture and Design

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

Mark Redekopp, All rights reserved. EE 357 Unit 11 MIPS ISA

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

HW2 solutions You did this for Lab sbn temp, temp,.+1 # temp = 0; sbn temp, b,.+1 # temp = -b; sbn a, temp,.+1 # a = a (-b) = a + b;

COMPUTER ARITHMETIC (Part 1)

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

Computer Science 61C Spring Friedland and Weaver. Instruction Encoding

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Divide: Paper & Pencil

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2)

We are quite familiar with adding two numbers in decimal

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Timing for Ripple Carry Adder

Chapter 3: Arithmetic for Computers

Reduced Instruction Set Computer (RISC)

CS 61C: Great Ideas in Computer Architecture MIPS Instruction Formats

Slide Set 5. for ENCM 369 Winter 2014 Lecture Section 01. Steve Norman, PhD, PEng

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

Chapter Two MIPS Arithmetic

Reduced Instruction Set Computer (RISC)

Signed Multiplication Multiply the positives Negate result if signs of operand are different

9/6/2011. Multiplication. Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

carry in carry 1101 carry carry

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

An instruction set processor consist of two important units: Data Processing Unit (DataPath) Program Control Unit

Chapter 3: part 3 Binary Subtraction

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

Arithmetic Logic Unit

Assembly Programming

Chapter 5 : Computer Arithmetic

Slide Set 5. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Organisasi Sistem Komputer

Part III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1

Transcription:

Chapter 3 Arithmetic for Computers 1

Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing with overflow Floating-point real numbers Representation and operations 3.1 Introduction 2

Taxonomy of Computer Information Information Instructions Data Addresses Numeric Non-numeric Fixed-point Floating-point Character (ASCII) Boolean Other Unsigned (ordinal) Signed Singleprecision Doubleprecision Sign-magnitude 2 s complement 3

Number Format Considerations Type of numbers (integer, fraction, real, complex) Range of values between smallest and largest values wider in floating-point formats Precision of values (max. accuracy) usually related to number of bits allocated n bits => represent 2 n values/levels Value/weight of least-significant bit Cost of hardware to store & process numbers (some formats more difficult to add, multiply/divide, etc.) 4

Unsigned Integers Positional number system: a n-1 a n-2. a 2 a 1 a 0 = a n-1 x2 n-1 + a n-2 x2 n-2 + + a 2 x2 2 + a 1 x2 1 + a 0 x2 0 Range = [(2 n -1) 0] Carry out of MSB has weight 2 n Fixed-point fraction: 5 0.a -1 a -2. a -n = a -1 x2-1 + a -2 x2-2 + + a -n x2 -n

Signed Integers Sign-magnitude format (n-bit values): A = Sa n-2. a 2 a 1 a 0 (S = sign bit) Range = [-(2 n-1-1) +(2 n-1-1)] Addition/subtraction difficult (multiply easy) Redundant representations of 0 2 s complement format (n-bit values): -A represented by 2 n A Range = [-(2 n-1 ) +(2 n-1-1)] Addition/subtraction easier (multiply harder) Single representation of 0 6

MIPS MIPS architecture uses 32-bit numbers. What is the range of integers (positive and negative) that can be represented? Positive integers: 0 to 2,147,483,647 Negative integers: - 1 to - 2,147,483,648 What are the binary representations of the extreme positive and negative integers? 0111 1111 1111 1111 1111 1111 1111 1111 = 2 31-1= 2,147,483,647 1000 0000 0000 0000 0000 0000 0000 0000 = - 2 31 = - 2,147,483,648 What is the binary representation of zero? 0000 0000 0000 0000 0000 0000 0000 0000 7

Computing the 2 s Complement To compute the 2 s complement of A: Let A = a n-1 a n-2. a 2 a 1 a 0 2 n - A = 2 n -1 + 1 A = (2 n -1) - A + 1 1 1 1 1 1 - a n-1 a n-2. a 2 a 1 a 0 + 1 ------------------------------------- a n-1 a n-2. a 2 a 1 a 0 + 1 (one s complement + 1) 8

2 s Complement Arithmetic Let (2 n-1-1) A 0 and (2 n-1-1) B 0 Case 1: A + B (2 n -2) (A + B) 0 Since result < 2 n, there is no carry out of the MSB Valid result if (A + B) < 2 n-1 MSB (sign bit) = 0 Overflow if (A + B) 2 n-1 MSB (sign bit) = 1 if result 2 n-1 Carry into MSB 9

2 s Complement Arithmetic Case 2: A - B Compute by adding: A + (-B) 2 s complement: A + (2 n - B) -2 n-1 < result < 2 n-1 (no overflow possible) If A B: 2 n + (A - B) 2 n Weight of adder carry output = 2 n Discard carry (2 n), keeping (A-B), which is 0 If A < B: 2 n + (A - B) < 2 n Adder carry output = 0 Result is 2 n - (B - A) = 2 s complement representation of -(B-A) 10

2 s Complement Arithmetic Case 3: -A - B Compute by adding: (-A) + (-B) 2 s complement: (2 n - A) + (2 n - B) = 2 n + 2 n - (A + B) Discard carry (2 n ), making result 2 n - (A + B) = 2 s complement representation of -(A + B) 0 result -2 n Overflow if -(A + B) < -2 n-1 MSB (sign bit) = 0 if 2 n - (A + B) < 2 n-1 no carry into MSB 11

Integer Subtraction Add negation of second operand Example: 7 6 = 7 + ( 6) +7: 0000 0000 0000 0111 6: 1111 1111 1111 1010 +1: 0000 0000 0000 0001 Overflow if result out of range Subtracting two +ve or two ve operands, no overflow Subtracting +ve from ve operand Overflow if result sign is 0 Subtracting ve from +ve operand Overflow if result sign is 1 12 ELEC 5200/6200 - From P-H slides

Relational Operators Compute A-B & test ALU flags to compare A vs. B ZF = result zero OF = 2 s complement overflow SF = sign bit of result CF = adder carry output Signed Unsigned A = B ZF = 1 ZF = 1 A <> B ZF = 0 ZF = 0 A B (SF OF) = 0 CF = 1 (no borrow) A > B (SF OF) + ZF = 0 CF ZF = 1 A B (SF OF) + ZF = 1 CF ZF = 0 A < B (SF OF) = 1 CF = 0 (borrow) 13

MIPS Overflow Detection An exception (interrupt) occurs when overflow detected for add,addi,sub Control jumps to predefined address for exception Interrupted address is saved for possible resumption Details based on software system / language example: flight control vs. homework assignment Don't always want to detect overflow new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons 14

Dealing with Overflow Some languages (e.g., C) ignore overflow Use MIPS addu, addui, subu instructions Other languages (e.g., Ada, Fortran) require raising an exception Use MIPS add, addi, sub instructions On overflow, invoke exception handler Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor reg) instruction can retrieve EPC value, to return after corrective action 15 ELEC 5200/6200 - From P-H slides

Designing the Arithmetic & Logic Unit (ALU) Provide arithmetic and logical functions as needed by the instruction set Consider tradeoffs of area vs. performance (Material from Appendix B) operation a 32 ALU result 32 b 32 16

Different Implementations Not easy to decide the best way to build something Don't want too many inputs to a single gate (fan in) Don t want to have to go through too many gates (delay) For our purposes, ease of comprehension is important Let's look at a 1-bit ALU for addition: CarryIn a b Sum How could CarryOut we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU? c out = a b + a c in + b c in sum = a xor b xor c in 17

Building a 32 bit ALU CarryIn Operation Operation CarryIn a0 b0 CarryIn ALU0 CarryOut Result0 a 0 1 Result a1 b1 CarryIn ALU1 CarryOut Result1 b 2 a2 b2 CarryIn ALU2 CarryOut Result2 CarryOut a31 b31 CarryIn ALU31 Result31 18

What about subtraction (a b)? Two's complement approach: just negate b and add. Binvert CarryIn Operation a 0 1 Result b 0 1 2 CarryOut 19

Adding a NOR function Can also choose to invert a. How do we get a NOR b? Ainvert Binvert CarryIn Operation a 0 1 0 1 Result b 0 + 2 1 CarryOut 20

Tailoring the ALU to the MIPS Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction produces a 1 if rs < rt and 0 otherwise use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = 0 implies a = b 21

Supporting slt Ainvert Binvert CarryIn Operation Ainvert Binvert CarryIn Operation a 0 0 a 0 0 1 1 1 1 b 0 + 2 Result b 0 + 2 Result 1 1 Less 3 Less 3 Set Overflow detection Overflow CarryOut Use this ALU for most significant bit all other bits 22

Supporting slt Binvert Ainvert Operation CarryIn a0 b0 CarryIn ALU0 Less CarryOut Result0 a1 b1 0 CarryIn ALU1 Less CarryOut Result1 a2 b2 0 CarryIn ALU2 Less CarryOut Result2.. CarryIn. a31 CarryIn Result31 b31 ALU31 Set 0 Less Overflow 23

Test for equality Notice control lines: Bnegate Ainvert Operation 0000 = and 0001 = or 0010 = add 0110 = subtract 0111 = slt 1100 = NOR Note: zero is a 1 when the result is zero! a0 b0 a1 b1 0 a2 b2 0 CarryIn ALU0 Less CarryOut CarryIn ALU1 Less CarryOut CarryIn ALU2 Less CarryOut Result0 Result1 Result2. Zero.. CarryIn.. Result31 a31 CarryIn b31 ALU31 Set 0 Less Overflow 24

Conclusion 25 We can build an ALU to support the MIPS instruction set key idea: use multiplexor to select the output we want we can efficiently perform subtraction using two s complement we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware all of the gates are always working the speed of a gate is affected by the number of inputs to the gate the speed of a circuit is affected by the number of gates in series (on the critical path or the deepest level of logic ) Our primary focus: comprehension, however, Clever changes to organization can improve performance (similar to using better algorithms in software) We saw this in multiplication, let s look at addition now

Problem: ripple carry adder is slow Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c 1 = b 0 c 0 + a 0 c 0 + a 0 b 0 c 2 = b 1 c 1 + a 1 c 1 + a 1 b 1 c 2 = c 3 = b 2 c 2 + a 2 c 2 + a 2 b 2 c 3 = c 4 = b 3 c 3 + a 3 c 3 + a 3 b 3 c 4 = Not feasible! Why? 26

One-bit Full-Adder Circuit ci ai XOR XOR FAi sumi bi AND AND OR Ci+1 27

32-bit Ripple-Carry Adder c0 a0 b0 FA0 a1 b1 sum0 FA1 a2 b2 sum1 FA2 sum2 a31 b31 FA31 sum31 28

How Fast is Ripple-Carry Adder? Longest delay path (critical path) runs from cin to sum31. Suppose delay of full-adder is 100ps. Critical path delay = 3,200ps Clock rate cannot be higher than 10 12 /3,200 = 312MHz. Must use more efficient ways to handle carry. 29

Fast Adders In general, any output of a 32-bit adder can be evaluated as a logic expression in terms of all 65 inputs. Levels of logic in the circuit can be reduced to log 2 N for N-bit adder. Ripple-carry has N levels. More gates are needed, about log 2 N times that of ripple-carry design. Fastest design is known as carry lookahead adder. 30

N-bit Adder Design Options Type of adder Time complexity (delay) Space complexity (size) Ripple-carry O(N) O(N) Carry-lookahead O(log 2 N) O(N log 2 N) Carry-skip O( N) O(N) Carry-select O( N) O(N) Reference: J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition, San Francisco, California, 1990. 31

Carry-lookahead adder An approach in-between our two extremes Motivation: If we didn't know the value of carry-in, what could we do? When would we always generate a carry? g i = a i b i When would we propagate the carry? p i = a i + b i Did we get rid of the ripple? c 1 = g 0 + p 0 c 0 c 2 = g 1 + p 1 c 1 c 2 = g 1 + p 1 g 0 + p 1 p 0 c 0 c 3 = g 2 + p 2 c 2 c 3 = c 4 = g 3 + p 3 c 3 c 4 = Feasible! Why? 32

Use principle to build bigger adders CarryIn a0 b0 a1 b1 a2 b2 a3 b3 CarryIn ALU0 P0 G0 C1 pi gi ci + 1 Result0 3 Carry-lookahead unit a4 b4 a5 b5 a6 b6 a7 b7 CarryIn ALU1 P1 G1 C2 pi + 1 gi + 1 ci + 2 Result4 7 Can t build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again! a8 b8 a9 b9 a10 b10 a11 b11 CarryIn ALU2 P2 G2 C3 pi + 2 gi + 2 ci + 3 Result8 11 a12 b12 a13 b13 a14 b14 a15 b15 CarryIn ALU3 P3 G3 C4 pi + 3 gi + 3 ci + 4 Result12 15 CarryOut 33

Carry-Select Adder a0-a15 b0-b15 cin a16-a31 b16-b31 0 a16-a31 b16-b31 1 16-bit ripple carry adder 16-bit ripple carry adder 16-bit ripple carry adder sum0-sum15 0 Multiplexer 1 sum16-sum31 This is known as carry-select adder 34

ALU Summary We can build an ALU to support MIPS addition Our focus is on comprehension, not performance Real processors use more sophisticated techniques for arithmetic Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware! 35

Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's look at 3 versions based on a gradeschool algorithm 0010 (multiplicand) x_1011 (multiplier) Negative numbers: convert and multiply there are better techniques, we won t look at them 36

Multiplication Start with long-multiplication approach 3.3 Multiplication multiplicand multiplier product 1000 1001 1000 0000 0000 1000 1001000 Length of product is the sum of operand lengths 37 ELEC 5200/6200 - From P-H slides

Multiplication Hardware 38 Control Datapath Initially 0 ELEC 5200/6200 - From P-H slides

Optimized Multiplier Perform steps in parallel: add/shift One cycle per partial-product addition That s ok, if frequency of multiplications is low 39 ELEC 5200/6200 - From P-H slides

Example: 0010 two 0011 two 0010 two 0011 two = 0110 two, i.e., 2 ten 3 ten = 6 ten Iteration Step Multiplicand Product 0 Initial values 0010 0000 0011 1 LSB=1 => Prod=Prod+Mcand 0010 0010 0011 Right shift product 0010 0001 0001 2 LSB=1 => Prod=Prod+Mcand 0010 0011 0001 Right shift product 0010 0001 1000 3 LSB=0 => no operation 0010 0001 1000 Right shift product 0010 0000 1100 4 LSB=0 => no operation 0010 0000 1100 Right shift product 0010 0000 0110 40

Faster Multiplier Uses multiple adders Cost/performance tradeoff 41 Can be pipelined Several multiplication performed in parallel ELEC 5200/6200 - From P-H slides

MIPS Multiplication Two 32-bit registers for product HI: most-significant 32 bits LO: least-significant 32-bits Instructions mult rs, rt / multu rs, rt 64-bit product in HI/LO mfhi rd / mflo rd Move from HI/LO to rd Can test HI value to see if product overflows 32 bits mul rd, rs, rt Least-significant 32 bits of product > rd 42 ELEC 5200/6200 - From P-H slides

Multiplying Signed Numbers with Boothe s Algorithm Consider A x B where A and B are signed integers (2 s complemented format) Decompose B into the sum B1 + B2 + + Bn A x B = A x (B1 + B2 + + Bn) = (A x B1) + (A x B2) + (A x Bn) Let each Bi be a single string of 1 s embedded in 0 s: 0011 1100 Example: 0110010011100 = 0110000000000 + 0000010000000 + 0000000011100 43

Boothe s Algorithm Scanning from right to left, bit number u is the first 1 bit of the string and bit v is the first 0 left of the string: v u Bi = 0 0 1 1 0 0 = 0 0 1 1 1 1 (2 v 1) - 0 0 0 0 1 1 (2 u 1) = (2 v 1) - (2 u 1) = 2 v 2 u 44

Boothe s Algorithm Decomposing B: A x B = A x (B1 + B2 + ) = A x [(2 v1 2 u1 ) + (2 v2 2 u2 ) + ] = (A x 2 v1 ) (A x 2 u1 ) + (A x 2 v2 ) (A x 2 u2 ) A x B can be computed by adding and subtracted shifted values of A: Scan bits right to left, shifting A once per bit When the bit string changes from 0 to 1, subtract shifted A from the current product P (A x 2 u ) When the bit string changes from 1 to 0, add shifted A to the current product P + (A x 2 v ) 45

Booth Algorithm: Example 1 7 3 = 21 0111 multiplicand = 7 0011(0) multiplier = 3 11111001 bit-pair 10, add -7 in two s com. bit-pair 11, do nothing 000111 bit-pair 01, add 7 bit-pair 00, do nothing 00010101 = 21 46

Booth Algorithm: Example 2 7 (-3) = -21 0111 multiplicand = 7 1101(0) multiplier = -3 11111001 bit-pair 10, add -7 in two s com. 0000111 bit-pair 01, add 7 111001 bit-pair 10, add -7 in two s com. bit-pair 11, do nothing 11101011-21 47

Booth Algorithm: Example 3-7 3 = -21 1001 multiplicand = -7 in two s com. 0011(0) multiplier = 3 00000111 bit-pair 10, add 7 bit-pair 11, do nothing 111001 bit-pair 01, add -7 bit-pair 00, do nothing 11101011-21 48

Booth Algorithm: Example 4-7 (-3) = 21 1001 multiplicand = -7 in two s com. 1101(0) multiplier = -3 in two s com. 00000111 bit-pair 10, add 7 1111001 bit-pair 01, add -7 in two s com. 000111 bit-pair 10, add 7 bit-pair 11, do nothing 00010101 21 49

Booth Advantage Serial multiplication 00010100 20 00011110 30 00000000 00010100 00010100 00010100 00010100 00000000 00000000 00000000 000001001011000 600 Four partial product additions Booth algorithm 00010100 20 000111100 30 111111111101100 00000010100 0000001001011000 600 Two partial product additions 50

Adding Partial Products y3 y2 y1 y0 Multiplicand x3 x2 x1 x0 Multiplier x0y3 x0y2 x0y1 x0y0 carry x1y3 x1y2 x1y1 x1y0 Partial carry x2y3 x2y2 x2y1 x2y0 Products carry x3y3 x3y2 x3y1 x3y0 p7 p6 p5 p4 p3 p2 p1 p0 Requires three 4-bit adders. Slow. 51

Array Multiplier: Carry Forward y3 y2 y1 y0 Multiplicand x3 x2 x1 x0 Multiplier x0y3 x0y2 x0y1 x0y0 x1y3 x1y2 x1y1 x1y0 Partial x2y3 x2y2 x2y1 x2y0 Products x3y3 x3y2 x3y1 x3y0 p7 p6 p5 p4 p3 p2 p1 p0 Note: Carry is added to the next partial product. Adding the carry from the final stage needs an extra stage. These additions are faster but we need four stages. 52

Basic Building Blocks Two-input AND Full-adder yi xk yi x0 kth partial product carry bits from (k-1)th product Full adder p0i = x0yi 0 th partial product carry bits to (k+1)th product (k+1)th partial product 53

Array Multiplier ppk yj xi ci x1 0 x0 y3 y2 y1 y0 0 0 0 0 co FA ppk+1 Critical path 0 x3 0 x2 0 54 FA FA FA FA 0 p7 p6 p5 p4 p3 p2 p1 p0

Types of Array Multipliers Baugh-Wooley Algorithm: Signed product by two s complement addition or subtraction according to the MSB s. Booth multiplier algorithm Tree multipliers Reference: N. H. E. Weste and D. Harris, CMOS VLSI Design, A Circuits and Systems Perspective, Third Edition, Boston: Addison- Wesley, 2005. 55

56 divisor Division quotient dividend remainder 1001 1000 1001010-1000 10 101 1010-1000 10 n-bit operands yield n-bit quotient and remainder Check for 0 divisor Long division approach If divisor dividend bits 1 bit in quotient, subtract Otherwise 0 bit in quotient, bring down next dividend bit Restoring division Do the subtract, and if remainder goes < 0, add divisor back Signed division Divide using absolute values Adjust sign of quotient and remainder as required ELEC 5200/6200 - From P-H slides 3.4 Division

Division Hardware Initially divisor in left half Initially dividend 57 ELEC 5200/6200 - From P-H slides

Optimized Divider One cycle per partial-remainder subtraction Looks a lot like a multiplier! Same hardware can be used for both 58 ELEC 5200/6200 - From P-H slides

Deriving a Better Algorithm (1) Start $R=0, $M=Divisor, $Q=Dividend, count=n Cycle contains 2 additions $Q0=1 Shift 1-bit left $R, $Q $R $R - $M No $R < 0? Yes count = count - 1 $R (33 b) $Q (32 b) $R and $M have one extra sign bit beyond 32 bits. $Q0=0 $R $R+$M Restore $R (remainder) No count = 0? Yes Done $Q=Quotient $R= Remainder 59

Start Deriving a Better Algorithm (2) $R=0, $M=Divisor, $Q=Dividend, count=n Shift 1-bit left $R, $Q $R (33 b) $Q (32 b) $R $R - $M $Q0=1 No $R < 0? Yes $Q0=0 No $R < 0? Yes $R $R+$M Restore $R (remainder) count = count - 1 60 No count = 0? Yes Done $Q=Quotient $R= Remainder

Deriving a Better Algorithm (3) Start $R=0, $M=Divisor, $Q=Dividend, count=n Shift 1-bit left $R, $Q $R (33 b) $Q (32 b) $R $R - $M $Q0=1 No $R < 0? Yes $Q0=0 count = count - 1 No $R < 0? No count = 0? Yes $R < 0? No Yes $R $R+$M Restore $R (remainder) Restore $R (remainder) Yes $R $R+$M Done $Q=Quotient $R= Remainder 61

Deriving a Better Algorithm (4) Start $R=0, $M=Divisor, $Q=Dividend, count=n Yes $R < 0? $R $R+$M No Shift 1-bit left $R, $Q $R,$Q = a $M 2 32 = b Restore $R (remainder) $R (33 b) $Q (32 b) 2a b No $Q0=1 $R $R - $M $R < 0? Yes (a + b)2 b = 2a + b $Q0=0 62 No count = count - 1 count = 0? Yes Restore $R (remainder) Restore $R (remainder) No $R < 0? Yes $R $R+$M Done $Q=Quotient $R= Remainder

Deriving a Better Algorithm (4) Start $R=0, $M=Divisor, $Q=Dividend, count=n No Shift 1-bit left $R, $Q $R < 0? Yes Shift 1-bit left $R, $Q $R (33 b) $Q (32 b) $R,#Q = a $M 2 32 = b $R $R - $M $R $R+$M $Q0=1 2a b 2a + b No $R < 0? Yes $Q0=0 count = count - 1 No count = 0? Yes $R < 0? No 63 Restore $R (remainder) Yes $R $R+$M Done $Q=Quotient $R= Remainder

Non-Restoring Division Algorithm Start $R=0, $M=Divisor, $Q=Dividend, count=n $R (33 b) $Q (32 b) $R $R - $M No Shift 1-bit left $R, $Q $R < 0? Yes Shift 1-bit left $R, $Q $R $R+$M 64 Cycle contains 1 addition $Q0=1 No No $R < 0? count = count - 1 count = 0? Yes Yes Restore $R (remainder) $Q0=0 $R < 0? Yes $R $R+$M No Done $Q=Quotient $R= Remainder

Non-Restoring Division Avoids the addition in the restore operation does exactly one add or subtract per cycle. Non-restoring division algorithm: Step 1: Repeat 32 times if sign bit of $R is 0 Left shift $R,Q one-bit and subtract, $R $R - $M else (sign bit of $R is 1) Left shift $R,Q one-bit and add, $R $R + $M if sign bit of resulting $R is 0 Set Q0 = 1 else (sign bit of resulting $R is 1) Set Q0 = 0 Step 2: (after 32 Step 1 iterations) if sign bit of $R is 1, add $R $R + $M 65

Non-Restoring Division: 8/3 = 2 (Rem=2) Iteration 4 Iteration 3 Iteration 2 Iteration 1 66 Initialize $R = 0 0 0 0 0 $Q = 1 0 0 0 $M = 0 0 0 1 1 Step 1, L-shift $R,Q = 0 0 0 0 1 $Q = 0 0 0? $M = 0 0 0 1 1 Subtract - $M = 1 1 1 0 1 $R = 1 1 1 1 0 $Q = 0 0 0 0 Step 1, L-shift $R,Q = 1 1 1 0 0 $Q = 0 0 0? $M = 0 0 0 1 1 Add +$M = 0 0 0 1 1 $R = 1 1 1 1 1 $Q = 0 0 0 0 Step 1, L-shift $R,Q = 1 1 1 1 0 $Q = 0 0 0? $M = 0 0 0 1 1 Add +$M = 0 0 0 1 1 $R = 0 0 0 0 1 $Q = 0 0 0 1 Step 1, L-shift $R,Q = 0 0 0 1 0 $Q = 0 0 1? $M = 0 0 0 1 1 Subtract -$M = 1 1 1 0 1 $R = 1 1 1 1 1 $Q = 0 0 1 0 Final quotient Step 2, Add $R $R + $M = 11111+00011 = 00010 (Final remainder) ELEC 5200-001/6200-001 Lecture 11 Spring 2010

Faster Division Can t use parallel hardware as in multiplier Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT devision) generate multiple quotient bits per step Still require multiple steps 67

MIPS Division Use HI/LO registers for result HI: 32-bit remainder LO: 32-bit quotient Instructions div rs, rt / divu rs, rt No overflow or divide-by-0 checking Software must perform checks if required Use mfhi, mflo to access result 68