T insn-mem T regfile T ALU T data-mem T regfile

Similar documents
This Unit: Arithmetic. CIS 371 Computer Organization and Design. Readings. Pre Class Exercise

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay

Tailoring the 32-Bit ALU to MIPS

Fundamentals of Computer Systems

COMP 303 Computer Architecture Lecture 6

9/6/2011. Multiplication. Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

CPE300: Digital System Architecture and Design

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

carry in carry 1101 carry carry

Computer Architecture Set Four. Arithmetic

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

Chapter 3 Arithmetic for Computers

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Module 2: Computer Arithmetic

CS/COE 0447 Example Problems for Exam 2 Spring 2011

Week 7: Assignment Solutions

Lecture 6: Signed Numbers & Arithmetic Circuits. BCD (Binary Coded Decimal) Points Addressed in this Lecture

Fast Arithmetic. Philipp Koehn. 19 October 2016

Integer Multiplication and Division

Number Systems and Computer Arithmetic

Logic Design (Part 2) Combinational Logic Circuits (Chapter 3)

ALU Design. 1-bit Full Adder 4-bit Arithmetic circuits. Arithmetic and Logic Unit Flags. Add/Subtract/Increament/Decrement Circuit

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Fundamentals of Computer Systems

Chapter 5: Computer Arithmetic. In this chapter you will learn about:

Arithmetic and Logical Operations

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

Arithmetic Logic Unit. Digital Computer Design

Chapter 3: part 3 Binary Subtraction

ECE260: Fundamentals of Computer Engineering

Learning Objectives. Binary over Decimal. In this chapter you will learn about:

Kinds Of Data CHAPTER 3 DATA REPRESENTATION. Numbers Are Different! Positional Number Systems. Text. Numbers. Other

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: The MIPS ISA (P&H ) Consulting hours. Milestone #1 (due 1/26)

Homework 3. Assigned on 02/15 Due time: midnight on 02/21 (1 WEEK only!) B.2 B.11 B.14 (hint: use multiplexors) CSCI 402: Computer Architectures

Chapter 5: Computer Arithmetic

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Timing for Ripple Carry Adder

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

NUMBER OPERATIONS. Mahdi Nazm Bojnordi. CS/ECE 3810: Computer Organization. Assistant Professor School of Computing University of Utah

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

Chapter 4 Arithmetic Functions

ECE 30 Introduction to Computer Engineering

Lecture 8: Addition, Multiplication & Division

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

EEC 483 Computer Organization

Microcomputers. Outline. Number Systems and Digital Logic Review

Addition and multiplication

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

CS 64 Week 1 Lecture 1. Kyle Dewey

Arithmetic Processing

MIPS Integer ALU Requirements

Chapter 3: Arithmetic for Computers

CHW 261: Logic Design

D I G I T A L C I R C U I T S E E

EE292: Fundamentals of ECE

Arithmetic Operations

CPS 104 Computer Organization and Programming

Iterative Division Techniques COMPUTER ARITHMETIC: Lecture Notes # 6. University of Illinois at Chicago

By, Ajinkya Karande Adarsh Yoga

Arithmetic Logic Unit

Review: MIPS Organization

Revision: August 31, E Main Suite D Pullman, WA (509) Voice and Fax

MULTIPLICATION TECHNIQUES

Binary Adders. Ripple-Carry Adder

CS 31: Introduction to Computer Systems. 03: Binary Arithmetic January 29

Math in MIPS. Subtracting a binary number from another binary number also bears an uncanny resemblance to the way it s done in decimal.

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Logic, Words, and Integers

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

University of Illinois at Chicago. Lecture Notes # 13

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 1: Basic Concepts. Chapter Overview. Welcome to Assembly Language

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 10 Binary Arithmetics

Binary Adders: Half Adders and Full Adders

Combinational Logic Use the Boolean Algebra and the minimization techniques to design useful circuits No feedback, no memory Just n inputs, m outputs

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

COE 202: Digital Logic Design Number Systems Part 2. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

Organisasi Sistem Komputer

Principles of Computer Architecture. Chapter 3: Arithmetic

COMPUTER ORGANIZATION AND. Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

ECE331: Hardware Organization and Design

Divide: Paper & Pencil

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

PESIT Bangalore South Campus

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

Computer Architecture and Organization

UNIT-III COMPUTER ARTHIMETIC

Transcription:

This Unit: rithmetic CI 371 Computer Organization and Design Unit 3: rithmetic pp pp pp ystem software Mem CPU I/O! little review! Binary + 2s complement! Ripple-carry addition (RC)! Fast integer addition! Carry-select (Ce)! hifters! Integer Multiplication and division! Floating point arithmetic CI371 (Roth/Martin): rithmetic 1 CI371 (Roth/Martin): rithmetic 2 Readings The Importance of Fast ddition! P&H! Chapter 3 + 4 PC Insn Mem Register File s1 s2 d Data Mem T insn-mem T regfile T LU T data-mem T regfile! ddition of two numbers is most common operation! Programs use addition frequently! Loads and stores use addition for address calculation! Branches use addition to test conditions and calculate targets! ll insns use addition to calculate default next PC! Fast addition critical to high performance CI371 (Roth/Martin): rithmetic 3 CI371 (Roth/Martin): rithmetic 4

Review: Binary Integers! Computers represent integers in binary (base2) 3 = 11, 4 = 1, 5 = 11, 3 = 1111 +! Natural since only two values are represented! ddition, etc. take place as usual (carry the 1, etc.) 17 = 11 +5 = 11 22 = 111! ome old machines use decimal (base1) with only /1 3 = 11! Unnatural for digial logic, implementation complicated & slow Fixed Width! On pencil and paper, integers have infinite width! In hardware, integers have fixed width! N bits: 16, 32 or 64! LB is 2, MB is 2 N-1! Range: to 2 N 1! Numbers >2 N represented using multiple fixed-width integers! In software CI371 (Roth/Martin): rithmetic 5 CI371 (Roth/Martin): rithmetic 6 What bout Negative Integers?! ign/magnitude! Unsigned plus one bit for sign 1 = 11, -1 = 111 +! Matches our intuition from by hand decimal arithmetic! Both and! ddition is difficult! Range: (2 N-1 1) to 2 N-1 1! Option II: two s complement (2C)! Leading s mean positive number, leading 1s negative 1 = 11, -1 = 111111 +! One representation for +! Easy addition! Range: (2 N-1 ) to 2 N-1 1 The Tao of 2C! How did 2C come about?! Let s design a representation that makes addition easy! Think of subtracting 1 from by hand! Have to borrow 1s from some imaginary leading 1 = 1-1 = 11-1 = 111111! Now, add the conventional way -1 = 111111 +1 = 11 = 1 CI371 (Roth/Martin): rithmetic 7 CI371 (Roth/Martin): rithmetic 8

till More On 2C! What is the interpretation of 2C?! ame as binary, except MB represents 2 N 1, not 2 N 1! 1 = 111111 = 2 7 +2 6 +2 5 +2 4 +2 2 +2 1 +! Extends to any width! 1 = 1111 = 2 5 +2 4 +2 2 +2 1! Why? 2 N = 2*2 N 1! 2 5 +2 4 +2 2 +2 1 = ( 2 6 +2*2 5 ) 2 5 +2 4 +2 2 +2 1 = 2 6 +2 5 +2 4 +2 2 +2 1! Trick to negating a number quickly: B = B + 1! (1) = (1) +1 = 111+1 = 1111 = 1! ( 1) = (1111) +1 = +1 = 1 = 1! () = () +1 = 1111+1 = =! Think about why this works 1st Grade: Decimal ddition 1 43 +29 72! Repeat N times! dd least significant digits and any overflow from previous add! Carry overflow to next addition! Overflow: any digit other than least significant of sum! hift two addends and sum one digit to the right! um of two N-digit numbers can yield an N+1 digit number CI371 (Roth/Martin): rithmetic 9 CI371 (Roth/Martin): rithmetic 1 Binary ddition: Works the ame Way The Half dder 1 111111 43 = 1111 +29 = 1111 72 = 11! Repeat N times! dd least significant bits and any overflow from previous add! Carry the overflow to next addition! hift two addends and sum one bit to the right! um of two N-bit numbers can yield an N+1 bit number! More steps (smaller base) +! Each one is simpler (adding just 1 and )! o simple we can do it in hardware! How to add two binary integers in hardware?! tart with adding two bits! When all else fails... look at truth table B = C = 1 = 1 1 = 1 1 1 = 1! = ^B! (carry out) = B! This is called a half adder B B H CI371 (Roth/Martin): rithmetic 11 CI371 (Roth/Martin): rithmetic 12

The Other Half! We could chain half adders together, but to do that! Need to incorporate a carry out from previous adder C B = C CI = 1 = 1 1 = 1 CI 1 1 = 1 1 = 1 B F 1 1 = 1 B 1 1 = 1 1 1 1 = 1 1! = C B + C B + C B + CB = C ^ ^ B! = C B + C B + CB + CB = C + CB + B! This is called a full adder CI371 (Roth/Martin): rithmetic 13 Ripple-Carry dder! N-bit ripple-carry adder! N 1-bit full adders chained together! = CI 1, 1 = CI 2, etc.! CI =! N 1 is carry-out of entire adder! N 1 = 1! overflow! Example: 16-bit ripple carry adder! How fast is this?! How fast is an N-bit ripple-carry adder? F B 1 F 1 B 1 2 F 2 B 2 15 F 15 B 15 CI371 (Roth/Martin): rithmetic 14 Quantifying dder Delay! Combinational logic dominated by gate (transistor) delays! rray storage dominated by wire delays! Longest delay or critical path is what matters! Can implement any combinational function in 2 logic levels! 1 level of ND + 1 level of OR (PL)! NOTs are free : push to input (DeMorgan s) or read from latch! Example: delay(fulldder) = 2! d(carryout) = delay(b + C + BC)! d(um) = d( ^ B ^ C) = d(b C + BC + BC + BC) = 2! Note ^ means Xor (just like in C & Java)! Caveat: 2 assumes gates have few (<8?) inputs Ripple-Carry dder Delay! Longest path is to 15 (or 15 )! d( 15 ) = 2 + MX(d( 15 ),d(b 15 ),d(ci 15 ))! d( 15 ) = d(b 15 ) =, d(ci 15 ) = d( 14 )! d( 15 ) = 2 + d( 14 ) = 2 + 2 + d( 13 )! d( 15 ) = 32! D( N 1 ) = 2N! Too slow!! Linear in number of bits! Number of gates is also linear F B 1 F 1 B 1 2 F 2 B 2 15 F 15 B 15 CI371 (Roth/Martin): rithmetic 15 CI371 (Roth/Martin): rithmetic 16

Bad idea: a PL-based dder?! If any function can be expressed as two-level logic! why not use a PL for an entire 8-bit adder?! Not small! pprox. 2 15 ND gates, each with 2 16 inputs! Then, 2 16 OR gates, each with 2 16 inputs! Number of gates exponential in bit width!! Not that fast, either! n ND gate with 65 thousand inputs!= 2-input ND gate! Many-input gates made a tree of, say, 4-input gates! 16-input gates would have at least 8 logic levels! o, at least 16 levels of logic for a 16-bit PL! Even so, delay is still logarithmic in number of bits! There are better (faster, smaller) ways Theme: Hardware!= oftware! Hardware can do things that software fundamentally can t! nd vice versa (of course)! In hardware, it s easier to trade resources for latency! One example of this: speculation! low computation is waiting for some slow input?! Input one of two things?! Compute with both (slow), choose right one later (fast)! Does this make sense in software? Not on a uni-processor! Difference? hardware is parallel, software is sequential CI371 (Roth/Martin): rithmetic 17 CI371 (Roth/Martin): rithmetic 18 Carry-elect dder! Carry-select adder! Do 15-8 +B 15-8 twice, once assuming C 8 ( 7 ) =, once = 1! Choose the correct one when 7 finally becomes available +! Effectively cuts carry chain in half (break critical path)! But adds mux 15- B 15-! Delay? 15-7- 8+ B 7-7- 16 15-8 15-8 8+ B 15-8 1 16 15-8 15-8 8+ B 15-8 15-8 18 CI371 (Roth/Martin): rithmetic 19 Multi-egment Carry-elect dder! Multiple segments! Example: 5, 5, 6 bit = 16 bit! Hardware cost! till mostly linear (~2x)! Compute each segment with and 1 carry-in! erial mux chain! Delay! 5-bit adder (1) + Two muxes (4) = 14 4-5+ B 4-4- 1 9-5 9-5 5+ B 9-5 1 1 8-5 9-5 5+ B 9-5 15-1 15-1 6+ B 15-1 1 15-1 15-1 6+ B 15-1 8-5 CI371 (Roth/Martin): rithmetic 2 12 12 15-1 14

Carry-elect dder Delay! What is carry-select adder delay (two segment)?! d( 15 ) = MX(d( 15-8 ), d( 7- )) + 2! d( 15 ) = MX(2*8, 2*8) + 2 = 18! In general: 2*(N/2) + 2 = N+2 (vs 2N for RC)! What if we cut adder into 4 equal pieces?! Would it be 2*(N/4) + 2 = 1? Not quite! d( 15 ) = MX(d( 15-12 ),d( 11- )) + 2! d( 15 ) = MX(2*4, MX(d( 11-8 ),d( 7- )) + 2) + 2! d( 15 ) = MX(2*4,MX(2*4,MX(d( 7-4 ),d( 3- )) + 2) + 2) + 2! d( 15 ) = MX(2*4,MX(2*4,MX(2*4,2*4) + 2) + 2) + 2! d( 15 ) = 2*4 + 3*2 = 14 nother Option: Carry Lookahead! Is carry-select adder as fast as we can go?! Nope! nother approach to using additional resources! Instead of redundantly computing sums assuming different carries! Use redundancy to compute carries more quickly! This approach is called carry lookahead (CL)! N-bit adder in M equal pieces: 2*(N/M) + (M 1)*2! 16-bit adder in 8 parts: 2*(16/8) + 7*2 = 18 CI371 (Roth/Martin): rithmetic 21 CI371 (Roth/Martin): rithmetic 22 Carry Lookahead dder (CL) dders In Real Processors! Calculate propagate and generate based on, B! Not based on carry in! Combine with tree structure! Prior years: CL covered in great detail! Dozen slides or so! Not this year! Take aways! Tree gives logarithmic delay! Reasonable area C C 1 C 2 C 3 B 1 B 1 2 B 2 3 B 3 G P G 1- G 1 P 1 G 2 P 2 G 3 P 3 P 1- C 1 G 3-2 P 3-2 C 3 G 3- P 3- C 2! Real processors super-optimize their adders! Ten or so different versions of CL! Highly optimized versions of carry-select! Other gate techniques: carry-skip, conditional-sum! ub-gate (transistor) techniques: Manchester carry chain! Combinations of different techniques! lpha 21264 uses CL+Ce+RippleC! Used a different levels! Even more optimizations for incrementers! Why? C 4 C 4 CI371 (Roth/Martin): rithmetic 23 CI371 (Roth/Martin): rithmetic 24

ubtraction: ddition s Tricky Pal! ign/magnitude subtraction is mental reverse addition! 2C subtraction is addition! How to subtract using an adder?! sub B = add -B! Negate B before adding (fast negation trick: B = B + 1)! Isn t a subtraction then a negation and two additions? +! No, an adder can implement +B+1 by setting the carry-in to 1 B 1 CI371 (Roth/Martin): rithmetic 25 CI371 (Roth/Martin): rithmetic 26 ~ 16-bit LU! Build an LU with functions: add/sub, and, or, not,xor! ll of these already in CL adder/subtracter! add B, sub B check! not B is needed for subtraction! and B are first level Gs! or B are first level Ps! xor B?! i = i^b i^c i! What is still missing? B CI371 (Roth/Martin): rithmetic 27 ~ 1 & G P ^ dd -sum ^ hift and Rotation Instructions! Left/right shifts are useful! Fast multiplication/division by small constants (next)! Bit manipulation: extracting and setting individual bits in words! Right shifts! Can be logical (shift in s) or arithmetic (shift in copies of MB) srl 1111, 2 = 11 sra 1111, 2 = 1111! Caveat: sra is not equal to division by 2 of negative numbers! Rotations are less useful! But almost free if shifter is there! MIP and LC4 have only shifts, x86 has shifts and rotations CI371 (Roth/Martin): rithmetic 28

imple hifter! The simplest 16-bit shifter: can only shift left by 1! Implement using wires (no logic!)! lightly more complicated: can shift left by 1 or! Implement using wires and a multiplexor (mux16_2to1) Barrel hifter! What about shifting left by any amount 15?! 16 consecutive left-shift-by-1-or- blocks?! Would take too long (how long?)! Barrel shifter: 4 shift-left-by-x-or- blocks (X = 1,2,4,8)! What is the delay? O <<8 <<4 <<2 <<1 O 15 <<1 O <<1 O shift shift[3] shift[2] shift[1] shift[]! imilar barrel designs for right shifts and rotations CI371 (Roth/Martin): rithmetic 29 CI371 (Roth/Martin): rithmetic 3 3rd Grade: Decimal Multiplication 19 // multiplicand * 12 // multiplier 38 + 19 228 // product! tart with product, repeat steps until no multiplier digits! Multiply multiplicand by least significant multiplier digit! dd to product! hift multiplicand one digit to the left (multiply by 1)! hift multiplier one digit to the right (divide by 1)! Product of N-digit, M-digit numbers may have N+M digits Binary Multiplication: ame Refrain 19 = 111 // multiplicand * 12 = 11 // multiplier = = 76 = 111 152 = 111 = + = 228 = 1111 // product ±! maller base! more steps, each is simpler! Multiply multiplicand by least significant multiplier digit +! or 1! no actual multiplication, add multiplicand or not! dd to total: we know how to do that! hift multiplicand left, multiplier right by one digit CI371 (Roth/Martin): rithmetic 31 CI371 (Roth/Martin): rithmetic 32

oftware Multiplication! Can implement this algorithm in software! Inputs: md (multiplicand) and mr (multiplier) int pd = ; // product int i = ; for (i = ; i < 16 && mr!= ; i++) { if (mr & 1) { pd = pd + md; } md = md << 1; // shift left mr = mr >> 1; // shift right } CI371 (Roth/Martin): rithmetic 33 Hardware Multiply: Iterative Multiplicand Multiplier << 1 >> 1 (32 bit) (16 bit) 32 Product (32 bit) 32+! Control: repeat 16 times! If least significant bit of multiplier is 1! Then add multiplicand to product! hift multiplicand left by 1! hift multiplier right by 1 CI371 (Roth/Martin): rithmetic 34 we lsb==1? Hardware Multiply: Multiple dders C<<4 C<<3 C<<3 P C<<2 C<<2 C<<1 C<<1 C C C<<4 P Consecutive ddition F 3 B 3 2 B 2 1 B 1 F F F F T 3 D 3 T 2 D 2 T 1 D 1 T F F F F 3 2 1 B C B D C D! 2 N-bit RC adders +! 2 + d(add) gate delays! M N-bit RC adders delay! Naïve: O(M*N)! ctual: O(M+N)! Multiply by N bits at a time using N adders! Example: N=5, terms (P=product, C=multiplicand, M=multiplier)! P = (M[]? (C) : ) + (M[1]? (C<<1) : ) + (M[2]? (C<<2) : ) + (M[3]? (C<<3) : ) +! rrange like a tree to reduce gate delay critical path! Delay? N 2 vs N*log N? Not that simple, depends on adder! pprox 2N versus N + log N, with optimization: O(log N) CI371 (Roth/Martin): rithmetic 35 F T 3 3 B 3 D 3 2 B 2 D 2 1 B 1 D 1 B D! M N-bit Carry elect? F F F F! Delay calculation tricky T 2 T 1 T C B C D! Carry ave dder (C) F F F F! 3-to-2 C tree + adder! Delay: O(log M + log N) 3 2 1 CI371 (Roth/Martin): rithmetic 36

Hardware!= oftware: Part Deux! Recall: hardware is parallel, software is sequential! Exploit: evaluate independent sub-expressions in parallel! Example I: = + B + C + D! oftware? 3 steps: (1) 1 = +B, (2) 2 = 1+C, (3) = 2+D +! Hardware? 2 steps: (1) 1 = +B, 2=C+D, (2) = 1+2 side: trength Reduction! trength reduction: compilers will do this (sort of) * 4 = << 2 / 8 = >> 3 * 5 = ( << 2) +! Useful for address calculation: all basic data types are 2 M in size int [1]; &[N] = +(N*sizeof(int)) = +N*4 = +N<<2! Example II: = + B + C! oftware? 2 steps: (1) 1 = +B, (2) = 1+C! Hardware? 2 steps: (1) 1 = +B (2) = 1+C +! ctually hardware can do this in 1.2 steps!! ub-expression parallelism exists below 16-bit addition level CI371 (Roth/Martin): rithmetic 37 CI371 (Roth/Martin): rithmetic 38 4th Grade: Decimal Division 9 // quotient 3 29 // divisor dividend -27 2 // remainder! hift divisor left (multiply by 1) until MB lines up with dividend s! Repeat until remaining dividend (remainder) < divisor! Find largest single digit q such that (q*divisor) < dividend! et LB of quotient to q! ubtract (q*divisor) from dividend! hift quotient left by one digit (multiply by 1)! hift divisor right by one digit (divide by 1) Binary Division 11 = 9 3 29 = 11 1111-24 = - 11 5 = 11-3 = - 11 2 = 1 CI371 (Roth/Martin): rithmetic 39 CI371 (Roth/Martin): rithmetic 4

Binary Division Hardware! ame as decimal division, except (again)! More individual steps (base is smaller) +! Each step is simpler! Find largest bit q such that (q*divisor) < dividend! q = or 1! ubtract (q*divisor) from dividend! q = or 1! no actual multiplication, subtract divisor or not! Complication: largest q such that (q*divisor) < dividend! How do you know if (1*divisor) < dividend?! Human can eyeball this! Computer does not have eyeballs! ubtract and see if result is negative oftware Divide lgorithm! Can implement this algorithm in software! Inputs: dividend and divisor for (int i = ; i < 32; i++) {! ""remainder = (remainder << 1) (dividend >> 31);! ""if (remainder >= divisor) {! """"quotient = (quotient << 1) 1;! """"remainder = remainder - divisor;! ""} else {! quotient = quotient << 1! }! ""dividend = dividend << 1;! }! CI371 (Roth/Martin): rithmetic 41 CI371 (Roth/Martin): rithmetic 42 Divide Example! Input: Divisor = 11, Dividend = 1111 Divider Circuit Divisor tep Remainder Quotient Remainder Dividend 1111 1 1 1 111 2 11 1 11 3 1 1 1 1 4 1 1 1 1 5 11 11 1! Result: Quotient: 11, Remainder: 1 ub >= Remainder msb hift in or 1 Quotient hift in or 1 Dividend hift in! N cycles for n-bit divide CI371 (Roth/Martin): rithmetic 43 CI371 (Roth/Martin): rithmetic 44

rithmetic Latencies! Latency in cycles of common arithmetic operations! ource: oftware Optimization Guide for MD Family 1h Processors, Dec 27! Intel Core 2 chips similar Int 32 Int 64 dd/ubtract 1 1 Multiply 3 5 Divide 14 to 4 23 to 87! Divide is variable latency based on the size of the dividend! Detect number of leading zeros, then divide ummary pp pp pp ystem software Mem CPU I/O! Integer addition! Most timing-critical operation in datapath! Hardware!= software! Exploit sub-addition parallelism! Fast addition! Carry-select: parallelism in sum! Multiplication! Chains and trees of additions! Division! Next up: Floating point CI371 (Roth/Martin): rithmetic 45 CI371 (Roth/Martin): rithmetic 46