Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

Similar documents
COMP 303 Computer Architecture Lecture 6

Tailoring the 32-Bit ALU to MIPS

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

Homework 3. Assigned on 02/15 Due time: midnight on 02/21 (1 WEEK only!) B.2 B.11 B.14 (hint: use multiplexors) CSCI 402: Computer Architectures

ECE 30 Introduction to Computer Engineering

Two-Level CLA for 4-bit Adder. Two-Level CLA for 4-bit Adder. Two-Level CLA for 16-bit Adder. A Closer Look at CLA Delay

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

ECE 341. Lecture # 7

Binary Adders. Ripple-Carry Adder

Integer Multiplication and Division

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter 3 Arithmetic for Computers

CPE300: Digital System Architecture and Design

Integer Multiplication and Division

Chapter 3: Arithmetic for Computers

Number Systems and Computer Arithmetic

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Lecture 8: Addition, Multiplication & Division

Let s put together a Manual Processor

CPS 104 Computer Organization and Programming

EEC 483 Computer Organization

361 div.1. Computer Architecture EECS 361 Lecture 7: ALU Design : Division

MIPS Integer ALU Requirements

ECE 341. Lecture # 6

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Week 7: Assignment Solutions

COMPUTER ORGANIZATION AND DESIGN

By, Ajinkya Karande Adarsh Yoga

Chapter 4. The Processor

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Computer Architecture Set Four. Arithmetic

More complicated than addition. Let's look at 3 versions based on grade school algorithm (multiplicand) More time and more area

ECE260: Fundamentals of Computer Engineering

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Boolean Unit (The obvious way)

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

Unsigned Binary Integers

Unsigned Binary Integers

Today s Outline. CS152 Computer Architecture and Engineering Lecture 5. VHDL, Multiply, Shift

ECE260: Fundamentals of Computer Engineering

MULTIPLE OPERAND ADDITION. Multioperand Addition

Computer Organization EE 3755 Midterm Examination

McGill University Faculty of Engineering FINAL EXAMINATION Fall 2007 (DEC 2007)

Processor (I) - datapath & control. Hwansoo Han

CPE 335 Computer Organization. MIPS Arithmetic Part I. Content from Chapter 3 and Appendix B

Arithmetic Logic Unit

ECE 2300 Digital Logic & Computer Organization. More Single Cycle Microprocessor

Binary Multiplication

Arithmetic Logic Unit. Digital Computer Design

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Embedded Systems Ch 15 ARM Organization and Implementation

Jan Rabaey Homework # 7 Solutions EECS141

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

Chapter 4. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

T insn-mem T regfile T ALU T data-mem T regfile

Review: MIPS Organization

NUMBER OPERATIONS. Mahdi Nazm Bojnordi. CS/ECE 3810: Computer Organization. Assistant Professor School of Computing University of Utah

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Arithmetic and Logical Operations

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Week 6: Processor Components

Improving Performance: Pipelining

Chapter 3: part 3 Binary Subtraction

CENG3420 L05: Arithmetic and Logic Unit

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Microcomputers. Outline. Number Systems and Digital Logic Review

ECE 341 Midterm Exam

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

Programmable Machines

Basic Arithmetic and the ALU. Basic Arithmetic and the ALU. Background. Background. Integer multiplication, division

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

Implementing FIR Filters

Digital Computer Arithmetic

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

ENGR 303 Introduction to Logic Design Lecture 7. Dr. Chuck Brown Engineering and Computer Information Science Folsom Lake College

Programmable Machines

MULTIPLICATION TECHNIQUES

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

CS/COE 0447 Example Problems for Exam 2 Spring 2011

ECE331: Hardware Organization and Design

Partial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks

ECE 341 Midterm Exam

Transcription:

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 200 Multiply We will start with unsigned multiply and contrast how humans and computers multiply Layout 8-bit 8 Pipelined Multiplier 1 2 We first generate all partial product terms We first generate all partial product terms Partial Product Partial Product 3 4 We first generate all partial product terms We first generate all partial product terms 1 Partial Product 101 Partial Product 6

Then add column by column, right to left Then add column by column, right to left == 0 Product == 10 Product 7 8 Then add column by column, right to left Sometimes with one or more carry digits == 010 Product 1 == 0010 Product 9 10 Sometimes with one or more carry digits Sometimes with one or more carry digits 11 == 00010 Product 111 == 10 Product 11 12

Sometimes with one or more carry digits Sometimes with one or more carry digits 1111 == 010 Product 1111 == Product 13 14 Sometimes with one or more carry digits 1111 == Product Human Method Not Best for Computers Each partial product must be stored extra hardware Columns vary in size complexity Multiple-digit carries complexity Need a simpler method for low-cost multipliers 1 16 Simplest computer multiply is to shift left one bit per iteration to generate partial product Each iteration if corresponding Multiplier bit is 1: Product = Product + NxN bit multiply takes N iterations (N clock cycles) Old Product New Product 17 18

Old Product New Product 1 Old Product 00110010 New Product 19 20 101 00110010 Old Product New Product Computer multiply also shifts multiplier right so current multiplier bit is at a fixed position, the least significant bit (LSB) 21 22 Old Product New Product x 110 Old Product New Product 23 24

x 111 Old Product 00110010 New Product x 1 00110010 Old Product New Product 2 26 Software Multiple Very simple processors don t t have hardware, implement shift & add in software Multiply in C: int product = 0; for(int i = 0; i<32; i++) if( ( multiplier>> >>i i % 2 ) == 1) product = product + multiplicand<< <<i; MIPS Assembly Multiply # multiplicand is in $a0, need not save # multiplier is in $a1, need not save # move $v0,0 #$v0 is the product Loop: andi $t1,$a1,1 #get multiplier bit beq $t1,$0,next #test bit add $v0,$v0,$a0 #add partial product Next: sll $a0,$a0,1 #get next partial product slr $a1,$a1,1 #position multiplier bit bne $a1,$0,loop #got any bits left? 27 28 Simple Hardware Multiply Like shift & add we have already seen Multiplier LSB is write enable for Product latch Simple Hardware Multiply Multiplier Multiplier 000 1101 00 0 0110 8-bit ALU 8-bit ALU Product Product 29 30

Simple Hardware Multiply Simple Hardware Multiply Multiplier Multiplier 0 0011 000 000 0001 8-bit ALU 8-bit ALU Product Product 00110010 010 31 32 Simple Hardware Multiply 8-bit ALU 000 000 Product 1010 Multiplier Refined Multiply Hardware Notice the following about the simple shift & add hardware: Only N significant bits are being summed each cycle, but we are using a 2N-bit adder, a waste Each cycle one new bit of the product is resolved, while one old bit of the multiplier is discarded Simple multiply shifts left and keeps Product stationary. It is equivalent to keep stationary and shift Product right (same relative motion). 33 34 Refined Hardware Multiply ALU input is accept/not accept based on Start of 1 st Iteration Refined Hardware Multiply When =0, shift but no ALU input Start of 2 nd Iteration 1101 0101 0110 3 36

Refined Hardware Multiply Refined Hardware Multiply Start of 3 rd Iteration Start of 4 th Iteration 0010 1011 11 0110 0101 37 38 Refined Hardware Multiply Final : 10 x 13 = 130 1000 0010 Product 39 Signed Multiplication Shift and Add only works for positive numbers To include negative numbers must: Save XOR of sign bits to get product sign bit Convert multiplier/multiplicand to positive Do shift and add algorithm Negate result if product sign bit is 1 handles positive /negative numbers uniformly Based on observation that 011 1110 = 100 = - 000 0010 E.g., 01110 2 (14 10 ) = 1 2 (16 10 ) - 00010 2 (2 10 ) = 1 2-00010 2 I.e. convert string of 1s into leading +1 and a trialing -1 40 1 ten = 00010 1 1 ten = 0001 11 41 42

1 ten = 0001 011 1 ten = 000101 0011 = 2-43 44-1 ten = 11110 1-1 ten = 1111 01 4 46-1 ten = 1111 001-1 ten = 1111 0001 = -1 47 48

-1 ten = 1111 0001 = -1-6 ten = 0 0 49-1 ten = 1111 0001 = -1-6 ten = 10 0-1 ten = 1111 0001 = -1-6 ten = 110 1-1 ten = 1111 0001 = -1-6 ten = 11100 = -8+4-2= -6 2 Booth Hardware Implementation Use ALU to ADD or SUB based on the trailing 1s, leading 1s from the Multiplier Booth Hardware Implementation Product shift is arithmetic shift, sign bit does not change Start of 1 st iteration Start of 2 nd iteration 1101 0 0011 0110 1 3 4

Booth Hardware Implementation Booth Hardware Implementation Start of 3 rd iteration Start of 4 th iteration 1110 1011 11 0 0010 0101 1 6 Booth Hardware Implementation Fast Multiply Final : -66 x 33 = 18 Shift & Add is slow: 32x slower than addition Fast processors have fast multiply hardware using more hardware than shift & add Basic example is a Wallace tree adder, which uses an array of full adder cells 0001 0010 1 7 8 Wallace Tree Stage 0 Wallace Tree Reducers Partial products produced sequentially (slow) Multiplier Partial Products Partial products produced using array of AND gates (fast) Multiplier 1 1 0 1 Partial Products 1 0 1 0 Full adder cells used to reduce column segment of height 3 to row of width 2 sum and carry out Half adders used selectively for height-3 column that does not need full reduction Full Adder Half Adder 9 60

Wallace Tree Strategy In stages reduce column height from <=N to height 2, then use fast adder Wallace Tree Adder 61 62 63 64 6 66

67 68 69 70 71 72

6-Bit Adder 6-Bit Adder 73 74 Wallace Tree Multiplier: 8x8 Example x x x x x x x x y y y y y y y y Wallace Tree Mult (cont.) Column height is reduced by about 2/3 during each level Stage 0 For NxN multiply, an estimate for number of levels: N x (2/3) x = 2 (2/3) x = 2/N Stage 3 Stage 4 x log 2 (2/3) = log 2 (2/N) x = log 2 (2/N) / log 2 (2/3) x = [log(2) -log(n)] / -0.6 x = -1.7 x [1 -log 2 (N)] 14-Bit Adder x = 1.7*log log 2 (N) - 1.7 2 Levels for N=4, 4 Levels for N = 8, 7 levels for N = 32 7 76 Barrel Shifter Barrel Shifter Want shifter that is fast, not shift one bit position at a time Can be done in N-1=31 N shifts in log 2 N= stages using an array of N log 2 N= 160 multiplexers Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 Shifted Input Unshifted Input Shift Control 1 0 77 78

Barrel Shifter Barrel Shifter Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 79 80 Barrel Shifter Barrel Shifter Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 81 82 Barrel Shifter Barrel Shifter Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 Shift Amount S2 S1 S0 1 0 1 0 Input 1 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 83 84