Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

Similar documents
EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

MULTIPLICATION TECHNIQUES

Chapter 3 Part 2 Combinational Logic Design

9/6/2011. Multiplication. Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Adders, Subtracters and Accumulators in XC3000

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Arithmetic Logic Unit. Digital Computer Design

ECE 30 Introduction to Computer Engineering

Tailoring the 32-Bit ALU to MIPS

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Chapter 3 Arithmetic for Computers

Computer Architecture Set Four. Arithmetic

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Design of Arithmetic Units ECE152B AU 1

University of Illinois at Chicago. Lecture Notes # 10

Lecture 19: Arithmetic Modules 14-1

CPE300: Digital System Architecture and Design

T insn-mem T regfile T ALU T data-mem T regfile

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

This Unit: Arithmetic. CIS 371 Computer Organization and Design. Readings. Pre Class Exercise

CPE 335 Computer Organization. MIPS Arithmetic Part I. Content from Chapter 3 and Appendix B

Lecture 5. Other Adder Issues

Lecture 3: Binary Subtraction, Switching Algebra, Gates, and Algebraic Expressions

Addition and multiplication

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

ECE 341 Midterm Exam

Digital Computer Arithmetic

By, Ajinkya Karande Adarsh Yoga

Outline. Combinational Circuit Design: Practice. Sharing. 2. Operator sharing. An example 0.55 um standard-cell CMOS implementation

Combinational Circuit Design: Practice

Binary Adders: Half Adders and Full Adders

ELCT 501: Digital System Design

Parallel logic circuits

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Chapter 3: part 3 Binary Subtraction

11.1. Unit 11. Adders & Arithmetic Circuits

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Week 7: Assignment Solutions

EE 109 Unit 6 Binary Arithmetic

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

Timing for Ripple Carry Adder

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

ECE 341 Midterm Exam

ECE 341. Lecture # 6

Let s put together a Manual Processor

Binary Adders. Ripple-Carry Adder

ECE331: Hardware Organization and Design

CS/COE 0447 Example Problems for Exam 2 Spring 2011

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Each DSP includes: 3-input, 48-bit adder/subtractor

A novel technique for fast multiplication

Integer Multiplication and Division

Implementation of Double Precision Floating Point Multiplier Using Wallace Tree Multiplier

PESIT Bangalore South Campus

Microcomputers. Outline. Number Systems and Digital Logic Review

CO Computer Architecture and Programming Languages CAPL. Lecture 9

Mark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs

REGISTER TRANSFER LANGUAGE

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 352 Digital System Fundamentals.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Computer Organization and Levels of Abstraction

Chapter 3 Part 2 Combinational Logic Design

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

Combinational Circuits

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

EKT 422/4 COMPUTER ARCHITECTURE. MINI PROJECT : Design of an Arithmetic Logic Unit

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: The MIPS ISA (P&H ) Consulting hours. Milestone #1 (due 1/26)

Chapter 4. Combinational Logic

Combinational Circuit Design

VLSI Design 9. Datapath Design

Binary Addition. Add the binary numbers and and show the equivalent decimal addition.

Arithmetic and Logical Operations

ECE468 Computer Organization & Architecture. The Design Process & ALU Design

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

Chapter 6 Combinational-Circuit Building Blocks

Writing Circuit Descriptions 8

Review: MIPS Organization

Final Exam Solution Sunday, December 15, 10:05-12:05 PM

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

Experiment 7 Arithmetic Circuits Design and Implementation

Lecture Topics ECE 341. Lecture # 6. CLA Delay Calculation. CLA Fan-in Limitation

COMP 303 Computer Architecture Lecture 6

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

Transcription:

Overview EE 15 - omponents and Design Techniques for Digital ystems Lec 16 Arithmetic II (Multiplication) Review of Addition Overflow Multiplication Further adder optimizations for multiplication LA in the large parallel prefix David uller Electrical Engineering and omputer ciences University of alifornia, Berkeley http://www.eecs.berkeley.edu/~culler http://inst.eecs.berkeley.edu/~cs15 Review ircuit design for unsigned addition Full adder per bit slice Delay limited by arry Propagation» Ripple is algorithmically slow, but wires are short arry select imple, resource-intensive Excellent layout arry look-ahead Excellent asymptotic behavior Great at the board level, but wire length effects are significant on chip Digital number systems How to represent negative numbers imple operations lean algorithmic properties 2omplement is most widely used ircuit for unsigned arithmetic ubtract by complement and carry in Overflow when cin xor cout of sign-bit is 1 omputer Number ystems Positional notation D n-1 D n-2 D represents D n-1 B n-1 D n-2 B n-2 D B where D i {,, B-1 } 2s omplement D n-1 D n-2 D represents: - D n-1 2 n-1 D n-2 2 n-2 D 2 MB has negative weight 5-4 -5-6 -3-7 -2 111 11 111 11 111 11-1 1111 1 1 1 1 11 2 3 1 4 11 5 11 111 6 7

2s omplement Overflow 2omp. Overflow Detection How can you tell an overflow occurred? Add two positive numbers to get a negative number or two negative numbers to get a positive number -1-1 -2 1111 1-2 1111 111 1 111-3 111 2-3 1 111-4 -4 11 11 3 11-5 111-5 1 111 4 11-6 11 11-6 5 11 11 11-7 1 111 6-7 1 7 1 1 2 1 11 3 1 4 11 111 7 11 6 5 5 3 Overflow 5 2 7 No overflow 1 1 1 1 1 1 1 1 1 1 1 1 1 1-7 -2 7 Overflow -3-5 No overflow 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 3 =! -7-2 = 7! Overflow occurs when carry in to sign does not equal carry out 2s omplement Adder/ubtractor Adders on the Xilinx Virtex A 3 B 3 B 3 A 2 B 2 B 2 A 1 B 1 B 1 A B B 1 el 1 el 1 el 1 el O I O I O I O I 3 2 1 Overflow A - B = A (-B) = A B 1 Add/ubtract Dedicated carry logic provides fast arithmetic carry capability for highspeed arithmetic functions. The Virtex-E LB supports two separate carry chains, one per lice. The height of the carry chains is two bits per LB. The arithmetic logic includes an XOR gate and AND gate that allows a 2- bit full adder to be implemented within a slice. in to out delay =.1ns, versus.4ns for F to X delay. How do we map a 2-bit adder to one slice?

Time / pace (resource) Trade-offs arry select and LA utilize more silicon to reduce time. an we use more time to reduce silicon? How few FAs does it take to do addition? Bit-serial Adder n-bit shift registers A B lsb reset FF FA A, B, and R held in shiftregisters. hift right once per clock cycle. Reset is asserted by controller. n-bit shift register c s R Addition of 2 n-bit numbers: takes n clock cycles, uses 1 FF, 1 FA cell, plus registers the bit streams may come from or go to other circuits, therefore the registers may be optional. Requireontroller What does the FM look like? Implemented? Final carry out? Discussion What is sign extension and why does it work? Where is addition used in the project? Where might you want more powerful arithmetic operations? Announcements Reading: 5.8 (4 pages!) Digital Design in the news from UB U Berkeley is among six universities to be part of the program started by IBM orp. and Google Inc. on college campuses to promote computer-programming techniques for clusters of processors known as "clouds". loud computing allowomputers in remote data centers to run parallel, increasing their processing power. Each company will spend between $2 million and $25 million for hardware, software and services that can be used by computer-science professors and students.

Basic concept of multiplication ombinational Multiplier: accumulation of partial products multiplicand multiplier Partial products 111 (13) * 111 (11) 111 111 A3 B2 A3 B1 A2 B2 A3 B3 A2 B A2 B1 A1 B2 A2 B2 A2 B A1 B1 2 A1 B1 A1 B 1 A B 111 A3 B3 A2 B3 A1 B3 3 11111 (143) 7 6 5 4 3 2 1 product of 2 n-bit numbers is an 2n-bit number sum of n n-bit partial products unsigned Array Multiplier Generates all n partial products simultaneously. b3 b2 b1 b P7 P6 P5 P4 a a1 a2 a3 P P1 P2 P3 Each row: n-bit adder with AND gates carry out b j FA sum in sum out a i carry in What is the critical path? hift and Add Multiplier n-bit adder 1 P B n-bit shift registers A n-bit register ost α n, Τ = n clock cycles. What is the critical path for determining the min clock period? ums each partial product, one at a time. In binary, each partial product is shifted versions of A or. ontrol Algorithm: 1. P, A multiplicand, B multiplier 2. If LB of B==1 then add A to P else add 3. hift [P][B] right 1 4. Repeat steps 2 and 3 n-1 times. 5. [P][B] has product.

arry-save Addition peeding up multiplication is a matter of speeding up the summing of the partial products. arry-save addition can help. arry-save addition passes (saves) the carries to the output, rather than propagating them. carry-save add carry-propagate add Example: sum three numbers, 3 1 = 11, 2 1 = 1, 3 1 = 11 3 1 11 2 1 1 c 1 = 4 1 s 1 = 1 1 3 1 11 c 1 = 2 1 s 11 = 6 1 1 = 8 1 carry-save add In general, carry-save addition takes in 3 numbers and produces 2. Whereas, carry-propagate takes 2 and produces 1. With this technique, we can avoid carry propagation until final addition arry-save ircuits A c FA FA FA FA FA FA FA FA When adding sets of numbers, carry-save can be used on all but the final sum. tandard adder (carry propagate) is used for final sum. x 2 x 1 x A A A PA Array Mult. using arry-save Addition b3 b2 b1 b P7 P6 P5 P4 1 a a1 a2 a3 P P1 P2 P3 carry out Fast carrypropagate adder b j FA sum in sum out a i carry in Another Representation um In X in Y F A O I out um Out Add PA Building block: full adder and A3 A2 A1 A B A3 B A2 B A1 B B1 A3 B1 A2 B1 A1 B1 1 B2 A3 B2 A2 B2 A1 B2 2 B3 A3 B3 A2 B3 A1 B3 3 P7 P6 P5 P4 P3 P2 P1 P 4 x 4 array of building blocks

arry-save Addition A is associative and commutative. For example: (((X X 1 )X 2 )X 3 ) = ((X X 1 )(X 2 X 3 )) x 7 x 6 x 5 x 4 x 3 x 2 x 1 x A balanced tree can be used to A A reduce the logic delay. igned Multiplier igned Multiplication: Remember for 2 omplement numbers MB has negative weight: N 2 i X = x i 2 x i= n 1 2 n 1 A A A A PA log 2 N log 3/2 N This structure is the basis of the Wallace Tree Multiplier. Partial products are summed with the A tree. Fast PA (ex: LA) is used for final sum. Multiplier delay α log 3/2 N log 2 N ex: -6 = 111 2 = 2 1 2 1 2 2 1 2 3-1 2 4 = 2 8-16 = -6 Therefore for multiplication: a) subtract final partial product b) sign-extend partial products Modifications to shift & add circuit: a) adder/subtractor b) sign-extender on P shifter register igned multiplication igned Array Multiplier multiplicand multiplier - * 111 (-3) 111 (-5) 1111111 111 111 1111 (-3) (-6) -(-24) Note: 2omplement ign extension 1111 (15) product of 2 n-bit numbers is an 2n-bit number sum of n n-bit partial products Implicit ign extension b3 b2 b1 b - - - - a a1 a2 a3 P P1 P2 P3 P7 P6 P5 P4

hift and Add igned Multiplier arry Look-ahead Adders n-bit adder 1 P B n-bit shift registers A n-bit register igned extend partial product at each stage Final step is a subtract In general, for n-bit addition best we can achieve is delay α log(n) How do we arrange this? (think trees) First, reformulate basic adder stage: a b c i c i1 s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 carry kill k i = a i b i carry propagate p i = a i b i carry generate g i = a i b i c i1 = g i p i c i s i = p i c i arry Look-ahead Adders in blocks c arry Look-ahead Adders Group propagate and generate signals: p i g i p i1 g i1 c in P = p i p i1 p ik G = g ik p ik g ik-1 (p i1 p i2 p ik )g i a b a 1 b 1 a 2 b 2 a P a G a c 3 = G a P a c 9-bit Example of hierarchically generated P and G signals: P = P a P b P c p ik g ik c out a 3 b 3 a 4 b 4 a 5 b 5 b P b G b P true if the group as a whole propagates a carry to c out G true if the group as a whole generates a carry out = G P in Group P and G can be generated hierarchically. a 6 b 6 a 7 b 7 a 8 b 8 c c 6 = G b P b c 3 P c G c c 9 = G Pc G = G c P c G b P b P c G a

Parallel Prefix (generalizing LA) 76 74 3 7 3 74 3 3 54 54 32 1 1 B x BA BAx BA A Ax a b s c c 1 a 1 b 1s1 a 2 b 2s2 c 3 a 3 b 3s3 a 4 b 4s4 p,g c c 2 P,G c c 4 c a i b isi P,G c i c i1 p,g p = a b g = ab s = p c i c i1 = g c i p 8-bit arry Lookahead Adder 76 54 32 1 c 5 a 5 b 5s5 c c 8 c in 7 6 54 3 1 4 2 6 5 4 3 2 76 74 64 54 32 3 2 7 6 5 4 1 1 a 6 b 6s6 c 7 a 7 b 7s7 c 6 P a,g a P b,g b c out P,G P = P a P b G = G b G a P b out = G c in P ompute all the prefixes F i = F i-1 op F i-2 op op F Assume associative and commutative ummary 2 complement number systems Algebraic and corresponding bit manipulations Overflow detection ignficance of sign bit -2 n-1 arry look ahead is form a parallel prefix Time / pace tradeoffs Bit serial adder Binary Multiplication algorithm Array multiplier erial multiply (with bit parallel adder) igned multiplication ign extend multipicand ign bit of multiplier treated as subtract