EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Similar documents
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Digital Computer Arithmetic

Array Multipliers. Figure 6.9 The partial products generated in a 5 x 5 multiplication. Sec. 6.5

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Partial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M

Multi-Operand Addition Ivor Page 1

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

VARUN AGGARWAL

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

MULTIPLE OPERAND ADDITION. Multioperand Addition

ECE 341. Lecture # 6

Timing for Ripple Carry Adder

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

Analysis of Different Multiplication Algorithms & FPGA Implementation

CPE300: Digital System Architecture and Design

Week 7: Assignment Solutions

Arithmetic Logic Unit

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE 109 Unit 6 Binary Arithmetic

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Chapter 3: part 3 Binary Subtraction

Integer Multipliers 1

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

University of Illinois at Chicago. Lecture Notes # 10

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Hybrid Signed Digit Representation for Low Power Arithmetic Circuits

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Jan Rabaey Homework # 7 Solutions EECS141

ECE331: Hardware Organization and Design

MULTIPLICATION TECHNIQUES

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

Design of Arithmetic Units ECE152B AU 1

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

By, Ajinkya Karande Adarsh Yoga

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Effective Improvement of Carry save Adder

Chapter 4. Combinational Logic

Computer Architecture and Organization

A novel technique for fast multiplication

RADIX-4 AND RADIX-8 MULTIPLIER USING VERILOG HDL

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

Adders, Subtracters and Accumulators in XC3000

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

EXPERIMENT #8: BINARY ARITHMETIC OPERATIONS

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

ECE 30 Introduction to Computer Engineering

Binary Adders. Ripple-Carry Adder

Topics. 6.1 Number Systems and Radix Conversion 6.2 Fixed-Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point Arithmetic

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

9/6/2011. Multiplication. Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Chapter 4 Arithmetic Functions

Integer Encoding and Manipulation

Binary Multiplication

ECE 341. Lecture # 7

II. MOTIVATION AND IMPLEMENTATION

Semester Transition Point. EE 109 Unit 11 Binary Arithmetic. Binary Arithmetic ARITHMETIC

COMPUTER ORGANIZATION AND ARCHITECTURE

More complicated than addition. Let's look at 3 versions based on grade school algorithm (multiplicand) More time and more area

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

Principles of Computer Architecture. Chapter 3: Arithmetic

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Chapter 10 - Computer Arithmetic

Chapter 3 Part 2 Combinational Logic Design

COMPUTER ARITHMETIC (Part 1)

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

AT Arithmetic. Integer addition

Paper ID # IC In the last decade many research have been carried

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017


Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Embedded Systems Ch 15 ARM Organization and Implementation

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Experiment 7 Arithmetic Circuits Design and Implementation

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

DESIGN OF RADIX-8 BOOTH MULTIPLIER USING KOGGESTONE ADDER FOR HIGH SPEED ARITHMETIC APPLICATIONS

CS/COE0447: Computer Organization

Outline. Combinational Circuit Design: Practice. Sharing. 2. Operator sharing. An example 0.55 um standard-cell CMOS implementation

Combinational Circuit Design: Practice

CS/COE0447: Computer Organization

ECE 341 Midterm Exam

Tailoring the 32-Bit ALU to MIPS

A New Architecture for 2 s Complement Gray Encoded Array Multiplier

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Transcription:

EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1

Array Multipliers The two basic operations - generation and summation of partial products - can be merged, avoiding overhead and speeding up multiplication Iterative array multipliers (or array multipliers) consist of identical cells, each forming a new partial product and adding it to previously accumulated partial product Gain in speed obtained at expense of extra hardware Can be implemented so as to support a high rate of pipelining Koren Part.6c.2

Illustration - 5 x 5 Multiplication Straightforward implementation - Add first 2 partial products (a4x0, a3x0,,a0 x0 and a4x1, a3x1,,a0x1) in row 1 after proper alignment The results of row 1 are then added to a4x2, a3x2,,a0x2 in row 2, and so on Koren Part.6c.3

5 x 5 Array Multiplier for Unsigned Numbers Basic cell - FA accepting one bit of new partial product aixj + one bit of previously accumulated partial product + carry-in bit No horizontal carry propagation in first 4 rows - carry-save type addition - accumulated partial product consists of intermediate sum and carry bits Last row is a ripple-carry adder - can be replaced by a fast 2-operand adder (e.g., carry-look-ahead adder) Koren Part.6c.4

Array Multiplier for Two s Complement Numbers Product bits like a4x0 and a0x4 have negative weight Should be subtracted instead of added Koren Part.6c.5

Type I and II Cells Type I cells: 3 positive inputs - ordinary FAs Type II cells: 1 negative and 2 positive inputs Sum of 3 inputs of type II cell can vary from -1 to 2 c output has weight +2 s output has weight -1 Arithmetic operation of type II cell - II s and c outputs given by Koren Part.6c.6

Type I and II Cells Type II' cells: 2 negative inputs and 1 positive Sum of inputs varies from -2 to 1 c output has weight -2 s output has weight +1 Type I' cell: all negative inputs - has negatively weighted c and s outputs II Counts number of -1's at its inputs - represents this number through c and s outputs Same logic operation as type I cell - same gate implementation Similarly - types II and II' have the same gate implementation Koren Part.6c.7

Booth s Algorithm Array Multiplier For two's complement operands n rows of basic cells - each row capable of adding or subtracting a properly aligned multiplicand to previously accumulated partial product Cells in row i perform an add, subtract or transfer-only operation, depending on xi and reference bit 4-bit operands Controlled add/ subtract/shift (CASS) Koren Part.6c.8

Controlled add/subtract/shift - CASS H and D: control signals indicating type of operation H=0: no arithmetic operation done H=1: arithmetic operation performed - new Pout Type of arithmetic operation indicated by D signal D=0: multiplicand bit, a, added to Pin with cin as incoming carry - generating Pout and cout as outgoing carry D=1: multiplicand bit, a, subtracted from Pin with incoming borrow and outgoing borrow Pout=Pin (a H) (cin H) cout=(pin D)(a+cin) + a cin Alternative: combination of multiplexer (0, +a and -a) and FA H and D generated by CTRL - based on xi and reference bit x{i-1} Koren Part.6c.9

Booth s Algorithm Array Multiplier - details First row - most significant bit of multiplier Resulting partial product need be shifted left before adding/subtracting next multiple of multiplicand A new cell with input Pin=0 is added Number of bits in partial product increases by one each row - expand multiplicand before adding/subtracting it Accomplished by replicating sign bit of multiplicand Koren Part.6c.10

Properties and Delay Cannot take advantage of strings of 0's or 1's - cannot eliminate or skip rows Only advantage: ability to multiply negative numbers in two's complement with no need for correction Operation in row i need not be delayed until all upper (i-1) rows have completed their operation P0, generated after one CASS delay (plus delay of CTRL), P1 generated after two CASS delays, and P{2n-2}, generated after (2n-1) CASS delays Similarly can implement higher-radix multiplication requiring less rows Building block: multiplexer-adder circuit that selects correct multiple of multiplicand A and adds it to previously accumulated partial product Koren Part.6c.11

Pipelining Important characteristic of array multipliers - allow pipelining Execution of separate multiplications overlaps The long delay of carry-propagating addition must be minimized Achieved by replacing CPA with several additional rows - allow carry propagation of only one position between consecutive rows To support pipelining, all cells must include latches - each row handles a separate multiplier-multiplicand pair Registers needed to propagate multiplier bits to their destination, and propagate completed product bits Koren Part.6c.12

Pipelined Array Multiplier Koren Part.6c.13

Optimality of Multiplier Implementations Bounds on performance of algorithms for multiplication Theoretical bounds for multiplication similar to those for addition Adopting the idealized model using (f,r) gates: Execution time of a multiply circuit for two operands with n bits satisfies Tmult log f 2n If residue number system is employed: Tmult log f 2m m - number of digits needed to represent largest modulus in residue number system Koren Part.6c.14

Optimal Implementations Need to compare performance (execution time) and implementation costs (e.g., regularity of design, total area, etc.) Objective function like A T can be used A - area and T - execution time α A more general objective function: A T α can be either smaller or larger than 1 Koren Part.6c.15

Basic Array Multiplier Very regular structure - can be implemented as a rectangular-shaped array - no waste of chip area n least significant bits of final product are produced on right side of rectangle; n most significant bits are outputs of bottom row of rectangle Highly regular and simple layout but has two drawbacks: Requires a very large area, proportional to n², since it contains about n² FAs and AND gates Long execution time T of about 2 n FA ( FA - delay of FA) More precisely, T consists of (n-1) FA for first (n-1) rows and (n-1) FA for CPA (ripple-carry adder) AT is proportional to n³ Koren Part.6c.16

Pipelined & Booth Array Multipliers Required area increases even further (CPA replaced) Latency of a single multiply operation increases However, pipeline period ( pipeline rate) shorter Booth based array multiplier offers no advantage A - order of n² and T - linear in n Radix-4 Booth can potentially be better - only n/2 rows - could reduce T and A by factor of two However, actual delay & area higher - recoding logic and, more importantly, partial product selectors, add complexity & interconnections - longer delay per row Also, since relative shift between adjacent rows is two bits, must allow carry to propagate horizontally Can be achieved locally or in last row - then carry propagation through 2n-1 bits (instead of n-1) Exact reduction depends on design and technology Koren Part.6c.17

Radix-8 Booth & CSA Tree Similar problems with radix-8 Booth's array multiplier In addition, 3A should be precalculated Reduction in delay and area may be less than expected 1/3 Still, may be cost-effective in certain technologies and design styles Partial products can be accumulated using a cascade or a tree structure with shorter execution time But CSA tree structures have irregular interconnects - no area-efficient layout with a rectangular shape Moreover - overall width 2n usually required - multiplier area of order 2n log k AT may increase as 2n log² k Koren Part.6c.18

Delay of Balanced Delay Tree Balanced delay tree - more regular structure Increments in number of operands - 3,3,5,7,9... 2 Sum of series - order of k=j (j - number of elements in series, k - number of operands) Number of levels - determines overall delay - linear in j= k Compare to log k - number of levels in complete binary tree Proof: exercise Above expressions - theoretical, limited practical significance Detailed analysis of alternative designs is necessary for specific technology Koren Part.6c.19