Partial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M

Similar documents
EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Digital Computer Arithmetic

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

CPE300: Digital System Architecture and Design

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

Chapter 4 Arithmetic Functions

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Chapter 3: part 3 Binary Subtraction

Number System. Introduction. Decimal Numbers

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Integer Multipliers 1

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

COMBINATIONAL LOGIC CIRCUITS

Real Digital Problem Set #6

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

MULTIPLICATION TECHNIQUES

carry in carry 1101 carry carry

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Design and Implementation of Advanced Modified Booth Encoding Multiplier

A New Architecture for 2 s Complement Gray Encoded Array Multiplier

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Slide Set 1. for ENEL 339 Fall 2014 Lecture Section 02. Steve Norman, PhD, PEng

Binary Adders: Half Adders and Full Adders

IA Digital Electronics - Supervision I

VARUN AGGARWAL

A Simple Method to Improve the throughput of A Multiplier

PESIT Bangalore South Campus

Paper ID # IC In the last decade many research have been carried

Array Multipliers. Figure 6.9 The partial products generated in a 5 x 5 multiplication. Sec. 6.5

Jan Rabaey Homework # 7 Solutions EECS141


Microcomputers. Outline. Number Systems and Digital Logic Review

Chapter 4. Combinational Logic

EE 486 Winter The role of arithmetic. EE 486 : lecture 1, the integers. SIA Roadmap - 2. SIA Roadmap - 1

University of Illinois at Chicago. Lecture Notes # 10

Arithmetic Operations

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

II. MOTIVATION AND IMPLEMENTATION

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Efficient Radix-10 Multiplication Using BCD Codes

World Inside a Computer is Binary

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS

Effective Improvement of Carry save Adder

Topics. 6.1 Number Systems and Radix Conversion 6.2 Fixed-Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point Arithmetic

SE311: Design of Digital Systems

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

COMPUTER ORGANIZATION AND ARCHITECTURE

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY ECE-2700: Digital Logic Design Winter Notes - Unit 4. hundreds.

CMPE223/CMSE222 Digital Logic Design. Positional representation

Arithmetic Processing


Chapter 10 Binary Arithmetics


INF2270 Spring Philipp Häfliger. Lecture 4: Signed Binaries and Arithmetic

High Throughput Radix-D Multiplication Using BCD

Chapter 6 Combinational-Circuit Building Blocks

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

MULTIPLE OPERAND ADDITION. Multioperand Addition

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Week 7: Assignment Solutions

EE 8351 Digital Logic Circuits Ms.J.Jayaudhaya, ASP/EEE

Lecture 6: Signed Numbers & Arithmetic Circuits. BCD (Binary Coded Decimal) Points Addressed in this Lecture

Analysis of Different Multiplication Algorithms & FPGA Implementation

Signed integers: 2 s complement. Adder: a circuit that does addition. Sign extension 42 = = Arithmetic Circuits & Multipliers

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

Representation of Numbers

Number Systems (2.1.1)

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY ECE-278: Digital Logic Design Fall Notes - Unit 4. hundreds.

Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park. Design Technology Infrastructure Design Center System-LSI Business Division

CS/COE0447: Computer Organization

Optimized Modified Booth Recorder for Efficient Design of the Operator

CS/COE0447: Computer Organization

Combinational Circuits

Addition and multiplication

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

ECE 341. Lecture # 7

A New Pipelined Array Architecture for Signed Multiplication

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

Arithmetic Circuits & Multipliers

CHW 261: Logic Design

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Arithmetic and Logical Operations


High Speed Special Function Unit for Graphics Processing Unit

Lecture 19: Arithmetic Modules 14-1

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Chapter 3 Arithmetic for Computers

CS/COE 0447 Example Problems for Exam 2 Spring 2011

Transcription:

TSTE8 igital Arithmetic Seminar 4 Oscar Gustafsson Multiplication Multiplication can typically be separated into three sub-problems Generating partial products Adding the partial products using a redundant number representation Converting the redundant number representation to non-redundant form commonly using a vector merging adder, VMA) Multiplication Partial product accumulation inal adder Squarers Constant multiplication Multiplying two unsigned binary numbers X and Y can be written as Z = XY = M xi i M j= yj j = M M xiyj ij The resulting partial product array is

For two s complement we can, by using that the sign-bit should be subtracted, write Z = XY = xm M xi i ) ym M = xmym M) xm M ym M xi i M M j= j= xiyj ij yj j j= yj j ) The resulting partial product array is To add the partial products we must make all rows have the same length or add them sequentially) Zero extend at LSB side Sign extend at MSB side Adding zeros can be ignored, but additional circuits are required to deal with the sign-bits Instead, use the identity p = p modified Baugh-Wooley) Replace all negative partial products by their inverse Z = Y = xmym M) j= xmyj ) jm ymxi ) im j= xi yj ij = xmym M) j= xi yj ij j= xmyj jm M) M ymxi im

Possible to add partial products with positive weights and then add the compensation M) M = 000...0000... The resulting partial product array is A standard multiplier would typically have M rows to generate and add If we can change to a signed-digit representation we know that there can be quite a bit of zeros However, the CS representation require a conversion with carry-propagation Another encoding, Modified Booth encoding, have a fully parallel conversion The modified Booth encoding has at most M non-zeros, but on average it has more digits compared to minimum signed-digit representations like CS Signed digit representations We only need to add non-zero partial products Representation with more zeros? The binary signed-digit representation, i.e., r =, xi 0 } is a redundant number representation 7 = 0 = 00, where = 5 = 000 = 00 00 = 000 0 =... An S representation with minimum number of non-zero digits is called a minimum S MS) representation In particular, if xixi = 0, i.e., no two adjacent digits are both non-zero, the representation is minimum This particular MS is called canonic =unique, i.e., no longer redundant) S CS) representation For CS the average number of non-zeros is M instead of M for binary More importantly, the maximum number of non-zeros is M instead of M for binary The modified Booth encoding can be seen as a radix-4 signed-digit representation with the recoded digits xi 0 } The encoding is based on three bits, but it only replaces two xk xk xk rk dkdk escription 0 0 0 0 00 String of zeros 0 0 0 End of ones 0 0 0 Single one 0 0 End of ones 0 0 0 Start of ones 0 0 Start and end of ones 0 0 Start of ones 0 00 String of ones Example: recode 7

Only half the number of partial product rows are now required Must be one bit wider to allow shifts Also, to possibly subtract a row, a partial product for adding the LSB must be included Resulting partial products array Possible to define Booth encoding for higher radices For radix-8 the digits are xi 4 0 4} Multiplication by require one adder in the pre-computation part The number of rows are now M instead of M for radix-4 Example: derive a truth table for Radix-8 Booth encoding One possible realization of the modified radix-4 Booth encoding partial product generation Partial product accumulation Three general ways of adding the partial products Sequential some of the partial products are added in each cycle Array regular structure Tree smallest logic depth but irregular structure

Partial product accumulation Sequential Add one row or column in each cycle, shift the result and add one more row or column Five-bits row-wise sequential accumulator ----x 0.x x x x 4 x 5 a 0 a a a a 4 Sign-Ext. & & & & & W c x 0 x 0 x 0 x 0 x 0.x x x x 4 x 5 W f W c y Commonly used structure for bit-serial/parallel multipliers The partial product accumulation tree should add a number of bits Convenient to represent these using dot diagrams Example of partial products from a 6 6 multiplication Partial product accumulation Array Use an array of almost identical cells for generation and accumulation of the partial products Array multiplier with multiplication time proportional to W x 0* y 0 x * y 0 x * y 0 x * y x * y x * y x * y x 0* y x * y x * y x * y x 0* y x * y 0 x * y 0 x * y 0 x 0* y 0 don t care p p 0 p p p p 4 p 5 p 6 Array accumulation gives high regularity, but at the same time the delay increase linearly with the wordlength Instead, we can just add all the partial products using simple building blocks such as half and full adders A full adder will take three partial products and output two partial products A full adder will take two partial products and output two partial products

The approach to add all partial products as fast as possible by first adding as many full adders as possible and then as many half adders as possible is known as the Wallace multiplier The drawback of the Wallace multiplier is the excessive use of half adders Half adders does not decrease the number of partial products, it only moves them to different columns Stage Stage Stage Stage 4 adda tree Stage Stage 4 An alternative approach is to try to balance the height of the columns first to later reduce as much as possible This approach is called the adda multiplier The sequence of wanted number of partial products is the same as the maximum number of partial products that can be reduced using a given number of levels Levels Partial products 4 6 4 9 5 6 9 7 8 8 4 The drawback is that it hardly use any half adders and, hence, the carry-propagation may become longer The reduced area tree tries to use the best of both algorithms, fastest possible reduction, but considering half adders Works as follows. Use as many full adders as possible for each level. Use half adders if a) will result in a number of bits equal to the adda sequence or b) to reduce the right-most column containing exactly two bits Stage Stage Stage Stage 4

Example reduction steps of the discussed algorithms: a) Wallace, b) adda, and c) Reduced area Tree structure Full adders Half adders VMA length Wallace 6 8 adda 5 5 0 Reduced area 8 5 7 Counters and compressors A compressor does not produce a valid binary count of the number of input bits Instead, it reduces the number of bits while having several input and output carries However, the output carries does not depend on the input carries A commonly used compressor is the 4: compressor as illustrated below Counters and compressors A full adder counts three bits and produce a result with two bits Hence, this can be called a :-counter Similarly, a half adder can be called a :-counter This is useful when looking at alternative cells to accumulate partial products A n : k counter adds n bits and produce a result with k bits Clearly k log n ) Counters and compressors The 4: compressor implemented using full adders may not have any significant advantages However, there are other realizations possible These should satisfy x x x x4 cin = s c cout ) with cout independent of cin Example: determine Boolean equations for a 4: compressor

Vector merging adder Most partial products accumulation schemes results in a redundant representation This must be converted to non-redundant form, typically by addition All carry-propagation schemes from earlier seminars can be used Vector merging adder Often, the arrival time of the bits differ Shown below are the arrival times for the bits of the earlier accumulation schemes assuming unity delay for full adders Wallace tree elay 0 0 4 6 8 0 Bit position adda tree elay 0 0 4 6 8 0 Bit position Reduced area tree elay Fixed width multiplication Multiplying two N-bit numbers results in a N-bit number Often, only N bits are of interest This can be utilized to simplify the multiplication Remove less significant partial products and possibly compensate for the error Commonly referred to as a fixed-width multiplication Example: esign a 4 4-bit unsigned fixed-width multiplier with 5-bit output and static compensation 0 0 4 6 8 0 Bit position Can use this information to optimize the vector merging adder