Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Similar documents
Tailoring the 32-Bit ALU to MIPS

COMP 303 Computer Architecture Lecture 6

Chapter 3: Arithmetic for Computers

Binary Adders. Ripple-Carry Adder

Number Systems and Computer Arithmetic

Lecture 8: Addition, Multiplication & Division

Integer Multiplication and Division

CPE300: Digital System Architecture and Design

EE 109 Unit 6 Binary Arithmetic

Arithmetic Logic Unit. Digital Computer Design

CS/COE0447: Computer Organization

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

CS/COE0447: Computer Organization

Arithmetic Operations

ECE 30 Introduction to Computer Engineering

*Instruction Matters: Purdue Academic Course Transformation. Introduction to Digital System Design. Module 4 Arithmetic and Computer Logic Circuits

Homework 3. Assigned on 02/15 Due time: midnight on 02/21 (1 WEEK only!) B.2 B.11 B.14 (hint: use multiplexors) CSCI 402: Computer Architectures

CS/COE 0447 Example Problems for Exam 2 Spring 2011

Chapter 4. Combinational Logic

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

Microcomputers. Outline. Number Systems and Digital Logic Review

COMPUTER ARITHMETIC (Part 1)

Module 2: Computer Arithmetic


Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

Organisasi Sistem Komputer

Binary Arithmetic. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

Introduction to Field Programmable Gate Arrays

By, Ajinkya Karande Adarsh Yoga

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

Chapter 3 Arithmetic for Computers

CS 64 Week 1 Lecture 1. Kyle Dewey

Chapter 10 Binary Arithmetics

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Basic Arithmetic (adding and subtracting)

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

Semester Transition Point. EE 109 Unit 11 Binary Arithmetic. Binary Arithmetic ARITHMETIC

Week 7: Assignment Solutions

CPS 104 Computer Organization and Programming

Arithmetic Processing

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Addition and multiplication

BINARY SYSTEM. Binary system is used in digital systems because it is:

Timing for Ripple Carry Adder

CO Computer Architecture and Programming Languages CAPL. Lecture 9

361 div.1. Computer Architecture EECS 361 Lecture 7: ALU Design : Division

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Lecture 6: Signed Numbers & Arithmetic Circuits. BCD (Binary Coded Decimal) Points Addressed in this Lecture

Advanced Computer Architecture-CS501

University of Illinois at Chicago. Lecture Notes # 10

A complement number system is used to represent positive and negative integers. A complement number system is based on a fixed length representation

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Binary Multiplication

ECE331: Hardware Organization and Design

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

ECE260: Fundamentals of Computer Engineering

ECE331: Hardware Organization and Design

Computer Architecture and Organization

Arithmetic and Logical Operations

EECS150 - Digital Design Lecture 09 - Parallelism

INF2270 Spring Philipp Häfliger. Lecture 4: Signed Binaries and Arithmetic

TSEA44 - Design for FPGAs

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

More complicated than addition. Let's look at 3 versions based on grade school algorithm (multiplicand) More time and more area

Introduction to Digital Logic Missouri S&T University CPE 2210 Registers

Digital Design with FPGAs. By Neeraj Kulkarni

Arithmetic Logic Unit

DC57 COMPUTER ORGANIZATION JUNE 2013

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

Divide: Paper & Pencil

Intel Stratix 10 Variable Precision DSP Blocks User Guide

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Computer Architecture Set Four. Arithmetic

Chapter 5: Computer Arithmetic. In this chapter you will learn about:

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

Combinational Circuits

carry in carry 1101 carry carry

Learning Objectives. Binary over Decimal. In this chapter you will learn about:

Chapter 3: part 3 Binary Subtraction

Boolean Unit (The obvious way)

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Binary Addition. Add the binary numbers and and show the equivalent decimal addition.

COE 202: Digital Logic Design Number Systems Part 2. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

Arithmetic Circuits. Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak.

Integer Multiplication and Division

CHW 261: Logic Design

Principles of Computer Architecture. Chapter 3: Arithmetic

Transcription:

Outline Introduction to Structured VLSI Design Integer Arithmetic and Pipelining Multiplication in the digital domain HW mapping Pipelining optimization Joachim Rodrigues Signed and Unsigned Integers n-1 Unsigned integer: bit i 2 i i=0 Two's complement signed integer: n-2 bit n-1 (-2 n-1 ) bit i 2 i i=0 n-1 5 4 3 2 1 0 8 bit Signed/Unsigned Integers Signed overflow 128 1000 0000 127 1000 0001...... 1111 1100 1111 1101 MSB defines sign 2 1111 1110 1 1111 1111 Signed integers 0 0000 0000 0 1 0000 0001 1 2 0000 0010 2 3 0000 0011 3......... 126 0111 1110 126 Unsigned integers Signed overflow 127 0111 1111 127 1000 0000 128 1000 0001 129...... 1111 1110 254 1111 1111 255 Unsigned overflow

Add/Subtract Unsigned Overflow Examples A n 1 B A A 0 B n 1 1 B 1 0... C 0 = 0 1 C 2 S S S n 1 1 0 The HW for sum/difference (S) doesn't care about signed/unsigned Unsigned overflow = Carry out & add OR no carry-out & subtract Unsigned overflow Signed overflow = 1 True sign = S n 1 signed overflow = (A n 1 B n 1 1 ) ( 1 ) = A n 1 B n 1 C 1 106 = 16, outside [0..15] 1010 0110 C 4 =1 0000 = C 4 = 1 & add Unsigned overflow Carry-out & add Unsigned overflow 7-10 = -3, outside [0..15] 0111-1010 same as 0111 0101 1 C 4 =0 1101 = C 4 = 0 & subtract Unsigned overflow No carry-out & subtract Unsigned overflow Signed Overflow Example Multiplication 67 = 13, outside [-8..7] 0110 0111 C 4 =0 1101 C 3 = 1-1 = C 4 C 3 = 0 1 = 1 Carry-outs different Signed overflow S n-1 signed overflow = A n-1 B n-1 = A 3 B 3 C 4 = 0 0 0 = 0 True sign = Positive/zero Product = Multiplicand * Multiplier log (product) = log (multiplicand) log (multiplier) Width of product is (worst case) sum of widths of factors May overflow if single length product register is used Paper and pencil method Conditional add (controlled by bits of multiplier) and shift Partial product progressively develops into product 1 product bit/cycle Unsigned and signed multiplication Signs require extra attention Sequential, combinational or pipelined implementation Tradeoff between hardware resources, throughput, latency, power

Multiplying Using Paper and Pencil... more Paper and Pencil We will concentrate on unsigned integers for the next few slides! Example: 1011 * 1110 0000 (*0 = zero) 1011. (*1 = copy) 1011.. (*1 = copy) 1011... (*1 = copy) 10011010 In decimal: 11 * 14 = 154 Multiplicand * Multiplier Partl product Partl multiplier 1011*1110 0000 1110 0000 (0) 0000 > 00000 111 1011. (1) 1011. > 010110 11 1011.. (1) 1011.. > 1000010 1 1011... (1) 1011... 10011010 10011010 0 Multiplicand Partial prod uct, part.mul. LSB controls whether to add 0 or multiplicand to partial product Disadvantage: 2n bit ALU Advantage: n bit ALU 0: add zero, 1: add multiplicand Shifting in carry out prevents overflow Seq. Multiplication, Initialize Seq. Multiplication, Step n bit reg. Multiplicand Load Repeat step n times n bit reg. Multiplicand Add Control signal Add Conditional add 0 Multiplier 2n bit reg. bit 0 Load Partial product Partial x multiplier Shift right bit 0 2n bit reg.

Seq. Multiplication, Result n bit reg. Multiplicand Don't forget... Signed Multiplication Either transform to multiply of non negative integers: 1. Record signs and negate any negative factors. Add 2. Perform unsigned multiplication. 3. Negate product if signs above differ. Or directly perform signed multiplication: Product bit 0 2n bit reg. one partial product per clock cycle => very slow 1. Take into account the sign bit of multiplicand by shifting in true sign bits rather than carry outs, i.e. A n 1 B n 1 rather than. 2. Take into account the sign bit of multiplier by doing a conditional subtract rather than a conditional add during the last iteration. Seq. signed multiplication, step Multiplication by a Constant Repeat step n times True sign True sign n bit reg. Multiplicand Add/ sub Conditional add for iteration 1.. n 1, conditional subtract for iteration n Partial product Partial x multiplier Shift right bit 0 2n bit reg. As a designer you need to assure that division with a small constant is accomplished by a number of shifts and adds Some numerical examples: *2 (*10 2 ): multiplicand << 1 *3 (*11 2 ): multiplicand << 1 multiplicand *4 (*100 2 ): multiplicand << 2 *5 (*101 2 ): multiplicand << 2 multiplicand *255 (*11111111 2 ): multiplicand << 8 multiplicand True sign = A n 1 B n 1

String of n bit Adders Carry save Adders in Multipliers Unrolling loop lowers latency when compared to sequential add and shift at the expense of much more hardware n x n multiplication requires n 1 n bit adders Mp 2 *Mc Mp 1 *Mc Mp 0 *Mc 0 Significantly reduced delays for multi input adders Full adders with clever interconnect Sum and carries fed separately to adder at next level Carries drawn diagonally, sums drawn vertically Typically, a final (carry propagate) adder assimilates the carries t saved_latency = n*(t clk out t set up ) Mp n 1 *Mc A 0,2 B 0,2 C 0,2 A 0,1 B 0,1 C 0,1 A 0,0 B 0,0 C 0,0 CSA 0 C 1,3 S 1,2 C 1,2 S 1,1 C 1,1 S 1,0 A 1,2 A 1,1 A 1,0 C 1,0 CSA 1 P 2n 1 P 2n 2..n P n 1 P 2 P 1 P 0 C 2,3 S 2,2 C 2,2 C 2,1 S 2,1 S 2,0 6 x 6 Parallel Array Multiplier... Pipelined Version MP i, j = Multiplier i AND Multiplicand j MP 1,3 MP 0,3 MP 1,2 0 MP 0,2 MP 1,1 0 MP 1,0 MP 0,1 0 MP 0,0 MP 2,3 MP 2,2 MP 2,1 MP 2,0 MP 3,3 MP 3,2 MP 3,1 MP 3,0 Pipeline registers Pipeline registers Pipeline registers Carry propagate adder P 7 P 6 P 5 P 4 P 3 P 2 P 1 P 0

Sequential, Combinational, and Pipelined The sequential shift and add algorithm corresponds to a for loop that may be implemented by: a state machine or instructions (low end microcontrollers) The sequential algorithm may be unrolled and implemented as a deep combinational circuit: String of n bit adders and AND gates, or Carry save adders, AND gates, and final (n 1) bit adder Advantage: low latency Disadvantage: more hardware Pipelining The deep combinational circuit may be pipelined Advantage: very high throughput Disadvantages: pipeline latency, more hardware, and higher power Laundry process Comparison Non pipelined: Delay: 60 min Throughput 1/60 load per min Pipelined: Delay: 60 min Throughput k/(40k*20) load per min about 1/20 when k is large Throughput 3 times better than non pipelined Joachim Rodrigues, Informatik og Matematisk Modellering, jnr@imm.dtu.dk

Pipelined combinational circuit Adding pipeline to a comb circuit Candidate circuit for pipeline: enough input data to feed the pipelined circuit throughput is a main performance criterion comb circuit can be divided into stages with similar propagation delays propagation delay of a stage is much larger than the setup time and the clock to q delay of the register. Exercise (15 min) Recipe Pipeline two 4 bit adders which are connected in series. The FFs are ideal(t setup = t clk >Q =0) t pa = 400 ps. The carry out of the 2nd adder can be ignored. How many pipeline stages? Where do you put the FFs? What s the gain in throughput? How many FFs are required? a 0 b 0 a 1 b 1 a 2 b 2 a 3 b 3 s 0p s 1p s 2p s 3p c 0 c 1 c 2 c 3 s 0 s 1 s 2 s 3 Derive the block diagram of the original combinational circuit and arrange the circuit as a cascading chain Identify the major components and estimate the relative propagation delays of these components Divide the chain into stages of similar propagation delays Identify the signals that cross the boundary of the chain Insert registers for these signals in the boundary. c 3 Joachim Rodrigues, Informatik og Matematisk Modellering, jnr@imm.dtu.dk

Datapath Datapath Sequential part RTL description is characterized by registers in a design, and the combinational logic inbetween. This can be illustrated by a "register and cloud" diagram. Registers and the combinational logic are described separately in two different processes. architecture SPLIT of DATAPATH is signal X1, Y1, X2, Y2 :... begin seq : process (CLK) begin if (CLK'event and CLK = '1') then X1 <= Y0; X2 <= Y1; X3 <= Y2; end if; end process; Datapath Combinatorial part Pipelining LOGIC : process (X1, X2) begin - F(X1) and G(X2) can be replaced with the code - implementing the desired combinational logic - or appropriate functions must be defined. Y1 <= F(X1); Y2 <= G(X2); end process; end SPLIT; The instructions on the preceeding slides introduced pipelining of the DP. The critical path is reduced from F(X1) G(X2) to the either F(X1) or G(X2). Do not constraint the synhtesis tool by splitting operations, e.g., y1=x1x1 2.