Fast Arithmetic. Philipp Koehn. 19 October 2016

Similar documents
Lecture 8: Addition, Multiplication & Division

NUMBER OPERATIONS. Mahdi Nazm Bojnordi. CS/ECE 3810: Computer Organization. Assistant Professor School of Computing University of Utah

Integer Multiplication and Division

Chapter Two MIPS Arithmetic

Integer Arithmetic. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Math in MIPS. Subtracting a binary number from another binary number also bears an uncanny resemblance to the way it s done in decimal.

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: The MIPS ISA (P&H ) Consulting hours. Milestone #1 (due 1/26)

Homework 3. Assigned on 02/15 Due time: midnight on 02/21 (1 WEEK only!) B.2 B.11 B.14 (hint: use multiplexors) CSCI 402: Computer Architectures

Tailoring the 32-Bit ALU to MIPS

Arithmetic for Computers

COMP 303 Computer Architecture Lecture 6

ECE260: Fundamentals of Computer Engineering

Thomas Polzer Institut für Technische Informatik

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

EECS150 - Digital Design Lecture 13 - Combinational Logic & Arithmetic Circuits Part 3

Integer Multiplication and Division

EEC 483 Computer Organization

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Number Systems and Their Representations

CPS 104 Computer Organization and Programming

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

Timing for Ripple Carry Adder

Computer Science 61C Spring Friedland and Weaver. Instruction Encoding

Number Systems and Computer Arithmetic

We are quite familiar with adding two numbers in decimal

Arithmetic. Chapter 3 Computer Organization and Design

Review: MIPS Organization

361 div.1. Computer Architecture EECS 361 Lecture 7: ALU Design : Division

Week 7: Assignment Solutions

Arithmetic for Computers. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN

MIPS Integer ALU Requirements

Divide: Paper & Pencil

Computer Architecture. Chapter 3: Arithmetic for Computers

CS 61C: Great Ideas in Computer Architecture MIPS Instruction Formats

Chapter 3. Arithmetic Text: P&H rev

ECE331: Hardware Organization and Design

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

Chapter 3 Arithmetic for Computers

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Architecture Set Four. Arithmetic

MIPS Instruction Set

ECE 30 Introduction to Computer Engineering

ECE331: Hardware Organization and Design

More complicated than addition. Let's look at 3 versions based on grade school algorithm (multiplicand) More time and more area

EEM336 Microprocessors I. Arithmetic and Logic Instructions

Boolean Algebra. Chapter 3. Boolean Algebra. Chapter 3 Arithmetic for Computers 1. Fundamental Boolean Operations. Arithmetic for Computers

Sparse Notes on an MIPS Processor s Architecture and its Assembly Language

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers

CPE 335 Computer Organization. MIPS Arithmetic Part I. Content from Chapter 3 and Appendix B

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

F. Appendix 6 MIPS Instruction Reference

By, Ajinkya Karande Adarsh Yoga

Chapter 3 Arithmetic for Computers

Chapter 3. Arithmetic for Computers

Chapter 3. Arithmetic for Computers

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

ECE331: Hardware Organization and Design

COMPUTER ARITHMETIC (Part 1)

Chapter 5: Computer Arithmetic. In this chapter you will learn about:

Chapter 3. Arithmetic for Computers

5DV118 Computer Organization and Architecture Umeå University Department of Computing Science Stephen J. Hegner. Topic 3: Arithmetic

Binary Adders. Ripple-Carry Adder

Arithmetic and Bitwise Operations on Binary Data

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Learning Objectives. Binary over Decimal. In this chapter you will learn about:

CPE300: Digital System Architecture and Design

MIPS Instruction Reference

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

Assembly Programming

CENG3420 L05: Arithmetic and Logic Unit

MIPS Instruction Format

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

HW2 solutions You did this for Lab sbn temp, temp,.+1 # temp = 0; sbn temp, b,.+1 # temp = -b; sbn a, temp,.+1 # a = a (-b) = a + b;

Chapter 4 Section 2 Operations on Decimals

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

TDT4255 Computer Design. Lecture 4. Magnus Jahre

CS 110 Computer Architecture MIPS Instruction Formats

CENG 3420 Lecture 05: Arithmetic and Logic Unit

ECE260: Fundamentals of Computer Engineering

CS 61C: Great Ideas in Computer Architecture Running a Program - CALL (Compiling, Assembling, Linking, and Loading)

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

Computer Architecture. The Language of the Machine

xx.yyyy Lecture #11 Floating Point II Summary (single precision): Precision and Accuracy Fractional Powers of 2 Representation of Fractions

CHW 261: Logic Design

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

Computer Architecture and Organization

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Islamic University Gaza Engineering Faculty Department of Computer Engineering ECOM 2125: Assembly Language LAB

Principles of Computer Architecture. Chapter 3: Arithmetic

Adding Binary Integers. Part 5. Adding Base 10 Numbers. Adding 2's Complement. Adding Binary Example = 10. Arithmetic Logic Unit

Kinds Of Data CHAPTER 3 DATA REPRESENTATION. Numbers Are Different! Positional Number Systems. Text. Numbers. Other

MIPS Assembly Programming

LAB B Translating Code to Assembly

Chapter 5: Computer Arithmetic

Transcription:

Fast Arithmetic Philipp Koehn 19 October 2016

1 arithmetic

Addition (Immediate) 2 Load immediately one number (s0 = 2) li $s0, 2 Add 4 ($s1 = $s0 + 4 = 6) addi $s1, $s0, 4 Subtract 3 ($s2 = $s1-3 = 3) addi $s2, $s1, -3

Addition (Register) 3 Load immediately one number (s0 = 2) li $s0, 2 Add value from $s5 ($s1 = $s0 + $s5) add $s1, $s0, $s5 Subtract value from $s6 ($s2 = $s1 - $s6) sub $s2, $s1, $s6

Overflow 4 Signed integers operations: add, addi, and sub overflow triggers exceptions similar to interrupt register $mfc0 contains address of exception program Unsigned integers operations: addu, addiu, and subu no overflow handling (as in C programming language)

Code for Detecting Overflow 5 Overflow for unsigned integers operations can be detected from result Actual detection code is a bit intricate If you are interested consult Section 3.2 in Patterson/Hennessy textbook

6 fast addition

Recall: N-Bit Addition 7 011 +11 --- 110 --- 110

Recall: N-Bit Addition 8 011 +11 --- 110 --- 110 1+1 = 0, carry the 1

Recall: N-Bit Addition 9 011 +11 --- 110 --- 110 1+1+1 = 1, carry the 1

Recall: N-Bit Addition 10 011 +11 --- 110 --- 110 copy carry bit

Fast Addition 11 We defined n-bit adding as a sequential process More bits addition takes longer 32 bit addition gets very slow Faster addition: Carry Lookahead

Problem: Carry Propagation 12 1+1 addition always causes a carry 0+0 addition never causes a carry 1+1 + carry1 = 1, carry 1 1+1 + carry0 = 0, carry 1 0+0 + carry1 = 1, carry 0 0+0 + carry0 = 0, carry 0 0+1 and 1+0 addition may cause a carry 0+1 + carry1 = 0, carry 1 0+1 + carry0 = 1, carry 0

Generate and Propagate 13 Compute for each bit, if it generates or propagates carry Example Operand A 0100 1111 Operand B 0110 0001 Generate 0100 0001 Propagate 0110 1111 Carry 1001 111- Generate: Propagate: a i and b i a i or b i Carry:?

4-Bit Adder 14 First compute generate and propagate for all bits generate: g i = a i and b i propagate: p i = a i or b i Compute carries for each bit c 1 = g 0 or (p 0 and c 0 ) c 2 = g 1 or (p 1 and g 0 ) or (p 1 and p 0 and c 0 ) c 3 = g 2 or (p 2 and g 1 ) or (p 2 and p 1 and g 1 ) or (p 2 and p 1 and p 0 and c 0 ) c 4 = g 3 or (p 3 and g 2 ) or (p 3 and p 2 and g 2 ) or (p 3 and p 2 and p 1 and g 1 ) or (p 3 and p 2 and p 1 and p 0 and c 0 ) The carry computations require no recursion --- but use a lot of gates We may want to stop at 4 bits with this idea

16-Bit Adder 15 Combine 4 4-bit adders For each 4-bit adder, compute "super" propagate = P = p 0 and p 1 and p 2 and p 3 "super" generate = g 3 or (p 3 and g 2 ) or (p 3 and p 2 and g 1 ) or (p 3 and p 2 and p 1 and g 0 ) Compute super carry C j from super propagate P j and super generate G j Use C j as input carry to the 4-bit adders

Cycles 16 1. compute propagate p i and generate g i 2. compute carry c i compute super propagate P j and super generate G j 3. compute super carry C j 4. carry out all bitwise additions

Trade-Off 17 Higher n in n-bit adders more gates in circuit faster computation Modern CPUs can pack more gates on a chip speed-up at same clock speed

18 multiplication

Recall Method 19 Elementary school multiplication: Idea xxxx10101 x 1101 ---------------- 10101 0 10101 10101 ---------------- 100010001 (in decimal: 23x13 = 299) shift second operand to right (get last bit) if carry: add second operand to sum rotate first operand to left (multiply with binary 10)

Multiplication in Hardware 20 64 Multiplicant SHIFT LEFT Adder WRITE 32 Multiplyer 64 Product WRITE Control Unit SHIFT RIGHT Control unit runs microprogram loop 32 times: if lowest bit of multiplyer=1 add multiplicant to product shift multiplicant left shift multiplyer right Speed 32 iterations 3 operations each (add + shift + shift) almost 100 operations Note: multiplying 32 bit numbers may result in 64 bit product

Parallelize the 3 Operations 21 The 3 operations in each loop affect different registers add: product shift left: multiplicant shift right: multiplyer These can be executed in parallel (note: read is executed before write)

Parallelize the Iterations 22 Sum of 32 independently computed values More adders some summing can be done in parallel Binary tree log 2 32 = 5 cycles MULTI- PLICANT SHIFT RIGHT 31 MULTI- PLICANT SHIFT RIGHT 30 MULTI- PLICANT SHIFT RIGHT 29 MULTI- PLICANT SHIFT RIGHT 28 MULTI- PLICANT SHIFT RIGHT 3 MULTI- PLICANT SHIFT RIGHT 2 MULTI- PLICANT SHIFT RIGHT 1 MULTI- PLICANT AND AND AND AND AND AND AND AND Adder Adder Adder Adder Adder Adder Adder PRODUCT

MIPS Instructions 23 32 bit multiplication results in 64 bit product Special 64 bit register holds result hi: high word lo: low word Low word has to be retrieved by another instruction mult $s1, $s2 mflo $s0 Since this is the typical usage, pseudo-instruction More on that later mul $s0, $s1, $s2

24 division

Elementary School Division 25 Algorithm xxxx1011 / 10 = 10 0 101 01 011 10 1 Remainder 1. shift divisor sufficiently to the left 2. check if subtraction is possible yes add result bit 1, carry out subtraction no add result bit 0 3. pull down bit from dividend 4. shift divisor to the right not possible done, note remainder otherwise go to step 2

Algorithm Refinement 26 1. Shift divisor sufficiently to the left hard for machine to determine shift to maximum left 32 bit division: use 64 register, push 32 positions 2. Check if subtraction is possible yes add result bit 1, carry out subtraction no add result bit 0 we always carry out subtraction if overflow, do not use result 3. Pull down bit from dividend 4. Shift divisor to the right not possible done, note remainder otherwise go to step 2

Division in Hardware 27 Operations similar to multiplication shift divisor subtraction indication if subtraction should be accepted These operations can be parallelized But: iterations cannot be parallelized the same way (sophisticated prediction methods guess outcome of subtractions)

MIPS Instructions 28 32 bit division results in 32 bit quotient and 32 bit remainder hi: remainder lo: quotient Quotient has to be retrieved by another instruction div $s1, $s2 mflo $s0