ECE 154A Introduction to. Fall 2012

Similar documents
3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Pipelining. CSC Friday, November 6, 2015

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Processor (II) - pipelining. Hwansoo Han

Full Datapath. Chapter 4 The Processor 2

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4. The Processor

Chapter 4. The Processor

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

ECE331: Hardware Organization and Design

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

LECTURE 3: THE PROCESSOR

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN

Computer Architecture. Lecture 6.1: Fundamentals of

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

Full Datapath. Chapter 4 The Processor 2

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Floating Point Arithmetic

ECE260: Fundamentals of Computer Engineering

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Lecture 7 Pipelining. Peng Liu.

Chapter 4 (Part II) Sequential Laundry

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

COMPUTER ORGANIZATION AND DESI

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Computer Architecture Review. Jo, Heeseung

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Chapter 4. The Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CSCI 402: Computer Architectures. Arithmetic for Computers (4) Fengguang Song Department of Computer & Information Science IUPUI.

COMPUTER ORGANIZATION AND DESIGN

Written Homework 3. Floating-Point Example (1/2)

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

Chapter 3. Arithmetic Text: P&H rev

COMP2611: Computer Organization. The Pipelined Processor

COSC 6385 Computer Architecture - Pipelining

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

CSEE 3827: Fundamentals of Computer Systems

Instr. execution impl. view

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Thomas Polzer Institut für Technische Informatik

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

14:332:331 Pipelined Datapath

COMPUTER ORGANIZATION AND DESIGN

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

MIPS An ISA for Pipelining

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

Pipelining: Hazards Ver. Jan 14, 2014

Introduction to Pipelined Datapath

Pipelined Datapath. One register file is enough

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Divide: Paper & Pencil

ECE331: Hardware Organization and Design

TDT4255 Computer Design. Lecture 4. Magnus Jahre

ECE232: Hardware Organization and Design

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Processor Architecture

EIE/ENE 334 Microprocessors

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Modern Computer Architecture

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Chapter 4. The Processor

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Systems Architecture Spring 2016

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Single-Cycle Examples, Multi-Cycle Introduction

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Thomas Polzer Institut für Technische Informatik

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

LECTURE 10. Pipelining: Advanced ILP

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Improving Performance: Pipelining

Pipeline Review. Review

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Chapter 4. The Processor

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

CSEE 3827: Fundamentals of Computer Systems

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

CPE 335. Basic MIPS Architecture Part II

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Pipelined Processor Design

Transcription:

ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design

IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double: 52 bits S Exponent Fraction x ( 1) S (1 Fraction) 2 (Exponent Bias) S: sign bit (0 non negative, 1 negative) Normalized significand: 1.0 significand < 2.0 Always has a leading pre binary point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the 1. restored Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203

Floating Point Addition Consider a 4 digit decimal example 9.999 10 1 + 1.610 10 1 1. Align decimal points Shiftnumber withsmallerexponent exponent 9.999 10 1 + 0.016 10 1 2. Add significands 9.999 10 1 + 0.016 10 1 = 10.015 10 1 3. Normalize result & check for over/underflow 1.0015 10 2 4. Round and renormalize if necessary 1.002 10 2

FP Adder Hardware Step 1 Step 2 Step 3 Step 4

Floating Point Multiplication Consider a 4 digit decimal example 1.110 10 10 9.200 10 5 1. Add exponents For biased exponents, subtract bias from sum New exponent = 10 + 5 = 5 2. Multiply significands 1.110 9.200 = 10.212 10.212 10 5 3. Normalize result & check for over/underflow 1.0212 10 6 4. Round and renormalize if necessary 1.021 10 6 5. Determine sign of result from signs of operands +1.021 10 6

Accurate Arithmetic IEEE Std 754 specifies additional rounding control Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine tune numerical behavior of a computation Not allfp units implement alloptions Most programming languages and FP libraries just use defaults Trade off between hardware complexity, performance, and market requirements

Interpretation of Data The BIG Picture Bits have no inherent meaning Interpretation depends on the instructions applied Computer representations of numbers Finite range and precision Need to account for this in programs

Associativity Parallel programs may interleave operations in unexpected orders Assumptions of associativity may fail (x+y)+z x -1.50E+38 y 1.50E+38 0.00E+0000E+00 z 1.0 1.0 1.00E+00 x+(y+z) -1.50E+38 1.50E+38 0.00E+00 Need to validate parallel programs under varying degrees of parallelism

Pipelined datapath

Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions smallnumber of instruction formats opcode always the first 6 bits Smaller is faster limited instruction set limited number of registers in register file limitednumber ofaddressing modes Make the common case fast arithmetic operands from the register file (load store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats

Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non stop: Speedup p = 2n/0.5n + 1.5 4 = number of stages 4.5 An Overview of Pipelining Chapter 4 The Processor 11

The Five Stages of Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB IFetch: Instruction Fetch and Update PC Dec: Registers Fetch and Instruction Decode Exec: Execute R type; calculate l memory address Mem: Read/write the data from/to the Data Memory WB: Write the result data into the register file

A Pipelined MIPS Processor Start the next instruction before the current one has completed improves throughput total amount of work done in a given time instruction latency (execution time, delay time, response time time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R type IFetch Dec Exec Mem WB clock cycle (pipeline stage time) is limited by the slowest stage for some stages don t need the whole clock cycle (e.g., WB) for some instructions, some stages are wasted cycles (i.e., nothing is done during that cycle for that instruction)

Pipeline Performance Single cycle (T c = 800ps) Pipelined (T c = 200ps) p) Chapter 4 The Processor 14

Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not bl balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease Chapter 4 The Processor 15

Single Cycle vs. Multicycle vs. Pipelined Clock Time needed Time allotted Instr 1 Instr 2 Instr 3 Instr 4 Clock Time needed Time allotted 3 cycles 5 cycles 3 cycles 4 cycles Instr 1 Instr 2 Instr 3 Instr 4 Time saved 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 f r a d w Cycle 1 f f f f f f f Cycle 2 3 f r f a r d a w d w 2 3 r r r r r r r a a a a a a a Drainage region 4 f = Fetch f r a d w 5 r = Reg read a = op f r a d w 6 d = Data access w = Writeback f r a d w 7 f r a d Instruction (a) Task-time diagram w 4 5 Start-up region Pipeline stage d d d d d d d w w w w w w w (b) Space-time diagram

MIPS Pipeline Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register lw Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IFetch Dec Exec Mem WB Chapter 4 The Processor 17

Pipelining and ISA Design MIPS ISA designed for pipelining pp All instructions are 32 bits Easier to fetch and decode in one cycle c.f. x86: 1 to 17 byte instructions i Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, access memory in 4 th stage Alignment of memory operands Memory access takes only one cycle Chapter 4 The Processor 18

Graphically Representing MIPS Pipeline Can help with answering questions like: How many cycles does it take to execute this code? What is the doing during cycle 4? Is there a hazard, why does it occur, and how can it be fixed?

Why Pipeline? For Performance! Time (l (clock cycles) I n s t r. Inst 0 Inst 1 A LU Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 O Inst 2 r d e r Inst 3 Inst 4 Timeto fillthe pipeline

Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction Chapter 4 The Processor 21

Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires dt data access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble bbl Hence, pipelined datapaths require separate instruction/data i memories Or separate instruction/data caches Chapter 4 The Processor 22

A Single Memory Would Be a Structural Hazard Time (l (clock cycles) I n s t r. lw Inst 1 A LU Mem Reg Mem Reg Mem Reg Mem Reg Reading data from memory O Inst 2 r d e r Inst 3 Mem Reg Mem Reg Mem Reg Mem Reg Inst 4 Reading instruction from memory Mem Reg Mem Reg Fix with separate instr and data memories (I$ and D$)

Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 Chapter 4 The Processor 24

Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards I n s t r. O r d e r add $1, sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 AL LU Read before write data hazard

Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards add $1, AL LU $,$,$ Usub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 Read before write data hazard

Loads Can Cause Data Hazards Dependencies backward in time cause hazards I n s t r. O r d e r lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 AL LU Load use data hazard

How About Register File Access? Time (clock cycles) I n s add $1, t Inst 1 r. Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half O r d e r Inst 2 add $2,$1, clock edge that controls register writing clock edge that t controls loading of pipeline state registers