ECE 154A Introduction to. Fall 2012

Size: px

Start display at page:

Download "ECE 154A Introduction to. Fall 2012"

Rosemary Henry
5 years ago
Views:

1 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design

2 IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double: 52 bits S Exponent Fraction x ( 1) S (1 Fraction) 2 (Exponent Bias) S: sign bit (0 non negative, 1 negative) Normalized significand: 1.0 significand < 2.0 Always has a leading pre binary point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the 1. restored Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203

3 Floating Point Addition Consider a 4 digit decimal example Align decimal points Shiftnumber withsmallerexponent exponent Add significands = Normalize result & check for over/underflow Round and renormalize if necessary

4 FP Adder Hardware Step 1 Step 2 Step 3 Step 4

5 Floating Point Multiplication Consider a 4 digit decimal example Add exponents For biased exponents, subtract bias from sum New exponent = = 5 2. Multiply significands = Normalize result & check for over/underflow Round and renormalize if necessary Determine sign of result from signs of operands

6 Accurate Arithmetic IEEE Std 754 specifies additional rounding control Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine tune numerical behavior of a computation Not allfp units implement alloptions Most programming languages and FP libraries just use defaults Trade off between hardware complexity, performance, and market requirements

7 Interpretation of Data The BIG Picture Bits have no inherent meaning Interpretation depends on the instructions applied Computer representations of numbers Finite range and precision Need to account for this in programs

8 Associativity Parallel programs may interleave operations in unexpected orders Assumptions of associativity may fail (x+y)+z x -1.50E+38 y 1.50E E+0000E+00 z E+00 x+(y+z) -1.50E E E+00 Need to validate parallel programs under varying degrees of parallelism

9 Pipelined datapath

10 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions smallnumber of instruction formats opcode always the first 6 bits Smaller is faster limited instruction set limited number of registers in register file limitednumber ofaddressing modes Make the common case fast arithmetic operands from the register file (load store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats

11 Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non stop: Speedup p = 2n/0.5n = number of stages 4.5 An Overview of Pipelining Chapter 4 The Processor 11

12 The Five Stages of Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB IFetch: Instruction Fetch and Update PC Dec: Registers Fetch and Instruction Decode Exec: Execute R type; calculate l memory address Mem: Read/write the data from/to the Data Memory WB: Write the result data into the register file

13 A Pipelined MIPS Processor Start the next instruction before the current one has completed improves throughput total amount of work done in a given time instruction latency (execution time, delay time, response time time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R type IFetch Dec Exec Mem WB clock cycle (pipeline stage time) is limited by the slowest stage for some stages don t need the whole clock cycle (e.g., WB) for some instructions, some stages are wasted cycles (i.e., nothing is done during that cycle for that instruction)

14 Pipeline Performance Single cycle (T c = 800ps) Pipelined (T c = 200ps) p) Chapter 4 The Processor 14

15 Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not bl balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease Chapter 4 The Processor 15

16 Single Cycle vs. Multicycle vs. Pipelined Clock Time needed Time allotted Instr 1 Instr 2 Instr 3 Instr 4 Clock Time needed Time allotted 3 cycles 5 cycles 3 cycles 4 cycles Instr 1 Instr 2 Instr 3 Instr 4 Time saved f r a d w Cycle 1 f f f f f f f Cycle 2 3 f r f a r d a w d w 2 3 r r r r r r r a a a a a a a Drainage region 4 f = Fetch f r a d w 5 r = Reg read a = op f r a d w 6 d = Data access w = Writeback f r a d w 7 f r a d Instruction (a) Task-time diagram w 4 5 Start-up region Pipeline stage d d d d d d d w w w w w w w (b) Space-time diagram

17 MIPS Pipeline Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register lw Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IFetch Dec Exec Mem WB Chapter 4 The Processor 17

18 Pipelining and ISA Design MIPS ISA designed for pipelining pp All instructions are 32 bits Easier to fetch and decode in one cycle c.f. x86: 1 to 17 byte instructions i Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, access memory in 4 th stage Alignment of memory operands Memory access takes only one cycle Chapter 4 The Processor 18

19 Graphically Representing MIPS Pipeline Can help with answering questions like: How many cycles does it take to execute this code? What is the doing during cycle 4? Is there a hazard, why does it occur, and how can it be fixed?

20 Why Pipeline? For Performance! Time (l (clock cycles) I n s t r. Inst 0 Inst 1 A LU Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 O Inst 2 r d e r Inst 3 Inst 4 Timeto fillthe pipeline

21 Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction Chapter 4 The Processor 21

22 Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires dt data access Instruction fetch would have to stall for that cycle Would cause a pipeline bubble bbl Hence, pipelined datapaths require separate instruction/data i memories Or separate instruction/data caches Chapter 4 The Processor 22

23 A Single Memory Would Be a Structural Hazard Time (l (clock cycles) I n s t r. lw Inst 1 A LU Mem Reg Mem Reg Mem Reg Mem Reg Reading data from memory O Inst 2 r d e r Inst 3 Mem Reg Mem Reg Mem Reg Mem Reg Inst 4 Reading instruction from memory Mem Reg Mem Reg Fix with separate instr and data memories (I$ and D$)

24 Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 Chapter 4 The Processor 24

25 Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards I n s t r. O r d e r add $1, sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 AL LU Read before write data hazard

26 Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards add $1, AL LU $,$,$ Usub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 Read before write data hazard

27 Loads Can Cause Data Hazards Dependencies backward in time cause hazards I n s t r. O r d e r lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 AL LU Load use data hazard

28 How About Register File Access? Time (clock cycles) I n s add $1, t Inst 1 r. Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half O r d e r Inst 2 add $2,$1, clock edge that controls register writing clock edge that t controls loading of pipeline state registers

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction: