EECS 470. Further review: Pipeline Hazards and More. Lecture 2 Winter 2018
|
|
- Harvey Joseph
- 5 years ago
- Views:
Transcription
1 EECS 470 Further review: Pipeline Hazards and ore Lecture 2 Winter 208 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, niversity of Pennsylvania, and niversity of Wisconsin.
2 Bureaucracy & Scheduling Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Announcements HW due HW2 posted by end of the week. se office hours! Programming assignment due Tuesday (7 days) Hand-in electronically by :59pm Should be reading C.-C.3 (review) 3., (new material) Get on 470 s Piazza site (link on website) 2
3 Bureaucracy & Scheduling Today Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Touch on performance Cover a bit on ISAs Pickup where we left off on pipelining 3/6
4 Performance Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Performance Key Points Amdahl s law S overall = / ( (-f) f/s ) Iron law Time Program Instructions Program Cycles Instruction Time Cycle Averaging Techniques Arithmetic Time n i Timei n Harmonic Rates n i n Rate i 4
5 Performance Speedup Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar While speedup is generally is used to explain the impact of parallel computation, we can also use it to discuss any performance improvement. Keep in mind that if execution time stays the same, speedup is. Speedup= T old T new 200% speedup means that it takes half as long to do something. So 50% speedup actually means it takes twice as long to do something. 5/73
6 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Instruction Set Architecture EECS 470
7 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Instruction Set Architecture Instruction set architecture (ISA) is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine IB introducing 360 in IB 360 is a family of binary-compatible machines with distinct microarchitectures and technologies, ranging from odel 30 (8- bit datapath, up to 64KB memory) to odel 70 (64-bit datapath, 52KB memory) and later odel 360/9 (the Tomasulo). - IB 360 replaced 4 concurrent, but incompatible lines of IB architectures developed over the previous 0 years
8 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar ISA: A contract between HW and SW ISA (instruction set architecture) A well-defined hardware/software interface The contract between software and hardware Functional definition of operations, modes, and storage locations supported by hardware Precise description of how to invoke, and access them No guarantees regarding How operations are implemented Which operations are fast and which are slow and when Which operations take more power and which take less
9 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Components of an ISA Programmer-visible states Program counter, general purpose registers, memory, control registers Programmer-visible behaviors (state transitions) What to do, when to do it Example register-transfer-level description of an instruction if imem[pc]== add rd, rs, rt then pc pc A binary encoding gpr[rd]=gpr[rs]grp[rt] ISAs last 25 years (because of SW cost) be careful what goes in
10 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar RISC vs CISC Recall Iron law: (instructions/program) * (cycles/instruction) * (seconds/cycle) CISC (Complex Instruction Set Computing) Improve instructions/program with complex instructions Easy for assembly-level programmers, good code density RISC (Reduced Instruction Set Computing) Improve cycles/instruction with many single-cycle instructions Increases instruction/program, but hopefully not as much Help from smart compiler Perhaps improve clock cycle time (seconds/cycle) via aggressive implementation allowed by simpler instructions
11 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar What akes a Good ISA? Programmability Easy to express programs efficiently? Implementability Easy to design high-performance implementations? ore recently Easy to design low-power implementations? Easy to design high-reliability implementations? Easy to design low-cost implementations? Compatibility Easy to maintain programmability (implementability) as languages and programs (technology) evolves? x86 (IA32) generations: 8086, 286, 386, 486, Pentium, PentiumII, PentiumIII, Pentium4,
12 ISA Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Typical Instructions (Opcodes) Type Arithmetic and logical and, add Data transfer move, load Control branch, jump, call, return System trap, rett Floating point add, mul, div, sqrt Decimal addd, convert String move, compare What operations are necessary? {sub, ld & st, conditional br.} What is the minimum complete ISA for a von Neuman machine? Too little or too simple not expressive enough difficult to program (by hand) programs tend to be bigger Too much or too complex most of it won t be used Example Instruction too much baggage for implementation. difficult choices during compiler optimization
13 Basic Pipelining Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Basic Pipelining
14 Basic Pipelining Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Before there was pipelining Single-cycle insn0.fetch ulti-cycle Basic datapath: fetch, decode, execute Single-cycle control: hardwired Low CPI () Long clock period (to accommodate slowest instruction) ulti-cycle control: micro-programmed Short clock period High CPI Can we have both low CPI and short clock period? insn0.fetch, dec, exec insn0.dec insn0.exec insn.fetch insn.fetch, dec, exec insn.dec Not if datapath executes only one instruction at a time No good way to make a single instruction go faster insn.exec
15 Basic Pipelining Pipelining Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar insn0.fetch ulti-cycle insn0.fetch Pipelined insn0.dec insn0.dec insn.fetch Important performance technique Improves throughput at the expense of latency Why does latency go up? Begin with multi-cycle design insn0.exec insn0.exec insn.dec When instruction advances from stage to 2 allow next instruction to enter stage Each instruction still passes through all stages But instructions enter and leave at a much faster rate Automotive assembly line analogy insn.fetch insn.exec insn.dec insn.exec
16 Basic Pipelining Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Pipeline Illustrated: L Comb. Logic n Gate Delay BW = ~(/n) L n -- 2 Gate Delay L n -- 2 Gate Delay BW = ~(2/n) L n -- Gate 3 Delay L n -- Gate 3 Delay L n -- Gate 3 Delay BW = ~(3/n)
17 Basic Pipelining Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar 370 Processor Pipeline Review Fetch Decode Execute emory (Write-back) PC I-cache Reg File AL D-cache T pipeline = T base / 5
18 Basic Pipelining Portions Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Wenisch, Vijaykumar Basic Pipelining Data hazards What are they? How do you detect them? How do you deal with them? icro-architectural changes Pipeline depth Pipeline width Forwarding ISA (minor point) Control hazards (time allowing) 8
19 Register file Basic Pipelining Fetch Decode Execute emory WB PC Inst mem PC instruction rega regb R0 R R2 R3 R4 R5 R6 R7 0 PC vala valb offset A L target eq? AL result valb Data memory AL result mdata data dest Bits 0-2 Bits 6-8 Bits dest op dest op dest op IF/ ID ID/ E E/ em em/ WB
20 PC Inst mem Register file A L Data memory IF/ ID ID/ E E/ em em/ WB op dest offset valb vala PC PC target AL result op dest valb op dest AL result mdata eq? instruction 0 R2 R3 R4 R5 R R6 R0 R7 rega regb data dest Fetch Decode Execute emory WB Basic Pipelining
21 Basic Pipelining Fetch Decode Execute emory WB Register file PC Inst mem PC instruction rega regb data R0 R R2 R3 R4 R5 R6 R7 0 PC vala valb offset A L target eq? AL result valb Data memory AL result mdata op op op IF/ ID fwd fwd fwd ID/ E E/ em em/ WB
22 Basic Pipelining Pipeline function for ADD Fetch: read instruction from memory Decode: read source operands from reg Execute: calculate sum emory: Pass results to next stage Writeback: write sum into register file
23 Pipelining & Data Hazards Data Hazards add 2 3 nand time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you will read the wrong value of R3
24 Pipelining & Data Hazards Three approaches to handling data hazards ake sure there are no hazards in the code If hazards exist, stall the processor until they go away. Detect and Forward If hazards exist, fix up the pipeline to get the correct value (if possible)
25 Pipelining & Data Hazards Detect and Forward Handling data hazards: avoid all hazards Assume the programmer (or the compiler) knows about the processor implementation. ake sure no hazards exist. Put noops between any dependent instructions. add 2 3 noop noop nand write R3 in cycle 5 read R3 in cycle 6
26 Pipelining & Data Hazards Detect and Forward Problems with this solution Old programs (legacy code) may not run correctly on new implementations Longer pipelines need more noops Programs get larger as noops are included Especially a problem for machines that try to execute more than one instruction every cycle Intel EPIC: Often 25% - 40% of instructions are noops Program execution is slower CPI is one, but some I s are noops
27 Pipelining & Data Hazards Detect and Forward Handling data hazards: Detection: detect and stall Compare rega with previous DestRegs 3 bit operand fields Compare regb with previous DestRegs Stall: 3 bit operand fields Keep current instructions in fetch and decode Pass a noop to execute
28 Register file Pipelining & Data Hazards Detect and Forward End of Cycle PC Inst mem PC add 2 3 rega regb data R0 R R2 R3 R4 R5 R6 R PC vala valb offset A L target eq? AL result valb Data memory AL result mdata op op op IF/ ID ID/ E E/ em em/ WB
29 Register file Pipelining & Data Hazards Detect and Forward End of Cycle 2 PC Inst mem PC nand rega regb data R0 R R2 R3 R4 R5 R6 R PC A L target eq? AL result valb Data memory AL result mdata add op op IF/ ID ID/ E E/ em em/ WB
30 Register file Pipelining & Data Hazards Detect and Forward First half of cycle 3 PC Inst mem PC nand Hazard detection 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R PC A L target eq? AL result valb Data memory AL result mdata add op op IF/ ID ID/ E E/ em em/ WB
31 Pipelining & Data Hazards Detect and Forward Hazard detected 3 compare compare compare compare rega regb 3 REG file IF/ ID ID/ E
32 Pipelining & Data Hazards Detect and Forward Hazard detected compare rega 0 regb 3
33 Pipelining & Data Hazards Detect and Forward Handling data hazards: detect and stall the pipeline Detection: until ready Compare rega with previous DestReg 3 bit operand fields Compare regb with previous DestReg Stall: 3 bit operand fields Keep current instructions in fetch and decode Pass a noop to execute
34 Register file First half of cycle 3 Pipelining & Data Hazards Detect and Forward en en PC Inst mem 2 nand Hazard 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R A L target eq? AL result Data memory AL result mdata valb add IF/ ID ID/ E E/ em em/ WB
35 Pipelining & Data Hazards Detect and Forward Handling data hazards: detect and stall the pipeline Detection: until ready Compare rega with previous DestReg 3 bit operand fields Compare regb with previous DestReg Stall: 3 bit operand fields Keep current instructions in fetch and decode Pass a noop to execute
36 Register file Pipelining & Data Hazards Detect and Forward End of cycle 3 PC Inst mem 2 nand rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory AL result mdata noop add IF/ ID ID/ E E/ em em/ WB
37 Register file First half of cycle 4 Pipelining & Data Hazards Detect and Forward en en PC Inst mem 2 nand Hazard 3 rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory AL result mdata noop add IF/ ID ID/ E E/ em em/ WB
38 Register file Pipelining & Data Hazards Detect and Forward End of cycle 4 PC Inst mem 2 nand rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L Data memory 2 noop noop add IF/ ID ID/ E E/ em em/ WB
39 Register file Pipelining & Data Hazards Detect and Forward First half of cycle 5 PC Inst mem 2 nand No Hazard 3 rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L Data memory 2 noop noop add IF/ ID ID/ E E/ em em/ WB
40 Pipelining & Data Hazards Detect and Forward Register file End of cycle 5 PC Inst mem 3 add rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory nand noop noop IF/ ID ID/ E E/ em em/ WB
41 Pipelining & Data Hazards Detect and Forward No more hazard: stalling time add 2 3 nand add nand fetch decode execute memory writeback fetch decode decode decode execute hazard hazard We are careful to get the right value of R3
42 Pipelining & Data Hazards Detect and Forward Problems with detect and stall CPI increases every time a hazard is detected! Is that necessary? Not always! Re-route the result of the add to the nand nand no longer needs to read R3 from reg file It can get the data later (when it is ready) This lets us complete the decode this cycle But we need more control to remember that the data that we aren t getting from the reg file at this time will be found elsewhere in the pipeline at a later cycle.
43 Pipelining & Data Hazards Detect and Forward Handling data hazards: detect and forward Detection: same as detect and stall Except that all 4 hazards are treated differently i.e., you can t logical-or the 4 hazard signals Forward: New datapaths to route computed data to where it is needed New ux and control to pick the right data
44 Pipelining & Data Hazards Detect and Forward Detect and Forward Example add 2 3 nand add lw sw // r3 = r r2 // r5 = r3 NAND r4 // r7 = r3 r6 // r6 = E[r30] // E[r62]=r2
45 Pipelining & Data Hazards Detect and Forward Register file First half of cycle 3 PC Inst mem 2 nand Hazard 3 3 rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory add IF/ ID fwd fwd fwd ID/ E E/ em em/ WB
46 Pipelining & Data Hazards Detect and Forward Register file End of cycle 3 PC Inst mem 3 add rega regb 3 data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB
47 Pipelining & Data Hazards Detect and Forward Register file First half of cycle 4 PC Inst mem 3 add New Hazard rega regb data R0 R R2 R3 R4 R5 R6 R A L 2 Data memory nand add IF/ ID H ID/ E E/ em em/ WB
48 Pipelining & Data Hazards Detect and Forward Register file End of cycle 4 PC Inst mem 4 lw rega regb 75 3 data R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB
49 Pipelining & Data Hazards Detect and Forward Register file First half of cycle 5 PC Inst mem 4 lw No Hazard 3 rega regb 75 3 data R0 R R2 R3 R4 R5 R6 R A L -2 Data memory 2 add nand add IF/ ID H2 ID/ E H E/ em em/ WB
50 Pipelining & Data Hazards Detect and Forward Register file End of cycle 5 PC Inst mem 5 sw rega regb 67 5 data R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB
51 Register file First half of cycle 6 Pipelining & Data Hazards Detect and Forward en en PC Inst mem 5 sw Hazard 6 rega regb 67 5 L data R0 R R2 R3 R4 R5 R6 R A L 22 Data memory -2 lw add nand IF/ ID ID/ E H2 E/ em H em/ WB
52 Pipelining & Data Hazards Detect and Forward Register file End of cycle 6 PC Inst mem 5 sw rega regb 6 7 data R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB
53 Pipelining & Data Hazards Detect and Forward Register file First half of cycle 7 PC Inst mem 5 sw Hazard 6 rega regb 6 7 data R0 R R2 R3 R4 R5 R6 R A L 3 Data memory 22 noop lw add IF/ ID ID/ E E/ em H2 em/ WB
54 Register file Pipelining & Data Hazards Detect and Forward End of cycle 7 PC Inst mem rega regb 6 data R0 R R2 R3 R4 R5 R6 R A L Data memory 99 sw noop lw IF/ ID H3 ID/ E E/ em em/ WB
55 Pipelining & Data Hazards Detect and Forward Register file First half of cycle 8 PC Inst mem rega regb 6 data R0 R R2 R3 R4 R5 R6 R A L Data memory 99 sw noop lw IF/ ID H3 ID/ E E/ em em/ WB
56 Pipelining & Data Hazards Detect and Forward Register file End of cycle 8 PC Inst mem rega regb data R0 R R2 R3 R4 R5 R6 R A L Data memory 7 sw noop IF/ ID ID/ E H3 E/ em em/ WB
57 Pipelining & Control Hazards Pipeline function for BEQ Fetch: read instruction from memory Decode: read source operands from reg Execute: calculate target address and test for equality emory: Send target to PC if test is equal Writeback: Nothing left to do
58 Pipelining & Control Hazards Control Hazards beq 0 sub beq sub t 0 t t 2 t 3 t 4 t 5 F D E W F D E W squash
59 Pipelining & Control Hazards Handling Control Hazards (static) No branches? Convert branches to predication Control dependence becomes data dependence (dynamic) Stop fetch until branch resolves Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong
60 Pipelining & Control Hazards Speculate and Squash Via Predication if (a == b) { x; y = n / d; } sub t a, b jnz t, PC2 add x x, # div y n, d sub t a, b add(!t) x x, # div(!t) y n, d sub t a, b add t2 x, # div t3 n, d cmov(!t) x t2 cmov(!t) y t3
61 Pipelining & Control Hazards Speculate and Squash Handling Control Hazards: Detect & Stall Detection In decode, check if opcode is branch or jump Stall Hold next instruction in Fetch Pass noop to Decode
62 Pipelining & Control Hazards Speculate and Squash PC Inst mem REG file sign ext A L Data memory bnz r IF/ ID Control ID/ E E/ em em/ WB
63 Pipelining & Control Hazards Speculate and Squash Control Hazards beq 0 sub time beq sub fetch decode execute memory writeback fetch fetch fetch fetch Target: or fetch
64 Pipelining & Control Hazards Speculate and Squash Problems with Detect & Stall CPI increases on every branch Are these stalls necessary? Not always! Branch is only taken half the time Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don t complete
65 Pipelining & Control Hazards Speculate and Squash Handling Control Hazards: Speculate & Squash Speculate Not-Taken Assume branch is not taken Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch
66 Pipelining & Control Hazards Speculate and Squash PC beq sub add nand Inst mem noop add IF/ ID Control REG file sign ext noop sub ID/ E equal A L noop beq E/ em Data memory beq em/ WB
67 Pipelining & Control Hazards Speculate and Squash Problems with Speculate & Squash Always assumes branch is not taken Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats. ore on branch prediction to come...
68 Pipelining & Control Hazards Branch Delay Slot branch: next: target: Branch Delay Slot (IPS, SPARC) t 0 t t 2 t 3 t 4 t 5 F D E W F Squash F D E - Instruction in delay slot executes even on taken branch branch: delay: target: i: beq, 2, tgt j: add 3, 4, 5 What can we put here? W F D E W F D E W F D E W
69 Improving pipeline performance Improving pipeline performance Add more stages Widen pipeline
70 Improving pipeline performance Adding pipeline stages Pipeline frontend Fetch, Decode Pipeline middle Execute Pipeline backend emory, Writeback
71 Improving pipeline performance Adding stages to fetch, decode Delays hazard detection No change in forwarding paths No performance penalty with respect to data hazards
72 Improving pipeline performance Adding stages to execute Check for structural hazards AL not pipelined ultiple AL ops completing at same time Data hazards may cause delays If multicycle op hasn't computed data before the dependent instruction is ready to execute Performance penalty for each stall
73 Improving pipeline performance Adding stages to memory, writeback Instructions ready to execute may need to wait longer for multi-cycle memory stage Adds more pipeline registers Thus more source registers to forward ore complex hazard detection Wider muxes ore control bits to manage muxes
74 Improving pipeline performance Wider pipelines fetch decode execute mem WB fetch decode execute mem WB ore complex hazard detection 2 pipeline registers to forward from 2 more instructions to check 2 more destinations (muxes) Need to worry about dependent instructions in the same stage
75 aking forwarding explicit add r r2, E/em AL result Include direct mux controls into the ISA Hazard detection is now a compiler task New micro-architecture leads to new ISA Is this why this approach always seems to fail? (e.g., simple VLIW, otorola 88k) Can reduce some resources Eliminates complex conflict checkers
EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont
GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,
More informationPipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1
Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs4
More informationEECS 470 Lecture 2. Performance, Power & ISA. Fall Jon Beaumont
Performance, Power & ISA Fall 218 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, udge, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We
More informationEECS 470 Lecture 1. Computer Architecture Winter 2014
EECS 470 Lecture 1 Computer Architecture Winter 2014 Slides developed in part by Profs. Brehob, Austin, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch 1 What Is Computer
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More information(Basic) Processor Pipeline
(Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationEECS 470. Control Hazards and ILP. Lecture 3 Winter 2014
EECS 470 Control Hazards and ILP Lecture 3 Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationBasic Pipelining Concepts
Basic ipelining oncepts Appendix A (recommended reading, not everything will be covered today) Basic pipelining ipeline hazards Data hazards ontrol hazards Structural hazards Multicycle operations Execution
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More information15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned
More informationECE 154A Introduction to. Fall 2012
ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationPipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...
CHAPTER 6 1 Pipelining Instruction class Instruction memory ister read ALU Data memory ister write Total (in ps) Load word 200 100 200 200 100 800 Store word 200 100 200 200 700 R-format 200 100 200 100
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationEECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table
Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationThe University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011
1. Performance Principles [5 pts] The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 For each of the following comparisons,
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationLecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1
Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)
More informationProcessor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed
Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites,
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationMultiple Issue ILP Processors. Summary of discussions
Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More information10. Basic Processor Design: Single-Cycle and Multi-Cycle Datapaths
0. Basic Processor Design: Single-Cycle and ulti-cycle paths EECS 370 Introduction to Computer Organization Winter 2007 Prof. Valeria Bertacco & Prof. Scott ahlke EECS Department niversity of ichigan in
More information15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011
15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationMark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control
EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationLecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)
Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationPipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science
Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationPipelined Datapath. One register file is enough
ipelined path The goal of pipelining is to allow multiple instructions execute at the same time We may need to perform several operations in a cycle Increment the and add s at the same time. Fetch one
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationModern Computer Architecture
Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationComputer Architecture ELEC3441
Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationComputer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More informationEECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont
Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More information55:132/22C:160, HPCA Spring 2011
55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationPipelining: Basic Concepts
Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation
More informationEECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont
Basic Caches Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of
More informationMIPS An ISA for Pipelining
Pipelining: Basic and Intermediate Concepts Slides by: Muhamed Mudawar CS 282 KAUST Spring 2010 Outline: MIPS An ISA for Pipelining 5 stage pipelining i Structural Hazards Data Hazards & Forwarding Branch
More informationPIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PIPELINING: HAZARDS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 1 submission deadline: Jan. 30 th This
More informationFall 2007 Prof. Thomas Wenisch
Basic Caches Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationPipeline Control Hazards and Instruction Variations
Pipeline Control Hazards and Instruction Variations Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Appendix 4.8 Goals for Today Recap: Data Hazards Control Hazards
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationECE 587 Advanced Computer Architecture I
ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationCPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner
CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.
More information1. Truthiness /8. 2. Branch prediction /5. 3. Choices, choices /6. 5. Pipeline diagrams / Multi-cycle datapath performance /11
The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 ANSWER KEY November 23 rd, 2010 Name: University of Michigan uniqname: (NOT your student ID
More informationImprove performance by increasing instruction throughput
Improve performance by increasing instruction throughput Program execution order Time (in instructions) lw $1, 100($0) fetch 2 4 6 8 10 12 14 16 18 ALU Data access lw $2, 200($0) 8ns fetch ALU Data access
More informationChapter 4 (Part II) Sequential Laundry
Chapter 4 (Part II) The Processor Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Sequential Laundry 6 P 7 8 9 10 11 12 1 2 A T a s k O r d e r A B C D 30 30 30 30 30 30 30 30 30 30
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationWhat do we have so far? Multi-Cycle Datapath (Textbook Version)
What do we have so far? ulti-cycle Datapath (Textbook Version) CPI: R-Type = 4, Load = 5, Store 4, Branch = 3 Only one instruction being processed in datapath How to lower CPI further? #1 Lec # 8 Summer2001
More informationComputer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture
Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:
More informationLecture 4 - Pipelining
CS 152 Computer Architecture and Engineering Lecture 4 - Pipelining John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II 2007-8-1 Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance
More informationSingle-Cycle Examples, Multi-Cycle Introduction
Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of
More informationMore advanced CPUs. August 4, Howard Huang 1
More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12
Pipelined Datapath Lecture notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7- in the online
More information