CS61C : Machine Structures
|
|
- Tamsyn Wiggins
- 5 years ago
- Views:
Transcription
1 inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #22 CPU Design: Pipelining to Improve Performance II Scott Beamer, Instructor CS61C L22 CPU Design : Pipelining to Improve Performance II (1)
2 Review: Processor Pipelining (1/2) Pipeline registers are added to the datapath/controller to neatly divide the single cycle processor into pipeline stages. Optimal Pipeline Each stage is executing part of an instruction each clock cycle. One inst. finishes during each clock cycle. On average, execute far more quickly. What makes this work well? Similarities between instructions allow us to use same stages for all instructions (generally). Each stage takes about the same amount of time as all others: little wasted time. CS61C L22 CPU Design : Pipelining to Improve Performance II (2)
3 Review: Pipeline (2/2) Pipelining is a BIG IDEA widely used concept What makes it less than perfect? Structural hazards: Conflicts for resources. Suppose we had only one cache? Need more HW resources Control hazards: Branch instructions effect which instructions come next. Delayed branch Data hazards: Data flow between instructions. Forwarding CS61C L22 CPU Design : Pipelining to Improve Performance II (3)
4 I n s t r. O r d e r Graphical Pipeline Representation (In Reg, right half highlight read, left half write) Time (clock cycles) Load Add Store Sub Or I$ Reg I$ Reg CS61C L22 CPU Design : Pipelining to Improve Performance II (4) I$ D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg Reg D$ Reg D$ Reg
5 Control Hazard: Branching (1/8) I n s t r. O r d e r beq Instr 1 Instr 2 Instr 3 Instr 4 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Where do we do the compare for the branch? Reg D$ Reg I$ Reg D$ Reg CS61C L22 CPU Design : Pipelining to Improve Performance II (5)
6 Control Hazard: Branching (2/8) We had put branch decision-making hardware in stage therefore two more instructions after the branch will always be fetched, whether or not the branch is taken Desired functionality of a branch if we do not take the branch, don t waste any time and continue executing normally if we take the branch, don t execute any instructions after the branch, just go to the desired label CS61C L22 CPU Design : Pipelining to Improve Performance II (6)
7 Control Hazard: Branching (3/8) Initial Solution: Stall until decision is made insert no-op instructions (those that accomplish nothing, just take time) or hold up the fetch of the next instruction (for 2 cycles). Drawback: branches take 3 clock cycles each (assuming comparator is put in stage) CS61C L22 CPU Design : Pipelining to Improve Performance II (7)
8 Control Hazard: Branching (4/8) Optimization #1: insert special branch comparator in Stage 2 as soon as instruction is decoded (Opcode identifies it as a branch), immediately make a decision and set the new value of the PC Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched, so only one no-op is needed Side Note: This means that branches are idle in Stages 3, 4 and 5. CS61C L22 CPU Design : Pipelining to Improve Performance II (8)
9 Control Hazard: Branching (5/8) I n s t r. O r d e r beq Instr 1 Instr 2 Instr 3 Instr 4 Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Branch comparator moved to Decode stage. Reg D$ Reg I$ Reg D$ Reg CS61C L22 CPU Design : Pipelining to Improve Performance II (9)
10 Control Hazard: Branching (6a/8) I n s t r. O r d e r User inserting no-op instruction add beq nop lw Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg bub ble bub ble I$ bub ble bub ble bub ble Reg D$ Reg Impact: 2 clock cycles per branch instruction slow CS61C L22 CPU Design : Pipelining to Improve Performance II (10)
11 Control Hazard: Branching (6b/8) I n s t r. O r d e r Controller inserting a single bubble add beq lw Time (clock cycles) I$ Reg D$ Reg I$ Reg D$ Reg bub ble I$ Reg D$ Reg Impact: 2 clock cycles per branch instruction slow CS61C L22 CPU Design : Pipelining to Improve Performance II (11)
12 Control Hazard: Branching (7/8) Optimization #2: Redefine branches Old definition: if we take the branch, none of the instructions after the branch get executed by accident New definition: whether or not we take the branch, the single instruction immediately following the branch gets executed (called the branch-delay slot) The term Delayed Branch means we always execute inst after branch This optimization is used on the MIPS CS61C L22 CPU Design : Pipelining to Improve Performance II (12)
13 Control Hazard: Branching (8/8) Notes on Branch-Delay Slot Worst-Case Scenario: can always put a no-op in the branch-delay slot Better Case: can find an instruction preceding the branch which can be placed in the branch-delay slot without affecting flow of the program re-ordering instructions is a common method of speeding up programs compiler must be very smart in order to find instructions to do this usually can find such an instruction at least 50% of the time Jumps also have a delay slot CS61C L22 CPU Design : Pipelining to Improve Performance II (13)
14 Example: Nondelayed vs. Delayed Branch Nondelayed Branch or $8, $9,$10 add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit xor $10, $1,$11 Delayed Branch add $1,$2,$3 sub $4, $5,$6 beq $1, $4, Exit or $8, $9,$10 xor $10, $1,$11 Exit: Exit: CS61C L22 CPU Design : Pipelining to Improve Performance II (14)
15 Out-of-Order Laundry: Don t Wait T a s k O r d e r 12 2 AM 6 PM A B C D E F bubble A depends on D; rest continue; need more resources to allow out-of-order CS61C L22 CPU Design : Pipelining to Improve Performance II (15) Time
16 Superscalar Laundry: Parallel per stage T a s k O r d e r 12 2 AM 6 PM A B C D E F More resources, HW to match mix of parallel tasks? CS61C L22 CPU Design : Pipelining to Improve Performance II (16) Time (light clothing) (dark clothing) (very dirty clothing) (light clothing) (dark clothing) (very dirty clothing)
17 Superscalar Laundry: Mismatch Mix 12 2 AM 6 PM T a s k O r d e r A B C D Time (light clothing) (light clothing) (dark clothing) (light clothing) Task mix underutilizes extra resources CS61C L22 CPU Design : Pipelining to Improve Performance II (17)
18 Real-world pipelining problem You re the manager of a HUGE assembly plant to build computers. Box Main pipeline 10 minutes/ pipeline stage 60 stages Latency: 10hr Problem: need to run 2 hr test before done..help! CS61C L22 CPU Design : Pipelining to Improve Performance II (18)
19 Real-world pipelining problem solution 1 You remember: a pipeline frequency is limited by its slowest stage, so Box Main pipeline 2hours/ 10 minutes/ pipeline stage 60 stages Latency: 120hr 10hr Problem: need to run 2 hr test before done..help! CS61C L22 CPU Design : Pipelining to Improve Performance II (19)
20 Real-world pipelining problem solution 2 Create a sub-pipeline! Box Main pipeline 10 minutes/ pipeline stage 60 stages CS61C L22 CPU Design : Pipelining to Improve Performance II (20) 2hr test (12 CPUs in this pipeline)
21 Peer Instruction (1/2) Assume 1 instr/clock, delayed branch, 5 stage pipeline, forwarding, interlock on unresolved load hazards (after 10 3 loops, so pipeline full) Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addiu $s1, $s1, -4 bne $s1, $zero, Loop nop How many pipeline stages (clock cycles) per loop CS61C iteration L22 CPU Design : Pipelining to execute to Improve Performance this II (21) code?
22 Peer Instruction Answer (1/2) Assume 1 instr/clock, delayed branch, 5 stage pipeline, forwarding, interlock on unresolved load hazards iterations, so pipeline full. 2. (data hazard so stall) Loop: 1. lw $t0, 0($s1) 3. addu $t0, $t0, $s2 4. sw $t0, 0($s1) 5. addiu $s1, $s1, bne $s1, $zero, Loop 8. nop 6. (!= in DCD) (delayed branch so exec. nop) How many pipeline stages (clock cycles) per loop iteration to execute this code? CS61C L22 CPU Design : Pipelining to Improve Performance II (22)
23 Peer Instruction (2/2) Assume 1 instr/clock, delayed branch, 5 stage pipeline, forwarding, interlock on unresolved load hazards (after 10 3 loops, so pipeline full). Rewrite this code to reduce pipeline stages (clock cycles) per loop to as few as possible. Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addiu $s1, $s1, -4 bne $s1, $zero, Loop nop How many pipeline stages (clock cycles) per loop iteration to execute this code? CS61C L22 CPU Design : Pipelining to Improve Performance II (23)
24 Peer Instruction (2/2) How long to execute? Rewrite this code to reduce clock cycles per loop to as few as possible: Loop: 1. lw $t0, 0($s1) 2. addiu $s1, $s1, -4 addu $t0, $t0, $s2 bne $s1, $zero, Loop sw $t0, +4($s1) (no hazard since extra cycle) (modified sw to put past addiu) How many pipeline stages (clock cycles) per loop iteration to execute your revised code? (assume pipeline is full) CS61C L22 CPU Design : Pipelining to Improve Performance II (24)
25 And in Early Conclusion.. Pipeline challenge is hazards Forwarding helps w/many data hazards Delayed branch helps with control hazard in 5 stage pipeline Load delay slot / interlock necessary More aggressive performance: Superscalar Out-of-order execution CS61C L22 CPU Design : Pipelining to Improve Performance II (25)
26 Administrivia Assignments HW7 due 8/2 Proj3 due 8/5 Midterm Regrades due Today Logisim in lab is now java -jar ~cs61c/bin/logisim Valerie s OH on Thursday moved to for this week CS61C L22 CPU Design : Pipelining to Improve Performance II (26)
27 Why Doesn t It Work? DO NOT MESS WITH THE CLOCK Crafty veterans may do it very rarely and carefully Doing so will cause unpredictable and hard to track errors Following slides are from CS 150 Lecture by Prof. Katz CS61C L22 CPU Design : Pipelining to Improve Performance II (27)
28 Cascading Edge-triggered Flip-Flops Shift register New value goes into first stage While previous value of first stage goes into second stage Consider setup/hold/propagation delays (prop must be > hold) IN Q0 Q1 D Q D Q OUT CLK 100 IN Q0 Q1 CLK CS61C L22 CPU Design : Pipelining to Improve Performance II (28)
29 Cascading Edge-triggered Flip-Flops Shift register New value goes into first stage While previous value of first stage goes into second stage Consider setup/hold/propagation delays (prop must be > hold) IN Q0 Q1 D Q D Q OUT CLK Delay Clk1 100 IN Q0 Q1 CLK Clk1 CS61C L22 CPU Design : Pipelining to Improve Performance II (29)
30 Clock Skew The problem Correct behavior assumes next state of all storage elements determined by all storage elements at the same time Difficult in high-performance systems because time for clock to arrive at flip-flop is comparable to delays through logic (and will soon become greater than logic delay) Effect of skew on cascaded flip-flops: 100 In Q0 Q1 CLK CLK1 CLK1 is a delayed version of CLK original state: IN = 0, Q0 = 1, Q1 = 1 due to skew, next state becomes: Q0 = 0, Q1 = 0, and not Q0 = 0, Q1 = 1 CS61C L22 CPU Design : Pipelining to Improve Performance II (30)
31 Why Gating of Clocks is Bad! LD Reg Reg Clk GOOD Clk LD BAD gatedclk Do NOT Mess With Clock Signals! CS61C L22 CPU Design : Pipelining to Improve Performance II (31)
32 Why Gating of Clocks is Bad! LD generated by FSM shortly after rising edge of CLK Clk LD gatedclk Runt pulse plays HAVOC with register internals! NASTY HACK: delay LD through negative edge triggered FF to ensure that it won t change during next positive edge event Clk LDn gatedclk Clk skew PLUS LD delayed by half clock cycle What is the effect on your register transfers? Do NOT Mess With Clock Signals! CS61C L22 CPU Design : Pipelining to Improve Performance II (32)
33 The Big Picture Computer Processor (active) Control ( brain ) Datapath ( brawn ) Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse Disk, Network Display, Printer CS61C L22 CPU Design : Pipelining to Improve Performance II (33)
34 Memory Hierarchy Storage in computer systems: Processor holds data in register file (~100 Bytes) Registers accessed on nanosecond timescale Memory (we ll call main memory ) Disk More capacity than registers (~Gbytes) Access time ~ ns Hundreds of clock cycles per memory access?! HUGE capacity (virtually limitless) VERY slow: runs ~milliseconds CS61C L22 CPU Design : Pipelining to Improve Performance II (34)
35 Motivation: Why We Use Caches (written $) Performance CPU DRAM µproc 60%/yr. Processor-Memory Performance Gap: (grows 50% / year) DRAM 7%/yr first Intel CPU with cache on chip 1998 Pentium III has two levels of cache on chip CS61C L22 CPU Design : Pipelining to Improve Performance II (35)
36 Memory Caching Mismatch between processor and memory speeds leads us to add a new level: a memory cache Implemented with same IC processing technology as the CPU (usually integrated on same chip): faster but more expensive than DRAM memory. Cache is a copy of a subset of main memory. Most processors have separate caches for instructions and data. CS61C L22 CPU Design : Pipelining to Improve Performance II (36)
37 Memory Hierarchy Processor Higher Levels in memory hierarchy Lower Level 1 Level 2 Level 3... Level n Increasing Distance from Proc., Decreasing speed Size of memory at each level As we move to deeper levels the latency goes up and price per bit goes down. CS61C L22 CPU Design : Pipelining to Improve Performance II (37)
38 Memory Hierarchy If level closer to Processor, it is: smaller faster subset of lower levels (contains most recently used data) Lowest Level (usually disk) contains all available data (or does it go beyond the disk?) Memory Hierarchy presents the processor with the illusion of a very large very fast memory. CS61C L22 CPU Design : Pipelining to Improve Performance II (38)
UC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 31 Caches I 2007-04-06 Powerpoint bad!! Research done at the Univ of NSW says that working memory, the brain part providing
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 30 Caches I 2006-11-08 Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Shuttle can t fly over Jan 1? A computer bug has
More information! CS61C : Machine Structures. Lecture 22 Caches I. !!Instructor Paul Pearce! ITʼS NOW LEGAL TO JAILBREAK YOUR PHONE!
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 22 Caches I 2010-07-28!!!Instructor Paul Pearce! ITʼS NOW LEGAL TO JAILBREAK YOUR PHONE! On Monday the Library of Congress added 5 exceptions
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 Caches I 2008-04-11 Lecturer SOE Dan Garcia Hi to Kononov Alexey from Russia! Touted as the fastest CPU on Earth, IBM s new Power6
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Is this the beginning of the end for our beloved Lecture 32 Caches I 2004-11-12 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia The Incredibles!
More informationReview : Pipelining. Memory Hierarchy
CS61C L11 Caches (1) CS61CL : Machine Structures Review : Pipelining The Big Picture Lecture #11 Caches 2009-07-29 Jeremy Huddleston!! Pipeline challenge is hazards "! Forwarding helps w/many data hazards
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationCS 61C: Great Ideas in Computer Architecture Pipelining and Hazards
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs61c/sp16 1 Pipelined Execution Representation Time
More informationCS61C : Machine Structures
CS C L.. Cache I () Design Principles for Hardware CSC : Machine Structures Lecture.. Cache I -- Kurt Meinz inst.eecs.berkeley.edu/~csc. Simplicity favors regularity Every instruction has operands, opcode
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 12 Caches I Lecturer SOE Dan Garcia Midterm exam in 3 weeks! A Mountain View startup promises to do Dropbox one better. 10GB free storage,
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #18 Introduction to CPU Design 2007-7-25 Scott Beamer, Instructor CS61C L18 Introduction to CPU Design (1) What about overflow? Consider
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 12 Caches I 2014-09-26 Instructor: Miki Lustig September 23: Another type of Cache PayPal Integrates Bitcoin Processors BitPay, Coinbase
More informationReview: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.
Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #23 Cache I 2007-8-2 Scott Beamer, Instructor CS61C L23 Caches I (1) The Big Picture Computer Processor (active) Control ( brain ) Datapath
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationCS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction
CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 24 Introduction to CPU Design 2007-03-14 CS61C L24 Introduction to CPU Design (1) Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia
More informationTruth Table! Review! Use this table and techniques we learned to transform from 1 to another! ! Gate Diagram! Today!
CS61C L17 Combinational Logic Blocks and Intro to CPU Design (1)! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 17 Combinational Logic Blocks and Intro to CPU Design 2010-07-20!!!Instructor
More informationReview. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU
CS6C L9 CPU Design : Designing a Single-Cycle CPU () insteecsberkeleyedu/~cs6c CS6C : Machine Structures Lecture #9 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker Review
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationLecture 8 Introduction to Pipelines Adapated from slides by David Patterson
Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.
More informationLecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University
Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will
More informationCS61C - Machine Structures. Lecture 17 - Caches, Part I. October 25, 2000 David Patterson
CS1C - Machine Structures Lecture 1 - Caches, Part I October 25, 2 David Patterson http://www-inst.eecs.berkeley.edu/~cs1c/ Things to Remember Magnetic Disks continue rapid advance: %/yr capacity, 4%/yr
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker CS61C L19 CPU Design : Designing a Single-Cycle CPU
More informationClever Signed Adder/Subtractor. Five Components of a Computer. The CPU. Stages of the Datapath (1/5) Stages of the Datapath : Overview
Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 24 Introduction to CPU design Hi to Vitaly Babiy from Albany, NY! 2008-03-21 Stanford researchers developing
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationPipelining. CSC Friday, November 6, 2015
Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21 Andy Carle CS 61C L19 Pipelining II (1) Review: Datapath for MIPS PC instruction memory rd rs rt registers
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #21: Caches 3 2005-07-27 CS61C L22 Caches III (1) Andy Carle Review: Why We Use Caches 1000 Performance 100 10 1 1980 1981 1982 1983
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction
CS 61C: Great Ideas in Computer Architecture MIPS CPU Datapath, Control Introduction Instructor: Alan Christopher 7/28/214 Summer 214 -- Lecture #2 1 Review of Last Lecture Critical path constrains clock
More informationCS3350B Computer Architecture Winter 2015
CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 35 Caches IV / VM I 2004-11-19 Andy Carle inst.eecs.berkeley.edu/~cs61c-ta Google strikes back against recent encroachments into the Search
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationQ1: Finite State Machine (8 points)
Q1: Finite State Machine (8 points) Answer the questions below for the finite state machine in this diagram: 1. Complete the truth table shown below. (2 points) Input Output State In State Out S 0 = 00
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 4 Processor Part 2: Pipelining (Ch.4) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations from Mike
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationCS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches
CS 61C: Great Ideas in Computer Architecture The Memory Hierarchy, Fully Associative Caches Instructor: Alan Christopher 7/09/2014 Summer 2014 -- Lecture #10 1 Review of Last Lecture Floating point (single
More informationCSE140: Components and Design Techniques for Digital Systems
CSE4: Components and Design Techniques for Digital Systems Tajana Simunic Rosing Announcements and Outline Check webct grades, make sure everything is there and is correct Pick up graded d homework at
More informationCS Basic Pipeline
CS 3220 Basic Pipeline Why not go directly to five stages? This is what we had in CS 2200! Will have more stages in Project 3, but We want to start with something easier Lots of things become more complicated
More informationComputer Science 61C Spring Friedland and Weaver. The MIPS Datapath
The MIPS Datapath 1 The Critical Path and Circuit Timing The critical path is the slowest path through the circuit For a synchronous circuit, the clock cycle must be longer than the critical path otherwise
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #24 Cache II 27-8-6 Scott Beamer, Instructor New Flow Based Routers CS61C L24 Cache II (1) www.anagran.com Caching Terminology When we try
More informationReview. Manage memory to disk? Treat as cache. Lecture #26 Virtual Memory II & I/O Intro
CS61C L26 Virtual Memory II (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #26 Virtual Memory II & I/O Intro 2007-8-8 Scott Beamer, Instructor Apple Releases new imac Review Manage
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design
CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner
CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationComputer Architecture
Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/
More informationPipelining. Maurizio Palesi
* Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/24/16 Fall 2016 - Lecture #16 1 Software
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More informationLecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter
Processor components" Multicore processors and programming" Processor comparison" vs." Lecture 17 Introduction to Memory Hierarchies" CSE 30321" Suggested reading:" (HP Chapter 5.1-5.2)" Writing more "
More informationELE 655 Microprocessor System Design
ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half
More informationLctures 33: Cache Memory - I. Some of the slides are adopted from David Patterson (UCB)
Lctures 33: Cache Memory - I Some of the slides are adopted from David Patterson (UCB) Outline Memory Hierarchy On-Chip SRAM Direct-Mapped Cache Review: ARM System Architecture Fast on-chip RAM External
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type
More informationComputer Architecture V Fall Practice Exam Questions
Computer Architecture V22.0436 Fall 2002 Practice Exam Questions These are practice exam questions for the material covered since the mid-term exam. Please note that the final exam is cumulative. See the
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single- Cycle CPU Datapath Design
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 27: Single- Cycle CPU Datapath Design Instructor: Sr Lecturer SOE Dan Garcia hgp://inst.eecs.berkeley.edu/~cs61c/sp13/ www.technologyreview.com/news/512891/software-makes-multiple-screens-less-distracting/
More informationThomas Polzer Institut für Technische Informatik
Thomas Polzer tpolzer@ecs.tuwien.ac.at Institut für Technische Informatik Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #17 Single Cycle CPU Datapath CPS today! 2005-10-31 There is one handout today at the front and back of the room! Lecturer PSOE, new dad
More informationPhoto David Wright STEVEN R. BAGLEY PIPELINES AND ILP
Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationinst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I CS61C L18 CPU Design: The Single-Cycle I (1)! 2010-07-21!!!Instructor Paul Pearce! Nasty new windows vulnerability!
More informationECE 154A Introduction to. Fall 2012
ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:
More informationChapter 4 The Processor 1. Chapter 4B. The Processor
Chapter 4 The Processor 1 Chapter 4B The Processor Chapter 4 The Processor 2 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can t always
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 1 Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Components of a Computer Processor
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006
More informationCS 61C: Great Ideas in Computer Architecture. Virtual Memory
CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructor: Justin Hsia 7/30/2012 Summer 2012 Lecture #24 1 Review of Last Lecture (1/2) Multiple instruction issue increases max speedup, but
More informationPipelining, Instruction Level Parallelism and Memory in Processors. Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010
Pipelining, Instruction Level Parallelism and Memory in Processors Advanced Topics ICOM 4215 Computer Architecture and Organization Fall 2010 NOTE: The material for this lecture was taken from several
More informationCS61C : Machine Structures
CS 61C L path (1) insteecsberkeleyedu/~cs61c/su6 CS61C : Machine Structures Lecture # path natomy: 5 components of any Computer Personal Computer -7-25 This week Computer Processor ( brain ) path ( brawn
More informationLec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements
Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct
More informationCS Computer Architecture Spring Week 10: Chapter
CS 35101 Computer Architecture Spring 2008 Week 10: Chapter 5.1-5.3 Materials adapated from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [adapted from D. Patterson slides] CS 35101 Ch 5.1
More informationComputer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra
Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last
More informationPipelined CPUs. Study Chapter 4 of Text. Where are the registers?
Pipelined CPUs Where are the registers? Study Chapter 4 of Text Second Quiz on Friday. Covers lectures 8-14. Open book, open note, no computers or calculators. L17 Pipelined CPU I 1 Review of CPU Performance
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3 Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/19/17 Fall 2017 - Lecture #16 1 Parallel
More informationL19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07
Pipelined CPUs Where are the registers? Study Chapter 6 of Text L19 Pipelined CPU I 1 Review of CPU Performance MIPS = Millions of Instructions/Second MIPS = Freq CPI Freq = Clock Frequency, MHz CPI =
More information