Pipelined RISC-V Processors
|
|
- Phebe Flynn
- 5 years ago
- Views:
Transcription
1 Due date: Tuesday November 20th 11:59:59pm EST Getting started: To create your initial Lab 7 repository, please visit the repository creation page at Once your repository is created, you can clone it into your VM by running: git clone git@github.mit.edu:6004-fall18/labs-lab7-{yourmitusername}.git lab7 Turning in the lab: To turn in this lab, commit and push the changes you made to your git repository. After pushing, check the course website to verify that your submission passes all the tests. If you finish the lab in time but forget to push, you will incur the standard late submission penalties. Check-off meeting: After turning in this lab, you are required to go to the lab for a check-off meeting within 10 days of the lab s due date (i.e., by Fri Nov 30th this is more days than usual to account for Thanksgiving holidays). See the course website for lab hours. Pipelined RISC-V Processors In this lab you will implement two pipelined RISC-V processors in Bluespec. For Bluespec-related questions, you may want to check out the Introductory Bluespec User Guide. To pass the lab you must complete all of the exercises and discussion questions and PASS all of the exercises. Coding guidelines: You should only change the following files: TwoStage.bsv, TwoStagePlus.bsv and ThreeStage.bsv. Modifications to other files will be overwritten during didit grading. Please provide answers to the discussion questions in discussion.txt. Debugging guidelines: If your processor does not work as expected, please read the Appendix, which describes both general debugging strategies and shows how to use an optional pipeline visualization aid. 1 Two-Stage Pipelined Processor 1.1 Fixing the Two-Stage Pipelined Processor TwoStage.bsv contains an implementation of a functional two-stage pipelined processor that correctly handles control hazards, but it is not properly pipelined. It passes all the fullasmtests for functional correctness, but fails the pipetests, which also check that the cycle counts match those of a pipelined processor. The reason the processor is not properly pipelined is because the dofetch and doexecute rules conflict, so they cannot run in the same cycle. Discussion Question 1 (10 points): Why do rules dofetch and doexecute conflict? To resolve the rule conflict, you can split the conflicting part of rule doexecute into rule doredirection, such that: Rule doexecute saves the misprediction condition and redirected PC in two registers. Rule doredirection is executed only if the saved misprediction condition is true, and it updates the pc and epoch registers. 1
2 The topic of rule splitting is explained in Slides of Lecture 18. Exercise 1 (20 points): Fix the two-stage pipelined processor by splitting the conflicting part of rule doexecute into rule doredirection. All the processor-related types are defined in ProcTypes.bsv. Build your two-stage pipelined processor by running make TwoStage. You can run a suite of tests on the processor by running./test.sh which dumps all the $display messages and PASSED or FAILED on the screen. You can silence the $display messages and only see PASSED or FAILED by running./test.sh -s or./test.sh --summary. You should pass both the fullasmtests (option 6) and the pipetests (option 11). If things don t work as expected, please refer to Appendix 3 for more Bluespec debugging guides. 1.2 Improving the Two-Stage Pipelined Processor Now that you have fixed the two-stage pipelined processor, you can further improve it and save a cycle by sending the instruction memory load request in rule doredirection in case of a misprediction. In other words, if there is a misprediction, then doredirection should initiate the instruction fetch and update the program counter, just like dofetch does. If there is no misprediction, dofetch should perform the instruction fetch like before. For the processor to work correctly, make sure that the guards of dofetch and doredirection are mutually exclusive. If they are not, when rules dofetch and doredirection conflict, the Bluespec compiler will automatically schedule dofetch before doredirection since dofetch appears before doredirection in the code. Thus, if doredirection and dofetch are co-related such that rule doredirection being ready always implies that dofetch is ready, doredirection will never fire. In such a case, Bluespec will print a warning: According to the generated schedule, rule domisprediction can never fire. To prevent rule doredirection from being blocked forever due to this problem, make sure that dofetch and doredirection have mutually exclusive guards. Exercise 2 (20 points): Copy your working code from TwoStage.bsv to TwoStagePlus.bsv. Improve the two-stage pipelined processor by sending a instruction load request in rule doredirection. Build your two-stage pipelined processor by running make TwoStagePlus. You can run a suite of tests on the processor by running./test.sh which dumps all the $display messages and PASSED or FAILED on the screen. You can silence the $display messages and only see PASSED or FAILED by running./test.sh -s or./test.sh --summary. You should pass both the fullasmtests (option 6) and the pipetests (option 11). If things don t work as expected, please refer to Appendix 3 for more Bluespec debugging guides. 1.3 Synthesizing the Two-Stage Pipeline Processor Synthesize your processor by running: synth TwoStagePlus.bsv mkproctwostageplus -l multisize N ote: synth has been updated for this lab. If you get an error when trying to run synth, close and reopen your terminal. This will automatically pull the latest version. Discussion Question 2 (10 points): What are the critical-path delay and area (excluding memories) of your Two-Stage Processor? Which stage determines the critical path? 2
3 Hint: You can determine which stage is in the critical path by looking at the names of the start- and endpoints in the critical path. These could be either the inputs or output of the instruction and data memories, or the inputs or outputs of a register. 2 Three-Stage Pipelined Processor 2.1 Fixing the Three-Stage Pipelined Processor To improve on the two-stage design, let s implement a three-stage pipelined processor with following stages: The Fetch stage initiates a instruction memory read request and sets the PC to the predicted next-pc value (PC+4). The Decode stage decodes the fetched instruction and reads its source operands from the register file. The Execute stage executes the instruction, reading or writing to the data memory and writing to the register file as needed. This design is like the one described in slides of Lecture 18. Unfortunately, since the Decode and Execute stages can execute concurrently, there can be a data hazard in this processor pipeline: the Decode stage can read a stale value from register file, which has not been yet updated by an earlier instruction that is still in the Execute stage. One can resolve this data hazard by tracking all outstanding register file writes into a hardware structure called a Scoreboard, and stall the Decode stage when the index of one of the source registers is found in the scoreboard. When an instruction writes to the register file, the item should be removed from scoreboard, and the Decode stage can then proceed. The Scoreboard has the following interface: interface Scoreboard#(numeric type size); method Action insert(maybe#(bit#(5)) dst); method Action remove(); method Bool search1(maybe#(bit#(5)) src1); method Bool search2(maybe#(bit#(5)) src2); endinterface size is the number of outstanding register write indices that the Scoreboard can hold. method insert inserts a destination register index into Scoreboard. An Invalid dst is treated as a NOP on the register file write. Each Valid or Invalid dst occupies a slot in the Scorebard and a search for an Invalid dst will return False. method remove removes the oldest outstanding register write index from Scoreboard. You would also need to remove invalid dst from Scoreboard to free up space for later instructions. methods search1 and search2 will match src register indices with a Valid register index stored in the Scoreboard, and returns True if a match is found. A search for register 0 is always False. ThreeStage.bsv contains a non-functional three-stage pipelined processor that does not handle hazards correctly. Specifically, the code in ThreeStage.bsv has three issues discussed in Slides of Lecture 18: 1. Rule dodecode does not have the necessary logic to stall the Decode stage on a data hazard. In rule dodecode, a new instruction inst from imem (Instruction Memory) should not be processed in case the previous instruction had stalled. Due to the request-response interface of imem: once imem.resp() is called, the value it returns is not available in imem anymore subsequent imem.resp() calls return the data for subsequent load requests. Therefore, if dodecode needs to stall (due to a data hazard), it needs to save the fetched instruction in fetchedinst to avoid losing it. Consequently after stall, dodecode should use the instruction previously saved into fetchedinst register instead of calling imem.resp(). 2. Rules doexecute and doloadwait do not have the necessary logic to remove the oldest item from the Scoreboard in the Execute stage when an instruction finishes execution. Specifically, these rules should 3
4 call sb.remove in the following three cases: For an instruction with Valid dst, sb.remove and rf.wr should be called atomically, which would be guaranteed if they were called together in the same rule. For an instruction with Invalid dst, sb.remove should also be called to make space for later instructions. Otherwise the pipeline would be stuck. For an instruction on the wrong path of execution (i.e., a mispredicted instruction), sb.remove should also be called to make space for later instructions. 3. Finally, rules dofetch and doexecute conflict just like in the two-stage pipelined processor (and this conflict can be fixed in the same way). Exercise 3 (30 points only passing fullasmtests is 20 points): Fix the three issues in ThreeStage.bsv. Build your three-stage pipelined processor by running make ThreeStage. You can run a suite of tests on the processor by running./test.sh which dumps all the $display messages and PASSED or FAILED on the screen. You can silence the $display messages and only see PASSED or FAILED by running./test.sh -s or./test.sh --summary. You should pass both the fullasmtests (option 6) and the pipetests (option 11). If things don t work as expected, please refer to Appendix 3 for more Bluespec debugging guides. Hint: You can first tackle issues 1 and 2, and that will pass fullasmtests. After this, you can apply the same strategy from Section 1.1 or Section 1.2 to solve issue 3 and pass pipetests. 2.2 Synthesizing the Three-Stage Pipeline Processor Synthesize your processor by running: synth ThreeStage.bsv mkprocthreestage -l multisize Discussion Question 3 (10 points): What are the critical-path delay and area (excluding memories) of your Three-stage pipelined processor? What stage determines the critical path? How does this design compare with your two-stage pipelined processor? 4
5 3 Appendix: Debugging Help If your processor does not work as expected, there are some simple strategies you can follow to debug it. This appendix first discusses a general strategy to debug Bluespec circuits, then discusses a tool that s specific to pipelined designs. 3.1 General Guidelines If things don t work as expected, start by adding $display statements to see what rules are being invoked and at which cycles. It helps to be systematic: we recommend that you first add $display("[%d] <rulename>", cycles); at the top of each rule. Many times this is sufficient to understand what s going wrong (e.g., if you forget to enqueue to a FIFO that s read by a rule, you ll see that the rule doesn t fire at all or stops firing). Then, refine by adding more $display statements or more output to each statement. 3.2 Pipeline Visualization with ScheduleMonitor This lab contains a ScheduleMonitor module that you can optionally use to obtain a visual representation of which pipeline rules are firing each cycle. This module is simulation-only and produces no actual hardware. For a fully pipelined processor with no data or control hazards or load instruction, this module may produce an output similar to the one below: fetch decode execute F FD_ The names at the top are the names of each of the columns. These correspond to pipeline stages. The rows below correspond to what is happening in each clock cycle. The first row F means in the first clock cycle only fetch fired. The fifth row W means in the fifth clock cycle all 4 stages of the pipeline fired concurrently. There are four other letters that may appear as output from the ScheduleMonitor integrated with the provided initial code: L - An instruction is in LoadWait state of execute stage x - An instruction was killed in-place in the specified stage. s - An instruction stalled in the decode stage due to a data hazard. R - The execute stage fired and redirected the fetch stage due to a mispredicted next pc. Using ScheduleMonitor Using ScheduleMonitor is optional and requires adding some code to your processor. The module constructor for ScheduleMonitor (mkschedulemonitor) takes in a File object (either stdout, stderr, or an opened text file) and a vector of pipeline stage names. The order of names in this vector determines the order of the columns in the output. The code changes outlined below instantiate a ScheduleMonitor for a 3-stage pipeline that prints to stdout. 5
6 ScheduleMonitor monitor <- mkschedulemonitor(stdout, vec("fetch", "decode", "execute")); rule dofetch; // do rest of fetch monitor.record("fetch", "F"); rule dodecode; // do rest of decode if (...) // not stalling monitor.record("decode", "D"); else // stalling monitor.record("decode", "s"); rule doexecute; // do rest of execute if (...) // not redirecting monitor.record("execute", "E"); else // killed monitor.record("execute", "x"); end rule doloadwait; // do rest of loadwait monitor.record("execute", "L"); rule doredirection; // do rest of redirection monitor.record("execute", "R"); The record method of ScheduleMonitor writes a character in the specified column of the pipeline schedule diagram. Typically the first letter of the pipeline stage is written in the column when the stage fires normally, but the above code uses some other letters to show special conditions. 6
Design Project Computation Structures Fall 2018
Due date: Friday December 7th 11:59:59pm EST. This is a hard deadline: To comply with MIT rules, we cannot allow the use of late days. Getting started: To create your initial Design Project repository,
More informationNeed for a scheduler. Concurrent rule execution. L17/L18 Review- Rule Scheduling
L17/L18 Review- Rule Scheduling Conflict Matri (CM) BSV compiler generates the pairwise conflict information Eample 1 rule ra;
More informationConstructive Computer Architecture Tutorial 6: Discussion for lab6. October 7, 2013 T05-1
Constructive Computer Architecture Tutorial 6: Discussion for lab6 October 7, 2013 T05-1 Introduction Lab 6 involves creating a 6 stage pipelined processor from a 2 stage pipeline This requires a lot of
More informationLab 4 Overview: 6-stage SMIPS Pipeline
Lab 4 Overview: 6-stage SMIPS Pipeline T05-1 Introduction Lab 4 involves creating a 6 stage pipelined SMIPS processor from a 2 stage pipeline This requires a lot of attention to architectural details of
More informationConstructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA T05-1 Introduction Lab 6 involves creating a 6 stage pipelined SMIPS processor from a 2 stage
More informationLab 6: RISC-V Pipeline with Caches Spring 2016
Lab 6: RISC-V 6-stage Pipeline with Caches Due: 11:59:59pm, Fri Mar 18, 2016 This lab is your introduction to realistic RISC-V pipelines and caches. In the first part of the lab, you will implement a six-stage
More informationMulticycle processors. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology L12-1 Single-Cycle RISC Processor As an illustrative example, we use a subset of RISC-V
More informationBypassing and EHRs. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L23-1
Bypassing and EHRs Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L23-1 Bypassing F D RF bypass E/WB Bypassing is a technique to reduce the number of stalls (that is, the number
More informationMulticycle processors. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology L14-1 Single-Cycle RISC Processor As an illustrative example, we use a subset of RISC-V
More informationLab5 : Sequential Logic Computation Structures Spring 2019
Due date: Thursday March 21st 11:59:59pm EST. Getting started: To create your initial Lab 5 repository, please visit the repository creation page at https://6004.mit.edu/web/spring19/user/labs/lab5. Once
More information6.175: Constructive Computer Architecture. Tutorial 5 Epochs, Debugging, and Caches
6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science and Comic Sans) T05-1 Agenda Epochs: a review Debugging
More informationLab 5: A Non-Blocking Instruction Cache
Lab 5: A Non-Blocking Instruction Cache 4541.763 Laboratory 5 Assigned: November 3, 2009 Due: November 17, 2009 1 Introduction In Lab 4, you were given a multi-cycle SMIPSv2 implementation, which you then
More informationComplex Pipelines and Branch Prediction
Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle
More information6.S195: Lab 6 6-Stage SMIPS Pipeline with Simple Branch Predictor
6.S195: Lab 6 6-Stage SMIPS Pipeline with Simple Branch Predictor October 24, 2013 Due: Sunday November 3, 2013 Note: This lab uses a different infrastructure than the previous SMIPS lab in order to compile
More informationLecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3) 1 Support for Speculation In general, when we re-order instructions, register renaming
More informationLab 3: Combinational Logic Computation Structures Spring 2019
Due date: Thursday February 28 11:59:59pm EST. Getting started: To create your initial Lab 3 repository, please visit the repository creation page at https://6004.mit.edu/web/spring19/user/labs/lab3. Once
More informationNon-pipelined Multicycle processors
Non-pipelined Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Code for the lecture is available on the course website under the code tab
More informationEECE.2160: ECE Application Programming
Spring 2018 Programming Assignment #10: Instruction Decoding and File I/O Due Wednesday, 5/9/18, 11:59:59 PM (Extra credit ( 4 pts on final average), no late submissions or resubmissions) 1. Introduction
More informationLab 5: RISC-V Introduction Multi-Cycle and Two-Stage Pipeline
Lab 5: RISC-V Introduction Multi-Cycle and Two-Stage Pipeline Due: 11:59:59pm, Wed Mar 9, 2016 1 Introduction This lab introduces the RISC-V processor and the toolflow associated with it. The lab begins
More informationData Hazards in Pipelined Processors
Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 13, 2013 http://csg.csail.mit.edu/6.375 L11-1 A different 2-Stage
More informationModeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L08-1 Instruction set typedef enum {R0;R1;R2; ;R31} RName; typedef union tagged { struct
More information1 Hazards COMP2611 Fall 2015 Pipelined Processor
1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add
More informationLecture Topics ECE 341. Lecture # 14. Unconditional Branches. Instruction Hazards. Pipelining
ECE 341 Lecture # 14 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 17, 2014 Portland State University Lecture Topics Pipelining Instruction Hazards Branch penalty Branch delay slot optimization
More informationModeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L07-1 Instruction set typedef enum {R0;R1;R2; ;R31} RName; typedef union tagged { struct
More informationConstructive Computer Architecture Tutorial 7: SMIPS Epochs. Andy Wright TA. October 7,
Constructive Computer Architecture Tutorial 7: SMIPS Epochs Andy Wright 6.7 TA T0- drecirect erecirect N-Stage pipeline: Two predictors f fdepoch redirect depoch redirect d miss pred? miss pred? Fetch
More informationBluespec SystemVerilog TM Training. Lecture 05: Rules. Copyright Bluespec, Inc., Lecture 05: Rules
Bluespec SystemVerilog Training Copyright Bluespec, Inc., 2005-2008 Rules: conditions, actions Rule Untimed Semantics Non-determinism Functional correctness: atomicity, invariants Examples Performance
More informationece4750-t11-ooo-execution-notes.txt ========================================================================== ece4750-l12-ooo-execution-notes.txt ==========================================================================
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance
More informationNon-Pipelined Processors - 2
Constructive Computer Architecture: Non-Pipelined Processors - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 4, 2017 http://csg.csail.mit.edu/6.175
More informationSuperscalar Processors
Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input
More informationChapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 9 Pipelining Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Data Hazards Instruction Hazards Advanced Reliable Systems (ARES) Lab.
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationMidnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4
IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM
More informationOutline. In-Order vs. Out-of-Order. Project Goal 5/14/2007. Design and implement an out-of-ordering superscalar. Introduction
Outline Group IV Wei-Yin Chen Myong Hyon Cho Introduction In-Order vs. Out-of-Order Register Renaming Re-Ordering Od Buffer Superscalar Architecture Architectural Design Bluespec Implementation Results
More informationCSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;
CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the
More informationLab 4: N-Element FIFOs 6.175: Constructive Computer Architecture Fall 2014
Lab 4: N-Element FIFOs Due: Wednesday October 8, 2014 1 Introduction This lab focuses on the design of various N-element FIFOs including a conflict-free FIFO. Conflict-free FIFOs are an essential tool
More informationLab 5: Pipelining an SMIPSv2 Processor: Part I
Lab 5: Pipelining an SMIPSv2 Processor: Part I 6.375 Laboratory 5 Assigned: March 8, 2013 Due: March 15, 2013 1 Introduction In this laboratory assignment and the next you will be provided with an unpipelined
More informationFor this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units
CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)
More informationSI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,
SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty
More informationIssue Logic for a 600-MHz Out-of-Order Execution Microprocessor
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits
More informationThe Processor: Improving the performance - Control Hazards
The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary
More information1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12
M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE 6.004 Computation Structures Fall 2018 Practice Quiz #3B Name Athena login
More informationLab 2: Multipliers 6.175: Constructive Computer Architecture Fall 2014
Lab 2: Multipliers Due: Monday September 22, 2014 1 Introduction In this lab you will be building different multiplier implementations and testing them using custom instantiations of provided test bench
More informationBranch prediction ( 3.3) Dynamic Branch Prediction
prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to
More informationModular Refinement. Successive refinement & Modular Structure
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L09-1 Successive refinement & Modular Structure pc rf fetch decode execute memory writeback
More informationChapter. Out of order Execution
Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until
More informationCIS 662: Midterm. 16 cycles, 6 stalls
CIS 662: Midterm Name: Points: /100 First read all the questions carefully and note how many points each question carries and how difficult it is. You have 1 hour 15 minutes. Plan your time accordingly.
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More informationBluespec for a Pipelined SMIPSv2 Processor
Bluespec for a Pipelined SMIPSv2 Processor 6.375 Laboratory 2 February 14, 2008 The second laboratory assignment is to implement a pipelined SMIPSv2 in Bluespec SystemVerilog. As with Lab One, your deliverables
More informationECE 486/586. Computer Architecture. Lecture # 12
ECE 486/586 Computer Architecture Lecture # 12 Spring 2015 Portland State University Lecture Topics Pipelining Control Hazards Delayed branch Branch stall impact Implementing the pipeline Detecting hazards
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationVirtual Memory and Interrupts
Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 13, 2015 http://csg.csail.mit.edu/6.175
More informationContributors to the course material
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 11, 2013 http://csg.csail.mit.edu/6.s195
More informationECE 341. Lecture # 15
ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties
More informationArvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 2, 2014 http://csg.csail.mit.edu/6.s195/cdac
More informationPipeline Architecture RISC
Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type
More informationInstruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties
Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,
More informationIntroduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay
More informationSISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs
More informationPipelining. Parts of these slides are from the support material provided by W. Stallings
Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance
More informationHigh Performance SMIPS Processor. Jonathan Eastep May 8, 2005
High Performance SMIPS Processor Jonathan Eastep May 8, 2005 Objectives: Build a baseline implementation: Single-issue, in-order, 6-stage pipeline Full bypassing ICache: blocking, direct mapped, 16KByte,
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II
CS252 Spring 2017 Graduate Computer Architecture Lecture 8: Advanced Out-of-Order Superscalar Designs Part II Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time
More informationCS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
More informationLecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections 2.2-2.6) 1 Predication A branch within a loop can be problematic to schedule Control
More informationModeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L07-1 The Plan Non-pipelined processor Two-stage synchronous pipeline Two-stage asynchronous
More informationA More Sophisticated Snooping-Based Multi-Processor
Lecture 16: A More Sophisticated Snooping-Based Multi-Processor Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Tunes The Projects Handsome Boy Modeling School (So... How
More informationDepartment of Computer Science. COS 122 Operating Systems. Practical 3. Due: 22:00 PM
Department of Computer Science COS 122 Operating Systems Practical 3 Due: 2018-09-13 @ 22:00 PM August 30, 2018 PLAGIARISM POLICY UNIVERSITY OF PRETORIA The Department of Computer Science considers plagiarism
More informationElastic Pipelines: Concurrency Issues
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L09-1 Inelastic vs Elastic Pipelines In a Inelastic pipeline: pp typically
More informationIn-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution
In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on
More informationWeek 11: Assignment Solutions
Week 11: Assignment Solutions 1. Consider an instruction pipeline with four stages with the stage delays 5 nsec, 6 nsec, 11 nsec, and 8 nsec respectively. The delay of an inter-stage register stage of
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationCS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12
Assigned 2/28/2018 CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 http://inst.eecs.berkeley.edu/~cs152/sp18
More informationCS433 Homework 2 (Chapter 3)
CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationOutline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception
Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add
More informationELE 655 Microprocessor System Design
ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More information5008: Computer Architecture HW#2
5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be
More informationHigh Performance SMIPS Processor
High Performance SMIPS Processor Jonathan Eastep 6.884 Final Project Report May 11, 2005 1 Introduction 1.1 Description This project will focus on producing a high-performance, single-issue, in-order,
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationCPSC 313, 04w Term 2 Midterm Exam 2 Solutions
1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationCS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07
CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as
More informationCSE 141L Computer Architecture Lab Fall Lecture 3
CSE 141L Computer Architecture Lab Fall 2005 Lecture 3 Pramod V. Argade November 1, 2005 Fall 2005 CSE 141L Course Schedule Lecture # Date Day Lecture Topic Lab Due 1 9/27 Tuesday No Class 2 10/4 Tuesday
More informationComputer Architecture ELEC3441
Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationCISC 662 Graduate Computer Architecture Lecture 6 - Hazards
CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More information101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned
101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 102. During a write operation if the required block is not
More informationComputer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović
Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are
More informationWhat is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?
What is ILP? Instruction Level Parallelism or Declaration of Independence The characteristic of a program that certain instructions are, and can potentially be. Any mechanism that creates, identifies,
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More information