Pipelined RISC-V Processors

Size: px
Start display at page:

Download "Pipelined RISC-V Processors"

Transcription

1 Due date: Tuesday November 20th 11:59:59pm EST Getting started: To create your initial Lab 7 repository, please visit the repository creation page at Once your repository is created, you can clone it into your VM by running: git clone git@github.mit.edu:6004-fall18/labs-lab7-{yourmitusername}.git lab7 Turning in the lab: To turn in this lab, commit and push the changes you made to your git repository. After pushing, check the course website to verify that your submission passes all the tests. If you finish the lab in time but forget to push, you will incur the standard late submission penalties. Check-off meeting: After turning in this lab, you are required to go to the lab for a check-off meeting within 10 days of the lab s due date (i.e., by Fri Nov 30th this is more days than usual to account for Thanksgiving holidays). See the course website for lab hours. Pipelined RISC-V Processors In this lab you will implement two pipelined RISC-V processors in Bluespec. For Bluespec-related questions, you may want to check out the Introductory Bluespec User Guide. To pass the lab you must complete all of the exercises and discussion questions and PASS all of the exercises. Coding guidelines: You should only change the following files: TwoStage.bsv, TwoStagePlus.bsv and ThreeStage.bsv. Modifications to other files will be overwritten during didit grading. Please provide answers to the discussion questions in discussion.txt. Debugging guidelines: If your processor does not work as expected, please read the Appendix, which describes both general debugging strategies and shows how to use an optional pipeline visualization aid. 1 Two-Stage Pipelined Processor 1.1 Fixing the Two-Stage Pipelined Processor TwoStage.bsv contains an implementation of a functional two-stage pipelined processor that correctly handles control hazards, but it is not properly pipelined. It passes all the fullasmtests for functional correctness, but fails the pipetests, which also check that the cycle counts match those of a pipelined processor. The reason the processor is not properly pipelined is because the dofetch and doexecute rules conflict, so they cannot run in the same cycle. Discussion Question 1 (10 points): Why do rules dofetch and doexecute conflict? To resolve the rule conflict, you can split the conflicting part of rule doexecute into rule doredirection, such that: Rule doexecute saves the misprediction condition and redirected PC in two registers. Rule doredirection is executed only if the saved misprediction condition is true, and it updates the pc and epoch registers. 1

2 The topic of rule splitting is explained in Slides of Lecture 18. Exercise 1 (20 points): Fix the two-stage pipelined processor by splitting the conflicting part of rule doexecute into rule doredirection. All the processor-related types are defined in ProcTypes.bsv. Build your two-stage pipelined processor by running make TwoStage. You can run a suite of tests on the processor by running./test.sh which dumps all the $display messages and PASSED or FAILED on the screen. You can silence the $display messages and only see PASSED or FAILED by running./test.sh -s or./test.sh --summary. You should pass both the fullasmtests (option 6) and the pipetests (option 11). If things don t work as expected, please refer to Appendix 3 for more Bluespec debugging guides. 1.2 Improving the Two-Stage Pipelined Processor Now that you have fixed the two-stage pipelined processor, you can further improve it and save a cycle by sending the instruction memory load request in rule doredirection in case of a misprediction. In other words, if there is a misprediction, then doredirection should initiate the instruction fetch and update the program counter, just like dofetch does. If there is no misprediction, dofetch should perform the instruction fetch like before. For the processor to work correctly, make sure that the guards of dofetch and doredirection are mutually exclusive. If they are not, when rules dofetch and doredirection conflict, the Bluespec compiler will automatically schedule dofetch before doredirection since dofetch appears before doredirection in the code. Thus, if doredirection and dofetch are co-related such that rule doredirection being ready always implies that dofetch is ready, doredirection will never fire. In such a case, Bluespec will print a warning: According to the generated schedule, rule domisprediction can never fire. To prevent rule doredirection from being blocked forever due to this problem, make sure that dofetch and doredirection have mutually exclusive guards. Exercise 2 (20 points): Copy your working code from TwoStage.bsv to TwoStagePlus.bsv. Improve the two-stage pipelined processor by sending a instruction load request in rule doredirection. Build your two-stage pipelined processor by running make TwoStagePlus. You can run a suite of tests on the processor by running./test.sh which dumps all the $display messages and PASSED or FAILED on the screen. You can silence the $display messages and only see PASSED or FAILED by running./test.sh -s or./test.sh --summary. You should pass both the fullasmtests (option 6) and the pipetests (option 11). If things don t work as expected, please refer to Appendix 3 for more Bluespec debugging guides. 1.3 Synthesizing the Two-Stage Pipeline Processor Synthesize your processor by running: synth TwoStagePlus.bsv mkproctwostageplus -l multisize N ote: synth has been updated for this lab. If you get an error when trying to run synth, close and reopen your terminal. This will automatically pull the latest version. Discussion Question 2 (10 points): What are the critical-path delay and area (excluding memories) of your Two-Stage Processor? Which stage determines the critical path? 2

3 Hint: You can determine which stage is in the critical path by looking at the names of the start- and endpoints in the critical path. These could be either the inputs or output of the instruction and data memories, or the inputs or outputs of a register. 2 Three-Stage Pipelined Processor 2.1 Fixing the Three-Stage Pipelined Processor To improve on the two-stage design, let s implement a three-stage pipelined processor with following stages: The Fetch stage initiates a instruction memory read request and sets the PC to the predicted next-pc value (PC+4). The Decode stage decodes the fetched instruction and reads its source operands from the register file. The Execute stage executes the instruction, reading or writing to the data memory and writing to the register file as needed. This design is like the one described in slides of Lecture 18. Unfortunately, since the Decode and Execute stages can execute concurrently, there can be a data hazard in this processor pipeline: the Decode stage can read a stale value from register file, which has not been yet updated by an earlier instruction that is still in the Execute stage. One can resolve this data hazard by tracking all outstanding register file writes into a hardware structure called a Scoreboard, and stall the Decode stage when the index of one of the source registers is found in the scoreboard. When an instruction writes to the register file, the item should be removed from scoreboard, and the Decode stage can then proceed. The Scoreboard has the following interface: interface Scoreboard#(numeric type size); method Action insert(maybe#(bit#(5)) dst); method Action remove(); method Bool search1(maybe#(bit#(5)) src1); method Bool search2(maybe#(bit#(5)) src2); endinterface size is the number of outstanding register write indices that the Scoreboard can hold. method insert inserts a destination register index into Scoreboard. An Invalid dst is treated as a NOP on the register file write. Each Valid or Invalid dst occupies a slot in the Scorebard and a search for an Invalid dst will return False. method remove removes the oldest outstanding register write index from Scoreboard. You would also need to remove invalid dst from Scoreboard to free up space for later instructions. methods search1 and search2 will match src register indices with a Valid register index stored in the Scoreboard, and returns True if a match is found. A search for register 0 is always False. ThreeStage.bsv contains a non-functional three-stage pipelined processor that does not handle hazards correctly. Specifically, the code in ThreeStage.bsv has three issues discussed in Slides of Lecture 18: 1. Rule dodecode does not have the necessary logic to stall the Decode stage on a data hazard. In rule dodecode, a new instruction inst from imem (Instruction Memory) should not be processed in case the previous instruction had stalled. Due to the request-response interface of imem: once imem.resp() is called, the value it returns is not available in imem anymore subsequent imem.resp() calls return the data for subsequent load requests. Therefore, if dodecode needs to stall (due to a data hazard), it needs to save the fetched instruction in fetchedinst to avoid losing it. Consequently after stall, dodecode should use the instruction previously saved into fetchedinst register instead of calling imem.resp(). 2. Rules doexecute and doloadwait do not have the necessary logic to remove the oldest item from the Scoreboard in the Execute stage when an instruction finishes execution. Specifically, these rules should 3

4 call sb.remove in the following three cases: For an instruction with Valid dst, sb.remove and rf.wr should be called atomically, which would be guaranteed if they were called together in the same rule. For an instruction with Invalid dst, sb.remove should also be called to make space for later instructions. Otherwise the pipeline would be stuck. For an instruction on the wrong path of execution (i.e., a mispredicted instruction), sb.remove should also be called to make space for later instructions. 3. Finally, rules dofetch and doexecute conflict just like in the two-stage pipelined processor (and this conflict can be fixed in the same way). Exercise 3 (30 points only passing fullasmtests is 20 points): Fix the three issues in ThreeStage.bsv. Build your three-stage pipelined processor by running make ThreeStage. You can run a suite of tests on the processor by running./test.sh which dumps all the $display messages and PASSED or FAILED on the screen. You can silence the $display messages and only see PASSED or FAILED by running./test.sh -s or./test.sh --summary. You should pass both the fullasmtests (option 6) and the pipetests (option 11). If things don t work as expected, please refer to Appendix 3 for more Bluespec debugging guides. Hint: You can first tackle issues 1 and 2, and that will pass fullasmtests. After this, you can apply the same strategy from Section 1.1 or Section 1.2 to solve issue 3 and pass pipetests. 2.2 Synthesizing the Three-Stage Pipeline Processor Synthesize your processor by running: synth ThreeStage.bsv mkprocthreestage -l multisize Discussion Question 3 (10 points): What are the critical-path delay and area (excluding memories) of your Three-stage pipelined processor? What stage determines the critical path? How does this design compare with your two-stage pipelined processor? 4

5 3 Appendix: Debugging Help If your processor does not work as expected, there are some simple strategies you can follow to debug it. This appendix first discusses a general strategy to debug Bluespec circuits, then discusses a tool that s specific to pipelined designs. 3.1 General Guidelines If things don t work as expected, start by adding $display statements to see what rules are being invoked and at which cycles. It helps to be systematic: we recommend that you first add $display("[%d] <rulename>", cycles); at the top of each rule. Many times this is sufficient to understand what s going wrong (e.g., if you forget to enqueue to a FIFO that s read by a rule, you ll see that the rule doesn t fire at all or stops firing). Then, refine by adding more $display statements or more output to each statement. 3.2 Pipeline Visualization with ScheduleMonitor This lab contains a ScheduleMonitor module that you can optionally use to obtain a visual representation of which pipeline rules are firing each cycle. This module is simulation-only and produces no actual hardware. For a fully pipelined processor with no data or control hazards or load instruction, this module may produce an output similar to the one below: fetch decode execute F FD_ The names at the top are the names of each of the columns. These correspond to pipeline stages. The rows below correspond to what is happening in each clock cycle. The first row F means in the first clock cycle only fetch fired. The fifth row W means in the fifth clock cycle all 4 stages of the pipeline fired concurrently. There are four other letters that may appear as output from the ScheduleMonitor integrated with the provided initial code: L - An instruction is in LoadWait state of execute stage x - An instruction was killed in-place in the specified stage. s - An instruction stalled in the decode stage due to a data hazard. R - The execute stage fired and redirected the fetch stage due to a mispredicted next pc. Using ScheduleMonitor Using ScheduleMonitor is optional and requires adding some code to your processor. The module constructor for ScheduleMonitor (mkschedulemonitor) takes in a File object (either stdout, stderr, or an opened text file) and a vector of pipeline stage names. The order of names in this vector determines the order of the columns in the output. The code changes outlined below instantiate a ScheduleMonitor for a 3-stage pipeline that prints to stdout. 5

6 ScheduleMonitor monitor <- mkschedulemonitor(stdout, vec("fetch", "decode", "execute")); rule dofetch; // do rest of fetch monitor.record("fetch", "F"); rule dodecode; // do rest of decode if (...) // not stalling monitor.record("decode", "D"); else // stalling monitor.record("decode", "s"); rule doexecute; // do rest of execute if (...) // not redirecting monitor.record("execute", "E"); else // killed monitor.record("execute", "x"); end rule doloadwait; // do rest of loadwait monitor.record("execute", "L"); rule doredirection; // do rest of redirection monitor.record("execute", "R"); The record method of ScheduleMonitor writes a character in the specified column of the pipeline schedule diagram. Typically the first letter of the pipeline stage is written in the column when the stage fires normally, but the above code uses some other letters to show special conditions. 6

Design Project Computation Structures Fall 2018

Design Project Computation Structures Fall 2018 Due date: Friday December 7th 11:59:59pm EST. This is a hard deadline: To comply with MIT rules, we cannot allow the use of late days. Getting started: To create your initial Design Project repository,

More information

Need for a scheduler. Concurrent rule execution. L17/L18 Review- Rule Scheduling

Need for a scheduler. Concurrent rule execution. L17/L18 Review- Rule Scheduling L17/L18 Review- Rule Scheduling Conflict Matri (CM) BSV compiler generates the pairwise conflict information Eample 1 rule ra;

More information

Constructive Computer Architecture Tutorial 6: Discussion for lab6. October 7, 2013 T05-1

Constructive Computer Architecture Tutorial 6: Discussion for lab6. October 7, 2013 T05-1 Constructive Computer Architecture Tutorial 6: Discussion for lab6 October 7, 2013 T05-1 Introduction Lab 6 involves creating a 6 stage pipelined processor from a 2 stage pipeline This requires a lot of

More information

Lab 4 Overview: 6-stage SMIPS Pipeline

Lab 4 Overview: 6-stage SMIPS Pipeline Lab 4 Overview: 6-stage SMIPS Pipeline T05-1 Introduction Lab 4 involves creating a 6 stage pipelined SMIPS processor from a 2 stage pipeline This requires a lot of attention to architectural details of

More information

Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations

Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA T05-1 Introduction Lab 6 involves creating a 6 stage pipelined SMIPS processor from a 2 stage

More information

Lab 6: RISC-V Pipeline with Caches Spring 2016

Lab 6: RISC-V Pipeline with Caches Spring 2016 Lab 6: RISC-V 6-stage Pipeline with Caches Due: 11:59:59pm, Fri Mar 18, 2016 This lab is your introduction to realistic RISC-V pipelines and caches. In the first part of the lab, you will implement a six-stage

More information

Multicycle processors. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Multicycle processors. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology L12-1 Single-Cycle RISC Processor As an illustrative example, we use a subset of RISC-V

More information

Bypassing and EHRs. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L23-1

Bypassing and EHRs. Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L23-1 Bypassing and EHRs Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L23-1 Bypassing F D RF bypass E/WB Bypassing is a technique to reduce the number of stalls (that is, the number

More information

Multicycle processors. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Multicycle processors. Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology L14-1 Single-Cycle RISC Processor As an illustrative example, we use a subset of RISC-V

More information

Lab5 : Sequential Logic Computation Structures Spring 2019

Lab5 : Sequential Logic Computation Structures Spring 2019 Due date: Thursday March 21st 11:59:59pm EST. Getting started: To create your initial Lab 5 repository, please visit the repository creation page at https://6004.mit.edu/web/spring19/user/labs/lab5. Once

More information

6.175: Constructive Computer Architecture. Tutorial 5 Epochs, Debugging, and Caches

6.175: Constructive Computer Architecture. Tutorial 5 Epochs, Debugging, and Caches 6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science and Comic Sans) T05-1 Agenda Epochs: a review Debugging

More information

Lab 5: A Non-Blocking Instruction Cache

Lab 5: A Non-Blocking Instruction Cache Lab 5: A Non-Blocking Instruction Cache 4541.763 Laboratory 5 Assigned: November 3, 2009 Due: November 17, 2009 1 Introduction In Lab 4, you were given a multi-cycle SMIPSv2 implementation, which you then

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

6.S195: Lab 6 6-Stage SMIPS Pipeline with Simple Branch Predictor

6.S195: Lab 6 6-Stage SMIPS Pipeline with Simple Branch Predictor 6.S195: Lab 6 6-Stage SMIPS Pipeline with Simple Branch Predictor October 24, 2013 Due: Sunday November 3, 2013 Note: This lab uses a different infrastructure than the previous SMIPS lab in order to compile

More information

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3) Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3) 1 Support for Speculation In general, when we re-order instructions, register renaming

More information

Lab 3: Combinational Logic Computation Structures Spring 2019

Lab 3: Combinational Logic Computation Structures Spring 2019 Due date: Thursday February 28 11:59:59pm EST. Getting started: To create your initial Lab 3 repository, please visit the repository creation page at https://6004.mit.edu/web/spring19/user/labs/lab3. Once

More information

Non-pipelined Multicycle processors

Non-pipelined Multicycle processors Non-pipelined Multicycle processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Code for the lecture is available on the course website under the code tab

More information

EECE.2160: ECE Application Programming

EECE.2160: ECE Application Programming Spring 2018 Programming Assignment #10: Instruction Decoding and File I/O Due Wednesday, 5/9/18, 11:59:59 PM (Extra credit ( 4 pts on final average), no late submissions or resubmissions) 1. Introduction

More information

Lab 5: RISC-V Introduction Multi-Cycle and Two-Stage Pipeline

Lab 5: RISC-V Introduction Multi-Cycle and Two-Stage Pipeline Lab 5: RISC-V Introduction Multi-Cycle and Two-Stage Pipeline Due: 11:59:59pm, Wed Mar 9, 2016 1 Introduction This lab introduces the RISC-V processor and the toolflow associated with it. The lab begins

More information

Data Hazards in Pipelined Processors

Data Hazards in Pipelined Processors Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 13, 2013 http://csg.csail.mit.edu/6.375 L11-1 A different 2-Stage

More information

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L08-1 Instruction set typedef enum {R0;R1;R2; ;R31} RName; typedef union tagged { struct

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

Lecture Topics ECE 341. Lecture # 14. Unconditional Branches. Instruction Hazards. Pipelining

Lecture Topics ECE 341. Lecture # 14. Unconditional Branches. Instruction Hazards. Pipelining ECE 341 Lecture # 14 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 17, 2014 Portland State University Lecture Topics Pipelining Instruction Hazards Branch penalty Branch delay slot optimization

More information

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L07-1 Instruction set typedef enum {R0;R1;R2; ;R31} RName; typedef union tagged { struct

More information

Constructive Computer Architecture Tutorial 7: SMIPS Epochs. Andy Wright TA. October 7,

Constructive Computer Architecture Tutorial 7: SMIPS Epochs. Andy Wright TA. October 7, Constructive Computer Architecture Tutorial 7: SMIPS Epochs Andy Wright 6.7 TA T0- drecirect erecirect N-Stage pipeline: Two predictors f fdepoch redirect depoch redirect d miss pred? miss pred? Fetch

More information

Bluespec SystemVerilog TM Training. Lecture 05: Rules. Copyright Bluespec, Inc., Lecture 05: Rules

Bluespec SystemVerilog TM Training. Lecture 05: Rules. Copyright Bluespec, Inc., Lecture 05: Rules Bluespec SystemVerilog Training Copyright Bluespec, Inc., 2005-2008 Rules: conditions, actions Rule Untimed Semantics Non-determinism Functional correctness: atomicity, invariants Examples Performance

More information

ece4750-t11-ooo-execution-notes.txt ========================================================================== ece4750-l12-ooo-execution-notes.txt ==========================================================================

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

Non-Pipelined Processors - 2

Non-Pipelined Processors - 2 Constructive Computer Architecture: Non-Pipelined Processors - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 4, 2017 http://csg.csail.mit.edu/6.175

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 9 Pipelining Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Basic Concepts Data Hazards Instruction Hazards Advanced Reliable Systems (ARES) Lab.

More information

Updated Exercises by Diana Franklin

Updated Exercises by Diana Franklin C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address

More information

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4 IC220 Set #9: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Return to Chapter 4 Midnight Laundry Task order A B C D 6 PM 7 8 9 0 2 2 AM 2 Smarty Laundry Task order A B C D 6 PM

More information

Outline. In-Order vs. Out-of-Order. Project Goal 5/14/2007. Design and implement an out-of-ordering superscalar. Introduction

Outline. In-Order vs. Out-of-Order. Project Goal 5/14/2007. Design and implement an out-of-ordering superscalar. Introduction Outline Group IV Wei-Yin Chen Myong Hyon Cho Introduction In-Order vs. Out-of-Order Register Renaming Re-Ordering Od Buffer Superscalar Architecture Architectural Design Bluespec Implementation Results

More information

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add; CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the

More information

Lab 4: N-Element FIFOs 6.175: Constructive Computer Architecture Fall 2014

Lab 4: N-Element FIFOs 6.175: Constructive Computer Architecture Fall 2014 Lab 4: N-Element FIFOs Due: Wednesday October 8, 2014 1 Introduction This lab focuses on the design of various N-element FIFOs including a conflict-free FIFO. Conflict-free FIFOs are an essential tool

More information

Lab 5: Pipelining an SMIPSv2 Processor: Part I

Lab 5: Pipelining an SMIPSv2 Processor: Part I Lab 5: Pipelining an SMIPSv2 Processor: Part I 6.375 Laboratory 5 Assigned: March 8, 2013 Due: March 15, 2013 1 Introduction In this laboratory assignment and the next you will be provided with an unpipelined

More information

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units

For this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)

More information

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1, SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2 Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty

More information

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits

More information

The Processor: Improving the performance - Control Hazards

The Processor: Improving the performance - Control Hazards The Processor: Improving the performance - Control Hazards Wednesday 14 October 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary

More information

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12 M A S S A C H U S E T T S I N S T I T U T E O F T E C H N O L O G Y DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE 6.004 Computation Structures Fall 2018 Practice Quiz #3B Name Athena login

More information

Lab 2: Multipliers 6.175: Constructive Computer Architecture Fall 2014

Lab 2: Multipliers 6.175: Constructive Computer Architecture Fall 2014 Lab 2: Multipliers Due: Monday September 22, 2014 1 Introduction In this lab you will be building different multiplier implementations and testing them using custom instantiations of provided test bench

More information

Branch prediction ( 3.3) Dynamic Branch Prediction

Branch prediction ( 3.3) Dynamic Branch Prediction prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to

More information

Modular Refinement. Successive refinement & Modular Structure

Modular Refinement. Successive refinement & Modular Structure Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L09-1 Successive refinement & Modular Structure pc rf fetch decode execute memory writeback

More information

Chapter. Out of order Execution

Chapter. Out of order Execution Chapter Long EX Instruction stages We have assumed that all stages. There is a problem with the EX stage multiply (MUL) takes more time than ADD MUL ADD We can clearly delay the execution of the ADD until

More information

CIS 662: Midterm. 16 cycles, 6 stalls

CIS 662: Midterm. 16 cycles, 6 stalls CIS 662: Midterm Name: Points: /100 First read all the questions carefully and note how many points each question carries and how difficult it is. You have 1 hour 15 minutes. Plan your time accordingly.

More information

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide

More information

Bluespec for a Pipelined SMIPSv2 Processor

Bluespec for a Pipelined SMIPSv2 Processor Bluespec for a Pipelined SMIPSv2 Processor 6.375 Laboratory 2 February 14, 2008 The second laboratory assignment is to implement a pipelined SMIPSv2 in Bluespec SystemVerilog. As with Lab One, your deliverables

More information

ECE 486/586. Computer Architecture. Lecture # 12

ECE 486/586. Computer Architecture. Lecture # 12 ECE 486/586 Computer Architecture Lecture # 12 Spring 2015 Portland State University Lecture Topics Pipelining Control Hazards Delayed branch Branch stall impact Implementing the pipeline Detecting hazards

More information

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

Virtual Memory and Interrupts

Virtual Memory and Interrupts Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 13, 2015 http://csg.csail.mit.edu/6.175

More information

Contributors to the course material

Contributors to the course material Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October 11, 2013 http://csg.csail.mit.edu/6.s195

More information

ECE 341. Lecture # 15

ECE 341. Lecture # 15 ECE 341 Lecture # 15 Instructor: Zeshan Chishti zeshan@ece.pdx.edu November 19, 2014 Portland State University Pipelining Structural Hazards Pipeline Performance Lecture Topics Effects of Stalls and Penalties

More information

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 2, 2014 http://csg.csail.mit.edu/6.s195/cdac

More information

Pipeline Architecture RISC

Pipeline Architecture RISC Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 RISC-V Pipeline Pipeline Control Hazards Structural Data R-type

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Introduction to Pipelining Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L15-1 Performance Measures Two metrics of interest when designing a system: 1. Latency: The delay

More information

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version: SISTEMI EMBEDDED Computer Organization Pipelining Federico Baronti Last version: 20160518 Basic Concept of Pipelining Circuit technology and hardware arrangement influence the speed of execution for programs

More information

Pipelining. Parts of these slides are from the support material provided by W. Stallings

Pipelining. Parts of these slides are from the support material provided by W. Stallings Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance

More information

High Performance SMIPS Processor. Jonathan Eastep May 8, 2005

High Performance SMIPS Processor. Jonathan Eastep May 8, 2005 High Performance SMIPS Processor Jonathan Eastep May 8, 2005 Objectives: Build a baseline implementation: Single-issue, in-order, 6-stage pipeline Full bypassing ICache: blocking, direct mapped, 16KByte,

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II CS252 Spring 2017 Graduate Computer Architecture Lecture 8: Advanced Out-of-Order Superscalar Designs Part II Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

Lecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

Lecture 7: Static ILP, Branch prediction. Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections ) Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections 2.2-2.6) 1 Predication A branch within a loop can be problematic to schedule Control

More information

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Modeling Processors. Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L07-1 The Plan Non-pipelined processor Two-stage synchronous pipeline Two-stage asynchronous

More information

A More Sophisticated Snooping-Based Multi-Processor

A More Sophisticated Snooping-Based Multi-Processor Lecture 16: A More Sophisticated Snooping-Based Multi-Processor Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Tunes The Projects Handsome Boy Modeling School (So... How

More information

Department of Computer Science. COS 122 Operating Systems. Practical 3. Due: 22:00 PM

Department of Computer Science. COS 122 Operating Systems. Practical 3. Due: 22:00 PM Department of Computer Science COS 122 Operating Systems Practical 3 Due: 2018-09-13 @ 22:00 PM August 30, 2018 PLAGIARISM POLICY UNIVERSITY OF PRETORIA The Department of Computer Science considers plagiarism

More information

Elastic Pipelines: Concurrency Issues

Elastic Pipelines: Concurrency Issues Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology L09-1 Inelastic vs Elastic Pipelines In a Inelastic pipeline: pp typically

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 17: Pipelining Wrapup Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Outline The textbook includes lots of information Focus on

More information

Week 11: Assignment Solutions

Week 11: Assignment Solutions Week 11: Assignment Solutions 1. Consider an instruction pipeline with four stages with the stage delays 5 nsec, 6 nsec, 11 nsec, and 8 nsec respectively. The delay of an inter-stage register stage of

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 Assigned 2/28/2018 CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 http://inst.eecs.berkeley.edu/~cs152/sp18

More information

CS433 Homework 2 (Chapter 3)

CS433 Homework 2 (Chapter 3) CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies

More information

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception Outline A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception 1 4 Which stage is the branch decision made? Case 1: 0 M u x 1 Add

More information

ELE 655 Microprocessor System Design

ELE 655 Microprocessor System Design ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1 Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction

More information

5008: Computer Architecture HW#2

5008: Computer Architecture HW#2 5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be

More information

High Performance SMIPS Processor

High Performance SMIPS Processor High Performance SMIPS Processor Jonathan Eastep 6.884 Final Project Report May 11, 2005 1 Introduction 1.1 Description This project will focus on producing a high-performance, single-issue, in-order,

More information

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07 Objectives ---------- 1. To introduce the basic concept of CPU speedup 2. To explain how data and branch hazards arise as

More information

CSE 141L Computer Architecture Lab Fall Lecture 3

CSE 141L Computer Architecture Lab Fall Lecture 3 CSE 141L Computer Architecture Lab Fall 2005 Lecture 3 Pramod V. Argade November 1, 2005 Fall 2005 CSE 141L Course Schedule Lecture # Date Day Lecture Topic Lab Due 1 9/27 Tuesday No Class 2 10/4 Tuesday

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Computer Architecture ELEC3441 RISC vs CISC Iron Law CPUTime = # of instruction program # of cycle instruction cycle Lecture 5 Pipelining Dr. Hayden Kwok-Hay So Department of Electrical and Electronic

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned

101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 101. The memory blocks are mapped on to the cache with the help of a) Hash functions b) Vectors c) Mapping functions d) None of the mentioned 102. During a write operation if the required block is not

More information

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović

Computer Architecture and Engineering CS152 Quiz #3 March 22nd, 2012 Professor Krste Asanović Computer Architecture and Engineering CS52 Quiz #3 March 22nd, 202 Professor Krste Asanović Name: This is a closed book, closed notes exam. 80 Minutes 0 Pages Notes: Not all questions are

More information

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?

What is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP? What is ILP? Instruction Level Parallelism or Declaration of Independence The characteristic of a program that certain instructions are, and can potentially be. Any mechanism that creates, identifies,

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information