Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
|
|
- Harry Hutchinson
- 5 years ago
- Views:
Transcription
1 PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI of the program. b) An embedded version of the processor that operates at 600 MHz is used to run the same application. In this version, the CPI of branch instruction becomes 6 while the other types CPI remain unchanged. A new compiler is used which eliminates 25% of the load-store instructions as well as 5% of the arithmetic instructions for this application. i. Determine the overall CPI of the program on the embedded processor with the new compiler. ii. Determine the factor by which the application on the embedded processor runs faster/slower. Solution: a) cycle/instruction b) First we calculate the new percentages for each type of instruction: i. Percentage of eliminated load-store from total instructions = Percentage of eliminated arithmetic from total instructions = Percentage of remaining instructions from total instructions = ( ) New percentage of load-store instructions 1
2 = New percentage of arithmetic instructions = New percentage of branch instructions = ii. (i.e the program now is slower) cycle/instr. ( ) PROBLEM 2: 1- Suppose a MIPS processor uses the simple 5-stage pipeline described in the text. Further suppose that: There is a single memory for both instructions and data which can do one read or write each cycle. No forwarding is used. An instruction cannot be fed into the pipeline until the hardware knows the instruction is to be executed certainly (no earlier than the end of the execution stage in case the current instruction is a branch). In the absence of hazards a new instruction can be fed into the pipeline each cycle. For the following MIPS code: lw R1, 0(R2) lw R3, 12(R4) add R5, R1, R3 beq R5, R5, L1 sw R5, 0(R3) L1: sw R5, 12(R4) 2
3 a) Show using a diagram, how many cycles does this code take to complete? b) Show using a diagram, how different hazard solving techniques can be used to decrease the total number of cycles for this program. Solution: a) As shown below, the code will take 15 cycles lw R1,0(R2) IF ID EX M WB lw R3,12(R4) IF ID EX M WB Add R5,R1,R3 IF ID EX M WB beq R5,R5,L1 IF ID EX M WB L1:sw R5,12(R4) IF ID EX M WB b) Using the following hazard solving techniques: Forwarding (to resolve some data hazards) Separate instruction and data memories (to resolve some structural hazards) Branch prediction Assuming branch prediction turns out to be correct, the code will take 11 cycles lw R1,0(R2) IF ID EX M WB lw R3,12(R4) IF ID EX M WB Add R5,R1,R3 IF ID EX M WB beq R5,R5,L1 IF ID EX M WB L1:sw R5,12(R4) IF ID EX M WB PROBLEM 3 3
4 2- A five-stage pipelined processor supports the following instruction types: Instruction Frequency Load 25% Store 15% Integer 30% Floating point 20% Branch 10% Assume the base CPI of the processor is equal to 1. Data hazards for floating point operations cause an average penalty of 0.9 stall cycles, branch instructions have a misprediction penalty of 1 stall cycle, while all other instructions run at maximum possible throughput. For branch instructions, the processor uses the predicted untaken scheme. If branch prediction turns out to be true 80% of the time, calculate the average CPI for this program. Solution: The average CPI = the base CPI + The average number of stalls per instruction = ( ) cycle/instr. 4
5 PROBLEM 4: a) Identify all WAR, WAW and RAW dependencies in the following instruction sequence: LD F2, 16(R6) ADDD F2, F2, F4 DIVD F6, F2, F0 SUBD F0, F2, F10 SD F6, 32(R3) b) Fill in the blank templates for executing this code with and without Tomasulo s Algorithm for this instruction sequence. Assume the following execution times: LW: 2 cycles ADD/SUB: 2 cycles BNEZ: 3 cycles MULT/DIV: 4 cycles For the original FP unit, assume one integer unit, one floating point multiply units, one F.P. add unit, one F.P. divide unit. For Tomasulo s, assume: Three FP ADD units, 2 FP MULT units, 6 load buffers and three store buffers. (Same units as in book example) Assume there is a cache miss causing a stall of 8 cycles on the execution of the 1 st LD. Assume FP adds/subs take 2 cycles, Mults take 10 cycles and Divides take 20 cycles. Assume the store is a cache hit and executes in one cycle. Assume many instructions can read from the register file simultaneously. For the Tomasulo example, recall that only one instruction can drive the CDB at a time. Solution: Without Tomasulo s Algorithm, and the processor is using Forwarding: LD F2,16(R6) IF ID EX MEM1 MEM2 WB ADDD F2,F2,F4 IF ID stall stall EX1 EX2 MEM WB 5
6 DIVD F6,F2,F0 IF stall stall ID stall EX1 EX2 EX3 EX4 MEM WB SUBD F0,F2,F10 stall stall IF stall ID stall stall stall EX1 EX2 MEM WB SD F6,32(R3) stall stall stall IF stall stall stall ID stall EX MEM1 MEM2 WB Notes: We considered we 1 execution unit and 1 memory unit and we had to respect this in order execution and in order completion to solve the stalls exactly as shown in slide 5 of the ILP chapter. With Tomasulo s Algorithm: We will use the same architecture shown in the lecture Instruction status: Exec Write Instruction j k Issue Comp Result Busy Address LD F2 16 R2 Load1 No ADDD F2 F2 F4 Load2 No DIVD F6 F2 F0 Load3 No SUBD F0 F2 F10 SD F6 32 R3 Add1 Add2 Add3 Mult1 Mult2 No NO No NO NO 0 FU Instruction status: Exec Write Instruction j k Issue Comp Result Busy Address LD F2 16 R2 1 2 Load1 Yes 16(R2) ADDD F2 F2 F4 Load2 No DIVD F^ F2 F0 Load3 No SUBD F0 F2 F10 SD F6 32 R3 Add1 Add2 Add3 Mult1 Mult2 No NO No NO NO 1 FU Load1 6
7 Instruction status: Exec Write Instruction j k Issue Comp Result Busy Address LD F2 16 R2 1 1 Load1 Yes 16(R2) ADDD F2 F2 F4 2 Load2 No DIVD F^ F2 F0 Load3 No SUBD F0 F2 F10 SD F6 32 R3 Add1 YES ADD F4 Load1 Add2 NO Mult1 NO 2 FU ADD1 Instruction status: Exec Write Instruction j k Issue Comp Result Busy Address ADDD F2 F2 F4 2 Load2 No SUBD F0 F2 F10 4 SD F6 32 R3 2 Add1 YES ADD MEM(1) F4 Add2 YES SUBD F10 ADD1 Mult1 YES DIVD F0 ADD1 4 FU ADD2 ADD1 MULT1 Instruction status: Exec Write Instruction j k Issue Comp Result Busy Address LD F2 16 R Load1 Yes 16(R2) ADDD F2 F2 F4 2 Load2 No SUBD F0 F2 F10 SD F6 32 R3 Add1 YES ADD F4 Load1 Add2 NO Mult1 YES DIVD F0 ADD1 3 FU ADD1 MULT1 ADDD F2 F2 F4 2 Load2 No SUBD F0 F2 F Add1 YES ADD MEM(1) F4 Add2 NO SUBD F10 ADD1 Mult1 YES DIVD F0 ADD1 5 FU ADD2 ADD1 MULT1 7
8 ADDD F2 F2 F4 2 6 Load2 No SUBD F0 F2 F Add1 YES ADD MEM(1) F4 Add2 NO SUBD F10 ADD1 Mult1 YES DIVD F0 ADD1 6 FU ADD2 ADD1 MULT1 SUBD F0 F2 F Add2 YES SUBD res1 F10 4 Mult1 YES DIVD res1 F0 7 FU ADD2 res1 MULT1 SUBD F0 F2 F Add2 YES SUBD res1 F10 3 Mult1 YES DIVD res1 F0 8 FU ADD2 (RES) MULT1 SUBD F0 F2 F Add2 NO SUBD res1 F10 3 Mult1 YES DIVD res1 F0 9 FU ADD2 (RES) MULT1 8
9 SUBD F0 F2 F Add1 NO 0 Add2 YES SUBD res1 F10 2 Mult1 YES DIVD res1 F0 10 FU ADD2 res1 MULT1 SUBD F0 F2 F Add2 No 1 Mult1 YES DIVD res1 F0 11 FU res2 res1 MULT1 DIVD F6 F2 F Load3 No SUBD F0 F2 F Add2 No 0 Mult1 YES DIVD res1 F0 12 FU res2 res1 MULT1 DIVD F6 F2 F Load3 No SUBD F0 F2 F Time: 2 Store Yes 32(R3) Res3 0 Add2 No 0 Mult1 No 13 FU res2 res1 Res3 9
10 DIVD F6 F2 F Load3 No SUBD F0 F2 F Time: 1 Store Yes 32(R3) Res3 0 Add2 No 0 Mult1 No 14 FU res2 res1 Res3 DIVD F6 F2 F Load3 No SUBD F0 F2 F Time: 0 Store Yes 32(R3) Res3 0 Add2 No 0 Mult1 No 15 FU res2 res1 Res3 10
11 PROBLEM 5: Consider the following code. (The... marks indicate instructions that are ignored in this example) LOOP1: ADDI R4, R0, #4... LOOP 2: SUBI R4, R4, #1... BNEZ R4, LOOP2... BEQZ R8, LOOP1... a) Focusing on the inner loop (LOOP2) only, analyze the branch behavior. Assume no other instruction changes the value of register R4. What percentage of the time is the BNEZ branch instruction taken and not taken? Consider LOOP2 is taken N times, so it is easy to deduce that the branch will be taken N times in each N+1 iterations, i.e. the loop will be taken N/N+1 and not taken 1/N+1 Consider LOOP2 is taken N times, so it is easy to deduce that the branch will be taken N times in each N+1 iterations, i.e. the loop will be taken N/N+1 and not taken 1/N+1 b) Choose the best static branch prediction scheme for the BNEZ instruction. What percentage of the time will this static branch prediction be correct for LOOP2? Using Branch taken, we will reach N correct iterations out of every N+1 decisions. c) Now consider dynamic branch prediction. Draw the state machine for a one-bit branch predictor. Be sure to clearly identify or define the meaning of each state. For the inner loop (LOOP2), what will be the misprediction rate of the one-bit branch predictor? 11
12 Taken Not Taken Taken Not Taken For 1 bit branch predictor the FSM should look as above, studying LOOP2 only, Iteration N N+1 Prediction Decision Not Taken Taken Taken Taken Taken Taken Taken Taken Final Decision Taken Taken Taken Taken Taken Taken Taken Taken Not So, we would take wrong decision 2 times out of every N+1 times d) Now draw the state diagram for a 2-bit dynamic branch predictor. Again, clearly label all states. What will be the misprediction rate of the 2- bit branch predictor for LOOP2? Iteration N N+1 Prediction Decision Not Not Taken Taken Taken Taken Taken Taken Taken Final Decision Taken Taken Taken Taken Taken Taken Taken Taken Not 12
13 e) Taking both loops in consideration, the state diagram for a 2,2 bit collator type dynamic branch predictor. We will not use 2,2 as it is not described in the lecture, so we will just take the relation between both loop1 and loop2. So, if we consider LOOP2 is executed N times every LOOP 1 Iteration. It is clear that for the 1 st loop iteration prediction will have 3 misses then it will be only 1 miss until the end of loop1 13
CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationAdvanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012
Advanced Computer Architecture CMSC 611 Homework 3 Due in class Oct 17 th, 2012 (Show your work to receive partial credit) 1) For the following code snippet list the data dependencies and rewrite the code
More informationELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *
ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST * SAMPLE 1 Section: Simple pipeline for integer operations For all following questions we assume that: a) Pipeline contains 5 stages: IF, ID, EX,
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationCISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3
CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Tomasulo
More informationCS433 Homework 2 (Chapter 3)
CS433 Homework 2 (Chapter 3) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationCS433 Homework 2 (Chapter 3)
CS Homework 2 (Chapter ) Assigned on 9/19/2017 Due in class on 10/5/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration..
More informationCOSC4201 Instruction Level Parallelism Dynamic Scheduling
COSC4201 Instruction Level Parallelism Dynamic Scheduling Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Outline Data dependence and hazards Exposing parallelism
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationCS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences
More informationCPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationMetodologie di Progettazione Hardware-Software
Metodologie di Progettazione Hardware-Software Advanced Pipelining and Instruction-Level Paralelism Metodologie di Progettazione Hardware/Software LS Ing. Informatica 1 ILP Instruction-level Parallelism
More informationELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism
ELEC 5200/6200 Computer Architecture and Design Fall 2016 Lecture 9: Instruction Level Parallelism Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University,
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Basic Compiler Techniques for Exposing ILP Advanced Branch Prediction Dynamic Scheduling Hardware-Based Speculation
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationCS433 Homework 3 (Chapter 3)
CS433 Homework 3 (Chapter 3) Assigned on 10/3/2017 Due in class on 10/17/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies
More informationSuperscalar Architectures: Part 2
Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23 rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr) Computer Science and Engineering Seoul NaMonal University Download this
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationWhat is ILP? Instruction Level Parallelism. Where do we find ILP? How do we expose ILP?
What is ILP? Instruction Level Parallelism or Declaration of Independence The characteristic of a program that certain instructions are, and can potentially be. Any mechanism that creates, identifies,
More informationCPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationReview: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction
Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup
More informationReduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:
Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding (register bypassing) to reduce or eliminate stalls needed
More informationExploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationChapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)
Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1) ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3) Hardware Speculation and Precise
More informationHardware-based Speculation
Hardware-based Speculation M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica 1 Introduction Hardware-based speculation is a technique for reducing the effects of control dependences
More informationStatic vs. Dynamic Scheduling
Static vs. Dynamic Scheduling Dynamic Scheduling Fast Requires complex hardware More power consumption May result in a slower clock Static Scheduling Done in S/W (compiler) Maybe not as fast Simpler processor
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationReview: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:
CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
More informationFor this problem, consider the following architecture specifications: Functional Unit Type Cycles in EX Number of Functional Units
CS333: Computer Architecture Spring 006 Homework 3 Total Points: 49 Points (undergrad), 57 Points (graduate) Due Date: Feb. 8, 006 by 1:30 pm (See course information handout for more details on late submissions)
More informationUpdated Exercises by Diana Franklin
C-82 Appendix C Pipelining: Basic and Intermediate Concepts Updated Exercises by Diana Franklin C.1 [15/15/15/15/25/10/15] Use the following code fragment: Loop: LD R1,0(R2) ;load R1 from address
More informationEECC551 Exam Review 4 questions out of 6 questions
EECC551 Exam Review 4 questions out of 6 questions (Must answer first 2 questions and 2 from remaining 4) Instruction Dependencies and graphs In-order Floating Point/Multicycle Pipelining (quiz 2) Improving
More informationInstruction Level Parallelism
Instruction Level Parallelism The potential overlap among instruction execution is called Instruction Level Parallelism (ILP) since instructions can be executed in parallel. There are mainly two approaches
More informationChapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences
Chapter 3: Instruction Level Parallelism (ILP) and its exploitation Pipeline CPI = Ideal pipeline CPI + stalls due to hazards invisible to programmer (unlike process level parallelism) ILP: overlap execution
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationScoreboard information (3 tables) Four stages of scoreboard control
Scoreboard information (3 tables) Instruction : issued, read operands and started execution (dispatched), completed execution or wrote result, Functional unit (assuming non-pipelined units) busy/not busy
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationWebsite for Students VTU NOTES QUESTION PAPERS NEWS RESULTS
Advanced Computer Architecture- 06CS81 Hardware Based Speculation Tomasulu algorithm and Reorder Buffer Tomasulu idea: 1. Have reservation stations where register renaming is possible 2. Results are directly
More informationLecture 4: Introduction to Advanced Pipelining
Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK
More informationLoad1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1
Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]
More informationECE 505 Computer Architecture
ECE 505 Computer Architecture Pipelining 2 Berk Sunar and Thomas Eisenbarth Review 5 stages of RISC IF ID EX MEM WB Ideal speedup of pipelining = Pipeline depth (N) Practically Implementation problems
More informationInstruction Level Parallelism. Taken from
Instruction Level Parallelism Taken from http://www.cs.utsa.edu/~dj/cs3853/lecture5.ppt Outline ILP Compiler techniques to increase ILP Loop Unrolling Static Branch Prediction Dynamic Branch Prediction
More informationThe Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.
The Processor Pipeline Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes. Pipeline A Basic MIPS Implementation Memory-reference instructions Load Word (lw) and Store Word (sw) ALU instructions
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationDynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:
LECTURE - 13 Dynamic Scheduling Better than static scheduling Scoreboarding: Used by the CDC 6600 Useful only within basic block WAW and WAR stalls Tomasulo algorithm: Used in IBM 360/91 for the FP unit
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationComputer Architecture Homework Set # 3 COVER SHEET Please turn in with your own solution
CSCE 6 (Fall 07) Computer Architecture Homework Set # COVER SHEET Please turn in with your own solution Eun Jung Kim Write your answers on the sheets provided. Submit with the COVER SHEET. If you need
More informationLecture-13 (ROB and Multi-threading) CS422-Spring
Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue
More informationLecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.SP96 1 Review: Evaluating Branch Alternatives Two part solution: Determine
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationDYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD
DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 3, John L. Hennessy and David A. Patterson,
More informationComplications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?
Complications with long instructions CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study So far, all MIPS instructions take 5 cycles But haven't talked
More informationSuper Scalar. Kalyan Basu March 21,
Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationCENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.
Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and
More informationThe basic structure of a MIPS floating-point unit
Tomasulo s scheme The algorithm based on the idea of reservation station The reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from
More informationILP: Instruction Level Parallelism
ILP: Instruction Level Parallelism Tassadaq Hussain Riphah International University Barcelona Supercomputing Center Universitat Politècnica de Catalunya Introduction Introduction Pipelining become universal
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationThe Evolution of Microprocessors. Per Stenström
The Evolution of Microprocessors Per Stenström Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory Evolution of Microprocessors Multicycle
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2018 Static Instruction Scheduling 1 Techniques to reduce stalls CPI = Ideal CPI + Structural stalls per instruction + RAW stalls per instruction + WAR stalls per
More informationPipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences
Dynamic Scheduling Pipelining: Issue instructions in every cycle (CPI 1) Compiler scheduling (static scheduling) reduces impact of dependences Increased compiler complexity, especially when attempting
More information吳俊興高雄大學資訊工程學系. October Example to eleminate WAR and WAW by register renaming. Tomasulo Algorithm. A Dynamic Algorithm: Tomasulo s Algorithm
EEF011 Computer Architecture 計算機結構 吳俊興高雄大學資訊工程學系 October 2004 Example to eleminate WAR and WAW by register renaming Original DIV.D ADD.D S.D SUB.D MUL.D F0, F2, F4 F6, F0, F8 F6, 0(R1) F8, F10, F14 F6,
More informationBranch prediction ( 3.3) Dynamic Branch Prediction
prediction ( 3.3) Static branch prediction (built into the architecture) The default is to assume that branches are not taken May have a design which predicts that branches are taken It is reasonable to
More informationCompiler Optimizations. Lecture 7 Overview of Superscalar Techniques. Memory Allocation by Compilers. Compiler Structure. Register allocation
Lecture 7 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2013 Reading: Textbook, Ch. 3 Complexity-Effective Superscalar Processors, PhD Thesis by Subbarao Palacharla, Ch.1
More information4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?
Chapter 4: Assessing and Understanding Performance 1. Define response (execution) time. 2. Define throughput. 3. Describe why using the clock rate of a processor is a bad way to measure performance. Provide
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationSlide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng
Slide Set 8 for ENCM 501 in Winter 2018 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 501 Winter 2018 Slide Set 8 slide
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationHY425 Lecture 05: Branch Prediction
HY425 Lecture 05: Branch Prediction Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS October 19, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 05: Branch Prediction 1 / 45 Exploiting ILP in hardware
More informationAs the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.
Hiroaki Kobayashi // As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor. Branches will arrive up to n times faster in an n-issue processor, and providing an instruction
More informationHardware-based Speculation
Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions
More informationSolutions to exercises on Instruction Level Parallelism
Solutions to exercises on Instruction Level Parallelism J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationT T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.
A1: Architecture (25 points) Consider these four possible branch predictors: (A) Static backward taken, forward not taken (B) 1-bit saturating counter (C) 2-bit saturating counter (D) Global predictor
More informationCSE 502 Graduate Computer Architecture. Lec 8-10 Instruction Level Parallelism
CSE 502 Graduate Computer Architecture Lec 8-10 Instruction Level Parallelism Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and ~lw Slides adapted from David Patterson,
More informationPredict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch
branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)
More informationECE473 Computer Architecture and Organization. Pipeline: Control Hazard
Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction
More informationADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011
ADVANCED COMPUTER ARCHITECTURES: 088949 Prof. C. SILVANO Written exam 11 July 2011 SURNAME NAME ID EMAIL SIGNATURE EX1 (3) EX2 (3) EX3 (3) EX4 (5) EX5 (5) EX6 (4) EX7 (5) EX8 (3+2) TOTAL (33) EXERCISE
More information3.16 Historical Perspective and References
Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell 247 Power4 Power5 Power6 Power7 Introduced 2001 2004 2007 2010 Initial clock rate (GHz) 1.3 1.9 4.7 3.6 Transistor count (M) 174 276 790
More informationCISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions
CISC 662 Graduate Computer Architecture Lecture 11 - Hardware Speculation Branch Predictions Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis6627 Powerpoint Lecture Notes from John Hennessy
More informationOutline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar, VLIW. CPE 631 Session 19 Exploiting ILP with SW Approaches
Session xploiting ILP with SW Approaches lectrical and Computer ngineering University of Alabama in Huntsville Outline Review: Basic Pipeline Scheduling and Loop Unrolling Multiple Issue: Superscalar,
More informationCS 614 COMPUTER ARCHITECTURE II FALL 2004
CS 64 COMPUTER ARCHITECTURE II FALL 004 DUE : October, 005 HOMEWORK II READ : - Portions of Chapters 5, 7, 8 and 9 of the Sima book and - Portions of Chapter 3, 4 and Appendix A of the Hennessy book ASSIGNMENT:
More informationCMSC411 Fall 2013 Midterm 2 Solutions
CMSC411 Fall 2013 Midterm 2 Solutions 1. (12 pts) Memory hierarchy a. (6 pts) Suppose we have a virtual memory of size 64 GB, or 2 36 bytes, where pages are 16 KB (2 14 bytes) each, and the machine has
More informationChapter 3 (CONT II) Instructor: Josep Torrellas CS433. Copyright J. Torrellas 1999,2001,2002,2007,
Chapter 3 (CONT II) Instructor: Josep Torrellas CS433 Copyright J. Torrellas 1999,2001,2002,2007, 2013 1 Hardware-Based Speculation (Section 3.6) In multiple issue processors, stalls due to branches would
More informationEE557--FALL 2000 MIDTERM 2. Open books and notes
NAME: Solutions STUDENT NUMBER: EE557--FALL 2000 MIDTERM 2 Open books and notes Time limit: 1hour and 20 minutes MAX. No extension. Q1: /12 Q2: /8 Q3: /9 Q4: /8 Q5: /8 Q6: /5 TOTAL: /50 Grade: /25 1 QUESTION
More informationCMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions
CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3 Long Instructions & MIPS Case Study Complications With Long Instructions So far, all MIPS instructions take 5 cycles But haven't talked
More informationCourse on Advanced Computer Architectures
Surname (Cognome) Name (Nome) POLIMI ID Number Signature (Firma) SOLUTION Politecnico di Milano, July 9, 2018 Course on Advanced Computer Architectures Prof. D. Sciuto, Prof. C. Silvano EX1 EX2 EX3 Q1
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More informationMulticycle ALU Operations 2/28/2011. Diversified Pipelines The Path Toward Superscalar Processors. Limitations of Our Simple 5 stage Pipeline
//11 Limitations of Our Simple stage Pipeline Diversified Pipelines The Path Toward Superscalar Processors HPCA, Spring 11 Assumes single cycle EX stage for all instructions This is not feasible for Complex
More information